Commit Graph

11 Commits

Author SHA1 Message Date
Martin v. Löwis 1c63f6e489 Correct various errors:
- Use substring search, not re search for user-agent and paths.
- Consider * entry last. Unquote, then requote URLs.
- Treat empty Disallow as "allow everything".
Add test cases. Fixes #523041
2002-02-28 15:24:47 +00:00
Andrew M. Kuchling e7abf97903 Remove unused import (PyChecker) 2001-08-13 14:43:43 +00:00
Tim Peters 0e6d213177 Whitespace normalization. 2001-02-15 23:56:39 +00:00
Skip Montanaro 5bba231d1e The bulk of the credit for these changes goes to Bastian Kleineidam
* restores urllib as the file fetcher (closes bug #132000)
* allows checking URLs with empty paths (closes patches #103511 and 103721)
* properly handle user agents with versions (e.g., SpamMeister/1.5)
* added several more tests
2001-02-12 20:58:30 +00:00
Eric S. Raymond 141971f22a String method conversion. 2001-02-09 08:40:40 +00:00
Tim Peters dfc538acae Whitespace normalization. 2001-01-21 04:49:16 +00:00
Skip Montanaro e99d5ea25b added __all__ lists to a number of Python modules
added test script and expected output file as well
this closes patch 103297.
__all__ attributes will be added to other modules without first submitting
a patch, just adding the necessary line to the test script to verify
more-or-less correct implementation.
2001-01-20 19:54:20 +00:00
Skip Montanaro 663f6c2ad2 rewrite of robotparser.py by Bastian Kleineidam. Closes patch 102229. 2001-01-20 15:59:25 +00:00
Guido van Rossum dc8b7980e0 Skip Montanaro:
The robotparser.py module currently lives in Tools/webchecker.  In
preparation for its migration to Lib, I made the following changes:

    * renamed the test() function _test
    * corrected the URLs in _test() so they refer to actual documents
    * added an "if __name__ == '__main__'" catcher to invoke _test()
      when run as a main program
    * added doc strings for the two main methods, parse and can_fetch
    * replaced usage of regsub and regex with corresponding re code
2000-03-27 19:29:31 +00:00
Guido van Rossum 986abac1ba Give in to tabnanny 1998-04-06 14:29:28 +00:00
Guido van Rossum bbf8c2fafd Skip Montanaro's robots.txt parser. 1997-01-30 03:18:23 +00:00