- Use substring search, not re search for user-agent and paths.
- Consider * entry last. Unquote, then requote URLs.
- Treat empty Disallow as "allow everything".
Add test cases. Fixes#523041
* restores urllib as the file fetcher (closes bug #132000)
* allows checking URLs with empty paths (closes patches #103511 and 103721)
* properly handle user agents with versions (e.g., SpamMeister/1.5)
* added several more tests
added test script and expected output file as well
this closes patch 103297.
__all__ attributes will be added to other modules without first submitting
a patch, just adding the necessary line to the test script to verify
more-or-less correct implementation.
The robotparser.py module currently lives in Tools/webchecker. In
preparation for its migration to Lib, I made the following changes:
* renamed the test() function _test
* corrected the URLs in _test() so they refer to actual documents
* added an "if __name__ == '__main__'" catcher to invoke _test()
when run as a main program
* added doc strings for the two main methods, parse and can_fetch
* replaced usage of regsub and regex with corresponding re code