cpython/Lib/urllib
Raymond Hettinger 122541bece Issue 21469: Mitigate risk of false positives with robotparser.
* Repair the broken link to norobots-rfc.txt.

* HTTP response codes >= 500 treated as a failed read rather than as a not
found.  Not found means that we can assume the entire site is allowed.  A 5xx
server error tells us nothing.

* A successful read() or parse() updates the mtime (which is defined to be "the
  time the robots.txt file was last fetched").

* The can_fetch() method returns False unless we've had a read() with a 2xx or
4xx response.  This avoids false positives in the case where a user calls
can_fetch() before calling read().

* I don't see any easy way to test this patch without hitting internet
resources that might change or without use of mock objects that wouldn't
provide must reassurance.
2014-05-12 21:56:33 -07:00
..
__init__.py
error.py Replace IOError with OSError (#16715) 2012-12-25 16:47:37 +02:00
parse.py Issue #20879: Delay the initialization of encoding and decoding tables for 2014-03-17 22:38:41 +01:00
request.py Convert urllib.request parse_proxy doctests to unittests. 2014-04-14 16:32:20 -04:00
response.py urllib.response object to use _TemporaryFileWrapper (and _TemporaryFileCloser) 2014-04-20 09:41:29 -07:00
robotparser.py Issue 21469: Mitigate risk of false positives with robotparser. 2014-05-12 21:56:33 -07:00