This auth handler adds the Authorization header to the first
HTTP request rather than waiting for a HTTP 401 Unauthorized
response from the server as the default HTTPBasicAuthHandler
does.
This allows working with websites like https://api.github.com which do
not follow the strict interpretation of RFC, but more the dicta in the
end of section 2 of RFC 2617:
> A client MAY preemptively send the corresponding Authorization
> header with requests for resources in that space without receipt
> of another challenge from the server. Similarly, when a client
> sends a request to a proxy, it may reuse a userid and password in
> the Proxy-Authorization header field without receiving another
> challenge from the proxy server. See section 4 for security
> considerations associated with Basic authentication.
Patch by Matej Cepl.
* Repair the broken link to norobots-rfc.txt.
* HTTP response codes >= 500 treated as a failed read rather than as a not
found. Not found means that we can assume the entire site is allowed. A 5xx
server error tells us nothing.
* A successful read() or parse() updates the mtime (which is defined to be "the
time the robots.txt file was last fetched").
* The can_fetch() method returns False unless we've had a read() with a 2xx or
4xx response. This avoids false positives in the case where a user calls
can_fetch() before calling read().
* I don't see any easy way to test this patch without hitting internet
resources that might change or without use of mock objects that wouldn't
provide must reassurance.
base32, ascii85 and base85 codecs in the base64 module, and delay the
initialization of the unquote_to_bytes() table of the urllib.parse module, to
not waste memory if these modules are not used.
Fix#17967 - Fix related to regression on Windows.
os.path.join(*self.dirs) produces an invalid path on windows.
ftp paths are always forward-slash seperated like this. /pub/dir.
Fix thishost helper funtion in urllib. Returns the ipaddress of localhost when
hostname is resolvable by socket.gethostname for local machine. This all fixes
certain freebsd builtbot failures.
Fix#17967: For ftp urls CWD to target instead of hopping to each directory
towards target. This fixes a bug where target is accessible, but parent
directories are restricted.
#17403: urllib.parse.robotparser normalizes the urls before adding to
ruleline. This helps in handling certain types invalid urls in a conservative
manner. Patch contributed by Mher Movsisyan.