Some servers send Location header fields with non-ASCII bytes, but "http.
client" requires the request target to be ASCII-encodable, otherwise a
UnicodeEncodeError is raised. Based on patch by Christian Heimes.
Python 2 does not suffer any problem because it allows non-ASCII bytes in the
HTTP request target.
This fix is a superset of the functionality introduced by the issue #19494
enhancement, and supersedes that fix. Instead of a new handler, we have a new
password manager that tracks whether we should send the auth for a given uri.
This allows us to say "always send", satisfying #19494, or track that we've
succeeded in auth and send the creds right away on every *subsequent* request.
The support for using the password manager is added to AbstractBasicAuth,
which means the proxy handler also now can handle prior auth if passed
the new password manager.
Patch by Akshit Khurana, docs mostly by me.
This auth handler adds the Authorization header to the first
HTTP request rather than waiting for a HTTP 401 Unauthorized
response from the server as the default HTTPBasicAuthHandler
does.
This allows working with websites like https://api.github.com which do
not follow the strict interpretation of RFC, but more the dicta in the
end of section 2 of RFC 2617:
> A client MAY preemptively send the corresponding Authorization
> header with requests for resources in that space without receipt
> of another challenge from the server. Similarly, when a client
> sends a request to a proxy, it may reuse a userid and password in
> the Proxy-Authorization header field without receiving another
> challenge from the proxy server. See section 4 for security
> considerations associated with Basic authentication.
Patch by Matej Cepl.
* Repair the broken link to norobots-rfc.txt.
* HTTP response codes >= 500 treated as a failed read rather than as a not
found. Not found means that we can assume the entire site is allowed. A 5xx
server error tells us nothing.
* A successful read() or parse() updates the mtime (which is defined to be "the
time the robots.txt file was last fetched").
* The can_fetch() method returns False unless we've had a read() with a 2xx or
4xx response. This avoids false positives in the case where a user calls
can_fetch() before calling read().
* I don't see any easy way to test this patch without hitting internet
resources that might change or without use of mock objects that wouldn't
provide must reassurance.
base32, ascii85 and base85 codecs in the base64 module, and delay the
initialization of the unquote_to_bytes() table of the urllib.parse module, to
not waste memory if these modules are not used.