Prevent urllib.parse.urlparse from accepting schemes that don't begin with an alphabetical ASCII character.
RFC 3986 defines a scheme like this: `scheme = ALPHA *( ALPHA / DIGIT / "+" / "-" / "." )`
RFC 2234 defines an ALPHA like this: `ALPHA = %x41-5A / %x61-7A`
The WHATWG URL spec defines a scheme like this:
`"A URL-scheme string must be one ASCII alpha, followed by zero or more of ASCII alphanumeric, U+002B (+), U+002D (-), and U+002E (.)."`
* improve performance of get_proxies_environment when there are many environment variables
* 📜🤖 Added by blurb_it.
* fix case of short env name
* fix formatting
* fix whitespace
* whitespace
* Update Lib/urllib/request.py
Co-authored-by: Carl Meyer <carl@oddbird.net>
* Update Lib/urllib/request.py
Co-authored-by: Carl Meyer <carl@oddbird.net>
* Update Lib/urllib/request.py
Co-authored-by: Carl Meyer <carl@oddbird.net>
* Update Lib/urllib/request.py
Co-authored-by: Carl Meyer <carl@oddbird.net>
* whitespace
* Update Misc/NEWS.d/next/Library/2022-04-15-11-29-38.gh-issue-91539.7WgVuA.rst
Co-authored-by: Carl Meyer <carl@oddbird.net>
* Update Lib/urllib/request.py
Co-authored-by: Carl Meyer <carl@oddbird.net>
Co-authored-by: blurb-it[bot] <43283697+blurb-it[bot]@users.noreply.github.com>
Co-authored-by: Carl Meyer <carl@oddbird.net>
The urllib.request no longer uses the deprecated check_hostname
parameter of the http.client module.
Add private http.client._create_https_context() helper to http.client,
used by urllib.request.
Remove the now redundant check on check_hostname and verify_mode in
http.client: the SSLContext.check_hostname setter already implements
the check.
Fix a bug in urllib.request.HTTPPasswordMgr.find_user_password() and
urllib.request.HTTPPasswordMgrWithPriorAuth.is_authenticated() which
allowed to bypass authorization. For example, access to URI "example.org/foobar"
was allowed if the user was authorized for URI "example.org/foo".
* Support HTTP response status code 308 in urllib.
HTTP response status code 308 is defined in https://tools.ietf.org/html/rfc7538 to be the permanent redirect variant of 307 (temporary redirect).
* Update documentation to include http_error_308()
* Add blurb for bpo-40321 fix
Co-authored-by: Roland Crosby <roland@rolandcrosby.com>
Switch to lru_cache in urllib.parse.
urllib.parse now uses functool.lru_cache for its internal URL splitting and
quoting caches instead of rolling its own like its the 90s.
The undocumented internal Quoted class API is now deprecated
as it had no reason to be public and no existing OSS users were found.
The clear_cache() API remains undocumented but gets an explicit test as it
is used in a few projects' (twisted, gevent) tests as well as our own regrtest.
Fix Regular Expression Denial of Service (ReDoS) vulnerability in
urllib.request.AbstractBasicAuthHandler. The ReDoS-vulnerable regex
has quadratic worst-case complexity and it allows cause a denial of
service when identifying crafted invalid RFCs. This ReDoS issue is on
the client side and needs remote attackers to control the HTTP server.
bpo-42967: [security] Address a web cache-poisoning issue reported in urllib.parse.parse_qsl().
urllib.parse will only us "&" as query string separator by default instead of both ";" and "&" as allowed in earlier versions. An optional argument seperator with default value "&" is added to specify the separator.
Co-authored-by: Éric Araujo <merwok@netwok.org>
Co-authored-by: blurb-it[bot] <43283697+blurb-it[bot]@users.noreply.github.com>
Co-authored-by: Ken Jin <28750310+Fidget-Spinner@users.noreply.github.com>
Co-authored-by: Éric Araujo <merwok@netwok.org>
The AbstractBasicAuthHandler class of the urllib.request module uses
an inefficient regular expression which can be exploited by an
attacker to cause a denial of service. Fix the regex to prevent the
catastrophic backtracking. Vulnerability reported by Ben Caller
and Matt Schwager.
AbstractBasicAuthHandler of urllib.request now parses all
WWW-Authenticate HTTP headers and accepts multiple challenges per
header: use the realm of the first Basic challenge.
Co-Authored-By: Serhiy Storchaka <storchaka@gmail.com>
* bpo-39548: Fix handling of 'WWW-Authenticate' header for Digest authentication
- The 'qop' value in the 'WWW-Authenticate' header is optional. The
presence of 'qop' in the header should be checked before its value
is parsed with 'split'.
Signed-off-by: Stephen Balousek <stephen@balousek.net>
* bpo-39548: Fix handling of 'WWW-Authenticate' header for Digest authentication
- Add NEWS item
Signed-off-by: Stephen Balousek <stephen@balousek.net>
* Update Misc/NEWS.d/next/Library/2020-02-06-05-33-52.bpo-39548.DF4FFe.rst
Co-Authored-By: Brandt Bucher <brandtbucher@gmail.com>
Co-authored-by: Brandt Bucher <brandtbucher@gmail.com>
* fix HTTP Digest handling in request.py
There is a bug triggered when server replies to a request with `WWW-Authenticate: Digest` where `qop="auth,auth-int"` rather than mere `qop="auth"`. Having both `auth` and `auth-int` is legitimate according to the `qop-options` rule in §3.2.1 of [[https://www.ietf.org/rfc/rfc2617.txt|RFC 2617]]:
> qop-options = "qop" "=" <"> 1#qop-value <">
> qop-value = "auth" | "auth-int" | token
> **qop-options**: [...] If present, it is a quoted string **of one or more** tokens indicating the "quality of protection" values supported by the server. The value `"auth"` indicates authentication; the value `"auth-int"` indicates authentication with integrity protection
This is description confirmed by the definition of the [//n//]`#`[//m//]//rule// extended-BNF pattern defined in §2.1 of [[https://www.ietf.org/rfc/rfc2616.txt|RFC 2616]] as 'a comma-separated list of //rule// with at least //n// and at most //m// items'.
When this reply is parsed by `get_authorization`, request.py only tests for identity with `'auth'`, failing to recognize it as one of the supported modes the server announced, and claims that `"qop 'auth,auth-int' is not supported"`.
* 📜🤖 Added by blurb_it.
* bpo-38686 review fix: remember why.
* fix trailing space in Lib/urllib/request.py
Co-Authored-By: Brandt Bucher <brandtbucher@gmail.com>
Capturing exceptions into names can lead to reference cycles though the __traceback__ attribute of the exceptions in some obscure cases that have been reported previously and fixed individually. As these variables are not used anyway, we can remove the binding to reduce the chances of creating reference cycles.
See for example GH-13135
* bpo-27657: Fix urlparse() with numeric paths
Revert parsing decision from bpo-754016 in favor of the documented
consensus in bpo-16932 of how to treat strings without a // to
designate the netloc.
* bpo-22891: Remove urlsplit() optimization for 'http' prefixed inputs.
CVE-2019-9948: Avoid file reading as disallowing the unnecessary URL
scheme in URLopener().open() and URLopener().retrieve()
of urllib.request.
Co-Authored-By: SH <push0ebp@gmail.com>
Fixes some mistakes and misleadings in the quote function docstring:
- reserved chars are never actually used by quote code, unreserved chars are
- reserved chars were wrong and incomplete
- mentioned that use-case is not minimal quoting wrt. RFC, but cautious quoting
Adding `max_num_fields` to `cgi.FieldStorage` to make DOS attacks harder by
limiting the number of `MiniFieldStorage` objects created by `FieldStorage`.