Commit Graph

439 Commits

Author SHA1 Message Date
Ben Kallus 439b9cfaf4
gh-99418: Make urllib.parse.urlparse enforce that a scheme must begin with an alphabetical ASCII character. (#99421)
Prevent urllib.parse.urlparse from accepting schemes that don't begin with an alphabetical ASCII character.

RFC 3986 defines a scheme like this: `scheme = ALPHA *( ALPHA / DIGIT / "+" / "-" / "." )`
RFC 2234 defines an ALPHA like this: `ALPHA = %x41-5A / %x61-7A`

The WHATWG URL spec defines a scheme like this:
`"A URL-scheme string must be one ASCII alpha, followed by zero or more of ASCII alphanumeric, U+002B (+), U+002D (-), and U+002E (.)."`
2022-11-13 10:25:55 -08:00
Ben Kallus 6f15ca8c7a
gh-96035: Make urllib.parse.urlparse reject non-numeric ports (#98273)
Co-authored-by: Jelle Zijlstra <jelle.zijlstra@gmail.com>
2022-10-20 14:00:56 -07:00
Carl Meyer ad817cd5c4
bpo-43564: preserve original exception in args of FTP URLError (#24938)
* bpo-43564: preserve original error in args of FTP URLError

* Add NEWS blurb

Co-authored-by: Carl Meyer <carljm@instagram.com>
2022-10-09 18:59:07 -07:00
Pieter Eendebak aeb28f5130
gh-91539: improve performance of get_proxies_environment (#91566)
* improve performance of get_proxies_environment when there are many environment variables

* 📜🤖 Added by blurb_it.

* fix case of short env name

* fix formatting

* fix whitespace

* whitespace

* Update Lib/urllib/request.py

Co-authored-by: Carl Meyer <carl@oddbird.net>

* Update Lib/urllib/request.py

Co-authored-by: Carl Meyer <carl@oddbird.net>

* Update Lib/urllib/request.py

Co-authored-by: Carl Meyer <carl@oddbird.net>

* Update Lib/urllib/request.py

Co-authored-by: Carl Meyer <carl@oddbird.net>

* whitespace

* Update Misc/NEWS.d/next/Library/2022-04-15-11-29-38.gh-issue-91539.7WgVuA.rst

Co-authored-by: Carl Meyer <carl@oddbird.net>

* Update Lib/urllib/request.py

Co-authored-by: Carl Meyer <carl@oddbird.net>

Co-authored-by: blurb-it[bot] <43283697+blurb-it[bot]@users.noreply.github.com>
Co-authored-by: Carl Meyer <carl@oddbird.net>
2022-10-05 10:57:52 -07:00
Gregory P. Smith e61ca22431
gh-95865: Further reduce quote_from_bytes memory consumption (#96860)
on large input values.  Based on Dennis Sweeney's chunking idea.
2022-09-19 16:06:25 -07:00
Dennis Sweeney 8ba22b90ca
gh-95865: Speed up urllib.parse.quote_from_bytes() (GH-95872) 2022-08-30 21:39:51 -04:00
Victor Stinner 37118fa2e3
gh-94172: urllib.request avoids deprecated key_file/cert_file (#94232)
The urllib.request module no longer uses the deprecated key_file and
cert_file parameter of the http.client module.
2022-06-26 10:43:21 +02:00
Victor Stinner f0b234e6ed
gh-94172: urllib.request avoids deprecated check_hostname (#94193)
The urllib.request no longer uses the deprecated check_hostname
parameter of the http.client module.

Add private http.client._create_https_context() helper to http.client,
used by urllib.request.

Remove the now redundant check on check_hostname and verify_mode in
http.client: the SSLContext.check_hostname setter already implements
the check.
2022-06-24 17:45:28 +02:00
Victor Stinner 259dd71c32
gh-84623: Remove unused imports in stdlib (#93773) 2022-06-13 16:28:41 +02:00
狂男风 b69297ea23
bpo-42627: Fix incorrect parsing of Windows registry proxy settings (GH-26307) 2022-05-11 19:17:17 +01:00
Oleg Iarygin a03a09e068
Replace with_traceback() with exception chaining and reraising (GH-32074) 2022-03-30 15:28:20 +03:00
Serhiy Storchaka e2e72567a1
bpo-46756: Fix authorization check in urllib.request (GH-31353)
Fix a bug in urllib.request.HTTPPasswordMgr.find_user_password() and
urllib.request.HTTPPasswordMgrWithPriorAuth.is_authenticated() which
allowed to bypass authorization. For example, access to URI "example.org/foobar"
was allowed if the user was authorized for URI "example.org/foo".
2022-02-25 13:31:03 +02:00
Christian Sattler e6fe10d340
bpo-45874: Handle empty query string correctly in urllib.parse.parse_qsl (#29716) 2021-12-12 10:41:12 +02:00
Łukasz Langa f528045f69
bpo-40321: Add missing test, slightly expand documentation (GH-28760) 2021-10-06 17:28:16 +02:00
Jochem Schulenklopper c379bc5ec9
bpo-40321: Support HTTP response status code 308 in urllib.request (#19588)
* Support HTTP response status code 308 in urllib.

HTTP response status code 308 is defined in https://tools.ietf.org/html/rfc7538 to be the permanent redirect variant of 307 (temporary redirect).

* Update documentation to include http_error_308()

* Add blurb for bpo-40321 fix

Co-authored-by: Roland Crosby <roland@rolandcrosby.com>
2021-10-05 19:02:58 -07:00
Noah Kantrowitz be42c06bb0
Update URLs in comments and metadata to use HTTPS (GH-27458) 2021-07-30 15:54:46 +02:00
Gregory P. Smith d597fdc5fd
bpo-44002: Switch to lru_cache in urllib.parse. (GH-25798)
Switch to lru_cache in urllib.parse.

urllib.parse now uses functool.lru_cache for its internal URL splitting and
quoting caches instead of rolling its own like its the 90s.

The undocumented internal Quoted class API is now deprecated
as it had no reason to be public and no existing OSS users were found.

The clear_cache() API remains undocumented but gets an explicit test as it
is used in a few projects' (twisted, gevent) tests as well as our own regrtest.
2021-05-11 17:01:44 -07:00
Senthil Kumaran 985ac01637
bpo-43882 Remove the newline, and tab early. From query and fragments. (GH-25921) 2021-05-05 15:50:05 -07:00
Dong-hee Na 6143fcdf8b
bpo-43979: Remove unnecessary operation from urllib.parse.parse_qsl (GH-25756)
Automerge-Triggered-By: GH:gpshead
2021-04-30 12:01:55 -07:00
Senthil Kumaran 76cd81d603
bpo-43882 - urllib.parse should sanitize urls containing ASCII newline and tabs. (GH-25595)
* issue43882 - urllib.parse should sanitize urls containing ASCII newline and tabs.

Co-authored-by: Gregory P. Smith <greg@krypto.org>
Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
2021-04-29 10:16:50 -07:00
Ken Jin b38601d496
bpo-42967: coerce bytes separator to string in urllib.parse_qs(l) (#24818)
* coerce bytes separator to string

* Add news

* Update Misc/NEWS.d/next/Library/2021-03-11-00-31-41.bpo-42967.2PeQRw.rst
2021-04-11 06:26:09 -07:00
Yeting Li 7215d1ae25
bpo-43075: Fix ReDoS in urllib AbstractBasicAuthHandler (GH-24391)
Fix Regular Expression Denial of Service (ReDoS) vulnerability in
urllib.request.AbstractBasicAuthHandler. The ReDoS-vulnerable regex
has quadratic worst-case complexity and it allows cause a denial of
service when identifying crafted invalid RFCs. This ReDoS issue is on
the client side and needs remote attackers to control the HTTP server.
2021-04-07 13:27:41 +02:00
Ken Jin a2f0654b0a
bpo-42967: Fix urllib.parse docs and make logic clearer (GH-24536) 2021-02-15 09:00:20 -08:00
Adam Goldschmidt fcbe0cb04d
bpo-42967: only use '&' as a query string separator (#24297)
bpo-42967: [security] Address a web cache-poisoning issue reported in urllib.parse.parse_qsl().

urllib.parse will only us "&" as query string separator by default instead of both ";" and "&" as allowed in earlier versions. An optional argument seperator with default value "&" is added to specify the separator.


Co-authored-by: Éric Araujo <merwok@netwok.org>
Co-authored-by: blurb-it[bot] <43283697+blurb-it[bot]@users.noreply.github.com>
Co-authored-by: Ken Jin <28750310+Fidget-Spinner@users.noreply.github.com>
Co-authored-by: Éric Araujo <merwok@netwok.org>
2021-02-14 14:41:57 -08:00
Senthil Kumaran 030a713183
Allow / character in username,password fields in _PROXY envvars. (#23973) 2020-12-29 04:18:42 -08:00
Christian Heimes f97406be4c
bpo-40968: Send http/1.1 ALPN extension (#20959)
Signed-off-by: Christian Heimes <christian@python.org>
2020-11-13 16:37:52 +01:00
Ronald Oussoren 93a1ccabde
bpo-41471: Ignore invalid prefix lengths in system proxy settings on macOS (GH-22762) 2020-10-19 20:16:21 +02:00
Batuhan Taşkaya 0361556537
bpo-39481: PEP 585 for a variety of modules (GH-19423)
- concurrent.futures
- ctypes
- http.cookies
- multiprocessing
- queue
- tempfile
- unittest.case
- urllib.parse
2020-04-10 07:46:36 -07:00
Victor Stinner 0b297d4ff1
bpo-39503: CVE-2020-8492: Fix AbstractBasicAuthHandler (GH-18284)
The AbstractBasicAuthHandler class of the urllib.request module uses
an inefficient regular expression which can be exploited by an
attacker to cause a denial of service. Fix the regex to prevent the
catastrophic backtracking. Vulnerability reported by Ben Caller
and Matt Schwager.

AbstractBasicAuthHandler of urllib.request now parses all
WWW-Authenticate HTTP headers and accepts multiple challenges per
header: use the realm of the first Basic challenge.

Co-Authored-By: Serhiy Storchaka <storchaka@gmail.com>
2020-04-02 02:52:20 +02:00
Stephen Balousek 5e260e0fde
bpo-39548: Fix handling of 'WWW-Authenticate' header for Digest Auth (GH-18338)
* bpo-39548: Fix handling of 'WWW-Authenticate' header for Digest authentication

 - The 'qop' value in the 'WWW-Authenticate' header is optional. The
   presence of 'qop' in the header should be checked before its value
   is parsed with 'split'.

Signed-off-by: Stephen Balousek <stephen@balousek.net>

* bpo-39548: Fix handling of 'WWW-Authenticate' header for Digest authentication

 - Add NEWS item

Signed-off-by: Stephen Balousek <stephen@balousek.net>

* Update Misc/NEWS.d/next/Library/2020-02-06-05-33-52.bpo-39548.DF4FFe.rst

Co-Authored-By: Brandt Bucher <brandtbucher@gmail.com>

Co-authored-by: Brandt Bucher <brandtbucher@gmail.com>
2020-02-29 12:31:58 -08:00
idomic c33bdbb20c
bpo-37970: update and improve urlparse and urlsplit doc-strings (GH-16458) 2020-02-16 21:17:58 +02:00
Serhiy Storchaka 6a265f0d0c
bpo-39057: Fix urllib.request.proxy_bypass_environment(). (GH-17619)
Ignore leading dots and no longer ignore a trailing newline.
2020-01-05 14:14:31 +02:00
PypeBros 14a89c4798 bpo-38686: fix HTTP Digest handling in request.py (#17045)
* fix HTTP Digest handling in request.py

There is a bug triggered when server replies to a request with `WWW-Authenticate: Digest` where `qop="auth,auth-int"` rather than mere `qop="auth"`. Having both `auth` and `auth-int` is legitimate according to the `qop-options` rule in §3.2.1 of [[https://www.ietf.org/rfc/rfc2617.txt|RFC 2617]]:
>      qop-options       = "qop" "=" <"> 1#qop-value <">
>      qop-value         = "auth" | "auth-int" | token
> **qop-options**: [...] If present, it is a quoted string **of one or more** tokens indicating the "quality of protection" values supported by the server.  The value `"auth"` indicates authentication; the value `"auth-int"` indicates authentication with integrity protection

This is description confirmed by the definition of the [//n//]`#`[//m//]//rule// extended-BNF pattern defined in §2.1 of [[https://www.ietf.org/rfc/rfc2616.txt|RFC 2616]] as 'a comma-separated list of //rule// with at least //n// and at most //m// items'.

When this reply is parsed by `get_authorization`, request.py only tests for identity with `'auth'`, failing to recognize it as one of the supported modes the server announced, and claims that `"qop 'auth,auth-int' is not supported"`.

* 📜🤖 Added by blurb_it.

* bpo-38686 review fix: remember why.

* fix trailing space in Lib/urllib/request.py

Co-Authored-By: Brandt Bucher <brandtbucher@gmail.com>
2019-11-22 15:19:08 -08:00
Pablo Galindo 293dd23477
Remove binding of captured exceptions when not used to reduce the chances of creating cycles (GH-17246)
Capturing exceptions into names can lead to reference cycles though the __traceback__ attribute of the exceptions in some obscure cases that have been reported previously and fixed individually. As these variables are not used anyway, we can remove the binding to reduce the chances of creating reference cycles.

See for example GH-13135
2019-11-19 21:34:03 +00:00
Tim Graham 5a88d50ff0 bpo-27657: Fix urlparse() with numeric paths (#661)
* bpo-27657: Fix urlparse() with numeric paths

Revert parsing decision from bpo-754016 in favor of the documented
consensus in bpo-16932 of how to treat strings without a // to
designate the netloc.

* bpo-22891: Remove urlsplit() optimization for 'http' prefixed inputs.
2019-10-18 06:07:20 -07:00
Stein Karlsen aad2ee0156 bpo-32498: urllib.parse.unquote also accepts bytes (GH-7768) 2019-10-14 13:36:29 +03:00
Zackery Spytz b761e3aed1 bpo-25068: urllib.request.ProxyHandler now lowercases the dict keys (GH-13489) 2019-09-13 15:07:07 +01:00
Ashwin Ramaswami ff2e182865 bpo-12707: deprecate info(), geturl(), getcode() methods in favor of headers, url, and status properties for HTTPResponse and addinfourl (GH-11447)
Co-Authored-By: epicfaace <aramaswamis@gmail.com>
2019-09-13 12:40:07 +01:00
Rémi Lapeyre 8047e0e1c6 bpo-35922: Fix RobotFileParser when robots.txt has no relevant crawl delay or request rate (GH-11791)
Co-Authored-By: Tal Einat <taleinat+github@gmail.com>
2019-06-16 09:48:57 +03:00
Steve Dower 8d0ef0b5ed bpo-36742: Corrects fix to handle decomposition in usernames (#13812) 2019-06-04 17:55:29 +02:00
Rémi Lapeyre 674ee12600 bpo-35397: Remove deprecation and document urllib.parse.unwrap (GH-11481) 2019-05-27 09:43:45 -04:00
Steve Dower b82e17e626
bpo-36842: Implement PEP 578 (GH-12613)
Adds sys.audit, sys.addaudithook, io.open_code, and associated C APIs.
2019-05-23 08:45:22 -07:00
Victor Stinner 0c2b6a3943
bpo-35907, CVE-2019-9948: urllib rejects local_file:// scheme (GH-13474)
CVE-2019-9948: Avoid file reading as disallowing the unnecessary URL
scheme in URLopener().open() and URLopener().retrieve()
of urllib.request.

Co-Authored-By: SH <push0ebp@gmail.com>
2019-05-22 22:15:01 +02:00
Xtreak c661b30f89 bpo-36948: Fix NameError in urllib.request.URLopener.retrieve (GH-13389) 2019-05-19 16:40:05 +03:00
Steve Dower d537ab0ff9
bpo-36742: Fixes handling of pre-normalization characters in urlsplit() (GH-13017) 2019-04-30 12:03:02 +00:00
Jörn Hees 750d74fac5 bpo-12910: update and correct quote docstring (#2568)
Fixes some mistakes and misleadings in the quote function docstring:
- reserved chars are never actually used by quote code, unreserved chars are
- reserved chars were wrong and incomplete
- mentioned that use-case is not minimal quoting wrt. RFC, but cautious quoting
2019-04-09 17:31:18 -07:00
Serhiy Storchaka da0847048a
bpo-36431: Use PEP 448 dict unpacking for merging two dicts. (GH-12553) 2019-03-27 08:02:28 +02:00
Steve Dower 16e6f7dee7
bpo-36216: Add check for characters in netloc that normalize to separators (GH-12201) 2019-03-07 08:02:26 -08:00
Boštjan Mejak 158695817d closes bpo-35309: cpath should be capath (GH-10699) 2018-11-25 12:32:50 -06:00
matthewbelisle-wf 209144831b bpo-34866: Adding max_num_fields to cgi.FieldStorage (GH-9660)
Adding `max_num_fields` to `cgi.FieldStorage` to make DOS attacks harder by
limiting the number of `MiniFieldStorage` objects created by `FieldStorage`.
2018-10-19 03:52:59 -07:00