cpython

Commit Graph

Author	SHA1	Message	Date
Serhiy Storchaka	fc897fcc01	gh-76960: Fix urljoin() and urldefrag() for URIs with empty components (GH-123273) * urljoin() with relative reference "?" sets empty query and removes fragment. * Preserve empty components (authority, params, query, fragment) in urljoin(). * Preserve empty components (authority, params, query) in urldefrag(). Also refactor the code and get rid of double _coerce_args() and _coerce_result() calls in urljoin(), urldefrag(), urlparse() and urlunparse().	2024-08-31 12:42:08 +03:00
Serhiy Storchaka	90c892efea	gh-85110: Preserve relative path in URL without netloc in urllib.parse.urlunsplit() (GH-123179)	2024-08-21 10:17:38 +03:00
Jeremy Hylton	77133f570d	gh-122909: Pass ftp error strings to URLError constructor (#122913 ) * pass the original string error message from the ftplib error to URLError() * Update request.py Change error string for ftp error to be consistent with other errors reported for ftp * Add NEWS entry for change to urllib.request for ftp errors. * Track the change in the ftp error message in the test.	2024-08-20 00:35:05 +00:00
Victor Stinner	6ae254aaa0	gh-120417: Add #noqa to used imports in the stdlib (#120421 ) Tools such as ruff can ignore "imported but unused" warnings if a line ends with "# noqa: F401". It avoids the temptation to remove an import which is used effectively.	2024-06-13 16:14:50 +02:00
Nikita Sobolev	84c3191954	gh-118827: Remove `Quoter` from `urllib.parse` (#118828 ) Co-authored-by: Shantanu <12621235+hauntsaninja@users.noreply.github.com> Co-authored-by: Hugo van Kemenade <1324225+hugovk@users.noreply.github.com>	2024-06-03 10:50:29 +03:00
Serhiy Storchaka	e237b25a4f	gh-67693: Fix urlunparse() and urlunsplit() for URIs with path starting with multiple slashes and no authority (GH-113563)	2024-05-14 12:24:37 +03:00
Harmen Stoppels	759e8e7ab8	gh-99730: urllib.request: Keep HEAD method on redirect (GH-99731)	2024-05-01 18:01:47 +02:00
Serhiy Storchaka	1069a462f6	gh-116764: Fix regressions in urllib.parse.parse_qsl() (GH-116801) * Restore support of None and other false values. * Raise TypeError for non-zero integers and non-empty sequences. The regressions were introduced in gh-74668 (`bdba8ef42b`).	2024-03-16 12:36:05 +02:00
Serhiy Storchaka	bdba8ef42b	gh-74668: Fix support of bytes in urllib.parse.parse_qsl() (GH-115771) urllib.parse functions parse_qs() and parse_qsl() now support bytes arguments containing raw and percent-encoded non-ASCII data.	2024-03-05 17:49:50 +02:00
Weii Wang	c43b26d02e	gh-115197: Stop resolving host in urllib.request proxy bypass (GH-115210) Use of a proxy is intended to defer DNS for the hosts to the proxy itself, rather than a potential for information leak of the host doing DNS resolution itself for any reason. Proxy bypass lists are strictly name based. Most implementations of proxy support agree.	2024-02-28 12:15:52 -08:00
Raphaël Marinier	5094690efd	gh-91539: Small performance improvement of urrlib.request.getproxies_environment() (#108771 ) Small performance improvement of getproxies_environment() when there are many environment variables. In a benchmark with 5k environment variables not related to proxies, and 5 specifying proxies, we get a 10% walltime improvement.	2024-01-15 15:45:01 -08:00
zentarim	f3266c05b6	GH-104554: Add RTSPS support to `urllib/parse.py` (#104605 ) * GH-104554: Add RTSPS support to `urllib/parse.py` RTSPS is the permanent scheme defined in https://www.iana.org/assignments/uri-schemes/uri-schemes.xhtml alongside RTSP and RTSPU schemes. * 📜🤖 Added by blurb_it. --------- Co-authored-by: blurb-it[bot] <43283697+blurb-it[bot]@users.noreply.github.com>	2023-06-13 16:45:47 -07:00
Victor Stinner	2587b9f64e	gh-105382: Remove urllib.request cafile parameter (#105384 ) Remove cafile, capath and cadefault parameters of the urllib.request.urlopen() function, deprecated in Python 3.6.	2023-06-06 21:17:45 +00:00
Illia Volochii	2f630e1ce1	gh-102153: Start stripping C0 control and space chars in `urlsplit` (#102508 ) `urllib.parse.urlsplit` has already been respecting the WHATWG spec a bit #25595. This adds more sanitizing to respect the "Remove any leading C0 control or space from input" [rule](https://url.spec.whatwg.org/#url-parsing:~:text=Remove%20any%20leading%20and%20trailing%20C0%20control%20or%20space%20from%20input.) in response to [CVE-2023-24329](https://nvd.nist.gov/vuln/detail/CVE-2023-24329). --------- Co-authored-by: Gregory P. Smith [Google] <greg@krypto.org>	2023-05-17 01:49:20 -07:00
JohnJamesUtley	29f348e232	gh-103848: Adds checks to ensure that bracketed hosts found by urlsplit are of IPv6 or IPvFuture format (#103849 ) * Adds checks to ensure that bracketed hosts found by urlsplit are of IPv6 or IPvFuture format --------- Co-authored-by: Gregory P. Smith <greg@krypto.org>	2023-05-10 00:18:35 +00:00
Gregory P. Smith	82f789be3b	gh-104139: Add itms-services to uses_netloc urllib.parse. (#104312 ) Teach unsplit to retain the `"//"` when assembling `itms-services://?action=generate-bugs` style [Apple Platform Deployment](https://support.apple.com/en-gb/guide/deployment/depce7cefc4d/web) URLs.	2023-05-09 07:04:50 -07:00
Dan Hemberger	e38bebb9ee	gh-81403: Fix for CacheFTPHandler in urllib (#13951 ) bpo-37222: Fix for CacheFTPHandler in urllib A call to FTP.ntransfercmd must be followed by FTP.voidresp to clear the "end transfer" message. Without this, the client and server get out of sync, which will result in an error if the FTP instance is reused to open a second URL. This scenario occurs for even the most basic usage of CacheFTPHandler. Reverts the patch merged as a resolution to bpo-16270 and adds a test case for the CacheFTPHandler in test_urllib2net.py. Co-authored-by: Senthil Kumaran <senthil@python.org>	2023-04-22 21:41:23 -07:00
Wheeler Law	5c00a6224d	gh-99352: Respect `http.client.HTTPConnection.debuglevel` in `urllib.request.AbstractHTTPHandler` (#99353 ) * bugfix: let the HTTP- and HTTPSHandlers respect the value of http.client.HTTPConnection.debuglevel * add tests * add news * ReSTify NEWS and reword a bit. * Address Review Comments. * Use mock.patch.object instead of settting the module level value. * Used test values to assert the debuglevel. --------- Co-authored-by: Gregory P. Smith <greg@krypto.org> Co-authored-by: Senthil Kumaran <senthil@python.org>	2023-04-20 19:04:25 -07:00
Vo Hoang Long	0d4c7fcd4f	gh-101936: Update the default value of fp from io.StringIO to io.BytesIO (gh-102100) Co-authored-by: Long Vo <long.vo@linecorp.com>	2023-02-22 00:14:41 +09:00
Gregory P. Smith	2e279e85fe	gh-88500: Reduce memory use of `urllib.unquote` (#96763 ) `urllib.unquote_to_bytes` and `urllib.unquote` could both potentially generate `O(len(string))` intermediate `bytes` or `str` objects while computing the unquoted final result depending on the input provided. As Python objects are relatively large, this could consume a lot of ram. This switches the implementation to using an expanding `bytearray` and a generator internally instead of precomputed `split()` style operations. Microbenchmarks with some antagonistic inputs like `mess = "\u0141%%%20a%fe"1000` show this is 10-20% slower for unquote and unquote_to_bytes and no different for typical inputs that are short or lack much unicode or % escaping. But the functions are already quite fast anyways so not a big deal. The slowdown scales consistently linear with input size as expected. Memory usage observed manually using `/usr/bin/time -v` on `python -m timeit` runs of larger inputs. Unittesting memory consumption is difficult and does not seem worthwhile. Observed memory usage is ~1/2 for `unquote()` and <1/3 for `unquote_to_bytes()` using `python -m timeit -s 'from urllib.parse import unquote, unquote_to_bytes; v="\u0141%01\u0161%20"500_000' 'unquote_to_bytes(v)'` as a test.	2022-12-10 16:17:39 -08:00
Dong-hee Na	dc8a86893d	gh-98778: Update HTTPError to initialize properly even if fp is None (gh-99966)	2022-12-08 11:20:34 +09:00
Nick Drozd	024ac542d7	bpo-45975: Simplify some while-loops with walrus operator (GH-29347)	2022-11-26 14:33:25 -08:00
Ben Kallus	439b9cfaf4	gh-99418: Make urllib.parse.urlparse enforce that a scheme must begin with an alphabetical ASCII character. (#99421 ) Prevent urllib.parse.urlparse from accepting schemes that don't begin with an alphabetical ASCII character. RFC 3986 defines a scheme like this: `scheme = ALPHA *( ALPHA / DIGIT / "+" / "-" / "." )` RFC 2234 defines an ALPHA like this: `ALPHA = %x41-5A / %x61-7A` The WHATWG URL spec defines a scheme like this: `"A URL-scheme string must be one ASCII alpha, followed by zero or more of ASCII alphanumeric, U+002B (+), U+002D (-), and U+002E (.)."`	2022-11-13 10:25:55 -08:00
Ben Kallus	6f15ca8c7a	gh-96035: Make urllib.parse.urlparse reject non-numeric ports (#98273 ) Co-authored-by: Jelle Zijlstra <jelle.zijlstra@gmail.com>	2022-10-20 14:00:56 -07:00
Carl Meyer	ad817cd5c4	bpo-43564: preserve original exception in args of FTP URLError (#24938 ) * bpo-43564: preserve original error in args of FTP URLError * Add NEWS blurb Co-authored-by: Carl Meyer <carljm@instagram.com>	2022-10-09 18:59:07 -07:00
Pieter Eendebak	aeb28f5130	gh-91539: improve performance of get_proxies_environment (#91566 ) * improve performance of get_proxies_environment when there are many environment variables * 📜🤖 Added by blurb_it. * fix case of short env name * fix formatting * fix whitespace * whitespace * Update Lib/urllib/request.py Co-authored-by: Carl Meyer <carl@oddbird.net> * Update Lib/urllib/request.py Co-authored-by: Carl Meyer <carl@oddbird.net> * Update Lib/urllib/request.py Co-authored-by: Carl Meyer <carl@oddbird.net> * Update Lib/urllib/request.py Co-authored-by: Carl Meyer <carl@oddbird.net> * whitespace * Update Misc/NEWS.d/next/Library/2022-04-15-11-29-38.gh-issue-91539.7WgVuA.rst Co-authored-by: Carl Meyer <carl@oddbird.net> * Update Lib/urllib/request.py Co-authored-by: Carl Meyer <carl@oddbird.net> Co-authored-by: blurb-it[bot] <43283697+blurb-it[bot]@users.noreply.github.com> Co-authored-by: Carl Meyer <carl@oddbird.net>	2022-10-05 10:57:52 -07:00
Gregory P. Smith	e61ca22431	gh-95865: Further reduce quote_from_bytes memory consumption (#96860 ) on large input values. Based on Dennis Sweeney's chunking idea.	2022-09-19 16:06:25 -07:00
Dennis Sweeney	8ba22b90ca	gh-95865: Speed up urllib.parse.quote_from_bytes() (GH-95872)	2022-08-30 21:39:51 -04:00
Victor Stinner	37118fa2e3	gh-94172: urllib.request avoids deprecated key_file/cert_file (#94232 ) The urllib.request module no longer uses the deprecated key_file and cert_file parameter of the http.client module.	2022-06-26 10:43:21 +02:00
Victor Stinner	f0b234e6ed	gh-94172: urllib.request avoids deprecated check_hostname (#94193 ) The urllib.request no longer uses the deprecated check_hostname parameter of the http.client module. Add private http.client._create_https_context() helper to http.client, used by urllib.request. Remove the now redundant check on check_hostname and verify_mode in http.client: the SSLContext.check_hostname setter already implements the check.	2022-06-24 17:45:28 +02:00
Victor Stinner	259dd71c32	gh-84623: Remove unused imports in stdlib (#93773 )	2022-06-13 16:28:41 +02:00
狂男风	b69297ea23	bpo-42627: Fix incorrect parsing of Windows registry proxy settings (GH-26307)	2022-05-11 19:17:17 +01:00
Oleg Iarygin	a03a09e068	Replace with_traceback() with exception chaining and reraising (GH-32074)	2022-03-30 15:28:20 +03:00
Serhiy Storchaka	e2e72567a1	bpo-46756: Fix authorization check in urllib.request (GH-31353) Fix a bug in urllib.request.HTTPPasswordMgr.find_user_password() and urllib.request.HTTPPasswordMgrWithPriorAuth.is_authenticated() which allowed to bypass authorization. For example, access to URI "example.org/foobar" was allowed if the user was authorized for URI "example.org/foo".	2022-02-25 13:31:03 +02:00
Christian Sattler	e6fe10d340	bpo-45874: Handle empty query string correctly in urllib.parse.parse_qsl (#29716 )	2021-12-12 10:41:12 +02:00
Łukasz Langa	f528045f69	bpo-40321: Add missing test, slightly expand documentation (GH-28760)	2021-10-06 17:28:16 +02:00
Jochem Schulenklopper	c379bc5ec9	bpo-40321: Support HTTP response status code 308 in urllib.request (#19588 ) * Support HTTP response status code 308 in urllib. HTTP response status code 308 is defined in https://tools.ietf.org/html/rfc7538 to be the permanent redirect variant of 307 (temporary redirect). * Update documentation to include http_error_308() * Add blurb for bpo-40321 fix Co-authored-by: Roland Crosby <roland@rolandcrosby.com>	2021-10-05 19:02:58 -07:00
Noah Kantrowitz	be42c06bb0	Update URLs in comments and metadata to use HTTPS (GH-27458)	2021-07-30 15:54:46 +02:00
Gregory P. Smith	d597fdc5fd	bpo-44002: Switch to lru_cache in urllib.parse. (GH-25798) Switch to lru_cache in urllib.parse. urllib.parse now uses functool.lru_cache for its internal URL splitting and quoting caches instead of rolling its own like its the 90s. The undocumented internal Quoted class API is now deprecated as it had no reason to be public and no existing OSS users were found. The clear_cache() API remains undocumented but gets an explicit test as it is used in a few projects' (twisted, gevent) tests as well as our own regrtest.	2021-05-11 17:01:44 -07:00
Senthil Kumaran	985ac01637	bpo-43882 Remove the newline, and tab early. From query and fragments. (GH-25921)	2021-05-05 15:50:05 -07:00
Dong-hee Na	6143fcdf8b	bpo-43979: Remove unnecessary operation from urllib.parse.parse_qsl (GH-25756) Automerge-Triggered-By: GH:gpshead	2021-04-30 12:01:55 -07:00
Senthil Kumaran	76cd81d603	bpo-43882 - urllib.parse should sanitize urls containing ASCII newline and tabs. (GH-25595) * issue43882 - urllib.parse should sanitize urls containing ASCII newline and tabs. Co-authored-by: Gregory P. Smith <greg@krypto.org> Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>	2021-04-29 10:16:50 -07:00
Ken Jin	b38601d496	bpo-42967: coerce bytes separator to string in urllib.parse_qs(l) (#24818 ) * coerce bytes separator to string * Add news * Update Misc/NEWS.d/next/Library/2021-03-11-00-31-41.bpo-42967.2PeQRw.rst	2021-04-11 06:26:09 -07:00
Yeting Li	7215d1ae25	bpo-43075: Fix ReDoS in urllib AbstractBasicAuthHandler (GH-24391) Fix Regular Expression Denial of Service (ReDoS) vulnerability in urllib.request.AbstractBasicAuthHandler. The ReDoS-vulnerable regex has quadratic worst-case complexity and it allows cause a denial of service when identifying crafted invalid RFCs. This ReDoS issue is on the client side and needs remote attackers to control the HTTP server.	2021-04-07 13:27:41 +02:00
Ken Jin	a2f0654b0a	bpo-42967: Fix urllib.parse docs and make logic clearer (GH-24536)	2021-02-15 09:00:20 -08:00
Adam Goldschmidt	fcbe0cb04d	bpo-42967: only use '&' as a query string separator (#24297 ) bpo-42967: [security] Address a web cache-poisoning issue reported in urllib.parse.parse_qsl(). urllib.parse will only us "&" as query string separator by default instead of both ";" and "&" as allowed in earlier versions. An optional argument seperator with default value "&" is added to specify the separator. Co-authored-by: Éric Araujo <merwok@netwok.org> Co-authored-by: blurb-it[bot] <43283697+blurb-it[bot]@users.noreply.github.com> Co-authored-by: Ken Jin <28750310+Fidget-Spinner@users.noreply.github.com> Co-authored-by: Éric Araujo <merwok@netwok.org>	2021-02-14 14:41:57 -08:00
Senthil Kumaran	030a713183	Allow / character in username,password fields in _PROXY envvars. (#23973 )	2020-12-29 04:18:42 -08:00
Christian Heimes	f97406be4c	bpo-40968: Send http/1.1 ALPN extension (#20959 ) Signed-off-by: Christian Heimes <christian@python.org>	2020-11-13 16:37:52 +01:00
Ronald Oussoren	93a1ccabde	bpo-41471: Ignore invalid prefix lengths in system proxy settings on macOS (GH-22762)	2020-10-19 20:16:21 +02:00
Batuhan Taşkaya	0361556537	bpo-39481: PEP 585 for a variety of modules (GH-19423) - concurrent.futures - ctypes - http.cookies - multiprocessing - queue - tempfile - unittest.case - urllib.parse	2020-04-10 07:46:36 -07:00

1 2 3 4 5 ...

461 Commits