Commit Graph

117 Commits

Author SHA1 Message Date
Gregory P. Smith d597fdc5fd
bpo-44002: Switch to lru_cache in urllib.parse. (GH-25798)
Switch to lru_cache in urllib.parse.

urllib.parse now uses functool.lru_cache for its internal URL splitting and
quoting caches instead of rolling its own like its the 90s.

The undocumented internal Quoted class API is now deprecated
as it had no reason to be public and no existing OSS users were found.

The clear_cache() API remains undocumented but gets an explicit test as it
is used in a few projects' (twisted, gevent) tests as well as our own regrtest.
2021-05-11 17:01:44 -07:00
Senthil Kumaran 985ac01637
bpo-43882 Remove the newline, and tab early. From query and fragments. (GH-25921) 2021-05-05 15:50:05 -07:00
Dong-hee Na 6143fcdf8b
bpo-43979: Remove unnecessary operation from urllib.parse.parse_qsl (GH-25756)
Automerge-Triggered-By: GH:gpshead
2021-04-30 12:01:55 -07:00
Senthil Kumaran 76cd81d603
bpo-43882 - urllib.parse should sanitize urls containing ASCII newline and tabs. (GH-25595)
* issue43882 - urllib.parse should sanitize urls containing ASCII newline and tabs.

Co-authored-by: Gregory P. Smith <greg@krypto.org>
Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
2021-04-29 10:16:50 -07:00
Ken Jin b38601d496
bpo-42967: coerce bytes separator to string in urllib.parse_qs(l) (#24818)
* coerce bytes separator to string

* Add news

* Update Misc/NEWS.d/next/Library/2021-03-11-00-31-41.bpo-42967.2PeQRw.rst
2021-04-11 06:26:09 -07:00
Ken Jin a2f0654b0a
bpo-42967: Fix urllib.parse docs and make logic clearer (GH-24536) 2021-02-15 09:00:20 -08:00
Adam Goldschmidt fcbe0cb04d
bpo-42967: only use '&' as a query string separator (#24297)
bpo-42967: [security] Address a web cache-poisoning issue reported in urllib.parse.parse_qsl().

urllib.parse will only us "&" as query string separator by default instead of both ";" and "&" as allowed in earlier versions. An optional argument seperator with default value "&" is added to specify the separator.


Co-authored-by: Éric Araujo <merwok@netwok.org>
Co-authored-by: blurb-it[bot] <43283697+blurb-it[bot]@users.noreply.github.com>
Co-authored-by: Ken Jin <28750310+Fidget-Spinner@users.noreply.github.com>
Co-authored-by: Éric Araujo <merwok@netwok.org>
2021-02-14 14:41:57 -08:00
Batuhan Taşkaya 0361556537
bpo-39481: PEP 585 for a variety of modules (GH-19423)
- concurrent.futures
- ctypes
- http.cookies
- multiprocessing
- queue
- tempfile
- unittest.case
- urllib.parse
2020-04-10 07:46:36 -07:00
idomic c33bdbb20c
bpo-37970: update and improve urlparse and urlsplit doc-strings (GH-16458) 2020-02-16 21:17:58 +02:00
Serhiy Storchaka 6a265f0d0c
bpo-39057: Fix urllib.request.proxy_bypass_environment(). (GH-17619)
Ignore leading dots and no longer ignore a trailing newline.
2020-01-05 14:14:31 +02:00
Tim Graham 5a88d50ff0 bpo-27657: Fix urlparse() with numeric paths (#661)
* bpo-27657: Fix urlparse() with numeric paths

Revert parsing decision from bpo-754016 in favor of the documented
consensus in bpo-16932 of how to treat strings without a // to
designate the netloc.

* bpo-22891: Remove urlsplit() optimization for 'http' prefixed inputs.
2019-10-18 06:07:20 -07:00
Stein Karlsen aad2ee0156 bpo-32498: urllib.parse.unquote also accepts bytes (GH-7768) 2019-10-14 13:36:29 +03:00
Steve Dower 8d0ef0b5ed bpo-36742: Corrects fix to handle decomposition in usernames (#13812) 2019-06-04 17:55:29 +02:00
Rémi Lapeyre 674ee12600 bpo-35397: Remove deprecation and document urllib.parse.unwrap (GH-11481) 2019-05-27 09:43:45 -04:00
Steve Dower d537ab0ff9
bpo-36742: Fixes handling of pre-normalization characters in urlsplit() (GH-13017) 2019-04-30 12:03:02 +00:00
Jörn Hees 750d74fac5 bpo-12910: update and correct quote docstring (#2568)
Fixes some mistakes and misleadings in the quote function docstring:
- reserved chars are never actually used by quote code, unreserved chars are
- reserved chars were wrong and incomplete
- mentioned that use-case is not minimal quoting wrt. RFC, but cautious quoting
2019-04-09 17:31:18 -07:00
Steve Dower 16e6f7dee7
bpo-36216: Add check for characters in netloc that normalize to separators (GH-12201) 2019-03-07 08:02:26 -08:00
matthewbelisle-wf 209144831b bpo-34866: Adding max_num_fields to cgi.FieldStorage (GH-9660)
Adding `max_num_fields` to `cgi.FieldStorage` to make DOS attacks harder by
limiting the number of `MiniFieldStorage` objects created by `FieldStorage`.
2018-10-19 03:52:59 -07:00
Cheryl Sabella 0250de4819 bpo-27485: Rename and deprecate undocumented functions in urllib.parse (GH-2205) 2018-04-25 16:51:54 -07:00
Matt Eaton 2cb4661707 bpo-33034: Improve exception message when cast fails for {Parse,Split}Result.port (GH-6078) 2018-03-20 09:41:37 +03:00
Коренберг Марк fbd605151f bpo-32323: urllib.parse.urlsplit() must not lowercase() IPv6 scope value (#4867) 2017-12-21 14:16:17 +02:00
Oren Milman 8df44ee8e0 remove a redundant lower in urllib.parse.urlsplit (#3008) 2017-09-02 21:51:39 -07:00
postmasters 90e01e50ef urllib: Simplify splithost by calling into urlparse. (#1849)
The current regex based splitting produces a wrong result. For example::

  http://abc#@def

Web browsers parse that URL as ``http://abc/#@def``, that is, the host
is ``abc``, the path is ``/``, and the fragment is ``#@def``.
2017-06-20 15:02:44 +02:00
Senthil Kumaran 906f5330b9 bpo-29976: urllib.parse clarify '' in scheme values. (GH-984) 2017-05-17 21:48:59 -07:00
Senthil Kumaran 257b980b31 correct parse_qs and parse_qsl test case descriptions. (#968)
* correct parse_qs and parse_qsl test case descriptions.
2017-04-04 21:19:43 -07:00
Ratnadeep Debnath 21024f0662 bpo-16285: Update urllib quoting to RFC 3986 (#173)
* bpo-16285: Update urllib quoting to RFC 3986

urllib.parse.quote is now based on RFC 3986, and hence
includes `'~'` in the set of characters that is not escaped
by default.

Patch by Christian Theune and Ratnadeep Debnath.
2017-02-25 19:00:28 +10:00
Serhiy Storchaka 8cbd3df3ce Issue #28992: Use bytes.fromhex(). 2016-12-21 12:59:28 +02:00
Berker Peksag f8479eeb34 Issue #25895: Merge from 3.5 2016-09-16 14:45:15 +03:00
Berker Peksag f676748a05 Issue #25895: Enable WebSocket URL schemes in urllib.parse.urljoin
Patch by Gergely Imreh and Markus Holtermann.
2016-09-16 14:43:58 +03:00
Senthil Kumaran 0b57f0adde merge from 3.5
Remove unnecessary test case comment in urllib.parse.py. These are asserted as test cases.
2016-01-25 18:54:37 -08:00
Senthil Kumaran d4e51f45a9 Remove unnecessary test case comment in urllib.parse.py. These are asserted as test cases. 2016-01-25 18:53:34 -08:00
Senthil Kumaran 86f7109dad Issue #25822: Add docstrings to the fields of urllib.parse results.
Patch contributed by Swati Jaiswal.
2016-01-14 00:11:39 -08:00
Robert Collins dfa95c9a8f Issue #20059: urllib.parse raises ValueError on all invalid ports.
Patch by Martin Panter.
2015-08-10 09:53:30 +12:00
R David Murray c17686f071 Issue #13866: add *quote_via* argument to urlencode.
Patch by samwyse, completed by Arnon Yaari, and reviewed by
Martin Panter.
2015-05-17 20:44:50 -04:00
Berker Peksag 20416f7994 Issue #23703: Fix a regression in urljoin() introduced in 901e4e52b20a.
Patch by Demian Brecht.
2015-04-16 02:31:14 +03:00
Serhiy Storchaka 1515450440 Issue #23411: Added DefragResult, ParseResult, SplitResult, DefragResultBytes,
ParseResultBytes, and SplitResultBytes to urllib.parse.__all__.
Patch by Martin Panter.
2015-04-07 19:09:01 +03:00
Serhiy Storchaka 44eceb6e2a Issue #23563: Optimized utility functions in urllib.parse. 2015-03-03 20:21:35 +02:00
R David Murray 3ab6ba4744 Merge: #23040: Clarify treatment of encoding and errors when component is bytes. 2014-12-24 21:24:07 -05:00
R David Murray 8c4e112afc #23040: Clarify treatment of encoding and errors when component is bytes.
Patch by Wojtek Ruszczewski.
2014-12-24 21:23:18 -05:00
Senthil Kumaran a66e3885fb Issue #22278: Fix urljoin problem with relative urls, a regression observed
after changes to issue22118 were submitted.

Patch contributed by Demian Brecht and reviewed by Antoine Pitrou.
2014-09-22 15:49:16 +08:00
Antoine Pitrou 55ac5b3f7b Issue #22118: Switch urllib.parse to use RFC 3986 semantics for the resolution of relative URLs, rather than RFCs 1808 and 2396.
Patch by Demian Brecht.
2014-08-21 19:16:17 -04:00
Serhiy Storchaka 465e60e654 Issue #22033: Reprs of most Python implemened classes now contain actual
class name instead of hardcoded one.
2014-07-25 23:36:00 +03:00
Victor Stinner d6a91a7ab6 Issue #20879: Delay the initialization of encoding and decoding tables for
base32, ascii85 and base85 codecs in the base64 module, and delay the
initialization of the unquote_to_bytes() table of the urllib.parse module, to
not waste memory if these modules are not used.
2014-03-17 22:38:41 +01:00
Serhiy Storchaka 5d83d1a814 Issue #20270: urllib.urlparse now supports empty ports. 2014-01-18 18:31:41 +02:00
Serhiy Storchaka ff97b08d00 Issue #20270: urllib.urlparse now supports empty ports. 2014-01-18 18:30:33 +02:00
Senthil Kumaran d80f7be580 merge from 3.3
Improve urlencode docstring. Patch by Brian Brazil.
Closes issue #15350
2013-09-05 21:43:53 -07:00
Senthil Kumaran 324ae385fe Improve urlencode docstring. Patch by Brian Brazil. 2013-09-05 21:42:38 -07:00
Raymond Hettinger 56b0a3d89a Remove redundant imports 2013-04-06 20:53:12 -07:00
Serhiy Storchaka 8ea4616f16 Issue #1285086: Get rid of the refcounting hack and speed up
urllib.parse.unquote() and urllib.parse.unquote_to_bytes().
2013-03-14 21:31:37 +02:00
Senthil Kumaran ed30199e78 Fix issue16713 - tel url parsing with params 2012-12-24 14:00:20 -08:00