cpython

History

Gregory P. Smith 2e279e85fe gh-88500: Reduce memory use of `urllib.unquote` (#96763 ) `urllib.unquote_to_bytes` and `urllib.unquote` could both potentially generate `O(len(string))` intermediate `bytes` or `str` objects while computing the unquoted final result depending on the input provided. As Python objects are relatively large, this could consume a lot of ram. This switches the implementation to using an expanding `bytearray` and a generator internally instead of precomputed `split()` style operations. Microbenchmarks with some antagonistic inputs like `mess = "\u0141%%%20a%fe"1000` show this is 10-20% slower for unquote and unquote_to_bytes and no different for typical inputs that are short or lack much unicode or % escaping. But the functions are already quite fast anyways so not a big deal. The slowdown scales consistently linear with input size as expected. Memory usage observed manually using `/usr/bin/time -v` on `python -m timeit` runs of larger inputs. Unittesting memory consumption is difficult and does not seem worthwhile. Observed memory usage is ~1/2 for `unquote()` and <1/3 for `unquote_to_bytes()` using `python -m timeit -s 'from urllib.parse import unquote, unquote_to_bytes; v="\u0141%01\u0161%20"500_000' 'unquote_to_bytes(v)'` as a test.		2022-12-10 16:17:39 -08:00
..
__init__.py	…
error.py	gh-98778: Update HTTPError to initialize properly even if fp is None (gh-99966)	2022-12-08 11:20:34 +09:00
parse.py	gh-88500: Reduce memory use of `urllib.unquote` (#96763 )	2022-12-10 16:17:39 -08:00
request.py	bpo-45975: Simplify some while-loops with walrus operator (GH-29347)	2022-11-26 14:33:25 -08:00
response.py	bpo-12707: deprecate info(), geturl(), getcode() methods in favor of headers, url, and status properties for HTTPResponse and addinfourl (GH-11447)	2019-09-13 12:40:07 +01:00
robotparser.py	bpo-35922: Fix RobotFileParser when robots.txt has no relevant crawl delay or request rate (GH-11791)	2019-06-16 09:48:57 +03:00