cpython

Commit Graph

Author	SHA1	Message	Date
John Sloboda	649857a157	gh-85287: Change codecs to raise precise UnicodeEncodeError and UnicodeDecodeError (#113674 ) Co-authored-by: Inada Naoki <songofacandy@gmail.com>	2024-03-17 04:58:42 +00:00
Zackery Spytz	d180b507c4	gh-63283: IDNA prefix should be case insensitive (GH-17726) Any capitalization of "xn--" should be acceptable for the ACE prefix (see https://tools.ietf.org/html/rfc3490#section-5). Co-authored-by: Pepijn de Vos <pepijndevos@gmail.com> Co-authored-by: Erlend E. Aasland <erlend@python.org> Co-authored-by: Petr Viktorin <encukou@gmail.com>	2024-03-15 15:38:13 +01:00
Masayuki Moriyama	1476ac2c58	gh-102388: Add windows_31j to aliases for cp932 codec (#102389 ) The charset name "Windows-31J" is registered in the IANA Charset Registry[1] and is implemented in Python as the cp932 codec. [1] https://www.iana.org/assignments/charset-reg/windows-31J Signed-off-by: Masayuki Moriyama <masayuki.moriyama@miraclelinux.com>	2024-02-19 17:01:35 +09:00
Gregory P. Smith	d315722564	gh-98433: Fix quadratic time idna decoding. (#99092 ) There was an unnecessary quadratic loop in idna decoding. This restores the behavior to linear. This also adds an early length check in IDNA decoding to outright reject huge inputs early on given the ultimate result is defined to be 63 or fewer characters.	2022-11-07 16:54:41 -08:00
Victor Stinner	ccbe8045fa	bpo-46659: Fix the MBCS codec alias on Windows (GH-31218)	2022-02-22 22:04:07 +01:00
Victor Stinner	04dd60e50c	bpo-46659: Update the test on the mbcs codec alias (GH-31168) encodings registers the _alias_mbcs() codec search function before the search_function() codec search function. Previously, the _alias_mbcs() was never used. Fix the test_codecs.test_mbcs_alias() test: use the current ANSI code page, not a fake ANSI code page number. Remove the test_site.test_aliasing_mbcs() test: the alias is now implemented in the encodings module, no longer in the site module.	2022-02-06 21:50:09 +01:00
Serhiy Storchaka	39aa98346d	bpo-45467: Fix IncrementalDecoder and StreamReader in the "raw-unicode-escape" codec (GH-28944) They support now splitting escape sequences between input chunks. Add the third parameter "final" in codecs.raw_unicode_escape_decode(). It is True by default to match the former behavior.	2021-10-14 20:04:19 +03:00
Serhiy Storchaka	c96d1546b1	bpo-45461: Fix IncrementalDecoder and StreamReader in the "unicode-escape" codec (GH-28939) They support now splitting escape sequences between input chunks. Add the third parameter "final" in codecs.unicode_escape_decode(). It is True by default to match the former behavior.	2021-10-14 13:17:00 +03:00
Hai Shi	c5b049b91c	bpo-39337: encodings.normalize_encoding() now ignores non-ASCII characters (GH-22219)	2020-10-14 17:43:31 +02:00
Berker Peksag	ba22e8f174	bpo-30566: Fix IndexError when using punycode codec (GH-18632) Trying to decode an invalid string with the punycode codec shoud raise UnicodeError.	2020-02-25 06:19:03 +03:00
Matthew Rollings	a62ad4730c	bpo-38945: UU Encoding: Don't let newline in filename corrupt the output format (#17418 )	2019-12-02 14:25:21 -08:00
Michael Osipov	a828514cc3	bpo-34519: Add additional aliases for HP Roman 8 (GH-8956) * bpo-34519: Add additional aliases for HP Roman 8 HP Roman 8 is known under mode aliases than listed in aliases.py. Patch by Michael Osipov.	2019-09-11 14:08:41 +01:00
Inada Naoki	cb65202520	bpo-35551: remove mac_centeuro encoding (GH-13856) It is alias to mac_latin2 now.	2019-06-06 14:38:52 +09:00
Ashwin Ramaswami	c4c15ed7a2	bpo-35551: encodings update (GH-11446)	2019-06-05 18:18:06 -04:00
Xtreak	0d70227e41	Fix typos in docs and docstrings (GH-13745)	2019-06-03 01:12:33 +02:00
Victor Stinner	d267ac20c3	bpo-36778: cp65001 encoding becomes an alias to utf_8 (GH-13230)	2019-05-10 03:19:54 +02:00
Inada Naoki	6a16b18224	bpo-36297: remove "unicode_internal" codec (GH-12342)	2019-03-18 15:44:11 +09:00
Anthony Sottile	ed2e9ab804	Remove obsolete comment about latin-1 in `normalize_encoding` (GH-8739) This docstring has drifted since python2: `ca079a3ea3/Lib/encodings/__init__.py (L68)`	2018-09-10 17:54:37 -07:00
Xiang Zhang	e4ce9fa89c	bpo-32943: Fix confusing error message for rot13 codec (GH-5869)	2018-03-25 12:09:21 +08:00
Victor Stinner	91106cd9ff	bpo-29240: PEP 540: Add a new UTF-8 Mode (#855 ) * Add -X utf8 command line option, PYTHONUTF8 environment variable and a new sys.flags.utf8_mode flag. * If the LC_CTYPE locale is "C" at startup: enable automatically the UTF-8 mode. * Add _winapi.GetACP(). encodings._alias_mbcs() now calls _winapi.GetACP() to get the ANSI code page * locale.getpreferredencoding() now returns 'UTF-8' in the UTF-8 mode. As a side effect, open() now uses the UTF-8 encoding by default in this mode. * Py_DecodeLocale() and Py_EncodeLocale() now use the UTF-8 encoding in the UTF-8 Mode. * Update subprocess._args_from_interpreter_flags() to handle -X utf8 * Skip some tests relying on the current locale if the UTF-8 mode is enabled. * Add test_utf8mode.py. * _Py_DecodeUTF8_surrogateescape() gets a new optional parameter to return also the length (number of wide characters). * pymain_get_global_config() and pymain_set_global_config() now always copy flag values, rather than only copying if the new value is greater than the old value.	2017-12-13 12:29:09 +01:00
Steve Dower	18591e4189	Revert #27959 : ImportError within an encoding module should also skip the encoding	2016-09-09 08:56:37 -07:00
Steve Dower	ef37dfcd84	Issue #28005 : Allow ImportErrors in encoding implementation to propagate.	2016-09-07 17:27:33 -07:00
Steve Dower	fe8f4c9e87	Issue #27959 : Prevent ImportError from escaping codec search function	2016-09-07 09:31:52 -07:00
Steve Dower	f5aba58480	Issue #27959 : Adds oem encoding, alias ansi to mbcs, move aliasmbcs to codec lookup	2016-09-06 19:42:27 -07:00
Victor Stinner	1a05d6c04d	PEP 7 style for if/else in C Add also a newline for readability in normalize_encoding().	2016-09-02 12:12:23 +02:00
Martin Panter	46f50726a0	Issue #27076 : Doc, comment and tests spelling fixes Most fixes to Doc/ and Lib/ directories by Ville Skyttä.	2016-05-26 05:35:26 +00:00
Brett Cannon	07b954d148	Add some "used with permission" mentions where external resources are referenced. Permission was validated prior to adding these markings.	2016-01-15 09:53:51 -08:00
Martin Panter	9ab96946ee	Issue #16473 : Merge codecs doc and test from 3.4 into 3.5	2015-09-12 01:22:17 +00:00
Martin Panter	06171bd52a	Issue #16473 : Fix byte transform codec documentation; test quotetabs=True This changes the equivalent functions listed for the Base-64, hex and Quoted- Printable codecs to reflect the functions actually used. Also mention and test the "quotetabs" setting for Quoted-Printable encoding.	2015-09-12 00:34:28 +00:00
Serhiy Storchaka	cd4a5cc339	Added forgotten new files for issues #22681 and #22682 .	2015-05-13 00:34:53 +03:00
Serhiy Storchaka	ad8a1c3fb2	Issue #22682 : Added support for the kz1048 encoding.	2015-05-12 23:16:55 +03:00
Serhiy Storchaka	85e7066278	Issue #22406 : Fixed the uu_codec codec incorrectly ported to 3.x. Based on patch by Martin Panter.	2014-11-07 14:06:19 +02:00
Serhiy Storchaka	519114df42	Issue #22406 : Fixed the uu_codec codec incorrectly ported to 3.x. Based on patch by Martin Panter.	2014-11-07 14:04:37 +02:00
Serhiy Storchaka	9c5553e122	Issue #21171 : Fixed undocumented filter API of the rot13 codec. Patch by Berker Peksag.	2014-04-13 17:08:51 +03:00
Serhiy Storchaka	a39938ff44	Issue #21171 : Fixed undocumented filter API of the rot13 codec. Patch by Berker Peksag.	2014-04-13 17:07:04 +03:00
Victor Stinner	7d00cc1a64	Issue #20574 : Implement incremental decoder for cp65001 code (Windows code page 65001, Microsoft UTF-8).	2014-03-17 23:08:06 +01:00
R David Murray	fb2c2db0fb	Merge #7475 : Remove references to '.transform' from transform codec docstrings.	2014-03-13 20:55:09 -04:00
R David Murray	e5cb836d4c	#7475 : Remove references to '.transform' from transform codec docstrings.	2014-03-13 20:54:30 -04:00
R David Murray	47d083cf1a	whatsnew: cp273 codec (#10907797 ) Also updated the docs and added the aliases mentioned by the references.	2014-03-07 21:00:34 -05:00
Serhiy Storchaka	94ee389308	Issue #19619 : Blacklist non-text codecs in method API str.encode, bytes.decode and bytearray.decode now use an internal API to throw LookupError for known non-text encodings, rather than attempting the encoding or decoding operation and then throwing a TypeError for an unexpected output type. The latter mechanism remains in place for third party non-text encodings. Backported changeset d68df99d7a57.	2014-02-24 14:43:03 +02:00
Serhiy Storchaka	e7f87e1262	Fixed incorrectly applying a patch for issue19668.	2013-11-23 19:50:47 +02:00
Serhiy Storchaka	be0c3250b1	Issue #19668 : Added support for the cp1125 encoding.	2013-11-23 18:52:23 +02:00
Nick Coghlan	9c1aed8f94	Close #7475 : Restore binary & text transform codecs The codecs themselves were restored in Python 3.2, this completes the restoration by adding back the convenience aliases. These aliases were originally left out due to confusing errors when attempting to use them with the text encoding specific convenience methods. Python 3.4 includes several improvements to those errors, thus permitting the aliases to be restored as well.	2013-11-23 11:13:36 +10:00
Nick Coghlan	c72e4e6dcc	Issue #19619 : Blacklist non-text codecs in method API str.encode, bytes.decode and bytearray.decode now use an internal API to throw LookupError for known non-text encodings, rather than attempting the encoding or decoding operation and then throwing a TypeError for an unexpected output type. The latter mechanism remains in place for third party non-text encodings.	2013-11-22 22:39:36 +10:00
Andrew Kuchling	ad8156e9b2	#1097797 : Add CP273 codec, and exercise it in the test suite	2013-11-10 13:44:30 -05:00
Brett Cannon	cd171c8e92	Issue #18200 : Back out usage of ModuleNotFoundError (8d28d44f3a9a)	2013-07-04 17:43:24 -04:00
Brett Cannon	0a140668fa	Issue #18200 : Update the stdlib (except tests) to use ModuleNotFoundError.	2013-06-13 20:57:26 -04:00
Victor Stinner	03c3e35d42	Add fast-path in PyUnicode_DecodeCharmap() for pure 8 bit encodings: cp037, cp500 and iso8859_1 codecs	2013-04-09 21:53:09 +02:00
Antoine Pitrou	7e19337ebc	Normalize whitespace	2012-06-16 22:50:54 +02:00
Antoine Pitrou	aaefac76dd	Issue #14874 : Restore charmap decoding speed to pre-PEP 393 levels. Patch by Serhiy Storchaka.	2012-06-16 22:48:21 +02:00

1 2 3 4

192 Commits