Commit Graph

193 Commits

Author SHA1 Message Date
Serhiy Storchaka f7ef0203d4
gh-123803: Support arbitrary code page encodings on Windows (GH-123804)
If the cpXXX encoding is not directly implemented in Python, fall back
to use the Windows-specific API codecs.code_page_encode() and
codecs.code_page_decode().
2024-11-18 17:45:25 +00:00
John Sloboda 649857a157
gh-85287: Change codecs to raise precise UnicodeEncodeError and UnicodeDecodeError (#113674)
Co-authored-by: Inada Naoki <songofacandy@gmail.com>
2024-03-17 04:58:42 +00:00
Zackery Spytz d180b507c4
gh-63283: IDNA prefix should be case insensitive (GH-17726)
Any capitalization of "xn--" should be acceptable for the ACE prefix
(see https://tools.ietf.org/html/rfc3490#section-5).

Co-authored-by: Pepijn de Vos <pepijndevos@gmail.com>
Co-authored-by: Erlend E. Aasland <erlend@python.org>
Co-authored-by: Petr Viktorin <encukou@gmail.com>
2024-03-15 15:38:13 +01:00
Masayuki Moriyama 1476ac2c58
gh-102388: Add windows_31j to aliases for cp932 codec (#102389)
The charset name "Windows-31J" is registered in the IANA Charset Registry[1]
and is implemented in Python as the cp932 codec.

[1] https://www.iana.org/assignments/charset-reg/windows-31J

Signed-off-by: Masayuki Moriyama <masayuki.moriyama@miraclelinux.com>
2024-02-19 17:01:35 +09:00
Gregory P. Smith d315722564
gh-98433: Fix quadratic time idna decoding. (#99092)
There was an unnecessary quadratic loop in idna decoding. This restores
the behavior to linear.

This also adds an early length check in IDNA decoding to outright reject
huge inputs early on given the ultimate result is defined to be 63 or fewer
characters.
2022-11-07 16:54:41 -08:00
Victor Stinner ccbe8045fa
bpo-46659: Fix the MBCS codec alias on Windows (GH-31218) 2022-02-22 22:04:07 +01:00
Victor Stinner 04dd60e50c
bpo-46659: Update the test on the mbcs codec alias (GH-31168)
encodings registers the _alias_mbcs() codec search function before
the search_function() codec search function. Previously, the
_alias_mbcs() was never used.

Fix the test_codecs.test_mbcs_alias() test: use the current ANSI code
page, not a fake ANSI code page number.

Remove the test_site.test_aliasing_mbcs() test: the alias is now
implemented in the encodings module, no longer in the site module.
2022-02-06 21:50:09 +01:00
Serhiy Storchaka 39aa98346d
bpo-45467: Fix IncrementalDecoder and StreamReader in the "raw-unicode-escape" codec (GH-28944)
They support now splitting escape sequences between input chunks.

Add the third parameter "final" in codecs.raw_unicode_escape_decode().
It is True by default to match the former behavior.
2021-10-14 20:04:19 +03:00
Serhiy Storchaka c96d1546b1
bpo-45461: Fix IncrementalDecoder and StreamReader in the "unicode-escape" codec (GH-28939)
They support now splitting escape sequences between input chunks.

Add the third parameter "final" in codecs.unicode_escape_decode().
It is True by default to match the former behavior.
2021-10-14 13:17:00 +03:00
Hai Shi c5b049b91c
bpo-39337: encodings.normalize_encoding() now ignores non-ASCII characters (GH-22219) 2020-10-14 17:43:31 +02:00
Berker Peksag ba22e8f174
bpo-30566: Fix IndexError when using punycode codec (GH-18632)
Trying to decode an invalid string with the punycode codec
shoud raise UnicodeError.
2020-02-25 06:19:03 +03:00
Matthew Rollings a62ad4730c bpo-38945: UU Encoding: Don't let newline in filename corrupt the output format (#17418) 2019-12-02 14:25:21 -08:00
Michael Osipov a828514cc3 bpo-34519: Add additional aliases for HP Roman 8 (GH-8956)
* bpo-34519: Add additional aliases for HP Roman 8

HP Roman 8 is known under mode aliases than listed in aliases.py.

Patch by Michael Osipov.
2019-09-11 14:08:41 +01:00
Inada Naoki cb65202520
bpo-35551: remove mac_centeuro encoding (GH-13856)
It is alias to mac_latin2 now.
2019-06-06 14:38:52 +09:00
Ashwin Ramaswami c4c15ed7a2 bpo-35551: encodings update (GH-11446) 2019-06-05 18:18:06 -04:00
Xtreak 0d70227e41 Fix typos in docs and docstrings (GH-13745) 2019-06-03 01:12:33 +02:00
Victor Stinner d267ac20c3
bpo-36778: cp65001 encoding becomes an alias to utf_8 (GH-13230) 2019-05-10 03:19:54 +02:00
Inada Naoki 6a16b18224
bpo-36297: remove "unicode_internal" codec (GH-12342) 2019-03-18 15:44:11 +09:00
Anthony Sottile ed2e9ab804 Remove obsolete comment about latin-1 in `normalize_encoding` (GH-8739)
This docstring has drifted since python2: ca079a3ea3/Lib/encodings/__init__.py (L68)
2018-09-10 17:54:37 -07:00
Xiang Zhang e4ce9fa89c
bpo-32943: Fix confusing error message for rot13 codec (GH-5869) 2018-03-25 12:09:21 +08:00
Victor Stinner 91106cd9ff
bpo-29240: PEP 540: Add a new UTF-8 Mode (#855)
* Add -X utf8 command line option, PYTHONUTF8 environment variable
  and a new sys.flags.utf8_mode flag.
* If the LC_CTYPE locale is "C" at startup: enable automatically the
  UTF-8 mode.
* Add _winapi.GetACP(). encodings._alias_mbcs() now calls
  _winapi.GetACP() to get the ANSI code page
* locale.getpreferredencoding() now returns 'UTF-8' in the UTF-8
  mode. As a side effect, open() now uses the UTF-8 encoding by
  default in this mode.
* Py_DecodeLocale() and Py_EncodeLocale() now use the UTF-8 encoding
  in the UTF-8 Mode.
* Update subprocess._args_from_interpreter_flags() to handle -X utf8
* Skip some tests relying on the current locale if the UTF-8 mode is
  enabled.
* Add test_utf8mode.py.
* _Py_DecodeUTF8_surrogateescape() gets a new optional parameter to
  return also the length (number of wide characters).
* pymain_get_global_config() and pymain_set_global_config() now
  always copy flag values, rather than only copying if the new value
  is greater than the old value.
2017-12-13 12:29:09 +01:00
Steve Dower 18591e4189 Revert #27959: ImportError within an encoding module should also skip the encoding 2016-09-09 08:56:37 -07:00
Steve Dower ef37dfcd84 Issue #28005: Allow ImportErrors in encoding implementation to propagate. 2016-09-07 17:27:33 -07:00
Steve Dower fe8f4c9e87 Issue #27959: Prevent ImportError from escaping codec search function 2016-09-07 09:31:52 -07:00
Steve Dower f5aba58480 Issue #27959: Adds oem encoding, alias ansi to mbcs, move aliasmbcs to codec lookup 2016-09-06 19:42:27 -07:00
Victor Stinner 1a05d6c04d PEP 7 style for if/else in C
Add also a newline for readability in normalize_encoding().
2016-09-02 12:12:23 +02:00
Martin Panter 46f50726a0 Issue #27076: Doc, comment and tests spelling fixes
Most fixes to Doc/ and Lib/ directories by Ville Skyttä.
2016-05-26 05:35:26 +00:00
Brett Cannon 07b954d148 Add some "used with permission" mentions where external resources are referenced.
Permission was validated prior to adding these markings.
2016-01-15 09:53:51 -08:00
Martin Panter 9ab96946ee Issue #16473: Merge codecs doc and test from 3.4 into 3.5 2015-09-12 01:22:17 +00:00
Martin Panter 06171bd52a Issue #16473: Fix byte transform codec documentation; test quotetabs=True
This changes the equivalent functions listed for the Base-64, hex and Quoted-
Printable codecs to reflect the functions actually used. Also mention and
test the "quotetabs" setting for Quoted-Printable encoding.
2015-09-12 00:34:28 +00:00
Serhiy Storchaka cd4a5cc339 Added forgotten new files for issues #22681 and #22682. 2015-05-13 00:34:53 +03:00
Serhiy Storchaka ad8a1c3fb2 Issue #22682: Added support for the kz1048 encoding. 2015-05-12 23:16:55 +03:00
Serhiy Storchaka 85e7066278 Issue #22406: Fixed the uu_codec codec incorrectly ported to 3.x.
Based on patch by Martin Panter.
2014-11-07 14:06:19 +02:00
Serhiy Storchaka 519114df42 Issue #22406: Fixed the uu_codec codec incorrectly ported to 3.x.
Based on patch by Martin Panter.
2014-11-07 14:04:37 +02:00
Serhiy Storchaka 9c5553e122 Issue #21171: Fixed undocumented filter API of the rot13 codec.
Patch by Berker Peksag.
2014-04-13 17:08:51 +03:00
Serhiy Storchaka a39938ff44 Issue #21171: Fixed undocumented filter API of the rot13 codec.
Patch by Berker Peksag.
2014-04-13 17:07:04 +03:00
Victor Stinner 7d00cc1a64 Issue #20574: Implement incremental decoder for cp65001 code
(Windows code page 65001, Microsoft UTF-8).
2014-03-17 23:08:06 +01:00
R David Murray fb2c2db0fb Merge #7475: Remove references to '.transform' from transform codec docstrings. 2014-03-13 20:55:09 -04:00
R David Murray e5cb836d4c #7475: Remove references to '.transform' from transform codec docstrings. 2014-03-13 20:54:30 -04:00
R David Murray 47d083cf1a whatsnew: cp273 codec (#10907797)
Also updated the docs and added the aliases mentioned by the
references.
2014-03-07 21:00:34 -05:00
Serhiy Storchaka 94ee389308 Issue #19619: Blacklist non-text codecs in method API
str.encode, bytes.decode and bytearray.decode now use an
internal API to throw LookupError for known non-text encodings,
rather than attempting the encoding or decoding operation and
then throwing a TypeError for an unexpected output type.

The latter mechanism remains in place for third party non-text
encodings.

Backported changeset d68df99d7a57.
2014-02-24 14:43:03 +02:00
Serhiy Storchaka e7f87e1262 Fixed incorrectly applying a patch for issue19668. 2013-11-23 19:50:47 +02:00
Serhiy Storchaka be0c3250b1 Issue #19668: Added support for the cp1125 encoding. 2013-11-23 18:52:23 +02:00
Nick Coghlan 9c1aed8f94 Close #7475: Restore binary & text transform codecs
The codecs themselves were restored in Python 3.2, this
completes the restoration by adding back the convenience
aliases.

These aliases were originally left out due to confusing
errors when attempting to use them with the text encoding
specific convenience methods. Python 3.4 includes several
improvements to those errors, thus permitting the aliases
to be restored as well.
2013-11-23 11:13:36 +10:00
Nick Coghlan c72e4e6dcc Issue #19619: Blacklist non-text codecs in method API
str.encode, bytes.decode and bytearray.decode now use an
internal API to throw LookupError for known non-text encodings,
rather than attempting the encoding or decoding operation and
then throwing a TypeError for an unexpected output type.

The latter mechanism remains in place for third party non-text
encodings.
2013-11-22 22:39:36 +10:00
Andrew Kuchling ad8156e9b2 #1097797: Add CP273 codec, and exercise it in the test suite 2013-11-10 13:44:30 -05:00
Brett Cannon cd171c8e92 Issue #18200: Back out usage of ModuleNotFoundError (8d28d44f3a9a) 2013-07-04 17:43:24 -04:00
Brett Cannon 0a140668fa Issue #18200: Update the stdlib (except tests) to use
ModuleNotFoundError.
2013-06-13 20:57:26 -04:00
Victor Stinner 03c3e35d42 Add fast-path in PyUnicode_DecodeCharmap() for pure 8 bit encodings:
cp037, cp500 and iso8859_1 codecs
2013-04-09 21:53:09 +02:00
Antoine Pitrou 7e19337ebc Normalize whitespace 2012-06-16 22:50:54 +02:00