cpython

Commit Graph

Author	SHA1	Message	Date
Miss Islington (bot)	735a960ac9	bpo-36311: Fixes decoding multibyte characters around chunk boundaries and improves decoding performance (GH-15083) (cherry picked from commit `7ebdda0dbe`) Co-authored-by: Steve Dower <steve.dower@python.org>	2019-08-21 16:55:57 -07:00
Miss Islington (bot)	c755ca89c7	[3.7] bpo-24214: Fixed the UTF-8 and UTF-16 incremental decoders. (GH-14304) (GH-14369) * bpo-24214: Fixed the UTF-8 and UTF-16 incremental decoders. (GH-14304) * The UTF-8 incremental decoders fails now fast if encounter a sequence that can't be handled by the error handler. * The UTF-16 incremental decoders with the surrogatepass error handler decodes now a lone low surrogate with final=False. (cherry picked from commit `894263ba80`) Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>	2019-06-25 12:29:18 +02:00
Miss Islington (bot)	a6dc5d4e1c	bpo-33361: Fix bug with seeking in StreamRecoders (GH-8278) (cherry picked from commit `a6ec1ce1ac`) Co-authored-by: Ammar Askar <ammar_askar@hotmail.com>	2019-05-31 23:03:22 +03:00
Jelle Zijlstra	81c5ec9e41	[3.7] bpo-33482: fix codecs.StreamRecoder.writelines (GH-6779) (GH-13502) A very simple fix. I found this while writing typeshed stubs for StreamRecoder. https://bugs.python.org/issue33482. (cherry picked from commit `b3be407288`) Co-authored-by: Jelle Zijlstra <jelle.zijlstra@gmail.com> https://bugs.python.org/issue33482	2019-05-22 09:28:38 -07:00
Miss Islington (bot)	bd48280cb6	bpo-24214: Fixed the UTF-8 incremental decoder. (GH-12603) (GH-12627) The bug occurred when the encoded surrogate character is passed to the incremental decoder in two chunks. (cherry picked from commit `7a465cb5ee`) Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>	2019-03-30 15:52:41 +02:00
Miss Islington (bot)	74829b7323	bpo-36312: Fix decoders for some code pages. (GH-12369) (cherry picked from commit `c1e2c288f4`) Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>	2019-03-20 21:31:57 -07:00
Miss Islington (bot)	bdeb56cd21	bpo-35372: Fix the code page decoder for input > 2 GiB. (GH-10848) (cherry picked from commit `4013c17911`) Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>	2018-12-03 01:09:11 -08:00
Victor Stinner	91106cd9ff	bpo-29240: PEP 540: Add a new UTF-8 Mode (#855 ) * Add -X utf8 command line option, PYTHONUTF8 environment variable and a new sys.flags.utf8_mode flag. * If the LC_CTYPE locale is "C" at startup: enable automatically the UTF-8 mode. * Add _winapi.GetACP(). encodings._alias_mbcs() now calls _winapi.GetACP() to get the ANSI code page * locale.getpreferredencoding() now returns 'UTF-8' in the UTF-8 mode. As a side effect, open() now uses the UTF-8 encoding by default in this mode. * Py_DecodeLocale() and Py_EncodeLocale() now use the UTF-8 encoding in the UTF-8 Mode. * Update subprocess._args_from_interpreter_flags() to handle -X utf8 * Skip some tests relying on the current locale if the UTF-8 mode is enabled. * Add test_utf8mode.py. * _Py_DecodeUTF8_surrogateescape() gets a new optional parameter to return also the length (number of wide characters). * pymain_get_global_config() and pymain_set_global_config() now always copy flag values, rather than only copying if the new value is greater than the old value.	2017-12-13 12:29:09 +01:00
Serhiy Storchaka	219c2de5ad	bpo-32110: codecs.StreamReader.read(n) now returns not more than n (#4499 ) characters/bytes for non-negative n. This makes it compatible with read() methods of other file-like objects.	2017-11-29 01:30:00 +02:00
Serhiy Storchaka	56cb465cc9	bpo-31825: Fixed OverflowError in the 'unicode-escape' codec (#4058 ) and in codecs.escape_decode() when decode an escaped non-ascii byte.	2017-10-20 17:08:15 +03:00
Berker Peksag	7b4bcd2004	Issue #25270 : Merge from 3.5	2016-09-16 17:32:06 +03:00
Berker Peksag	4a72a7b6c4	Issue #25270 : Prevent codecs.escape_encode() from raising SystemError when an empty bytestring is passed	2016-09-16 17:31:06 +03:00
R David Murray	110b6fecbb	#27364 : Deprecate invalid escape strings in str/byutes. Patch by Emanuel Barry, reviewed by Serhiy Storchaka and Martin Panter.	2016-09-08 15:34:08 -04:00
R David Murray	44b548dda8	#27364 : fix "incorrect" uses of escape character in the stdlib. And most of the tools. Patch by Emanual Barry, reviewed by me, Serhiy Storchaka, and Martin Panter.	2016-09-08 13:59:53 -04:00
Steve Dower	f5aba58480	Issue #27959 : Adds oem encoding, alias ansi to mbcs, move aliasmbcs to codec lookup	2016-09-06 19:42:27 -07:00
Serhiy Storchaka	e437a10d15	Issue #23277 : Remove unused imports in tests.	2016-04-24 21:41:02 +03:00
Martin Panter	8b04a945ef	Merge typo fixes from 3.5	2016-04-16 09:29:17 +00:00
Martin Panter	119e502277	Fix typos in code comments and documentation	2016-04-16 09:28:57 +00:00
Martin Panter	cda80940ed	Issue #15984 : Merge PyUnicode doc from 3.5	2016-04-15 02:27:11 +00:00
Martin Panter	6245cb3c01	Correct “an” → “a” with “Unicode”, “user”, “UTF”, etc This affects documentation, code comments, and a debugging messages.	2016-04-15 02:14:19 +00:00
Martin Panter	e56a919100	Issue #25523 : Merge a-to-an corrections from 3.5	2015-11-02 04:27:17 +00:00
Martin Panter	2eb819f7a8	Issue #25523 : Merge "a" to "an" fixes from 3.4 into 3.5	2015-11-02 04:04:57 +00:00
Martin Panter	7462b64911	Issue #25523 : Correct "a" article to "an" article This changes the main documentation, doc strings, source code comments, and a couple error messages in the test suite. In some cases the word was removed or edited some other way to fix the grammar.	2015-11-02 03:37:02 +00:00
Victor Stinner	797485e101	Issue #25318 : Avoid sprintf() in backslashreplace() Rewrite backslashreplace() to be closer to PyCodec_BackslashReplaceErrors(). Add also unit tests for non-BMP characters.	2015-10-09 03:17:30 +02:00
Victor Stinner	1d65d9192d	Issue #25301 : The UTF-8 decoder is now up to 15 times as fast for error handlers: ``ignore``, ``replace`` and ``surrogateescape``.	2015-10-05 13:43:50 +02:00
Serhiy Storchaka	29e68edbf4	Issue #24848 : Fixed bugs in UTF-7 decoding of misformed data: 1. Non-ASCII bytes were accepted after shift sequence. 2. A low surrogate could be emitted in case of error in high surrogate. 3. In some circumstances the '\xfd' character was produced instead of the replacement character '\ufffd' (due to a bug in _PyUnicodeWriter).	2015-10-02 13:14:03 +03:00
Serhiy Storchaka	58c8f2bb6d	Issue #24848 : Fixed bugs in UTF-7 decoding of misformed data: 1. Non-ASCII bytes were accepted after shift sequence. 2. A low surrogate could be emitted in case of error in high surrogate. 3. In some circumstances the '\xfd' character was produced instead of the replacement character '\ufffd' (due to a bug in _PyUnicodeWriter).	2015-10-02 13:13:14 +03:00
Serhiy Storchaka	28b21e50c8	Issue #24848 : Fixed bugs in UTF-7 decoding of misformed data: 1. Non-ASCII bytes were accepted after shift sequence. 2. A low surrogate could be emitted in case of error in high surrogate.	2015-10-02 13:07:28 +03:00
Victor Stinner	01ada3996b	Issue #25267 : The UTF-8 encoder is now up to 75 times as fast for error handlers: ``ignore``, ``replace``, ``surrogateescape``, ``surrogatepass``. Patch co-written with Serhiy Storchaka.	2015-10-01 21:54:51 +02:00
Victor Stinner	c3713e9706	Optimize ascii/latin1+surrogateescape encoders Issue #25227: Optimize ASCII and latin1 encoders with the ``surrogateescape`` error handler: the encoders are now up to 3 times as fast. Initial patch written by Serhiy Storchaka.	2015-09-29 12:32:13 +02:00
Victor Stinner	f96418de05	Issue #24870 : Optimize the ASCII decoder for error handlers: surrogateescape, ignore and replace. Initial patch written by Naoki Inada. The decoder is now up to 60 times as fast for these error handlers. Add also unit tests for the ASCII decoder.	2015-09-21 23:06:27 +02:00
Martin Panter	9ab96946ee	Issue #16473 : Merge codecs doc and test from 3.4 into 3.5	2015-09-12 01:22:17 +00:00
Martin Panter	06171bd52a	Issue #16473 : Fix byte transform codec documentation; test quotetabs=True This changes the equivalent functions listed for the Base-64, hex and Quoted- Printable codecs to reflect the functions actually used. Also mention and test the "quotetabs" setting for Quoted-Printable encoding.	2015-09-12 00:34:28 +00:00
Serhiy Storchaka	f0eeedf0d8	Issue #22681 : Added support for the koi8_t encoding.	2015-05-12 23:24:19 +03:00
Serhiy Storchaka	ad8a1c3fb2	Issue #22682 : Added support for the kz1048 encoding.	2015-05-12 23:16:55 +03:00
Serhiy Storchaka	8490f5acfe	Issue #23001 : Few functions in modules mmap, ossaudiodev, socket, ssl, and codecs, that accepted only read-only bytes-like object now accept writable bytes-like object too.	2015-03-20 09:00:36 +02:00
Victor Stinner	f2be23d329	Issue #22286 , #23321 : Fix failing test on Windows code page 932 There was a bug which was fixed. The unit test was also wrong.	2015-01-26 23:26:11 +01:00
Serhiy Storchaka	07985ef387	Issue #22286 : The "backslashreplace" error handlers now works with decoding and translating.	2015-01-25 22:56:57 +02:00
Nick Coghlan	582acb75e9	Merge issue 19548 changes from 3.4	2015-01-07 00:37:01 +10:00
Nick Coghlan	b9fdb7a452	Issue 19548: update codecs module documentation - clarified the distinction between text encodings and other codecs - clarified relationship with builtin open and the io module - consolidated documentation of error handlers into one section - clarified type constraints of some behaviours - added tests for some of the new statements in the docs	2015-01-07 00:22:00 +10:00
Serhiy Storchaka	f65d1d3b02	Issue #23071 : "namereplace_errors" was added only in 3.5.	2014-12-20 18:53:01 +02:00
Serhiy Storchaka	4d33ff6183	Issue #23071 : Added missing names to codecs.__all__. Patch by Martin Panter.	2014-12-20 17:46:05 +02:00
Serhiy Storchaka	de3ee5b94f	Issue #23071 : Added missing names to codecs.__all__. Patch by Martin Panter.	2014-12-20 17:42:38 +02:00
Serhiy Storchaka	166ebc4e5d	Issue #19676 : Added the "namereplace" error handler.	2014-11-25 13:57:17 +02:00
Serhiy Storchaka	85e7066278	Issue #22406 : Fixed the uu_codec codec incorrectly ported to 3.x. Based on patch by Martin Panter.	2014-11-07 14:06:19 +02:00
Serhiy Storchaka	519114df42	Issue #22406 : Fixed the uu_codec codec incorrectly ported to 3.x. Based on patch by Martin Panter.	2014-11-07 14:04:37 +02:00
Nick Coghlan	a0f33759fa	Merge fix for issue #22166 from 3.4	2014-09-15 23:55:16 +12:00
Nick Coghlan	8fad1676a2	Issue #22166 : clear codec caches in test_codecs	2014-09-15 23:50:44 +12:00
Victor Stinner	0d4e01ca07	Issue #13916 : Fix surrogatepass error handler on Windows	2014-05-16 14:46:20 +02:00
Serhiy Storchaka	88d8fb6af6	Issue #13916 : Disallowed the surrogatepass error handler for non UTF-* encodings.	2014-05-15 14:37:42 +03:00

1 2 3 4 5

227 Commits