Commit Graph

204 Commits

Author SHA1 Message Date
Victor Stinner 797485e101 Issue #25318: Avoid sprintf() in backslashreplace()
Rewrite backslashreplace() to be closer to PyCodec_BackslashReplaceErrors().

Add also unit tests for non-BMP characters.
2015-10-09 03:17:30 +02:00
Victor Stinner 1d65d9192d Issue #25301: The UTF-8 decoder is now up to 15 times as fast for error
handlers: ``ignore``, ``replace`` and ``surrogateescape``.
2015-10-05 13:43:50 +02:00
Serhiy Storchaka 29e68edbf4 Issue #24848: Fixed bugs in UTF-7 decoding of misformed data:
1. Non-ASCII bytes were accepted after shift sequence.
2. A low surrogate could be emitted in case of error in high surrogate.
3. In some circumstances the '\xfd' character was produced instead of the
replacement character '\ufffd' (due to a bug in _PyUnicodeWriter).
2015-10-02 13:14:03 +03:00
Serhiy Storchaka 58c8f2bb6d Issue #24848: Fixed bugs in UTF-7 decoding of misformed data:
1. Non-ASCII bytes were accepted after shift sequence.
2. A low surrogate could be emitted in case of error in high surrogate.
3. In some circumstances the '\xfd' character was produced instead of the
replacement character '\ufffd' (due to a bug in _PyUnicodeWriter).
2015-10-02 13:13:14 +03:00
Serhiy Storchaka 28b21e50c8 Issue #24848: Fixed bugs in UTF-7 decoding of misformed data:
1. Non-ASCII bytes were accepted after shift sequence.
2. A low surrogate could be emitted in case of error in high surrogate.
2015-10-02 13:07:28 +03:00
Victor Stinner 01ada3996b Issue #25267: The UTF-8 encoder is now up to 75 times as fast for error
handlers: ``ignore``, ``replace``, ``surrogateescape``, ``surrogatepass``.
Patch co-written with Serhiy Storchaka.
2015-10-01 21:54:51 +02:00
Victor Stinner c3713e9706 Optimize ascii/latin1+surrogateescape encoders
Issue #25227: Optimize ASCII and latin1 encoders with the ``surrogateescape``
error handler: the encoders are now up to 3 times as fast.

Initial patch written by Serhiy Storchaka.
2015-09-29 12:32:13 +02:00
Victor Stinner f96418de05 Issue #24870: Optimize the ASCII decoder for error handlers: surrogateescape,
ignore and replace. Initial patch written by Naoki Inada.

The decoder is now up to 60 times as fast for these error handlers.

Add also unit tests for the ASCII decoder.
2015-09-21 23:06:27 +02:00
Martin Panter 9ab96946ee Issue #16473: Merge codecs doc and test from 3.4 into 3.5 2015-09-12 01:22:17 +00:00
Martin Panter 06171bd52a Issue #16473: Fix byte transform codec documentation; test quotetabs=True
This changes the equivalent functions listed for the Base-64, hex and Quoted-
Printable codecs to reflect the functions actually used. Also mention and
test the "quotetabs" setting for Quoted-Printable encoding.
2015-09-12 00:34:28 +00:00
Serhiy Storchaka f0eeedf0d8 Issue #22681: Added support for the koi8_t encoding. 2015-05-12 23:24:19 +03:00
Serhiy Storchaka ad8a1c3fb2 Issue #22682: Added support for the kz1048 encoding. 2015-05-12 23:16:55 +03:00
Serhiy Storchaka 8490f5acfe Issue #23001: Few functions in modules mmap, ossaudiodev, socket, ssl, and
codecs, that accepted only read-only bytes-like object now accept writable
bytes-like object too.
2015-03-20 09:00:36 +02:00
Victor Stinner f2be23d329 Issue #22286, #23321: Fix failing test on Windows code page 932
There was a bug which was fixed. The unit test was also wrong.
2015-01-26 23:26:11 +01:00
Serhiy Storchaka 07985ef387 Issue #22286: The "backslashreplace" error handlers now works with
decoding and translating.
2015-01-25 22:56:57 +02:00
Nick Coghlan 582acb75e9 Merge issue 19548 changes from 3.4 2015-01-07 00:37:01 +10:00
Nick Coghlan b9fdb7a452 Issue 19548: update codecs module documentation
- clarified the distinction between text encodings and other codecs
- clarified relationship with builtin open and the io module
- consolidated documentation of error handlers into one section
- clarified type constraints of some behaviours
- added tests for some of the new statements in the docs
2015-01-07 00:22:00 +10:00
Serhiy Storchaka f65d1d3b02 Issue #23071: "namereplace_errors" was added only in 3.5. 2014-12-20 18:53:01 +02:00
Serhiy Storchaka 4d33ff6183 Issue #23071: Added missing names to codecs.__all__. Patch by Martin Panter. 2014-12-20 17:46:05 +02:00
Serhiy Storchaka de3ee5b94f Issue #23071: Added missing names to codecs.__all__. Patch by Martin Panter. 2014-12-20 17:42:38 +02:00
Serhiy Storchaka 166ebc4e5d Issue #19676: Added the "namereplace" error handler. 2014-11-25 13:57:17 +02:00
Serhiy Storchaka 85e7066278 Issue #22406: Fixed the uu_codec codec incorrectly ported to 3.x.
Based on patch by Martin Panter.
2014-11-07 14:06:19 +02:00
Serhiy Storchaka 519114df42 Issue #22406: Fixed the uu_codec codec incorrectly ported to 3.x.
Based on patch by Martin Panter.
2014-11-07 14:04:37 +02:00
Nick Coghlan a0f33759fa Merge fix for issue #22166 from 3.4 2014-09-15 23:55:16 +12:00
Nick Coghlan 8fad1676a2 Issue #22166: clear codec caches in test_codecs 2014-09-15 23:50:44 +12:00
Victor Stinner 0d4e01ca07 Issue #13916: Fix surrogatepass error handler on Windows 2014-05-16 14:46:20 +02:00
Serhiy Storchaka 88d8fb6af6 Issue #13916: Disallowed the surrogatepass error handler for non UTF-*
encodings.
2014-05-15 14:37:42 +03:00
Victor Stinner a57dfd033c Issue #21488: Add support of keyword arguments for codecs.encode and codecs.decode 2014-05-14 17:13:14 +02:00
Victor Stinner 07beb375b7 Issue #20574: Remove duplicated test failing on Windows XP 2014-03-18 01:40:22 +01:00
Victor Stinner f8cbf78bbd Issue #20574: Add more tests for cp65001 2014-03-17 23:16:02 +01:00
Victor Stinner 7d00cc1a64 Issue #20574: Implement incremental decoder for cp65001 code
(Windows code page 65001, Microsoft UTF-8).
2014-03-17 23:08:06 +01:00
Victor Stinner 3633ce3301 Issue #20571: skip test_readline() of test_codecs for Windows code page 65001.
The decoder does not support partial decoding yet for this code page.
2014-02-09 13:11:53 +01:00
Serhiy Storchaka 6cbf151032 Issue #20538: UTF-7 incremental decoder produced inconsistant string when
input was truncated in BASE64 section.
2014-02-08 14:06:33 +02:00
Serhiy Storchaka 016a3f33a5 Issue #20538: UTF-7 incremental decoder produced inconsistant string when
input was truncated in BASE64 section.
2014-02-08 14:01:29 +02:00
Nick Coghlan 96252cd724 Issue 20542: Temporarily skip failing test 2014-02-07 23:34:41 +10:00
Serhiy Storchaka f28ba369dd Issue #20532: Tests which use _testcapi now are marked as CPython only. 2014-02-07 10:10:55 +02:00
Serhiy Storchaka 5cfc79deae Issue #20532: Tests which use _testcapi now are marked as CPython only. 2014-02-07 10:06:39 +02:00
Serhiy Storchaka 3dcb0cf9b1 Issue #20520: Fixed readline test in test_codecs. 2014-02-06 09:27:28 +02:00
Serhiy Storchaka 5b4fab1ad7 Issue #20520: Fixed readline test in test_codecs. 2014-02-06 09:26:56 +02:00
Serhiy Storchaka dbe0982bc5 Issue #8260: The read(), readline() and readlines() methods of
codecs.StreamReader returned incomplete data when were called after
readline() or read(size).  Based on patch by Amaury Forgeot d'Arc.
2014-01-26 19:27:56 +02:00
Serhiy Storchaka 8003850e22 Issue #8260: The read(), readline() and readlines() methods of
codecs.StreamReader returned incomplete data when were called after
readline() or read(size).  Based on patch by Amaury Forgeot d'Arc.
2014-01-26 19:21:00 +02:00
Nick Coghlan 77b286b2cc Close #20105: set __traceback__ when chaining exceptions in C 2014-01-27 00:53:38 +10:00
Zachary Ware efa2e04033 Issue19619: skip zlib error test when zlib not available 2013-12-30 14:54:11 -06:00
Serhiy Storchaka 2480c2ed59 Issue #15204: Silence and check the 'U' mode deprecation warnings in tests.
Changed deprecation message in the fileinput module.
2013-11-24 23:13:26 +02:00
Serhiy Storchaka be0c3250b1 Issue #19668: Added support for the cp1125 encoding. 2013-11-23 18:52:23 +02:00
Nick Coghlan 9c1aed8f94 Close #7475: Restore binary & text transform codecs
The codecs themselves were restored in Python 3.2, this
completes the restoration by adding back the convenience
aliases.

These aliases were originally left out due to confusing
errors when attempting to use them with the text encoding
specific convenience methods. Python 3.4 includes several
improvements to those errors, thus permitting the aliases
to be restored as well.
2013-11-23 11:13:36 +10:00
Nick Coghlan c72e4e6dcc Issue #19619: Blacklist non-text codecs in method API
str.encode, bytes.decode and bytearray.decode now use an
internal API to throw LookupError for known non-text encodings,
rather than attempting the encoding or decoding operation and
then throwing a TypeError for an unexpected output type.

The latter mechanism remains in place for third party non-text
encodings.
2013-11-22 22:39:36 +10:00
Nick Coghlan f1de55fb33 Also chain codec exceptions that allow weakrefs
The zlib and hex codecs throw custom exception types with
weakref support if the input type is valid, but the data
fails validation. Make sure the exception chaining in the
codec infrastructure can wrap those as well.
2013-11-19 22:33:10 +10:00
Serhiy Storchaka 58cf607d13 Issue #12892: The utf-16* and utf-32* codecs now reject (lone) surrogates.
The utf-16* and utf-32* encoders no longer allow surrogate code points
(U+D800-U+DFFF) to be encoded.
The utf-32* decoders no longer decode byte sequences that correspond to
surrogate code points.
The surrogatepass error handler now works with the utf-16* and utf-32* codecs.

Based on patches by Victor Stinner and Kang-Hao (Kenny) Lu.
2013-11-19 11:32:41 +02:00
Nick Coghlan 4e553e2e52 Avoid triggering the refleak detector 2013-11-16 00:35:34 +10:00