Commit Graph

71 Commits

Author SHA1 Message Date
Bénédikt Tran c00964ecd5
gh-124665: Add `_PyCodec_UnregisterError` and `_codecs._unregister_error` (#124677) 2024-09-29 02:25:23 +02:00
Serhiy Storchaka 18b07d773e
bpo-36819: Fix crashes in built-in encoders with weird error handlers (GH-28593)
If the error handler returns position less or equal than the starting
position of non-encodable characters, most of built-in encoders didn't
properly re-size the output buffer. This led to out-of-bounds writes,
and segfaults.
2022-05-02 12:37:48 +03:00
Victor Stinner 8f4ef3b019
Remove unused imports in tests (GH-14518) 2019-07-01 18:28:25 +02:00
Inada Naoki 6a16b18224
bpo-36297: remove "unicode_internal" codec (GH-12342) 2019-03-18 15:44:11 +09:00
Xiang Zhang 2c7fd46e11
bpo-32583: Fix possible crashing in builtin Unicode decoders (#5325)
When using customized decode error handlers, it is possible for builtin decoders
to write out-of-bounds and then crash.
2018-01-31 20:48:05 +08:00
Xiang Zhang 370d04d1dc
bpo-32618: Fix test_mutatingdecodehandler not testing test.mutating (#5269)
* bpo-32618: Fix test_mutatingdecodehandler not testing test.mutating

It should test both test.replacing and test.mutating instead of test test.replacing twice.
2018-01-23 22:50:50 +08:00
R David Murray 44b548dda8 #27364: fix "incorrect" uses of escape character in the stdlib.
And most of the tools.

Patch by Emanual Barry, reviewed by me, Serhiy Storchaka, and
Martin Panter.
2016-09-08 13:59:53 -04:00
Serhiy Storchaka e437a10d15 Issue #23277: Remove unused imports in tests. 2016-04-24 21:41:02 +03:00
Serhiy Storchaka c0937f79ec Issue #24102: Fixed exception type checking in standard error handlers. 2015-05-18 16:10:40 +03:00
Serhiy Storchaka ca7fecb038 Issue #24102: Fixed exception type checking in standard error handlers. 2015-05-18 16:08:52 +03:00
Serhiy Storchaka b8a78d3d85 Use non-zero and non-last positions in error handler tests. 2015-03-16 08:31:38 +02:00
Serhiy Storchaka 05d54730da Use non-zero and non-last positions in error handler tests. 2015-03-16 08:29:47 +02:00
Serhiy Storchaka 93f4d4c1d6 Increased coverage of standard codec error handlers. 2015-03-15 23:43:34 +02:00
Serhiy Storchaka 98d156b2b2 Increased coverage of standard codec error handlers. 2015-03-15 23:41:37 +02:00
Serhiy Storchaka 07985ef387 Issue #22286: The "backslashreplace" error handlers now works with
decoding and translating.
2015-01-25 22:56:57 +02:00
Serhiy Storchaka 166ebc4e5d Issue #19676: Added the "namereplace" error handler. 2014-11-25 13:57:17 +02:00
Victor Stinner e49a95fe05 Issue #21118: str.translate() now raises a ValueError, not a TypeError, if the
replacement character is bigger than U+10ffff code point.
2014-04-05 15:35:01 +02:00
Brett Cannon 3e9a9ae09d Update various test modules to use unittest.main() for test discovery
instead of manually listing tests for test.support.run_unittest().
2013-06-12 21:25:59 -04:00
Serhiy Storchaka 24193debd4 Issue #16979: Fix error handling bugs in the unicode-escape-decode decoder. 2013-01-29 10:28:07 +02:00
Serhiy Storchaka d679377be7 Issue #16979: Fix error handling bugs in the unicode-escape-decode decoder. 2013-01-29 10:20:44 +02:00
Antoine Pitrou 6f80f5d444 Issue #15379: Fix passing of non-BMP characters as integers for the charmap decoder (already working as unicode strings).
Patch by Serhiy Storchaka.
2012-09-23 19:55:21 +02:00
Ezio Melotti adc417ce36 #13406: fix more deprecation warnings and move the deprecation of unicode-internal earlier in the code. 2011-11-17 12:23:34 +02:00
Victor Stinner 040e16e3e8 "unicode_internal" codec has been deprecated: fix related tests 2011-11-15 22:44:05 +01:00
Martin v. Löwis 3d325191bf Port code page codec to Unicode API. 2011-11-04 18:23:06 +01:00
Antoine Pitrou 00b2c86d09 Fix text failures when ctypes is not available
(followup to Victor's 85d11cf67aa8 and 7a50e549bd11)
2011-10-05 13:01:41 +02:00
Ezio Melotti a9860aeb08 #13054: fix usage of sys.maxunicode after PEP-393. 2011-10-04 19:06:00 +03:00
Victor Stinner ef17f12a39 Fix test_codeccallbacks for Windows: check size of wchar_t, not sys.maxunicode 2011-09-29 20:01:55 +02:00
Martin v. Löwis d63a3b8beb Implement PEP 393. 2011-09-28 07:41:54 +02:00
Ezio Melotti b3aedd4862 #9424: Replace deprecated assert* methods in the Python test suite. 2010-11-20 19:04:17 +00:00
Antoine Pitrou e4a189274f Issue #9804: ascii() now always represents unicode surrogate pairs as
a single `\UXXXXXXXX`, regardless of whether the character is printable
or not.  Also, the "backslashreplace" error handler now joins surrogate
pairs into a single character on UCS-2 builds.
2010-09-09 20:30:23 +00:00
Ezio Melotti 57221d02ba Update PyUnicode_DecodeUTF8 from RFC 2279 to RFC 3629.
1) #8271: when a byte sequence is invalid, only the start byte and all the
   valid continuation bytes are now replaced by U+FFFD, instead of replacing
   the number of bytes specified by the start byte.
   See http://www.unicode.org/versions/Unicode5.2.0/ch03.pdf (pages 94-95);
2) 5- and 6-bytes-long UTF-8 sequences are now considered invalid (no changes
   in behavior);
3) Change the error messages "unexpected code byte" to "invalid start byte"
   and "invalid data" to "invalid continuation byte";
4) Add an extensive set of tests in test_unicode;
5) Fix test_codeccallbacks because it was failing after this change.
2010-07-01 07:32:02 +00:00
Mark Dickinson 934896dc09 Merged revisions 69846 via svnmerge from
svn+ssh://pythondev@svn.python.org/python/trunk

........
  r69846 | mark.dickinson | 2009-02-21 20:27:01 +0000 (Sat, 21 Feb 2009) | 2 lines

  Issue #5341: Fix a variety of spelling errors.
........
2009-02-21 20:59:32 +00:00
Benjamin Peterson b58dda7bdb Merged revisions 68633,68648,68667,68706,68718,68720-68721,68724-68727,68739 via svnmerge from
svn+ssh://pythondev@svn.python.org/python/trunk

........
  r68633 | thomas.heller | 2009-01-16 12:53:44 -0600 (Fri, 16 Jan 2009) | 3 lines

  Change an example in the docs to avoid a mistake when the code is copy
  pasted and changed afterwards.
........
  r68648 | benjamin.peterson | 2009-01-16 22:28:57 -0600 (Fri, 16 Jan 2009) | 1 line

  use enumerate
........
  r68667 | amaury.forgeotdarc | 2009-01-17 14:18:59 -0600 (Sat, 17 Jan 2009) | 3 lines

  #4077: No need to append \n when calling Py_FatalError
  + fix a declaration to make it match the one in pythonrun.h
........
  r68706 | benjamin.peterson | 2009-01-17 19:28:46 -0600 (Sat, 17 Jan 2009) | 1 line

  fix grammar
........
  r68718 | georg.brandl | 2009-01-18 04:42:35 -0600 (Sun, 18 Jan 2009) | 1 line

  #4976: union() and intersection() take multiple args, but talk about "the other".
........
  r68720 | georg.brandl | 2009-01-18 04:45:22 -0600 (Sun, 18 Jan 2009) | 1 line

  #4974: fix redundant mention of lists and tuples.
........
  r68721 | georg.brandl | 2009-01-18 04:48:16 -0600 (Sun, 18 Jan 2009) | 1 line

  #4914: trunc is in math.
........
  r68724 | georg.brandl | 2009-01-18 07:24:10 -0600 (Sun, 18 Jan 2009) | 1 line

  #4979: correct result range for some random functions.
........
  r68725 | georg.brandl | 2009-01-18 07:47:26 -0600 (Sun, 18 Jan 2009) | 1 line

  #4857: fix augmented assignment target spec.
........
  r68726 | georg.brandl | 2009-01-18 08:41:52 -0600 (Sun, 18 Jan 2009) | 1 line

  #4923: clarify what was added.
........
  r68727 | georg.brandl | 2009-01-18 12:25:30 -0600 (Sun, 18 Jan 2009) | 1 line

  #4986: augassigns are not expressions.
........
  r68739 | benjamin.peterson | 2009-01-18 15:11:38 -0600 (Sun, 18 Jan 2009) | 1 line

  fix test that wasn't working as expected #4990
........
2009-01-18 22:27:04 +00:00
Benjamin Peterson ee8712cda4 #2621 rename test.test_support to test.support 2008-05-20 21:35:26 +00:00
Fred Drake 3c50ea4303 rename HTMLParser to html.parser and htmlentitydefs to html.entities;
includes merge of trunk revision 63432
2008-05-17 22:02:32 +00:00
Guido van Rossum 254348e201 Rename buffer -> bytearray. 2007-11-21 19:29:53 +00:00
Guido van Rossum 98297ee781 Merging the py3k-pep3137 branch back into the py3k branch.
No detailed change log; just check out the change log for the py3k-pep3137
branch.  The most obvious changes:

  - str8 renamed to bytes (PyString at the C level);
  - bytes renamed to buffer (PyBytes at the C level);
  - PyString and PyUnicode are no longer compatible.

I.e. we now have an immutable bytes type and a mutable bytes type.

The behavior of PyString was modified quite a bit, to make it more
bytes-like.  Some changes are still on the to-do list.
2007-11-06 21:34:58 +00:00
Georg Brandl edbcc1332f Remove a test case which is no longer valid. 2007-10-24 21:25:34 +00:00
Georg Brandl bd1c68c94f Patch #1303: Adapt str8 constructor to bytes (now buffer) one. 2007-10-24 18:55:37 +00:00
Guido van Rossum 3172c5d263 Patch# 1258 by Christian Heimes: kill basestring.
I like this because it makes the code shorter! :-)
2007-10-16 18:12:55 +00:00
Guido van Rossum 09549f4407 Changes in anticipation of stricter str vs. bytes enforcement. 2007-08-27 20:40:10 +00:00
Walter Dörwald 41980caf64 Apply SF patch #1775604: This adds three new codecs (utf-32, utf-32-le and
ut-32-be). On narrow builds the codecs combine surrogate pairs in the unicode
object into one codepoint on encoding and create surrogate pairs for
codepoints outside the BMP on decoding. Lone surrogates are passed through
unchanged in all cases.

Backport to the trunk will follow.
2007-08-16 21:55:45 +00:00
Walter Dörwald e78178e2c0 Bytes (which are the input for decoding) are mutable now. If a decoding
error callback changes the bytes object in the exception the decoder might
use memory that's no longer in use. Change unicode_decode_call_errorhandler()
so that it fetches the adresses of the bytes array (start and end) from the
exception object and passes them back to the caller.
2007-07-30 13:31:40 +00:00
Walter Dörwald 32a4c71419 Patch by Ron Adam: Don't use u prefix in unicode error messages
and remove u prefix from some comments in test_codecs.py.
2007-06-20 09:25:34 +00:00
Walter Dörwald fee1af9d1c Fix test_codeccallbacks.py: bytes has no % operator. 2007-06-06 15:17:22 +00:00
Walter Dörwald d2034310d6 Add 'U'/'U#' format characters to Py_BuildValue (and thus
to PyObject_CallFunction()) that take a char * (and a size
in the case of 'U#') and create a unicode object out of it.

Add functions PyUnicode_FromFormat() and PyUnicode_FromFormatV()
that work similar to PyString_FromFormat(), but create a unicode
object (also a %U format character has been added, that takes
a PyObject *, which must point to a unicode object).

Change the encoding and reason attributes of UnicodeEncodeError,
UnicodeDecodeError and UnicodeTranslateError to be unicode
objects.
2007-05-18 16:29:38 +00:00
Walter Dörwald 00048f0c22 test_codeccallbacks.py passes again. 2007-05-09 10:44:06 +00:00
Guido van Rossum 805365ee39 Merged revisions 55007-55179 via svnmerge from
svn+ssh://pythondev@svn.python.org/python/branches/p3yk

........
  r55077 | guido.van.rossum | 2007-05-02 11:54:37 -0700 (Wed, 02 May 2007) | 2 lines

  Use the new print syntax, at least.
........
  r55142 | fred.drake | 2007-05-04 21:27:30 -0700 (Fri, 04 May 2007) | 1 line

  remove old cruftiness
........
  r55143 | fred.drake | 2007-05-04 21:52:16 -0700 (Fri, 04 May 2007) | 1 line

  make this work with the new Python
........
  r55162 | neal.norwitz | 2007-05-06 22:29:18 -0700 (Sun, 06 May 2007) | 1 line

  Get asdl code gen working with Python 2.3.  Should continue to work with 3.0
........
  r55164 | neal.norwitz | 2007-05-07 00:00:38 -0700 (Mon, 07 May 2007) | 1 line

  Verify checkins to p3yk (sic) branch go to 3000 list.
........
  r55166 | neal.norwitz | 2007-05-07 00:12:35 -0700 (Mon, 07 May 2007) | 1 line

  Fix this test so it runs again by importing warnings_test properly.
........
  r55167 | neal.norwitz | 2007-05-07 01:03:22 -0700 (Mon, 07 May 2007) | 8 lines

  So long xrange.  range() now supports values that are outside
  -sys.maxint to sys.maxint.  floats raise a TypeError.

  This has been sitting for a long time.  It probably has some problems and
  needs cleanup.  Objects/rangeobject.c now uses 4-space indents since
  it is almost completely new.
........
  r55171 | guido.van.rossum | 2007-05-07 10:21:26 -0700 (Mon, 07 May 2007) | 4 lines

  Fix two tests that were previously depending on significant spaces
  at the end of a line (and before that on Python 2.x print behavior
  that has no exact equivalent in 3.0).
........
2007-05-07 22:24:25 +00:00
Guido van Rossum 84fc66dd02 Rename 'unicode' to 'str' in its tp_name field. Rename 'str' to 'str8'.
Change all occurrences of unichr to chr.
2007-05-03 17:18:26 +00:00
Guido van Rossum ef87d6ed94 Rip out all the u"..." literals and calls to unicode(). 2007-05-02 19:09:54 +00:00