If the error handler returns position less or equal than the starting
position of non-encodable characters, most of built-in encoders didn't
properly re-size the output buffer. This led to out-of-bounds writes,
and segfaults.
* bpo-32618: Fix test_mutatingdecodehandler not testing test.mutating
It should test both test.replacing and test.mutating instead of test test.replacing twice.
a single `\UXXXXXXXX`, regardless of whether the character is printable
or not. Also, the "backslashreplace" error handler now joins surrogate
pairs into a single character on UCS-2 builds.
1) #8271: when a byte sequence is invalid, only the start byte and all the
valid continuation bytes are now replaced by U+FFFD, instead of replacing
the number of bytes specified by the start byte.
See http://www.unicode.org/versions/Unicode5.2.0/ch03.pdf (pages 94-95);
2) 5- and 6-bytes-long UTF-8 sequences are now considered invalid (no changes
in behavior);
3) Change the error messages "unexpected code byte" to "invalid start byte"
and "invalid data" to "invalid continuation byte";
4) Add an extensive set of tests in test_unicode;
5) Fix test_codeccallbacks because it was failing after this change.
svn+ssh://pythondev@svn.python.org/python/trunk
........
r68633 | thomas.heller | 2009-01-16 12:53:44 -0600 (Fri, 16 Jan 2009) | 3 lines
Change an example in the docs to avoid a mistake when the code is copy
pasted and changed afterwards.
........
r68648 | benjamin.peterson | 2009-01-16 22:28:57 -0600 (Fri, 16 Jan 2009) | 1 line
use enumerate
........
r68667 | amaury.forgeotdarc | 2009-01-17 14:18:59 -0600 (Sat, 17 Jan 2009) | 3 lines
#4077: No need to append \n when calling Py_FatalError
+ fix a declaration to make it match the one in pythonrun.h
........
r68706 | benjamin.peterson | 2009-01-17 19:28:46 -0600 (Sat, 17 Jan 2009) | 1 line
fix grammar
........
r68718 | georg.brandl | 2009-01-18 04:42:35 -0600 (Sun, 18 Jan 2009) | 1 line
#4976: union() and intersection() take multiple args, but talk about "the other".
........
r68720 | georg.brandl | 2009-01-18 04:45:22 -0600 (Sun, 18 Jan 2009) | 1 line
#4974: fix redundant mention of lists and tuples.
........
r68721 | georg.brandl | 2009-01-18 04:48:16 -0600 (Sun, 18 Jan 2009) | 1 line
#4914: trunc is in math.
........
r68724 | georg.brandl | 2009-01-18 07:24:10 -0600 (Sun, 18 Jan 2009) | 1 line
#4979: correct result range for some random functions.
........
r68725 | georg.brandl | 2009-01-18 07:47:26 -0600 (Sun, 18 Jan 2009) | 1 line
#4857: fix augmented assignment target spec.
........
r68726 | georg.brandl | 2009-01-18 08:41:52 -0600 (Sun, 18 Jan 2009) | 1 line
#4923: clarify what was added.
........
r68727 | georg.brandl | 2009-01-18 12:25:30 -0600 (Sun, 18 Jan 2009) | 1 line
#4986: augassigns are not expressions.
........
r68739 | benjamin.peterson | 2009-01-18 15:11:38 -0600 (Sun, 18 Jan 2009) | 1 line
fix test that wasn't working as expected #4990
........
No detailed change log; just check out the change log for the py3k-pep3137
branch. The most obvious changes:
- str8 renamed to bytes (PyString at the C level);
- bytes renamed to buffer (PyBytes at the C level);
- PyString and PyUnicode are no longer compatible.
I.e. we now have an immutable bytes type and a mutable bytes type.
The behavior of PyString was modified quite a bit, to make it more
bytes-like. Some changes are still on the to-do list.
ut-32-be). On narrow builds the codecs combine surrogate pairs in the unicode
object into one codepoint on encoding and create surrogate pairs for
codepoints outside the BMP on decoding. Lone surrogates are passed through
unchanged in all cases.
Backport to the trunk will follow.
error callback changes the bytes object in the exception the decoder might
use memory that's no longer in use. Change unicode_decode_call_errorhandler()
so that it fetches the adresses of the bytes array (start and end) from the
exception object and passes them back to the caller.
to PyObject_CallFunction()) that take a char * (and a size
in the case of 'U#') and create a unicode object out of it.
Add functions PyUnicode_FromFormat() and PyUnicode_FromFormatV()
that work similar to PyString_FromFormat(), but create a unicode
object (also a %U format character has been added, that takes
a PyObject *, which must point to a unicode object).
Change the encoding and reason attributes of UnicodeEncodeError,
UnicodeDecodeError and UnicodeTranslateError to be unicode
objects.
svn+ssh://pythondev@svn.python.org/python/branches/p3yk
........
r55077 | guido.van.rossum | 2007-05-02 11:54:37 -0700 (Wed, 02 May 2007) | 2 lines
Use the new print syntax, at least.
........
r55142 | fred.drake | 2007-05-04 21:27:30 -0700 (Fri, 04 May 2007) | 1 line
remove old cruftiness
........
r55143 | fred.drake | 2007-05-04 21:52:16 -0700 (Fri, 04 May 2007) | 1 line
make this work with the new Python
........
r55162 | neal.norwitz | 2007-05-06 22:29:18 -0700 (Sun, 06 May 2007) | 1 line
Get asdl code gen working with Python 2.3. Should continue to work with 3.0
........
r55164 | neal.norwitz | 2007-05-07 00:00:38 -0700 (Mon, 07 May 2007) | 1 line
Verify checkins to p3yk (sic) branch go to 3000 list.
........
r55166 | neal.norwitz | 2007-05-07 00:12:35 -0700 (Mon, 07 May 2007) | 1 line
Fix this test so it runs again by importing warnings_test properly.
........
r55167 | neal.norwitz | 2007-05-07 01:03:22 -0700 (Mon, 07 May 2007) | 8 lines
So long xrange. range() now supports values that are outside
-sys.maxint to sys.maxint. floats raise a TypeError.
This has been sitting for a long time. It probably has some problems and
needs cleanup. Objects/rangeobject.c now uses 4-space indents since
it is almost completely new.
........
r55171 | guido.van.rossum | 2007-05-07 10:21:26 -0700 (Mon, 07 May 2007) | 4 lines
Fix two tests that were previously depending on significant spaces
at the end of a line (and before that on Python 2.x print behavior
that has no exact equivalent in 3.0).
........