Xiang Zhang
ea1cf87030
Issue #29044 : Fix a use-after-free in string '%c' formatter.
2016-12-22 15:30:47 +08:00
Xiang Zhang
b211068f5c
Issue #28822 : Adjust indices handling of PyUnicode_FindChar().
2016-12-20 22:52:33 +08:00
Xavier de Gaye
31eaf49ed9
Merge 3.6.
2016-12-15 21:01:52 +01:00
Xavier de Gaye
76febd0792
Issue #26919 : On Android, operating system data is now always encoded/decoded
...
to/from UTF-8, instead of the locale encoding to avoid inconsistencies with
os.fsencode() and os.fsdecode() which are already using UTF-8.
2016-12-15 20:59:58 +01:00
Serhiy Storchaka
fb3134f4d4
Issue #28808 : PyUnicode_CompareWithASCIIString() now never raises exceptions.
2016-12-06 00:20:26 +02:00
Serhiy Storchaka
9a953dbb34
Issue #28808 : PyUnicode_CompareWithASCIIString() now never raises exceptions.
2016-12-06 00:17:45 +02:00
Serhiy Storchaka
419967b832
Issue #28808 : PyUnicode_CompareWithASCIIString() now never raises exceptions.
2016-12-06 00:13:34 +02:00
Victor Stinner
de4ae3d486
Backed out changeset b9c9691c72c5
...
Issue #28858 : The change b9c9691c72c5 introduced a regression. It seems like
_PyObject_CallArg1() uses more stack memory than
PyObject_CallFunctionObjArgs().
2016-12-04 22:59:09 +01:00
Victor Stinner
27580c1fb5
Replace PyObject_CallFunctionObjArgs() with fastcall
...
* PyObject_CallFunctionObjArgs(func, NULL) => _PyObject_CallNoArg(func)
* PyObject_CallFunctionObjArgs(func, arg, NULL) => _PyObject_CallArg1(func, arg)
PyObject_CallFunctionObjArgs() allocates 40 bytes on the C stack and requires
extra work to "parse" C arguments to build a C array of PyObject*.
_PyObject_CallNoArg() and _PyObject_CallArg1() are simpler and don't allocate
memory on the C stack.
This change is part of the fastcall project. The change on listsort() is
related to the issue #23507 .
2016-12-01 14:43:22 +01:00
Serhiy Storchaka
99250d5c63
Issue #28774 : Simplified encoding a str result of an error handler in ASCII
...
and Latin1 encoders.
2016-11-23 15:13:00 +02:00
Xiang Zhang
d04d8474df
Issue #28774 : Fix start/end pos in unicode_encode_ucs1().
...
Fix error position of the unicode error in ASCII and Latin1
encoders when a string returned by the error handler contains multiple
non-encodable characters (non-ASCII for the ASCII codec, characters out
of the U+0000-U+00FF range for Latin1).
2016-11-23 19:34:01 +08:00
Serhiy Storchaka
50911476f5
Issue #28760 : Clean up and fix comments in PyUnicode_AsUnicodeEscapeString().
...
Patch by Xiang Zhang.
2016-11-21 11:47:16 +02:00
Serhiy Storchaka
ac0720eaa4
Issue #28760 : Clean up and fix comments in PyUnicode_AsUnicodeEscapeString().
...
Patch by Xiang Zhang.
2016-11-21 11:46:51 +02:00
Serhiy Storchaka
460bd0d284
Issue #19569 : Compiler warnings are now emitted if use most of deprecated
...
functions.
2016-11-20 12:16:46 +02:00
Serhiy Storchaka
27b74244fb
Issue #28701 : _PyUnicode_EqualToASCIIId and _PyUnicode_EqualToASCIIString now
...
require ASCII right argument and assert this condition in debug build.
2016-11-16 20:03:03 +02:00
Serhiy Storchaka
a83a6a3275
Issue #28701 : _PyUnicode_EqualToASCIIId and _PyUnicode_EqualToASCIIString now
...
require ASCII right argument and assert this condition in debug build.
2016-11-16 20:02:44 +02:00
Serhiy Storchaka
e6d6131f78
Fixed an off-by-one error in _PyUnicode_EqualToASCIIString (issue #28701 ).
2016-11-16 16:13:13 +02:00
Serhiy Storchaka
df66b9c425
Fixed an off-by-one error in _PyUnicode_EqualToASCIIString (issue #28701 ).
2016-11-16 16:12:56 +02:00
Serhiy Storchaka
292dd1b2ad
Fixed an off-by-one error in _PyUnicode_EqualToASCIIString (issue #28701 ).
2016-11-16 16:12:34 +02:00
Serhiy Storchaka
503db266a5
Issue #21449 : Removed private function _PyUnicode_CompareWithId.
2016-11-16 15:56:50 +02:00
Serhiy Storchaka
dddec81b2d
Issue #21449 : Removed private function _PyUnicode_CompareWithId.
2016-11-16 15:56:27 +02:00
Serhiy Storchaka
29a5447360
Issue #28701 : Replace _PyUnicode_CompareWithId with _PyUnicode_EqualToASCIIId.
...
The latter function is more readable, faster and doesn't raise exceptions.
Based on patch by Xiang Zhang.
2016-11-16 15:41:31 +02:00
Serhiy Storchaka
fab6acd9f5
Issue #28701 : Replace _PyUnicode_CompareWithId with _PyUnicode_EqualToASCIIId.
...
The latter function is more readable, faster and doesn't raise exceptions.
Based on patch by Xiang Zhang.
2016-11-16 15:41:11 +02:00
Serhiy Storchaka
f5894dd646
Issue #28701 : Replace _PyUnicode_CompareWithId with _PyUnicode_EqualToASCIIId.
...
The latter function is more readable, faster and doesn't raise exceptions.
Based on patch by Xiang Zhang.
2016-11-16 15:40:39 +02:00
Serhiy Storchaka
1a73bf365e
Issue #28701 : Replace PyUnicode_CompareWithASCIIString with _PyUnicode_EqualToASCIIString.
...
The latter function is more readable, faster and doesn't raise exceptions.
2016-11-16 10:19:57 +02:00
Serhiy Storchaka
3b73ea1278
Issue #28701 : Replace PyUnicode_CompareWithASCIIString with _PyUnicode_EqualToASCIIString.
...
The latter function is more readable, faster and doesn't raise exceptions.
2016-11-16 10:19:20 +02:00
Serhiy Storchaka
f4934ea77d
Issue #28701 : Replace PyUnicode_CompareWithASCIIString with _PyUnicode_EqualToASCIIString.
...
The latter function is more readable, faster and doesn't raise exceptions.
2016-11-16 10:17:58 +02:00
Serhiy Storchaka
616034eb73
Issue #28648 : Fixed crash in Py_DecodeLocale() in debug build on Mac OS X
...
when decode astral characters.
2016-11-12 14:37:11 +02:00
Serhiy Storchaka
babe4f8e5e
Issue #28648 : Fixed crash in Py_DecodeLocale() in debug build on Mac OS X
...
when decode astral characters.
2016-11-12 14:36:02 +02:00
Serhiy Storchaka
6b4b6e956e
Issue #28648 : Fixed crash in Py_DecodeLocale() in debug build on Mac OS X
...
when decode astral characters.
2016-11-12 14:35:46 +02:00
Serhiy Storchaka
84293aff9f
Issue #28648 : Fixed crash in Py_DecodeLocale() in debug build on Mac OS X
...
when decode astral characters.
2016-11-12 14:29:48 +02:00
Serhiy Storchaka
b626643734
Issue #28648 : Fixed crash in Py_DecodeLocale() in debug build on Mac OS X
...
when decode astral characters.
2016-11-12 14:28:06 +02:00
Steve Dower
257a4c1503
Closes #27781 : Removes special cases for the experimental aspect of PEP 529
2016-11-06 19:35:24 -08:00
Steve Dower
78057b4159
Closes #27781 : Removes special cases for the experimental aspect of PEP 529
2016-11-06 19:35:08 -08:00
Eric V. Smith
5646648678
Issue 28128: Print out better error/warning messages for invalid string escapes. Backport to 3.6.
2016-10-31 14:46:26 -04:00
Eric V. Smith
42454af094
Issue 28128: Print out better error/warning messages for invalid string escapes.
2016-10-31 09:22:08 -04:00
Serhiy Storchaka
2edcd1cba4
Issue #28426 : Deprecated undocumented functions PyUnicode_AsEncodedObject(),
...
PyUnicode_AsDecodedObject(), PyUnicode_AsDecodedUnicode() and
PyUnicode_AsEncodedUnicode().
2016-10-27 21:08:00 +03:00
Serhiy Storchaka
0093907f0e
Issue #28426 : Deprecated undocumented functions PyUnicode_AsEncodedObject(),
...
PyUnicode_AsDecodedObject(), PyUnicode_AsDecodedUnicode() and
PyUnicode_AsEncodedUnicode().
2016-10-27 21:05:49 +03:00
Serhiy Storchaka
a4f8823063
Issue #28408 : Fixed a leak and remove redundant code in _PyUnicodeWriter_Finish().
...
Patch by Xiang Zhang.
2016-10-25 13:25:04 +03:00
Serhiy Storchaka
c8bc3d1c07
Issue #28408 : Fixed a leak and remove redundant code in _PyUnicodeWriter_Finish().
...
Patch by Xiang Zhang.
2016-10-25 13:23:56 +03:00
Serhiy Storchaka
d7e5ff13bb
Issue #28426 : Fixed potential crash in PyUnicode_AsDecodedObject() in debug build.
2016-10-25 10:18:16 +03:00
Serhiy Storchaka
c4a3e90aa8
Issue #28426 : Fixed potential crash in PyUnicode_AsDecodedObject() in debug build.
2016-10-25 10:17:33 +03:00
Serhiy Storchaka
839023f12c
Issue #28426 : Fixed potential crash in PyUnicode_AsDecodedObject() in debug build.
2016-10-25 10:13:43 +03:00
Serhiy Storchaka
77eede35fc
Issue #28426 : Fixed potential crash in PyUnicode_AsDecodedObject() in debug build.
2016-10-25 10:07:51 +03:00
Serhiy Storchaka
2fbc019c8c
Issue #28439 : Remove redundant checks in PyUnicode_EncodeLocale and
...
PyUnicode_DecodeLocaleAndSize. Patch by Xiang Zhang.
2016-10-23 15:41:36 +03:00
Serhiy Storchaka
f8d7d41507
Issue #28511 : Use the "U" format instead of "O!" in PyArg_Parse*.
2016-10-23 15:12:25 +03:00
Serhiy Storchaka
523c449ca0
Issue #28504 : Cleanup unicode_decode_call_errorhandler_wchar/writer.
...
Patch by Xiang Zhang.
2016-10-22 23:18:31 +03:00
Serhiy Storchaka
14ab277632
Issue #28410 : Added _PyErr_FormatFromCause() -- the helper for raising
...
new exception with setting current exception as __cause__.
_PyErr_FormatFromCause(exception, format, args...) is equivalent to Python
raise exception(format % args) from sys.exc_info()[1]
2016-10-21 17:10:42 +03:00
Serhiy Storchaka
467ab194fc
Issue #28410 : Added _PyErr_FormatFromCause() -- the helper for raising
...
new exception with setting current exception as __cause__.
_PyErr_FormatFromCause(exception, format, args...) is equivalent to Python
raise exception(format % args) from sys.exc_info()[1]
2016-10-21 17:09:17 +03:00
Benjamin Peterson
d6d49f16f4
merge 3.6 ( #28454 )
2016-10-16 15:42:33 -07:00
Benjamin Peterson
3aa75528a1
merge 3.5 ( #28454 )
2016-10-16 15:42:24 -07:00
Benjamin Peterson
8d761ff045
remove extra PyErr_Format arguments ( closes #28454 )
...
Patch from Xiang Zhang.
2016-10-16 15:41:46 -07:00
Victor Stinner
5a33759fba
Merge 3.6
2016-10-12 13:59:13 +02:00
Victor Stinner
ebe17e0347
Fix _Py_normalize_encoding() command
...
It's not exactly the same than encodings.normalize_encoding(): the C function
also converts to lowercase.
2016-10-12 13:57:45 +02:00
Benjamin Peterson
8a3748290a
merge 3.6 ( #28417 )
2016-10-11 23:01:12 -07:00
Benjamin Peterson
b329e1bb5b
va_end vargs2 once ( closes #28417 )
2016-10-11 23:00:58 -07:00
Serhiy Storchaka
2e58f1a52a
Issue #28400 : Removed uncessary checks in unicode_char and resize_copy.
...
1. In resize_copy we don't need to PyUnicode_READY(unicode) since when
it's not PyUnicode_WCHAR_KIND it should be ready.
2. In unicode_char, PyUnicode_1BYTE_KIND is handled by get_latin1_char.
Patch by Xiang Zhang.
2016-10-09 23:44:48 +03:00
Serhiy Storchaka
21d9f10c94
Merge from 3.5.
2016-10-08 22:46:01 +03:00
Serhiy Storchaka
9c0e1f83af
Issue #28379 : Added sanity checks and tests for PyUnicode_CopyCharacters().
...
Patch by Xiang Zhang.
2016-10-08 22:45:38 +03:00
Victor Stinner
44f4874e68
Merge 3.5
2016-09-21 14:13:53 +02:00
Victor Stinner
1ddf53d496
Fix PyUnicode_FromFormatV() error handling
...
Issue #28233 : Fix a memory leak if the format string contains a non-ASCII
character, destroy the unicode writer.
2016-09-21 14:13:14 +02:00
Christian Heimes
2f2fee19ec
va_end() all va_copy()ed va_lists.
2016-09-21 11:37:27 +02:00
Benjamin Peterson
0c21214f3e
replace usage of Py_VA_COPY with the (C99) standard va_copy
2016-09-20 20:39:33 -07:00
Christian Heimes
f051e43b22
Issue #28126 : Replace Py_MEMCPY with memcpy(). Visual Studio can properly optimize memcpy().
2016-09-13 20:22:02 +02:00
Benjamin Peterson
621b430a14
remove all usage of Py_LOCAL
2016-09-09 13:54:34 -07:00
Benjamin Peterson
33d2a492d0
promote some shifts to unsigned, so as not to invoke undefined behavior
2016-09-06 20:40:04 -07:00
R David Murray
110b6fecbb
#27364 : Deprecate invalid escape strings in str/byutes.
...
Patch by Emanuel Barry, reviewed by Serhiy Storchaka and Martin Panter.
2016-09-08 15:34:08 -04:00
Steve Dower
cc16be85c0
Issue #27781 : Change file system encoding on Windows to UTF-8 (PEP 529)
2016-09-08 10:35:16 -07:00
Benjamin Peterson
47ff0734b8
more PY_LONG_LONG to long long
2016-09-08 09:15:54 -07:00
Benjamin Peterson
2e7c5e9c11
replace some Py_LOCAL_INLINE with the inline keyword
2016-09-07 15:33:32 -07:00
Benjamin Peterson
4b9abf3a27
merge 3.5
2016-09-06 20:42:17 -07:00
Brett Cannon
a571120410
Issue #27182 : Add support for path-like objects to PyUnicode_FSDecoder().
2016-09-06 19:36:01 -07:00
Victor Stinner
62ec3317d2
Optimize unicode_escape and raw_unicode_escape
...
Issue #16334 . Patch written by Serhiy Storchaka.
2016-09-06 17:04:34 -07:00
Victor Stinner
2740e46089
_PyUnicodeWriter: assert that max character <= MAX_UNICODE
2016-09-06 16:58:36 -07:00
Brett Cannon
ec6ce879c7
Issue #26027 : Support path-like objects in PyUnicode-FSConverter().
...
This is to add support for os.exec*() and os.spawn*() functions. Part
of PEP 519.
2016-09-06 15:50:29 -07:00
Benjamin Peterson
9b3d77052f
replace Python aliases for standard integer types with the standard integer types ( #17884 )
2016-09-06 13:24:00 -07:00
Serhiy Storchaka
ea525a2d1a
Issue #27078 : Added BUILD_STRING opcode. Optimized f-strings evaluation.
2016-09-06 22:07:53 +03:00
Benjamin Peterson
af580dff4a
replace PY_LONG_LONG with long long
2016-09-06 10:46:49 -07:00
Benjamin Peterson
ed4aa83ff7
require a long long data type ( closes #27961 )
2016-09-05 17:44:18 -07:00
Victor Stinner
942889aae2
Issue #27938 : Add a fast-path for us-ascii encoding
...
Other changes:
* Rewrite _Py_normalize_encoding() as a C implementation of
encodings.normalize_encoding(). For example, " utf-8 " is now normalized to
"utf_8". So the fast path is now used for more name variants of the same
encoding.
* Avoid strcpy() when encoding is NULL: call directly the UTF-8 codec
2016-09-05 15:40:10 -07:00
Victor Stinner
1a05d6c04d
PEP 7 style for if/else in C
...
Add also a newline for readability in normalize_encoding().
2016-09-02 12:12:23 +02:00
Raymond Hettinger
15f44ab043
Issue #27895 : Spelling fixes (Contributed by Ville Skyttä).
2016-08-30 10:47:49 -07:00
Serhiy Storchaka
febc332056
Issue #26754 : Undocumented support of general bytes-like objects
...
as path in compile() and similar functions is now deprecated.
2016-08-06 23:29:29 +03:00
Berker Peksag
ced8d4c6eb
Issue #27454 : Use PyDict_SetDefault in PyUnicode_InternInPlace
...
Patch by INADA Naoki.
2016-07-25 04:40:39 +03:00
Serhiy Storchaka
f95de0e8cc
Issue #26754 : PyUnicode_FSDecoder() accepted a filename argument encoded as
...
an iterable of integers. Now only strings and byte-like objects are accepted.
2016-06-18 13:56:16 +03:00
Serhiy Storchaka
9305d83425
Issue #26754 : PyUnicode_FSDecoder() accepted a filename argument encoded as
...
an iterable of integers. Now only strings and byte-like objects are accepted.
2016-06-18 13:53:36 +03:00
Martin Panter
0b7d84de6b
Issue #27171 : Merge typo fixes from 3.5
2016-06-02 10:11:18 +00:00
Martin Panter
e26da7c03a
Issue #27171 : Fix typos in documentation, comments, and test function names
2016-06-02 10:07:09 +00:00
Serhiy Storchaka
dd40fc3e57
Issue #26765 : Moved common code and docstrings for bytes and bytearray methods
...
to bytes_methods.c.
2016-05-04 22:23:26 +03:00
Martin Panter
cda80940ed
Issue #15984 : Merge PyUnicode doc from 3.5
2016-04-15 02:27:11 +00:00
Martin Panter
6245cb3c01
Correct “an” → “a” with “Unicode”, “user”, “UTF”, etc
...
This affects documentation, code comments, and a debugging messages.
2016-04-15 02:14:19 +00:00
Serhiy Storchaka
21a663ea28
Issue #26057 : Got rid of nonneeded use of PyUnicode_FromObject().
2016-04-13 15:37:23 +03:00
Serhiy Storchaka
f01e408c16
Issue #26200 : Added Py_SETREF and replaced Py_XSETREF with Py_SETREF
...
in places where Py_DECREF was used.
2016-04-10 18:12:01 +03:00
Serhiy Storchaka
57a01d3a0e
Issue #26200 : Added Py_SETREF and replaced Py_XSETREF with Py_SETREF
...
in places where Py_DECREF was used.
2016-04-10 18:05:40 +03:00
Serhiy Storchaka
ec39756960
Issue #22570 : Renamed Py_SETREF to Py_XSETREF.
2016-04-06 09:50:03 +03:00
Serhiy Storchaka
48842714b9
Issue #22570 : Renamed Py_SETREF to Py_XSETREF.
2016-04-06 09:45:48 +03:00
Serhiy Storchaka
ab479c49d3
Issue #26494 : Fixed crash on iterating exhausting iterators.
...
Affected classes are generic sequence iterators, iterators of str, bytes,
bytearray, list, tuple, set, frozenset, dict, OrderedDict, corresponding
views and os.scandir() iterator.
2016-03-30 20:41:15 +03:00
Serhiy Storchaka
fbb1c5ee06
Issue #26494 : Fixed crash on iterating exhausting iterators.
...
Affected classes are generic sequence iterators, iterators of str, bytes,
bytearray, list, tuple, set, frozenset, dict, OrderedDict, corresponding
views and os.scandir() iterator.
2016-03-30 20:40:02 +03:00
Victor Stinner
f2192855dd
Merge 3.5
2016-03-01 22:07:53 +01:00
Victor Stinner
337986740f
Issue #26464 : Fix unicode_fast_translate() again
...
Initialize i variable if the string is non-ASCII.
2016-03-01 21:59:58 +01:00
Victor Stinner
3d9d77a3dc
Merge 3.5
2016-03-01 21:30:50 +01:00
Victor Stinner
6c9aa8f2bf
Fix str.translate()
...
Issue #26464 : Fix str.translate() when string is ASCII and first replacements
removes character, but next replacement uses a non-ASCII character or a string
longer than 1 character. Regression introduced in Python 3.5.0.
2016-03-01 21:30:30 +01:00
Victor Stinner
5b96f17b1c
Merge 3.5
2016-01-27 17:01:13 +01:00
Victor Stinner
5bc03a6d4d
Fix resize_compact()
...
Issue #26217 : resize_compact() must set wstr_length to 0 after freeing the wstr
string. Otherwise, an assertion fails in _PyUnicode_CheckConsistency().
2016-01-27 16:56:53 +01:00
Serhiy Storchaka
726fc139a5
Issue #20440 : More use of Py_SETREF.
...
This patch is manually crafted and contains changes that couldn't be handled
automatically.
2015-12-27 15:44:33 +02:00
Serhiy Storchaka
191321d11b
Issue #20440 : More use of Py_SETREF.
...
This patch is manually crafted and contains changes that couldn't be handled
automatically.
2015-12-27 15:41:34 +02:00
Serhiy Storchaka
ef1585eb9a
Issue #25923 : Added more const qualifiers to signatures of static and private functions.
2015-12-25 20:01:53 +02:00
Serhiy Storchaka
2d06e84455
Issue #25923 : Added the const qualifier to static constant arrays.
2015-12-25 19:53:18 +02:00
Serhiy Storchaka
f006940351
Issue #20440 : Massive replacing unsafe attribute setting code with special
...
macro Py_SETREF.
2015-12-24 10:39:57 +02:00
Serhiy Storchaka
5a57ade58e
Issue #20440 : Massive replacing unsafe attribute setting code with special
...
macro Py_SETREF.
2015-12-24 10:35:59 +02:00
Serhiy Storchaka
9b3a2eec1c
Issues #25890 , #25891 , #25892 : Removed unused variables in Windows code.
...
Reported by Alexander Riccio.
2015-12-18 10:03:13 +02:00
Serhiy Storchaka
7c088a9b5c
Issue #25709 : Fixed problem with in-place string concatenation and utf-8 cache.
2015-12-03 01:05:52 +02:00
Serhiy Storchaka
6648bf5661
Issue #25709 : Fixed problem with in-place string concatenation and utf-8 cache.
2015-12-03 01:04:37 +02:00
Serhiy Storchaka
31b9410654
Issue #25709 : Fixed problem with in-place string concatenation and utf-8 cache.
2015-12-03 01:02:03 +02:00
Serhiy Storchaka
7aa690860e
Issue #25709 : Fixed problem with in-place string concatenation and utf-8 cache.
2015-12-03 01:02:03 +02:00
Benjamin Peterson
d798dc1034
merge 3.5 ( #25630 )
2015-11-15 21:57:50 -08:00
Benjamin Peterson
a4d33b3428
make the PyUnicode_FSConverter cleanup set the decrefed argument to NULL ( closes #25630 )
2015-11-15 21:57:39 -08:00
Serhiy Storchaka
413fdcea21
Issue #24821 : Refactor STRINGLIB(fastsearch_memchr_1char) and split it on
...
STRINGLIB(find_char) and STRINGLIB(rfind_char) that can be used independedly
without special preconditions.
2015-11-14 15:42:17 +02:00
Serhiy Storchaka
4a7c03aab4
Issue #25523 : Merge a-to-an corrections from 3.5.
2015-11-02 14:44:29 +02:00
Serhiy Storchaka
a84f6c3dd3
Issue #25523 : Merge a-to-an corrections from 3.4.
2015-11-02 14:39:05 +02:00
Serhiy Storchaka
d65c9496da
Issue #25523 : Further a-to-an corrections.
2015-11-02 14:10:23 +02:00
Victor Stinner
358af13526
Issue #25353 : Optimize unicode escape and raw unicode escape encoders to use
...
the new _PyBytesWriter API.
2015-10-12 22:36:57 +02:00
Victor Stinner
6c2cdae9e6
Writer APIs: use empty string singletons
...
Modify _PyBytesWriter_Finish() and _PyUnicodeWriter_Finish() to return the
empty bytes/Unicode string if the string is empty.
2015-10-12 13:29:43 +02:00
Victor Stinner
6bd525b656
Optimize error handlers of ASCII and Latin1 encoders when the replacement
...
string is pure ASCII: use _PyBytesWriter_WriteBytes(), don't check individual
character.
Cleanup unicode_encode_ucs1():
* Rename repunicode to rep
* Clear rep object on error
* Factorize code between bytes and unicode path
2015-10-09 13:10:05 +02:00
Victor Stinner
ce179bf6ba
Add _PyBytesWriter_WriteBytes() to factorize the code
2015-10-09 12:57:22 +02:00
Victor Stinner
ad7715891e
_PyBytesWriter: simplify code to avoid "prealloc" parameters
...
Substract preallocate bytes from min_size before calling
_PyBytesWriter_Prepare().
2015-10-09 12:38:53 +02:00
Victor Stinner
3fa36ff5e4
Issue #25318 : Fix backslashreplace()
...
Fix code to estimate the needed space.
2015-10-09 03:37:11 +02:00
Victor Stinner
797485e101
Issue #25318 : Avoid sprintf() in backslashreplace()
...
Rewrite backslashreplace() to be closer to PyCodec_BackslashReplaceErrors().
Add also unit tests for non-BMP characters.
2015-10-09 03:17:30 +02:00
Victor Stinner
0016507c16
Issue #25318 : Move _PyBytesWriter to bytesobject.c
...
Declare also the private API in bytesobject.h.
2015-10-09 01:53:21 +02:00
Victor Stinner
e7bf86cd7d
Optimize backslashreplace error handler
...
Issue #25318 : Optimize backslashreplace and xmlcharrefreplace error handlers in
UTF-8 encoder. Optimize also backslashreplace error handler for ASCII and
Latin1 encoders.
Use the new _PyBytesWriter API to optimize these error handlers for the
encoders. It avoids to create an exception and call the slow implementation of
the error handler.
2015-10-09 01:39:28 +02:00
Victor Stinner
fdfbf78114
Issue #25318 : Add _PyBytesWriter API
...
Add a new private API to optimize Unicode encoders. It uses a small buffer
allocated on the stack and supports overallocation.
Use _PyBytesWriter API for UCS1 (ASCII and Latin1) and UTF-8 encoders. Enable
overallocation for the UTF-8 encoder with error handlers.
unicode_encode_ucs1(): initialize collend to collstart+1 to not check the
current character twice, we already know that it is not ASCII.
2015-10-09 00:33:49 +02:00
Victor Stinner
74e8fac3c8
Issue #25301 : Fix compatibility with ISO C90
2015-10-05 13:49:26 +02:00
Victor Stinner
1d65d9192d
Issue #25301 : The UTF-8 decoder is now up to 15 times as fast for error
...
handlers: ``ignore``, ``replace`` and ``surrogateescape``.
2015-10-05 13:43:50 +02:00
Victor Stinner
eb36fdaad8
Fix _PyUnicodeWriter_PrepareKind()
...
Initialize kind to 0 (PyUnicode_WCHAR_KIND) to ensure that
_PyUnicodeWriter_PrepareKind() handles correctly read-only buffer: copy the
buffer.
2015-10-03 01:55:51 +02:00
Serhiy Storchaka
29e68edbf4
Issue #24848 : Fixed bugs in UTF-7 decoding of misformed data:
...
1. Non-ASCII bytes were accepted after shift sequence.
2. A low surrogate could be emitted in case of error in high surrogate.
3. In some circumstances the '\xfd' character was produced instead of the
replacement character '\ufffd' (due to a bug in _PyUnicodeWriter).
2015-10-02 13:14:03 +03:00
Serhiy Storchaka
58c8f2bb6d
Issue #24848 : Fixed bugs in UTF-7 decoding of misformed data:
...
1. Non-ASCII bytes were accepted after shift sequence.
2. A low surrogate could be emitted in case of error in high surrogate.
3. In some circumstances the '\xfd' character was produced instead of the
replacement character '\ufffd' (due to a bug in _PyUnicodeWriter).
2015-10-02 13:13:14 +03:00
Serhiy Storchaka
28b21e50c8
Issue #24848 : Fixed bugs in UTF-7 decoding of misformed data:
...
1. Non-ASCII bytes were accepted after shift sequence.
2. A low surrogate could be emitted in case of error in high surrogate.
2015-10-02 13:07:28 +03:00
Victor Stinner
3222da26fe
Make _PyUnicode_TranslateCharmap() symbol private
...
unicodeobject.h exposes PyUnicode_TranslateCharmap() and PyUnicode_Translate().
2015-10-01 22:07:32 +02:00
Victor Stinner
01ada3996b
Issue #25267 : The UTF-8 encoder is now up to 75 times as fast for error
...
handlers: ``ignore``, ``replace``, ``surrogateescape``, ``surrogatepass``.
Patch co-written with Serhiy Storchaka.
2015-10-01 21:54:51 +02:00
Victor Stinner
c3713e9706
Optimize ascii/latin1+surrogateescape encoders
...
Issue #25227 : Optimize ASCII and latin1 encoders with the ``surrogateescape``
error handler: the encoders are now up to 3 times as fast.
Initial patch written by Serhiy Storchaka.
2015-09-29 12:32:13 +02:00
Victor Stinner
0030cd52da
Issue #25227 : Cleanup unicode_encode_ucs1() error handler
...
* Change limit type from unsigned int to Py_UCS4, to use the same type than the
"ch" variable (an Unicode character).
* Reuse ch variable for _Py_ERROR_XMLCHARREFREPLACE
* Add some newlines for readability
2015-09-24 14:45:00 +02:00
Victor Stinner
54385b206d
Issue #24870 : revert unwanted change
...
Sorry, I pushed the patch on the UTF-8 decoder by mistake :-(
2015-09-22 10:46:52 +02:00
Victor Stinner
5ebae87628
Issue #25207 , #14626 : Fix my commit.
...
It doesn't work to use #define XXX defined(YYY)" and then "#ifdef XXX"
to check YYY.
2015-09-22 01:29:33 +02:00
Victor Stinner
6174474bea
_PyUnicodeWriter_PrepareInternal(): make the assertion more strict
2015-09-22 01:01:17 +02:00
Victor Stinner
ca9381ea01
Issue #24870 : Add _PyUnicodeWriter_PrepareKind() macro
...
Add a macro which ensures that the writer has at least the requested kind.
2015-09-22 00:58:32 +02:00
Victor Stinner
5014920cb7
Issue #24870 : Reuse the new _Py_error_handler enum
...
Factorize code with the new get_error_handler() function.
Add some empty lines for readability.
2015-09-22 00:26:54 +02:00
Victor Stinner
f96418de05
Issue #24870 : Optimize the ASCII decoder for error handlers: surrogateescape,
...
ignore and replace. Initial patch written by Naoki Inada.
The decoder is now up to 60 times as fast for these error handlers.
Add also unit tests for the ASCII decoder.
2015-09-21 23:06:27 +02:00
Zachary Ware
070bd62cfa
Closes #21279 : Merge with 3.5
2015-08-06 00:05:13 -05:00
Zachary Ware
d987a81d29
Issue #21279 : Merge with 3.4
2015-08-06 00:04:23 -05:00
Zachary Ware
79b98df023
Issue #21279 : Flesh out str.translate docs
...
Initial patch by Kinga Farkas, Martin Panter, and John Posner.
2015-08-05 23:54:15 -05:00