Commit Graph

1274 Commits

Author SHA1 Message Date
Martin Panter cda80940ed Issue #15984: Merge PyUnicode doc from 3.5 2016-04-15 02:27:11 +00:00
Martin Panter 6245cb3c01 Correct “an” → “a” with “Unicode”, “user”, “UTF”, etc
This affects documentation, code comments, and a debugging messages.
2016-04-15 02:14:19 +00:00
Serhiy Storchaka 21a663ea28 Issue #26057: Got rid of nonneeded use of PyUnicode_FromObject(). 2016-04-13 15:37:23 +03:00
Serhiy Storchaka f01e408c16 Issue #26200: Added Py_SETREF and replaced Py_XSETREF with Py_SETREF
in places where Py_DECREF was used.
2016-04-10 18:12:01 +03:00
Serhiy Storchaka 57a01d3a0e Issue #26200: Added Py_SETREF and replaced Py_XSETREF with Py_SETREF
in places where Py_DECREF was used.
2016-04-10 18:05:40 +03:00
Serhiy Storchaka ec39756960 Issue #22570: Renamed Py_SETREF to Py_XSETREF. 2016-04-06 09:50:03 +03:00
Serhiy Storchaka 48842714b9 Issue #22570: Renamed Py_SETREF to Py_XSETREF. 2016-04-06 09:45:48 +03:00
Serhiy Storchaka ab479c49d3 Issue #26494: Fixed crash on iterating exhausting iterators.
Affected classes are generic sequence iterators, iterators of str, bytes,
bytearray, list, tuple, set, frozenset, dict, OrderedDict, corresponding
views and os.scandir() iterator.
2016-03-30 20:41:15 +03:00
Serhiy Storchaka fbb1c5ee06 Issue #26494: Fixed crash on iterating exhausting iterators.
Affected classes are generic sequence iterators, iterators of str, bytes,
bytearray, list, tuple, set, frozenset, dict, OrderedDict, corresponding
views and os.scandir() iterator.
2016-03-30 20:40:02 +03:00
Victor Stinner f2192855dd Merge 3.5 2016-03-01 22:07:53 +01:00
Victor Stinner 337986740f Issue #26464: Fix unicode_fast_translate() again
Initialize i variable if the string is non-ASCII.
2016-03-01 21:59:58 +01:00
Victor Stinner 3d9d77a3dc Merge 3.5 2016-03-01 21:30:50 +01:00
Victor Stinner 6c9aa8f2bf Fix str.translate()
Issue #26464: Fix str.translate() when string is ASCII and first replacements
removes character, but next replacement uses a non-ASCII character or a string
longer than 1 character. Regression introduced in Python 3.5.0.
2016-03-01 21:30:30 +01:00
Victor Stinner 5b96f17b1c Merge 3.5 2016-01-27 17:01:13 +01:00
Victor Stinner 5bc03a6d4d Fix resize_compact()
Issue #26217: resize_compact() must set wstr_length to 0 after freeing the wstr
string. Otherwise, an assertion fails in _PyUnicode_CheckConsistency().
2016-01-27 16:56:53 +01:00
Serhiy Storchaka 726fc139a5 Issue #20440: More use of Py_SETREF.
This patch is manually crafted and contains changes that couldn't be handled
automatically.
2015-12-27 15:44:33 +02:00
Serhiy Storchaka 191321d11b Issue #20440: More use of Py_SETREF.
This patch is manually crafted and contains changes that couldn't be handled
automatically.
2015-12-27 15:41:34 +02:00
Serhiy Storchaka ef1585eb9a Issue #25923: Added more const qualifiers to signatures of static and private functions. 2015-12-25 20:01:53 +02:00
Serhiy Storchaka 2d06e84455 Issue #25923: Added the const qualifier to static constant arrays. 2015-12-25 19:53:18 +02:00
Serhiy Storchaka f006940351 Issue #20440: Massive replacing unsafe attribute setting code with special
macro Py_SETREF.
2015-12-24 10:39:57 +02:00
Serhiy Storchaka 5a57ade58e Issue #20440: Massive replacing unsafe attribute setting code with special
macro Py_SETREF.
2015-12-24 10:35:59 +02:00
Serhiy Storchaka 9b3a2eec1c Issues #25890, #25891, #25892: Removed unused variables in Windows code.
Reported by Alexander Riccio.
2015-12-18 10:03:13 +02:00
Serhiy Storchaka 7c088a9b5c Issue #25709: Fixed problem with in-place string concatenation and utf-8 cache. 2015-12-03 01:05:52 +02:00
Serhiy Storchaka 6648bf5661 Issue #25709: Fixed problem with in-place string concatenation and utf-8 cache. 2015-12-03 01:04:37 +02:00
Serhiy Storchaka 7aa690860e Issue #25709: Fixed problem with in-place string concatenation and utf-8 cache. 2015-12-03 01:02:03 +02:00
Benjamin Peterson d798dc1034 merge 3.5 (#25630) 2015-11-15 21:57:50 -08:00
Benjamin Peterson a4d33b3428 make the PyUnicode_FSConverter cleanup set the decrefed argument to NULL (closes #25630) 2015-11-15 21:57:39 -08:00
Serhiy Storchaka 413fdcea21 Issue #24821: Refactor STRINGLIB(fastsearch_memchr_1char) and split it on
STRINGLIB(find_char) and STRINGLIB(rfind_char) that can be used independedly
without special preconditions.
2015-11-14 15:42:17 +02:00
Serhiy Storchaka 4a7c03aab4 Issue #25523: Merge a-to-an corrections from 3.5. 2015-11-02 14:44:29 +02:00
Serhiy Storchaka a84f6c3dd3 Issue #25523: Merge a-to-an corrections from 3.4. 2015-11-02 14:39:05 +02:00
Serhiy Storchaka d65c9496da Issue #25523: Further a-to-an corrections. 2015-11-02 14:10:23 +02:00
Victor Stinner 358af13526 Issue #25353: Optimize unicode escape and raw unicode escape encoders to use
the new _PyBytesWriter API.
2015-10-12 22:36:57 +02:00
Victor Stinner 6c2cdae9e6 Writer APIs: use empty string singletons
Modify _PyBytesWriter_Finish() and _PyUnicodeWriter_Finish() to return the
empty bytes/Unicode string if the string is empty.
2015-10-12 13:29:43 +02:00
Victor Stinner 6bd525b656 Optimize error handlers of ASCII and Latin1 encoders when the replacement
string is pure ASCII: use _PyBytesWriter_WriteBytes(), don't check individual
character.

Cleanup unicode_encode_ucs1():

* Rename repunicode to rep
* Clear rep object on error
* Factorize code between bytes and unicode path
2015-10-09 13:10:05 +02:00
Victor Stinner ce179bf6ba Add _PyBytesWriter_WriteBytes() to factorize the code 2015-10-09 12:57:22 +02:00
Victor Stinner ad7715891e _PyBytesWriter: simplify code to avoid "prealloc" parameters
Substract preallocate bytes from min_size before calling
_PyBytesWriter_Prepare().
2015-10-09 12:38:53 +02:00
Victor Stinner 3fa36ff5e4 Issue #25318: Fix backslashreplace()
Fix code to estimate the needed space.
2015-10-09 03:37:11 +02:00
Victor Stinner 797485e101 Issue #25318: Avoid sprintf() in backslashreplace()
Rewrite backslashreplace() to be closer to PyCodec_BackslashReplaceErrors().

Add also unit tests for non-BMP characters.
2015-10-09 03:17:30 +02:00
Victor Stinner 0016507c16 Issue #25318: Move _PyBytesWriter to bytesobject.c
Declare also the private API in bytesobject.h.
2015-10-09 01:53:21 +02:00
Victor Stinner e7bf86cd7d Optimize backslashreplace error handler
Issue #25318: Optimize backslashreplace and xmlcharrefreplace error handlers in
UTF-8 encoder. Optimize also backslashreplace error handler for ASCII and
Latin1 encoders.

Use the new _PyBytesWriter API to optimize these error handlers for the
encoders. It avoids to create an exception and call the slow implementation of
the error handler.
2015-10-09 01:39:28 +02:00
Victor Stinner fdfbf78114 Issue #25318: Add _PyBytesWriter API
Add a new private API to optimize Unicode encoders. It uses a small buffer
allocated on the stack and supports overallocation.

Use _PyBytesWriter API for UCS1 (ASCII and Latin1) and UTF-8 encoders. Enable
overallocation for the UTF-8 encoder with error handlers.

unicode_encode_ucs1(): initialize collend to collstart+1 to not check the
current character twice, we already know that it is not ASCII.
2015-10-09 00:33:49 +02:00
Victor Stinner 74e8fac3c8 Issue #25301: Fix compatibility with ISO C90 2015-10-05 13:49:26 +02:00
Victor Stinner 1d65d9192d Issue #25301: The UTF-8 decoder is now up to 15 times as fast for error
handlers: ``ignore``, ``replace`` and ``surrogateescape``.
2015-10-05 13:43:50 +02:00
Victor Stinner eb36fdaad8 Fix _PyUnicodeWriter_PrepareKind()
Initialize kind to 0 (PyUnicode_WCHAR_KIND) to ensure that
_PyUnicodeWriter_PrepareKind() handles correctly read-only buffer: copy the
buffer.
2015-10-03 01:55:51 +02:00
Serhiy Storchaka 29e68edbf4 Issue #24848: Fixed bugs in UTF-7 decoding of misformed data:
1. Non-ASCII bytes were accepted after shift sequence.
2. A low surrogate could be emitted in case of error in high surrogate.
3. In some circumstances the '\xfd' character was produced instead of the
replacement character '\ufffd' (due to a bug in _PyUnicodeWriter).
2015-10-02 13:14:03 +03:00
Serhiy Storchaka 58c8f2bb6d Issue #24848: Fixed bugs in UTF-7 decoding of misformed data:
1. Non-ASCII bytes were accepted after shift sequence.
2. A low surrogate could be emitted in case of error in high surrogate.
3. In some circumstances the '\xfd' character was produced instead of the
replacement character '\ufffd' (due to a bug in _PyUnicodeWriter).
2015-10-02 13:13:14 +03:00
Serhiy Storchaka 28b21e50c8 Issue #24848: Fixed bugs in UTF-7 decoding of misformed data:
1. Non-ASCII bytes were accepted after shift sequence.
2. A low surrogate could be emitted in case of error in high surrogate.
2015-10-02 13:07:28 +03:00
Victor Stinner 3222da26fe Make _PyUnicode_TranslateCharmap() symbol private
unicodeobject.h exposes PyUnicode_TranslateCharmap() and PyUnicode_Translate().
2015-10-01 22:07:32 +02:00
Victor Stinner 01ada3996b Issue #25267: The UTF-8 encoder is now up to 75 times as fast for error
handlers: ``ignore``, ``replace``, ``surrogateescape``, ``surrogatepass``.
Patch co-written with Serhiy Storchaka.
2015-10-01 21:54:51 +02:00
Victor Stinner c3713e9706 Optimize ascii/latin1+surrogateescape encoders
Issue #25227: Optimize ASCII and latin1 encoders with the ``surrogateescape``
error handler: the encoders are now up to 3 times as fast.

Initial patch written by Serhiy Storchaka.
2015-09-29 12:32:13 +02:00