cpython

Commit Graph

Author	SHA1	Message	Date
Victor Stinner	3fa36ff5e4	Issue #25318 : Fix backslashreplace() Fix code to estimate the needed space.	2015-10-09 03:37:11 +02:00
Victor Stinner	797485e101	Issue #25318 : Avoid sprintf() in backslashreplace() Rewrite backslashreplace() to be closer to PyCodec_BackslashReplaceErrors(). Add also unit tests for non-BMP characters.	2015-10-09 03:17:30 +02:00
Victor Stinner	0016507c16	Issue #25318 : Move _PyBytesWriter to bytesobject.c Declare also the private API in bytesobject.h.	2015-10-09 01:53:21 +02:00
Victor Stinner	e7bf86cd7d	Optimize backslashreplace error handler Issue #25318: Optimize backslashreplace and xmlcharrefreplace error handlers in UTF-8 encoder. Optimize also backslashreplace error handler for ASCII and Latin1 encoders. Use the new _PyBytesWriter API to optimize these error handlers for the encoders. It avoids to create an exception and call the slow implementation of the error handler.	2015-10-09 01:39:28 +02:00
Victor Stinner	fdfbf78114	Issue #25318 : Add _PyBytesWriter API Add a new private API to optimize Unicode encoders. It uses a small buffer allocated on the stack and supports overallocation. Use _PyBytesWriter API for UCS1 (ASCII and Latin1) and UTF-8 encoders. Enable overallocation for the UTF-8 encoder with error handlers. unicode_encode_ucs1(): initialize collend to collstart+1 to not check the current character twice, we already know that it is not ASCII.	2015-10-09 00:33:49 +02:00
Victor Stinner	74e8fac3c8	Issue #25301 : Fix compatibility with ISO C90	2015-10-05 13:49:26 +02:00
Victor Stinner	1d65d9192d	Issue #25301 : The UTF-8 decoder is now up to 15 times as fast for error handlers: ``ignore``, ``replace`` and ``surrogateescape``.	2015-10-05 13:43:50 +02:00
Victor Stinner	eb36fdaad8	Fix _PyUnicodeWriter_PrepareKind() Initialize kind to 0 (PyUnicode_WCHAR_KIND) to ensure that _PyUnicodeWriter_PrepareKind() handles correctly read-only buffer: copy the buffer.	2015-10-03 01:55:51 +02:00
Serhiy Storchaka	29e68edbf4	Issue #24848 : Fixed bugs in UTF-7 decoding of misformed data: 1. Non-ASCII bytes were accepted after shift sequence. 2. A low surrogate could be emitted in case of error in high surrogate. 3. In some circumstances the '\xfd' character was produced instead of the replacement character '\ufffd' (due to a bug in _PyUnicodeWriter).	2015-10-02 13:14:03 +03:00
Serhiy Storchaka	58c8f2bb6d	Issue #24848 : Fixed bugs in UTF-7 decoding of misformed data: 1. Non-ASCII bytes were accepted after shift sequence. 2. A low surrogate could be emitted in case of error in high surrogate. 3. In some circumstances the '\xfd' character was produced instead of the replacement character '\ufffd' (due to a bug in _PyUnicodeWriter).	2015-10-02 13:13:14 +03:00
Serhiy Storchaka	28b21e50c8	Issue #24848 : Fixed bugs in UTF-7 decoding of misformed data: 1. Non-ASCII bytes were accepted after shift sequence. 2. A low surrogate could be emitted in case of error in high surrogate.	2015-10-02 13:07:28 +03:00
Victor Stinner	3222da26fe	Make _PyUnicode_TranslateCharmap() symbol private unicodeobject.h exposes PyUnicode_TranslateCharmap() and PyUnicode_Translate().	2015-10-01 22:07:32 +02:00
Victor Stinner	01ada3996b	Issue #25267 : The UTF-8 encoder is now up to 75 times as fast for error handlers: ``ignore``, ``replace``, ``surrogateescape``, ``surrogatepass``. Patch co-written with Serhiy Storchaka.	2015-10-01 21:54:51 +02:00
Victor Stinner	c3713e9706	Optimize ascii/latin1+surrogateescape encoders Issue #25227: Optimize ASCII and latin1 encoders with the ``surrogateescape`` error handler: the encoders are now up to 3 times as fast. Initial patch written by Serhiy Storchaka.	2015-09-29 12:32:13 +02:00
Victor Stinner	0030cd52da	Issue #25227 : Cleanup unicode_encode_ucs1() error handler * Change limit type from unsigned int to Py_UCS4, to use the same type than the "ch" variable (an Unicode character). * Reuse ch variable for _Py_ERROR_XMLCHARREFREPLACE * Add some newlines for readability	2015-09-24 14:45:00 +02:00
Victor Stinner	54385b206d	Issue #24870 : revert unwanted change Sorry, I pushed the patch on the UTF-8 decoder by mistake :-(	2015-09-22 10:46:52 +02:00
Victor Stinner	5ebae87628	Issue #25207 , #14626 : Fix my commit. It doesn't work to use #define XXX defined(YYY)" and then "#ifdef XXX" to check YYY.	2015-09-22 01:29:33 +02:00
Victor Stinner	6174474bea	_PyUnicodeWriter_PrepareInternal(): make the assertion more strict	2015-09-22 01:01:17 +02:00
Victor Stinner	ca9381ea01	Issue #24870 : Add _PyUnicodeWriter_PrepareKind() macro Add a macro which ensures that the writer has at least the requested kind.	2015-09-22 00:58:32 +02:00
Victor Stinner	5014920cb7	Issue #24870 : Reuse the new _Py_error_handler enum Factorize code with the new get_error_handler() function. Add some empty lines for readability.	2015-09-22 00:26:54 +02:00
Victor Stinner	f96418de05	Issue #24870 : Optimize the ASCII decoder for error handlers: surrogateescape, ignore and replace. Initial patch written by Naoki Inada. The decoder is now up to 60 times as fast for these error handlers. Add also unit tests for the ASCII decoder.	2015-09-21 23:06:27 +02:00
Zachary Ware	070bd62cfa	Closes #21279 : Merge with 3.5	2015-08-06 00:05:13 -05:00
Zachary Ware	d987a81d29	Issue #21279 : Merge with 3.4	2015-08-06 00:04:23 -05:00
Zachary Ware	79b98df023	Issue #21279 : Flesh out str.translate docs Initial patch by Kinga Farkas, Martin Panter, and John Posner.	2015-08-05 23:54:15 -05:00
Raymond Hettinger	ac2ef65c32	Make the unicode equality test an external function rather than in-lining it. The real benefit of the unicode specialized function comes from bypassing the overhead of PyObject_RichCompareBool() and not from being in-lined (especially since there was almost no shared data between the caller and callee). Also, the in-lining was having a negative effect on code generation for the callee.	2015-07-04 16:04:44 -07:00
Serhiy Storchaka	d4ea03c785	Issue #24284 : The startswith and endswith methods of the str class no longer return True when finding the empty string and the indexes are completely out of range.	2015-05-31 09:15:51 +03:00
Antoine Pitrou	873e0df946	Fix some compilation warnings when using gcc (-Wmaybe-uninitialized).	2015-05-19 21:06:04 +02:00
Antoine Pitrou	f6d1f1fa8a	Fix some compilation warnings when using gcc (-Wmaybe-uninitialized).	2015-05-19 21:04:33 +02:00
Serhiy Storchaka	0d4df752ac	Issue #15027 : The UTF-32 encoder is now 3x to 7x faster.	2015-05-12 23:12:45 +03:00
Serhiy Storchaka	7e9d1d1a1b	Issue #23908 : os functions now reject paths with embedded null character on Windows instead of silently truncate them. Removed no longer used _PyUnicode_HasNULChars().	2015-04-20 10:12:28 +03:00
Serhiy Storchaka	1009bf18b3	Issue #23501 : Argumen Clinic now generates code into separate files by default.	2015-04-03 23:53:51 +03:00
Victor Stinner	1912b39def	_PyUnicodeWriter_WriteStr() now checks that the input string is consistent in debug mode to detect bugs earlier. _PyUnicodeWriter_Finish() doesn't check if the read only string is consistent, whereas it does check consistency for strings built by itself.	2015-03-26 09:37:23 +01:00
Serhiy Storchaka	d9d769fcdd	Issue #23573 : Increased performance of string search operations (str.find, str.index, str.count, the in operator, str.split, str.partition) with arguments of different kinds (UCS1, UCS2, UCS4).	2015-03-24 21:55:47 +02:00
Victor Stinner	f50e187724	Fix compiler warnings: comparison between signed and unsigned numbers	2015-03-20 11:32:24 +01:00
Victor Stinner	0c39b1b970	Initialize variables to prevent GCC warnings	2015-03-18 15:02:06 +01:00
Steve Dower	3e96f324dc	Issue #23451 : Update pyconfig.h for Windows to require Vista headers and remove unnecessary version checks.	2015-03-02 08:01:10 -08:00
Serhiy Storchaka	78a8249127	Issue #23490 : Fixed possible crashes related to interoperability between old-style and new API for string with 2**30-1 characters.	2015-02-20 21:34:39 +02:00
Serhiy Storchaka	e55181f517	Issue #23490 : Fixed possible crashes related to interoperability between old-style and new API for string with 2**30-1 characters.	2015-02-20 21:34:06 +02:00
Serhiy Storchaka	4d0d982985	Issue #23446 : Use PyMem_New instead of PyMem_Malloc to avoid possible integer overflows. Added few missed PyErr_NoMemory().	2015-02-16 13:33:32 +02:00
Serhiy Storchaka	1a1ff29659	Issue #23446 : Use PyMem_New instead of PyMem_Malloc to avoid possible integer overflows. Added few missed PyErr_NoMemory().	2015-02-16 13:28:22 +02:00
Victor Stinner	29dacf2e97	Issue #15859 : PyUnicode_EncodeFSDefault(), PyUnicode_EncodeMBCS() and PyUnicode_EncodeCodePage() now raise an exception if the object is not an Unicode object. For PyUnicode_EncodeFSDefault(), it was already the case on platforms other than Windows. Patch written by Campbell Barton.	2015-01-26 16:41:32 +01:00
Serhiy Storchaka	bbd3aa8ece	Issue #23321 : Fixed a crash in str.decode() when error handler returned replacment string longer than mailformed input data.	2015-01-26 01:24:31 +02:00
Serhiy Storchaka	7e4b9057b3	Issue #23321 : Fixed a crash in str.decode() when error handler returned replacment string longer than mailformed input data.	2015-01-26 01:22:54 +02:00
Ethan Furman	b95b56150f	Issue20284: Implement PEP461	2015-01-23 20:05:18 -08:00
Serhiy Storchaka	82e07b92b3	Issue #23181 : More "codepoint" -> "code point".	2015-01-18 11:33:31 +02:00
Serhiy Storchaka	d3faf43f9b	Issue #23181 : More "codepoint" -> "code point".	2015-01-18 11:28:37 +02:00
Serhiy Storchaka	b757c83ec6	Issue #22581 : Use more "bytes-like object" throughout the docs and comments.	2014-12-05 22:25:22 +02:00
Serhiy Storchaka	133b11b566	Issue #22975 : Close block at right place.	2014-12-01 18:56:28 +02:00
Serhiy Storchaka	92bf919ed0	Issue #22581 : Use more "bytes-like object" throughout the docs and comments.	2014-12-05 22:26:10 +02:00
Serhiy Storchaka	407249c62b	Issue #22975 : Close block at right place.	2014-12-01 18:56:54 +02:00

1 2 3 4 5 ...

1238 Commits