cpython

Commit Graph

Author	SHA1	Message	Date
Victor Stinner	0016507c16	Issue #25318 : Move _PyBytesWriter to bytesobject.c Declare also the private API in bytesobject.h.	2015-10-09 01:53:21 +02:00
Victor Stinner	e7bf86cd7d	Optimize backslashreplace error handler Issue #25318: Optimize backslashreplace and xmlcharrefreplace error handlers in UTF-8 encoder. Optimize also backslashreplace error handler for ASCII and Latin1 encoders. Use the new _PyBytesWriter API to optimize these error handlers for the encoders. It avoids to create an exception and call the slow implementation of the error handler.	2015-10-09 01:39:28 +02:00
Victor Stinner	fdfbf78114	Issue #25318 : Add _PyBytesWriter API Add a new private API to optimize Unicode encoders. It uses a small buffer allocated on the stack and supports overallocation. Use _PyBytesWriter API for UCS1 (ASCII and Latin1) and UTF-8 encoders. Enable overallocation for the UTF-8 encoder with error handlers. unicode_encode_ucs1(): initialize collend to collstart+1 to not check the current character twice, we already know that it is not ASCII.	2015-10-09 00:33:49 +02:00
Victor Stinner	74e8fac3c8	Issue #25301 : Fix compatibility with ISO C90	2015-10-05 13:49:26 +02:00
Victor Stinner	1d65d9192d	Issue #25301 : The UTF-8 decoder is now up to 15 times as fast for error handlers: ``ignore``, ``replace`` and ``surrogateescape``.	2015-10-05 13:43:50 +02:00
Victor Stinner	eb36fdaad8	Fix _PyUnicodeWriter_PrepareKind() Initialize kind to 0 (PyUnicode_WCHAR_KIND) to ensure that _PyUnicodeWriter_PrepareKind() handles correctly read-only buffer: copy the buffer.	2015-10-03 01:55:51 +02:00
Serhiy Storchaka	29e68edbf4	Issue #24848 : Fixed bugs in UTF-7 decoding of misformed data: 1. Non-ASCII bytes were accepted after shift sequence. 2. A low surrogate could be emitted in case of error in high surrogate. 3. In some circumstances the '\xfd' character was produced instead of the replacement character '\ufffd' (due to a bug in _PyUnicodeWriter).	2015-10-02 13:14:03 +03:00
Serhiy Storchaka	58c8f2bb6d	Issue #24848 : Fixed bugs in UTF-7 decoding of misformed data: 1. Non-ASCII bytes were accepted after shift sequence. 2. A low surrogate could be emitted in case of error in high surrogate. 3. In some circumstances the '\xfd' character was produced instead of the replacement character '\ufffd' (due to a bug in _PyUnicodeWriter).	2015-10-02 13:13:14 +03:00
Serhiy Storchaka	28b21e50c8	Issue #24848 : Fixed bugs in UTF-7 decoding of misformed data: 1. Non-ASCII bytes were accepted after shift sequence. 2. A low surrogate could be emitted in case of error in high surrogate.	2015-10-02 13:07:28 +03:00
Victor Stinner	3222da26fe	Make _PyUnicode_TranslateCharmap() symbol private unicodeobject.h exposes PyUnicode_TranslateCharmap() and PyUnicode_Translate().	2015-10-01 22:07:32 +02:00
Victor Stinner	01ada3996b	Issue #25267 : The UTF-8 encoder is now up to 75 times as fast for error handlers: ``ignore``, ``replace``, ``surrogateescape``, ``surrogatepass``. Patch co-written with Serhiy Storchaka.	2015-10-01 21:54:51 +02:00
Victor Stinner	c3713e9706	Optimize ascii/latin1+surrogateescape encoders Issue #25227: Optimize ASCII and latin1 encoders with the ``surrogateescape`` error handler: the encoders are now up to 3 times as fast. Initial patch written by Serhiy Storchaka.	2015-09-29 12:32:13 +02:00
Victor Stinner	0030cd52da	Issue #25227 : Cleanup unicode_encode_ucs1() error handler * Change limit type from unsigned int to Py_UCS4, to use the same type than the "ch" variable (an Unicode character). * Reuse ch variable for _Py_ERROR_XMLCHARREFREPLACE * Add some newlines for readability	2015-09-24 14:45:00 +02:00
Victor Stinner	54385b206d	Issue #24870 : revert unwanted change Sorry, I pushed the patch on the UTF-8 decoder by mistake :-(	2015-09-22 10:46:52 +02:00
Victor Stinner	5ebae87628	Issue #25207 , #14626 : Fix my commit. It doesn't work to use #define XXX defined(YYY)" and then "#ifdef XXX" to check YYY.	2015-09-22 01:29:33 +02:00
Victor Stinner	6174474bea	_PyUnicodeWriter_PrepareInternal(): make the assertion more strict	2015-09-22 01:01:17 +02:00
Victor Stinner	ca9381ea01	Issue #24870 : Add _PyUnicodeWriter_PrepareKind() macro Add a macro which ensures that the writer has at least the requested kind.	2015-09-22 00:58:32 +02:00
Victor Stinner	5014920cb7	Issue #24870 : Reuse the new _Py_error_handler enum Factorize code with the new get_error_handler() function. Add some empty lines for readability.	2015-09-22 00:26:54 +02:00
Victor Stinner	f96418de05	Issue #24870 : Optimize the ASCII decoder for error handlers: surrogateescape, ignore and replace. Initial patch written by Naoki Inada. The decoder is now up to 60 times as fast for these error handlers. Add also unit tests for the ASCII decoder.	2015-09-21 23:06:27 +02:00
Zachary Ware	070bd62cfa	Closes #21279 : Merge with 3.5	2015-08-06 00:05:13 -05:00
Zachary Ware	d987a81d29	Issue #21279 : Merge with 3.4	2015-08-06 00:04:23 -05:00
Zachary Ware	79b98df023	Issue #21279 : Flesh out str.translate docs Initial patch by Kinga Farkas, Martin Panter, and John Posner.	2015-08-05 23:54:15 -05:00
Raymond Hettinger	ac2ef65c32	Make the unicode equality test an external function rather than in-lining it. The real benefit of the unicode specialized function comes from bypassing the overhead of PyObject_RichCompareBool() and not from being in-lined (especially since there was almost no shared data between the caller and callee). Also, the in-lining was having a negative effect on code generation for the callee.	2015-07-04 16:04:44 -07:00
Serhiy Storchaka	d4ea03c785	Issue #24284 : The startswith and endswith methods of the str class no longer return True when finding the empty string and the indexes are completely out of range.	2015-05-31 09:15:51 +03:00
Antoine Pitrou	873e0df946	Fix some compilation warnings when using gcc (-Wmaybe-uninitialized).	2015-05-19 21:06:04 +02:00
Antoine Pitrou	f6d1f1fa8a	Fix some compilation warnings when using gcc (-Wmaybe-uninitialized).	2015-05-19 21:04:33 +02:00
Serhiy Storchaka	0d4df752ac	Issue #15027 : The UTF-32 encoder is now 3x to 7x faster.	2015-05-12 23:12:45 +03:00
Serhiy Storchaka	7e9d1d1a1b	Issue #23908 : os functions now reject paths with embedded null character on Windows instead of silently truncate them. Removed no longer used _PyUnicode_HasNULChars().	2015-04-20 10:12:28 +03:00
Serhiy Storchaka	1009bf18b3	Issue #23501 : Argumen Clinic now generates code into separate files by default.	2015-04-03 23:53:51 +03:00
Victor Stinner	1912b39def	_PyUnicodeWriter_WriteStr() now checks that the input string is consistent in debug mode to detect bugs earlier. _PyUnicodeWriter_Finish() doesn't check if the read only string is consistent, whereas it does check consistency for strings built by itself.	2015-03-26 09:37:23 +01:00
Serhiy Storchaka	d9d769fcdd	Issue #23573 : Increased performance of string search operations (str.find, str.index, str.count, the in operator, str.split, str.partition) with arguments of different kinds (UCS1, UCS2, UCS4).	2015-03-24 21:55:47 +02:00
Victor Stinner	f50e187724	Fix compiler warnings: comparison between signed and unsigned numbers	2015-03-20 11:32:24 +01:00
Victor Stinner	0c39b1b970	Initialize variables to prevent GCC warnings	2015-03-18 15:02:06 +01:00
Benjamin Peterson	e5a853c390	use PyMem_NEW to detect overflow (closes #23362 )	2015-03-02 13:23:25 -05:00
Steve Dower	3e96f324dc	Issue #23451 : Update pyconfig.h for Windows to require Vista headers and remove unnecessary version checks.	2015-03-02 08:01:10 -08:00
Serhiy Storchaka	78a8249127	Issue #23490 : Fixed possible crashes related to interoperability between old-style and new API for string with 2**30-1 characters.	2015-02-20 21:34:39 +02:00
Serhiy Storchaka	e55181f517	Issue #23490 : Fixed possible crashes related to interoperability between old-style and new API for string with 2**30-1 characters.	2015-02-20 21:34:06 +02:00
Serhiy Storchaka	4d0d982985	Issue #23446 : Use PyMem_New instead of PyMem_Malloc to avoid possible integer overflows. Added few missed PyErr_NoMemory().	2015-02-16 13:33:32 +02:00
Serhiy Storchaka	1a1ff29659	Issue #23446 : Use PyMem_New instead of PyMem_Malloc to avoid possible integer overflows. Added few missed PyErr_NoMemory().	2015-02-16 13:28:22 +02:00
Serhiy Storchaka	4dbc305002	Issue #23055 : Fixed a buffer overflow in PyUnicode_FromFormatV. Analysis and fix by Guido Vranken.	2015-01-27 22:18:46 +02:00
Victor Stinner	29dacf2e97	Issue #15859 : PyUnicode_EncodeFSDefault(), PyUnicode_EncodeMBCS() and PyUnicode_EncodeCodePage() now raise an exception if the object is not an Unicode object. For PyUnicode_EncodeFSDefault(), it was already the case on platforms other than Windows. Patch written by Campbell Barton.	2015-01-26 16:41:32 +01:00
Serhiy Storchaka	bbd3aa8ece	Issue #23321 : Fixed a crash in str.decode() when error handler returned replacment string longer than mailformed input data.	2015-01-26 01:24:31 +02:00
Serhiy Storchaka	7e4b9057b3	Issue #23321 : Fixed a crash in str.decode() when error handler returned replacment string longer than mailformed input data.	2015-01-26 01:22:54 +02:00
Ethan Furman	b95b56150f	Issue20284: Implement PEP461	2015-01-23 20:05:18 -08:00
Serhiy Storchaka	82e07b92b3	Issue #23181 : More "codepoint" -> "code point".	2015-01-18 11:33:31 +02:00
Serhiy Storchaka	d3faf43f9b	Issue #23181 : More "codepoint" -> "code point".	2015-01-18 11:28:37 +02:00
Serhiy Storchaka	b757c83ec6	Issue #22581 : Use more "bytes-like object" throughout the docs and comments.	2014-12-05 22:25:22 +02:00
Serhiy Storchaka	133b11b566	Issue #22975 : Close block at right place.	2014-12-01 18:56:28 +02:00
Serhiy Storchaka	92bf919ed0	Issue #22581 : Use more "bytes-like object" throughout the docs and comments.	2014-12-05 22:26:10 +02:00
Serhiy Storchaka	407249c62b	Issue #22975 : Close block at right place.	2014-12-01 18:56:54 +02:00
Victor Stinner	3aa979e0cd	Issue #20948 : Inline makefmt() in unicode_fromformat_arg()	2014-11-18 21:40:51 +01:00
Antoine Pitrou	b6dc9b7554	Fixed signed/unsigned comparison warning	2014-10-15 23:14:53 +02:00
Antoine Pitrou	4e334241b7	Fixed signed/unsigned comparison warning	2014-10-15 23:14:53 +02:00
Benjamin Peterson	736982d36d	merge 3.4 (closes #22643 )	2014-10-15 12:17:47 -04:00
Benjamin Peterson	9c422f3c3d	merge 3.3	2014-10-15 12:17:33 -04:00
Benjamin Peterson	1e211ff10d	it suffices to check for PY_SSIZE_T_MAX overflow (#22643 )	2014-10-15 12:17:21 -04:00
Benjamin Peterson	315aa40403	Merge 3.4	2014-10-15 11:51:17 -04:00
Benjamin Peterson	60d7a73194	Merge 3.3	2014-10-15 11:51:12 -04:00
Benjamin Peterson	c0e64f5027	make sure length is unsigned	2014-10-15 11:51:05 -04:00
Benjamin Peterson	6925264334	merge 3.4 (#22643 )	2014-10-15 11:49:15 -04:00
Benjamin Peterson	1cbb3fe775	merge 3.3 (#22643 )	2014-10-15 11:48:41 -04:00
Benjamin Peterson	e1bd38c03c	fix integer overflow in unicode case operations (closes #22643 )	2014-10-15 11:47:36 -04:00
Gregory P. Smith	8486f9b134	Fix "warning: comparison between signed and unsigned integer expressions" -Wsign-compare warnings in unicodeobject.c. These were all a result of sizeof() being unsigned and being compared to a Py_ssize_t. Not actual problems.	2014-09-30 00:33:24 -07:00
Benjamin Peterson	fd97a6fb2d	merge 3.4 (#22520 )	2014-09-29 23:02:56 -04:00
Benjamin Peterson	43030ee780	merge 3.3 (#22520 )	2014-09-29 23:02:35 -04:00
Benjamin Peterson	736b8012b4	prevent overflow in unicode_repr (closes #22520 )	2014-09-29 23:02:15 -04:00
Benjamin Peterson	10e4b2545e	merge 3.4 (closes #22518 )	2014-09-29 18:53:58 -04:00
Benjamin Peterson	2b76ce6d27	merge 3.3 (closes #22518 )	2014-09-29 18:50:06 -04:00
Benjamin Peterson	a1c1be4e03	cleanup overflowing handling in unicode_decode_call_errorhandler and unicode_encode_ucs1 (closes #22518 )	2014-09-29 18:18:57 -04:00
Serhiy Storchaka	20b39b27d9	Removed redundant casts to `char `. Corresponding functions now accept `const char ` (issue #1772673).	2014-09-28 11:27:24 +03:00
Benjamin Peterson	fa5021699a	Merge 3.3	2014-10-15 23:58:32 -04:00
Serhiy Storchaka	d8a1447c99	Issue #22215 : Now ValueError is raised instead of TypeError when str or bytes argument contains not permitted null character or byte.	2014-09-06 20:07:17 +03:00
Victor Stinner	12174a5dca	Issue #22156 : Fix "comparison between signed and unsigned integers" compiler warnings in the Objects/ subdirectory. PyType_FromSpecWithBases() and PyType_FromSpec() now reject explicitly negative slot identifiers.	2014-08-15 23:17:38 +02:00
Victor Stinner	f6a271ae98	Issue #18395 : Rename ``_Py_char2wchar()`` to :c:func:`Py_DecodeLocale`, rename ``_Py_wchar2char()`` to :c:func:`Py_EncodeLocale`, and document these functions.	2014-08-01 12:28:48 +02:00
Victor Stinner	e1f17c6c0b	unicodeobject.c: fix a compiler warning on Windows 64 bits	2014-07-25 14:03:03 +02:00
Victor Stinner	c68b7fba86	(Merge 3.4) Issue #21892 , #21893 : Partial revert of changeset 4f55e802baf0, PyErr_Format() uses "%zd" for Py_ssize_t, not PY_FORMAT_SIZE_T	2014-07-04 22:50:13 +02:00
Victor Stinner	a33bce0945	Issue #21892 , #21893 : Partial revert of changeset 4f55e802baf0, PyErr_Format() uses "%zd" for Py_ssize_t, not PY_FORMAT_SIZE_T	2014-07-04 22:47:46 +02:00
Victor Stinner	9f43505f3d	(Merge 3.4) Closes #21892 , #21893 : Use PY_FORMAT_SIZE_T instead of %zi or %zu to format C size_t, because %zi/%u is not supported on all platforms.	2014-07-01 08:57:54 +02:00
Victor Stinner	293f3f526d	Closes #21892 , #21893 : Use PY_FORMAT_SIZE_T instead of %zi or %zu to format C size_t, because %zi/%u is not supported on all platforms.	2014-07-01 08:57:10 +02:00
Serhiy Storchaka	48070c1248	Issue #23803 : Fixed str.partition() and str.rpartition() when a separator is wider then partitioned string.	2015-03-29 19:21:02 +03:00
Benjamin Peterson	92ce1b4392	merge 3.3 (#23362 )	2015-03-02 13:23:41 -05:00
Victor Stinner	4dd25256e2	Issue #21118 : PyLong_AS_LONG() result type is long Even if PyLong_AS_LONG() cannot fail, I prefer to use the right type.	2014-04-08 09:14:21 +02:00
Benjamin Peterson	1365de764e	fix reference leaks in the translate fast path (closes #21175 ) Patch by Josh Rosenberg.	2014-04-07 20:15:41 -04:00
Victor Stinner	872b291b96	Issue #21118 : Optimize also str.translate() for ASCII => ASCII deletion	2014-04-05 14:27:07 +02:00
Victor Stinner	4ff33af257	Issue #21118 : Add unit test for invalid character replacement (code point higher than U+10ffff)	2014-04-05 11:56:37 +02:00
Victor Stinner	89a76abf20	Issue #21118 : Optimize str.translate() for ASCII => ASCII translation	2014-04-05 11:44:04 +02:00
Victor Stinner	8a4422e78d	Issue #21118 : Remove unused variable	2014-04-05 00:15:52 +02:00
Victor Stinner	1194ea020c	Issue #21118 : Use _PyUnicodeWriter API in str.translate() to simplify and factorize the code	2014-04-04 19:37:40 +02:00
Ethan Furman	9ab748013b	Issue19995: more informative error message; spelling corrections; use operator.mod instead of __mod__	2014-03-21 06:38:46 -07:00
Ethan Furman	38d872ee5d	Issue19995: passing a non-int to %o, %c, %x, or %X now raises an exception	2014-03-19 08:38:52 -07:00
Victor Stinner	7d00cc1a64	Issue #20574 : Implement incremental decoder for cp65001 code (Windows code page 65001, Microsoft UTF-8).	2014-03-17 23:08:06 +01:00
Kristján Valur Jónsson	25dded041f	Make the various iterators' "setstate" sliently and consistently clip the index. This avoids the possibility of setting an iterator to an invalid state.	2014-03-05 13:47:57 +00:00
Kristján Valur Jónsson	c5cc5011ac	Make the various iterators' "setstate" sliently and consistently clip the index. This avoids the possibility of setting an iterator to an invalid state.	2014-03-05 15:23:07 +00:00
Serhiy Storchaka	94ee389308	Issue #19619 : Blacklist non-text codecs in method API str.encode, bytes.decode and bytearray.decode now use an internal API to throw LookupError for known non-text encodings, rather than attempting the encoding or decoding operation and then throwing a TypeError for an unexpected output type. The latter mechanism remains in place for third party non-text encodings. Backported changeset d68df99d7a57.	2014-02-24 14:43:03 +02:00
Benjamin Peterson	4267869ad8	merge 3.3 (#20507 )	2014-02-15 13:03:20 -05:00
Benjamin Peterson	9743b2c2b5	give non-iterable TypeError a message (closes #20507 )	2014-02-15 13:02:52 -05:00
Serhiy Storchaka	dfe98a102e	Issue #20437 : Fixed 22 potential bugs when deleting objects references.	2014-02-09 13:46:20 +02:00
Serhiy Storchaka	505ff755d7	Issue #20437 : Fixed 21 potential bugs when deleting objects references.	2014-02-09 13:33:53 +02:00
Larry Hastings	2623c8c23c	Issue #20530 : Argument Clinic's signature format has been revised again. The new syntax is highly human readable while still preventing false positives. The syntax also extends Python syntax to denote "self" and positional-only parameters, allowing inspect.Signature objects to be totally accurate for all supported builtins in Python 3.4.	2014-02-08 22:15:29 -08:00
Serhiy Storchaka	6cbf151032	Issue #20538 : UTF-7 incremental decoder produced inconsistant string when input was truncated in BASE64 section.	2014-02-08 14:06:33 +02:00
Serhiy Storchaka	016a3f33a5	Issue #20538 : UTF-7 incremental decoder produced inconsistant string when input was truncated in BASE64 section.	2014-02-08 14:01:29 +02:00
Larry Hastings	581ee3618c	Issue #20326 : Argument Clinic now uses a simple, unique signature to annotate text signatures in docstrings, resulting in fewer false positives. "self" parameters are also explicitly marked, allowing inspect.Signature() to authoritatively detect (and skip) said parameters. Issue #20326: Argument Clinic now generates separate checksums for the input and output sections of the block, allowing external tools to verify that the input has not changed (and thus the output is not out-of-date).	2014-01-28 05:00:08 -08:00
Larry Hastings	c20472640c	Issue #20390 : Small fixes and improvements for Argument Clinic.	2014-01-25 20:43:29 -08:00
Larry Hastings	5c66189e88	Issue #20189 : Four additional builtin types (PyTypeObject, PyMethodDescr_Type, _PyMethodWrapper_Type, and PyWrapperDescr_Type) have been modified to provide introspection information for builtins. Also: many additional Lib, test suite, and Argument Clinic fixes.	2014-01-24 06:17:25 -08:00
Ethan Furman	a70805e1fa	Issue19995: fixed typo; switched from test.support.check_warnings to assertWarns	2014-01-12 08:42:35 -08:00
Ethan Furman	f9bba9c67f	Issue19995: issue deprecation warning for non-integer values to %c, %o, %x, %X	2014-01-11 23:20:58 -08:00
Larry Hastings	61272b77b0	Issue #19273 : The marker comments Argument Clinic uses have been changed to improve readability.	2014-01-07 12:41:53 -08:00
Ethan Furman	df3ed242c0	Issue19995: %o, %x, %X now only accept ints	2014-01-05 06:50:30 -08:00
Serhiy Storchaka	3079328d29	Reverted changeset b72c5573c5e7 (issue #15027 ).	2014-01-04 22:44:01 +02:00
Serhiy Storchaka	583a93943c	Issue #15027 : Rewrite the UTF-32 encoder. It is now 1.6x to 3.5x faster.	2014-01-04 19:25:37 +02:00
Victor Stinner	fa4e68d425	Remove deadcode (HASH macro is no more defined)	2014-01-03 17:42:18 +01:00
Victor Stinner	92a419eea4	Remove now unused variables	2014-01-03 17:39:40 +01:00
Victor Stinner	f3b46b4a66	unicode_char() uses get_latin1_char() to get latin1 singleton characters	2014-01-03 13:16:00 +01:00
Victor Stinner	985a82a6d2	add unicode_char() in unicodeobject.c to factorize code	2014-01-03 12:53:47 +01:00
Larry Hastings	44e2eaab54	Issue #19674 : inspect.signature() now produces a correct signature for some builtins.	2013-11-23 15:37:55 -08:00
Larry Hastings	ebdcb50b8a	Issue #19730 : Argument Clinic now supports all the existing PyArg "format units" as legacy converters, as well as two new features: "self converters" and the "version" directive.	2013-11-23 14:54:00 -08:00
Nick Coghlan	c72e4e6dcc	Issue #19619 : Blacklist non-text codecs in method API str.encode, bytes.decode and bytearray.decode now use an internal API to throw LookupError for known non-text encodings, rather than attempting the encoding or decoding operation and then throwing a TypeError for an unexpected output type. The latter mechanism remains in place for third party non-text encodings.	2013-11-22 22:39:36 +10:00
Christian Heimes	985ecdcfc2	ssue #19183 : Implement PEP 456 'secure and interchangeable hash algorithm'. Python now uses SipHash24 on all major platforms.	2013-11-20 11:46:18 +01:00
Victor Stinner	4a58707a34	Add _PyUnicodeWriter_WriteASCIIString() function	2013-11-19 12:54:53 +01:00
Serhiy Storchaka	58cf607d13	Issue #12892 : The utf-16* and utf-32* codecs now reject (lone) surrogates. The utf-16* and utf-32* encoders no longer allow surrogate code points (U+D800-U+DFFF) to be encoded. The utf-32* decoders no longer decode byte sequences that correspond to surrogate code points. The surrogatepass error handler now works with the utf-16* and utf-32* codecs. Based on patches by Victor Stinner and Kang-Hao (Kenny) Lu.	2013-11-19 11:32:41 +02:00
Victor Stinner	6989ba0174	Issue #19581 : Change the overallocation factor of _PyUnicodeWriter on Windows On Windows, a factor of 50% gives best performances.	2013-11-18 21:08:39 +01:00
Larry Hastings	ed4a1c5703	Argument Clinic: rename "self" to "module" for module-level functions.	2013-11-18 09:32:13 -08:00
Ezio Melotti	745d54d2fa	#17806 : Added keyword-argument support for "tabsize" to str/bytes.expandtabs().	2013-11-16 19:10:57 +02:00
Nick Coghlan	8b097b4ed7	Close #17828 : better handling of codec errors - output type errors now redirect users to the type-neutral convenience functions in the codecs module - stateless errors that occur during encoding and decoding will now be automatically wrapped in exceptions that give the name of the codec involved	2013-11-13 23:49:21 +10:00
Victor Stinner	66b3270975	_Py_normalize_encoding(): explain how the value 6 was computed	2013-11-07 23:12:23 +01:00
Victor Stinner	df23e30bea	Fix _Py_normalize_encoding(): ensure that buffer is big enough to store "utf-8" if the input string is NULL	2013-11-07 13:33:36 +01:00
Victor Stinner	ad14ccd047	Issue #19512 : add _PyUnicode_CompareWithId() function _PyUnicode_CompareWithId() is faster than PyUnicode_CompareWithASCIIString() when both strings are equal and interned. Add also _PyId_builtins identifier for "builtins" common string.	2013-11-07 00:46:04 +01:00
Victor Stinner	21ea21ef6d	Issue #19424 : PyUnicode_CompareWithASCIIString() normalizes memcmp() result to -1, 0, 1	2013-11-04 11:28:26 +01:00
Victor Stinner	f0c7b2af05	Issue #16286 : remove duplicated identity check from unicode_compare() Move the test to PyUnicode_Compare()	2013-11-04 11:27:14 +01:00
Victor Stinner	fd9e44db37	Issue #16286 : optimize PyUnicode_RichCompare() for identical strings (same pointer) for any operator, not only Py_EQ and Py_NE. Code of bytes_richcompare() and PyUnicode_RichCompare() is now closer.	2013-11-04 11:23:05 +01:00
Victor Stinner	c8bc5377ac	Issue #16286 : write a new subfunction bytes_compare_eq() * cleanup bytes_richcompare() * PyUnicode_RichCompare(): replace a test with a XOR	2013-11-04 11:08:10 +01:00
Victor Stinner	e1b1592fd4	Issue #19424 : Fix a compiler warning on comparing signed/unsigned size_t Patch written by Zachary Ware.	2013-11-03 13:53:12 +01:00
Victor Stinner	a6b9b071a3	Issue #19424 : Fix a compiler warning memcmp() just takes raw pointers	2013-10-30 18:27:13 +01:00
Victor Stinner	602f7cf0b9	Issue #19424 : Optimize PyUnicode_CompareWithASCIIString() Use fast memcmp() instead of a loop using the slow PyUnicode_READ() macro. strlen() is still necessary to check Unicode string containing null bytes.	2013-10-29 23:31:50 +01:00
Victor Stinner	68b674c9d4	Issue #19437 : Fix _PyUnicode_New() (constructor of legacy string), set all attributes before checking for error. The destructor expects all attributes to be set. It is now safe to call Py_DECREF(unicode) in the constructor.	2013-10-29 19:31:43 +01:00
Victor Stinner	fa3ba4c3bc	Issue #18609 : Add a fast-path for "iso8859-1" encoding On AIX, the locale encoding may be "iso8859-1", which was not a known syntax of the legacy ISO 8859-1 encoding. Using a C codec instead of a Python codec is faster but also avoids tricky issues during Python startup or complex code.	2013-10-29 11:34:05 +01:00
Victor Stinner	a5afb58986	Issue #18408 : Fix PyUnicode_AsUTF8AndSize(), raise MemoryError exception on memory allocation failure	2013-10-29 01:28:23 +01:00
Serhiy Storchaka	c679227e31	Issue #1772673 : The type of `char` arguments now changed to `const char`.	2013-10-19 21:03:34 +03:00
Serhiy Storchaka	55e092f545	Issue #19279 : UTF-7 decoder no more produces illegal strings.	2013-10-19 20:39:28 +03:00
Serhiy Storchaka	35804e4c63	Issue #19279 : UTF-7 decoder no more produces illegal strings.	2013-10-19 20:38:19 +03:00
Larry Hastings	3182680210	Issue #16612 : Add "Argument Clinic", a compile-time preprocessor for C files to generate argument parsing code. (See PEP 436.)	2013-10-19 00:09:25 -07:00
Ethan Furman	fb13721b1b	Close #18780 : %-formatting now prints value for int subclasses with %d, %i, and %u codes.	2013-08-31 10:18:55 -07:00
Antoine Pitrou	9ed5f27266	Issue #18722 : Remove uses of the "register" keyword in C code.	2013-08-13 20:18:52 +02:00
Raymond Hettinger	e56666d17f	Silence compiler warning about an uninitialized variable	2013-08-04 11:51:03 -07:00
Raymond Hettinger	5ed1b38a7d	merge	2013-08-04 11:51:35 -07:00
Christian Heimes	b578735dff	Check return value of PyType_Ready(&EncodingMapType) CID 486654	2013-07-20 14:57:28 +02:00
Christian Heimes	26532f7519	Check return value of PyType_Ready(&EncodingMapType) CID 486654	2013-07-20 14:57:16 +02:00
Victor Stinner	e699e5a218	Issue #18408 : Don't check unicode consistency in _PyUnicode_HAS_UTF8_MEMORY() and _PyUnicode_HAS_WSTR_MEMORY() macros These macros are called in unicode_dealloc(), whereas the unicode object can be "inconsistent" if the creation of the object failed. For example, when unicode_subtype_new() fails on a memory allocation, _PyUnicode_CheckConsistency() fails with an assertion error because data is NULL.	2013-07-15 18:22:47 +02:00
Victor Stinner	9e6b4d715c	Issue #18408 : _PyUnicodeWriter_Finish() now clears its buffer attribute in all cases, so _PyUnicodeWriter_Dealloc() can be called after finish.	2013-07-09 00:37:24 +02:00
Victor Stinner	15a0bd3965	Issue #18408 : Fix _PyUnicodeWriter_Finish(): clear writer->buffer, so _PyUnicodeWriter_Dealloc() can be called on the writer after finish.	2013-07-08 22:29:55 +02:00
Victor Stinner	6f8eeee7b9	Issue #18203 : Fix _Py_DecodeUTF8_surrogateescape(), use PyMem_RawMalloc() as _Py_char2wchar()	2013-07-07 22:57:45 +02:00
Victor Stinner	1a7425f67a	Issue #18203 : Replace malloc() with PyMem_RawMalloc() at Python initialization * Replace malloc() with PyMem_RawMalloc() * Replace PyMem_Malloc() with PyMem_RawMalloc() where the GIL is not held. * _Py_char2wchar() now returns a buffer allocated by PyMem_RawMalloc(), instead of PyMem_Malloc()	2013-07-07 16:25:15 +02:00
Christian Heimes	d47802eef7	Fix ref leak in error case of unicode find, count, formatlong CID 983315: Resource leak (RESOURCE_LEAK) CID 983316: Resource leak (RESOURCE_LEAK) CID 983317: Resource leak (RESOURCE_LEAK)	2013-06-29 21:33:36 +02:00
Christian Heimes	d47a0456b1	Fix ref leak in error case of unicode index CID 983319 (#1 of 2): Resource leak (RESOURCE_LEAK) leaked_storage: Variable substring going out of scope leaks the storage it points to.	2013-06-29 21:21:37 +02:00
Christian Heimes	ea71a525c3	Fix ref leak in error case of unicode rindex and rfind CID 983320: Resource leak (RESOURCE_LEAK) CID 983321: Resource leak (RESOURCE_LEAK) leaked_storage: Variable substring going out of scope leaks the storage it points to.	2013-06-29 21:17:34 +02:00
Christian Heimes	305e49e17e	Fix memory leak in endswith CID 1040368 (#1 of 1): Resource leak (RESOURCE_LEAK) leaked_storage: Variable substring going out of scope leaks the storage it points to.	2013-06-29 20:41:06 +02:00
Serhiy Storchaka	c89533f72f	Issue #18184 : PyUnicode_FromFormat() and PyUnicode_FromFormatV() now raise OverflowError when an argument of %c format is out of range.	2013-06-23 20:21:16 +03:00
Serhiy Storchaka	8eeae2126c	Issue #18184 : PyUnicode_FromFormat() and PyUnicode_FromFormatV() now raise OverflowError when an argument of %c format is out of range.	2013-06-23 20:12:14 +03:00
Benjamin Peterson	3164f5d565	merge 3.3 (#18183 )	2013-06-10 09:24:01 -07:00
Benjamin Peterson	7e30373126	remove MAX_MAXCHAR because it's unsafe for computing maximum codepoitn value (see #18183 )	2013-06-10 09:19:46 -07:00
Victor Stinner	9f067f490f	Issue #9566 : Fix compiler warning on Windows 64-bit	2013-06-05 00:21:31 +02:00
Antoine Pitrou	7ce35a1816	Issue #17237 : Fix crash in the ASCII decoder on m68k.	2013-05-11 15:59:37 +02:00
Antoine Pitrou	8b0e98426d	Issue #17237 : Fix crash in the ASCII decoder on m68k.	2013-05-11 15:58:34 +02:00
Victor Stinner	f4f24248dc	Fix uninitialized value in charmap_decode_mapping()	2013-05-07 01:01:31 +02:00
Victor Stinner	8cecc8c262	Issue #7330 : Implement width and precision (ex: "%5.3s") for the format string of PyUnicode_FromFormat() function, original patch written by Ysj Ray.	2013-05-06 23:11:54 +02:00
Victor Stinner	bb4503f61e	Partial revert of changeset 9744b2df134c PyUnicode_Append() cannot call directly resize_compact(): I forgot that a string can be ready and not compact (a legacy string can also be ready).	2013-04-18 09:41:34 +02:00
Victor Stinner	fb161b1b6d	Split PyUnicode_DecodeCharmap() into subfunction for readability	2013-04-18 01:44:27 +02:00
Victor Stinner	170ca6f84b	Fix bug in Unicode decoders related to _PyUnicodeWriter Bug introduced by changesets 7ed9993d53b4 and edf029fc9591.	2013-04-18 00:25:28 +02:00
Victor Stinner	376cfa122d	Fix typo in unicode_decode_call_errorhandler_writer() Bug introduced by changeset 7ed9993d53b4.	2013-04-17 23:58:16 +02:00
Victor Stinner	8f674ccd64	Close #17694 : Add minimum length to _PyUnicodeWriter * Add also min_char attribute to _PyUnicodeWriter structure (currently unused) * _PyUnicodeWriter_Init() has no more argument (except the writer itself): min_length and overallocate must be set explicitly * In error handlers, only enable overallocation if the replacement string is longer than 1 character * CJK decoders don't use overallocation anymore * Set min_length, instead of preallocating memory using _PyUnicodeWriter_Prepare(), in many decoders * _PyUnicode_DecodeUnicodeInternal() checks for integer overflow	2013-04-17 23:02:17 +02:00
Victor Stinner	77282cb4f8	Cleanup PyUnicode_Contains() * No need to double-check that strings are ready: test already done by PyUnicode_FromObject() * Remove useless kind variable (use kind1 instead)	2013-04-14 19:22:47 +02:00
Victor Stinner	d92e078c8d	Minor change: fix character in do_strip() for the ASCII case	2013-04-14 19:17:42 +02:00
Victor Stinner	f033510fee	Cleanup PyUnicode_Append() * Check also that right is a Unicode object * call directly resize_compact() instead of unicode_resize() for a more explicit error handling, and to avoid testing some properties twice (ex: unicode_modifiable())	2013-04-14 19:13:03 +02:00
Victor Stinner	4560f9c63f	PyUnicode_Join(): move use_memcpy test out of the loop to cleanup and optimize the code	2013-04-14 18:56:46 +02:00
Victor Stinner	55c08781e8	Optimize repr(str): use _PyUnicode_FastCopyCharacters() when no character is escaped	2013-04-14 18:45:39 +02:00
Victor Stinner	af03757d20	Optimize ascii(str): don't encode/decode repr if repr is already ASCII	2013-04-14 18:44:10 +02:00
Victor Stinner	8a1a6cffd6	Add _PyUnicodeWriter_WriteCharInline()	2013-04-14 02:35:33 +02:00
Serhiy Storchaka	e2cef885a2	Issue #16061 : Speed up str.replace() for replacing 1-character strings.	2013-04-13 22:45:04 +03:00
Victor Stinner	a0dd0213cc	Close #17693 : Rewrite CJK decoders to use the _PyUnicodeWriter API instead of the legacy Py_UNICODE API. Add also a new _PyUnicodeWriter_WriteChar() function.	2013-04-11 22:09:04 +02:00
Victor Stinner	247109e74d	Issue #17615 : On Windows (VS2010), Performances of wmemcmp() to compare Unicode strings are not convincing. For UCS2 (16-bit wchar_t type), use a dummy loop instead of wmemcmp(). The dummy loop is as fast, or a little bit faster. wchar_t is only 16-bit long on Windows. wmemcmp() is still used for 32-bit wchar_t.	2013-04-09 23:53:26 +02:00
Victor Stinner	0cff4b16d9	replace(): only call PyUnicode_DATA(u) once	2013-04-09 22:52:48 +02:00
Victor Stinner	cc7af72192	Write super-fast version of str.strip(), str.lstrip() and str.rstrip() for pure ASCII	2013-04-09 22:39:24 +02:00
Victor Stinner	f50a4e9bc9	Don't calls macros in PyUnicode_WRITE() parameters PyUnicode_WRITE() expands some parameters twice or more.	2013-04-09 22:38:52 +02:00
Victor Stinner	9c79e41fc5	Fix do_strip(): don't call PyUnicode_READ() in Py_UNICODE_ISSPACE() to not call it twice	2013-04-09 22:21:08 +02:00
Victor Stinner	b3a6014504	Fix _PyUnicode_XStrip() Inline the BLOOM_MEMBER() to only call PyUnicode_READ() only once (per loop iteration). Store also the length of the seperator in a variable to avoid calls to PyUnicode_GET_LENGTH().	2013-04-09 22:19:21 +02:00
Victor Stinner	63d5c1a14a	Optimize PyUnicode_DecodeCharmap() Avoid expensive PyUnicode_READ() and PyUnicode_WRITE(), manipulate pointers instead.	2013-04-09 22:13:33 +02:00
Victor Stinner	a85af502a4	Optimize make_bloom_mask(), used by str.strip(), str.lstrip() and str.rstrip() Write specialized functions per Unicode kind to avoid the expensive PyUnicode_READ() macro.	2013-04-09 21:53:54 +02:00
Victor Stinner	69ed0f4c86	Use PyUnicode_READ() instead of PyUnicode_READ_CHAR() "PyUnicode_READ_CHAR() is less efficient than PyUnicode_READ() because it calls PyUnicode_KIND() and might call it twice." according to its documentation.	2013-04-09 21:48:24 +02:00
Victor Stinner	03c3e35d42	Add fast-path in PyUnicode_DecodeCharmap() for pure 8 bit encodings: cp037, cp500 and iso8859_1 codecs	2013-04-09 21:53:09 +02:00
Victor Stinner	cd777eaf53	Issue #17615 : Comparing two Unicode strings now uses wmemcmp() when possible wmemcmp() is twice faster than a dummy loop (342 usec vs 744 usec) on Fedora 18/x86_64, GCC 4.7.2.	2013-04-08 22:43:44 +02:00
Victor Stinner	c1302bba4c	Issue #17615 : Expand expensive PyUnicode_READ() macro in unicode_compare(): write specialized functions for each combination of Unicode kinds.	2013-04-08 21:50:54 +02:00
Victor Stinner	207dd38726	fix unused variable	2013-04-03 03:14:58 +02:00
Victor Stinner	eb4b5ac8af	Close #16757 : Avoid calling the expensive _PyUnicode_FindMaxChar() function when possible	2013-04-03 02:02:33 +02:00
Victor Stinner	cfc4c13b04	Add _PyUnicodeWriter_WriteSubstring() function Write a function to enable more optimizations: * If the substring is the whole string and overallocation is disabled, just keep a reference to the string, don't copy characters * Avoid a call to the expensive _PyUnicode_FindMaxChar() function when possible	2013-04-03 01:48:39 +02:00
Raymond Hettinger	51612fd803	merge	2013-03-23 08:21:52 -07:00
Raymond Hettinger	378170d5d9	Issue 17447: Clarify that str.isidentifier doesn't check for reserved keywords.	2013-03-23 08:21:12 -07:00
Victor Stinner	fb84b5d48d	(Merge 3.3) _PyUnicode_Writer() now also reuses Unicode singletons: empty string and latin1 single character	2013-03-06 19:29:09 +01:00
Victor Stinner	2cb16aa3cb	_PyUnicode_Writer() now also reuses Unicode singletons: empty string and latin1 single character	2013-03-06 19:28:37 +01:00
Victor Stinner	cf77da9fb5	Backed out changeset b9f7b1bf36aa	2013-03-06 01:09:24 +01:00
Victor Stinner	313cac88c5	Issue #17223 : Fix PyUnicode_FromUnicode() on Windows (16-bit wchar_t type) to reject invalid UTF-16 surrogate.	2013-03-06 00:41:50 +01:00

... 2 3 4 5 6 ...

1386 Commits