cpython

Commit Graph

Author	SHA1	Message	Date
Serhiy Storchaka	413fdcea21	Issue #24821 : Refactor STRINGLIB(fastsearch_memchr_1char) and split it on STRINGLIB(find_char) and STRINGLIB(rfind_char) that can be used independedly without special preconditions.	2015-11-14 15:42:17 +02:00
Victor Stinner	6bd525b656	Optimize error handlers of ASCII and Latin1 encoders when the replacement string is pure ASCII: use _PyBytesWriter_WriteBytes(), don't check individual character. Cleanup unicode_encode_ucs1(): * Rename repunicode to rep * Clear rep object on error * Factorize code between bytes and unicode path	2015-10-09 13:10:05 +02:00
Victor Stinner	ce179bf6ba	Add _PyBytesWriter_WriteBytes() to factorize the code	2015-10-09 12:57:22 +02:00
Victor Stinner	ad7715891e	_PyBytesWriter: simplify code to avoid "prealloc" parameters Substract preallocate bytes from min_size before calling _PyBytesWriter_Prepare().	2015-10-09 12:38:53 +02:00
Victor Stinner	e7bf86cd7d	Optimize backslashreplace error handler Issue #25318: Optimize backslashreplace and xmlcharrefreplace error handlers in UTF-8 encoder. Optimize also backslashreplace error handler for ASCII and Latin1 encoders. Use the new _PyBytesWriter API to optimize these error handlers for the encoders. It avoids to create an exception and call the slow implementation of the error handler.	2015-10-09 01:39:28 +02:00
Victor Stinner	fdfbf78114	Issue #25318 : Add _PyBytesWriter API Add a new private API to optimize Unicode encoders. It uses a small buffer allocated on the stack and supports overallocation. Use _PyBytesWriter API for UCS1 (ASCII and Latin1) and UTF-8 encoders. Enable overallocation for the UTF-8 encoder with error handlers. unicode_encode_ucs1(): initialize collend to collstart+1 to not check the current character twice, we already know that it is not ASCII.	2015-10-09 00:33:49 +02:00
Victor Stinner	01ada3996b	Issue #25267 : The UTF-8 encoder is now up to 75 times as fast for error handlers: ``ignore``, ``replace``, ``surrogateescape``, ``surrogatepass``. Patch co-written with Serhiy Storchaka.	2015-10-01 21:54:51 +02:00
Eric V. Smith	ab2aa6dc91	Fixed an incorrect comment.	2015-08-26 14:10:32 -04:00
Serhiy Storchaka	9ce71a6475	Fixed typos in comments.	2015-05-18 22:20:18 +03:00
Serhiy Storchaka	7e29eea926	Fixed typos in comments.	2015-05-18 22:19:42 +03:00
Serhiy Storchaka	0d4df752ac	Issue #15027 : The UTF-32 encoder is now 3x to 7x faster.	2015-05-12 23:12:45 +03:00
Serhiy Storchaka	d9d769fcdd	Issue #23573 : Increased performance of string search operations (str.find, str.index, str.count, the in operator, str.split, str.partition) with arguments of different kinds (UCS1, UCS2, UCS4).	2015-03-24 21:55:47 +02:00
Serhiy Storchaka	009b811d67	Removed unintentional trailing spaces in non-external and non-generated C files.	2015-03-18 21:53:15 +02:00
Serhiy Storchaka	4fdb68491e	Issue #22896 : Avoid to use PyObject_AsCharBuffer(), PyObject_AsReadBuffer() and PyObject_AsWriteBuffer().	2015-02-03 01:21:08 +02:00
Serhiy Storchaka	b757c83ec6	Issue #22581 : Use more "bytes-like object" throughout the docs and comments.	2014-12-05 22:25:22 +02:00
Benjamin Peterson	1cc9520327	s/stringobject/bytesobject/ (closes #22036 ) Patch by Martin Matusiak.	2014-07-23 21:39:37 -07:00
Benjamin Peterson	d455ce4fd4	merge 3.3	2014-03-30 19:52:39 -04:00
Benjamin Peterson	0ad6098b67	merge 3.2	2014-03-30 19:52:22 -04:00
Benjamin Peterson	23cf403ca1	fix expandtabs overflow detection to be consistent and not rely on signed overflow	2014-03-30 19:47:57 -04:00
Serhiy Storchaka	3079328d29	Reverted changeset b72c5573c5e7 (issue #15027 ).	2014-01-04 22:44:01 +02:00
Serhiy Storchaka	583a93943c	Issue #15027 : Rewrite the UTF-32 encoder. It is now 1.6x to 3.5x faster.	2014-01-04 19:25:37 +02:00
Benjamin Peterson	0ee22bf774	fix format spec recursive expansion (closes #19729 )	2013-11-26 19:22:36 -06:00
Serhiy Storchaka	dc2fd5101a	Remove dead code committed in issue #12892 .	2013-11-19 15:56:05 +02:00
Serhiy Storchaka	58cf607d13	Issue #12892 : The utf-16* and utf-32* codecs now reject (lone) surrogates. The utf-16* and utf-32* encoders no longer allow surrogate code points (U+D800-U+DFFF) to be encoded. The utf-32* decoders no longer decode byte sequences that correspond to surrogate code points. The surrogatepass error handler now works with the utf-16* and utf-32* codecs. Based on patches by Victor Stinner and Kang-Hao (Kenny) Lu.	2013-11-19 11:32:41 +02:00
Ezio Melotti	745d54d2fa	#17806 : Added keyword-argument support for "tabsize" to str/bytes.expandtabs().	2013-11-16 19:10:57 +02:00
Victor Stinner	cc64eb5b9f	Issue #18408 : Fix bytearrayiter.partition()/rpartition(), handle PyByteArray_FromStringAndSize() failure (ex: on memory allocation failure)	2013-10-29 03:15:37 +01:00
Serhiy Storchaka	8fa8ee3970	Issue #18701 : Remove support of old CPython versions (<3.0) from C code.	2013-08-17 00:48:02 +03:00
Raymond Hettinger	d06eeb4a24	merge	2013-08-13 18:20:55 -07:00
Raymond Hettinger	b1b915c796	Issue 18719: Remove a false optimization Remove an unused early-out test from the critical path for dict and set lookups. When the strings already have matching lengths, kinds, and hashes, there is no additional information gained by checking the first characters (the probability of a mismatch is already known to be less than 1 in 2**64).	2013-08-13 18:16:34 -07:00
Antoine Pitrou	9ed5f27266	Issue #18722 : Remove uses of the "register" keyword in C code.	2013-08-13 20:18:52 +02:00
Benjamin Peterson	d2b58a9880	only recursively expand in the format spec (closes #17644 )	2013-05-17 17:34:30 -05:00
Benjamin Peterson	4d94474ba3	rewrite the parsing of field names to be more consistent wrt recursive expansion	2013-05-17 18:22:31 -05:00
Benjamin Peterson	48953632df	merge 3.3	2013-05-17 17:35:28 -05:00
Ezio Melotti	5263c13801	Merge removal of trailing whitespace from 3.3.	2013-04-21 04:08:18 +03:00
Ezio Melotti	6b02772c13	Remove trailing whitespace.	2013-04-21 04:07:51 +03:00
Victor Stinner	8f674ccd64	Close #17694 : Add minimum length to _PyUnicodeWriter * Add also min_char attribute to _PyUnicodeWriter structure (currently unused) * _PyUnicodeWriter_Init() has no more argument (except the writer itself): min_length and overallocate must be set explicitly * In error handlers, only enable overallocation if the replacement string is longer than 1 character * CJK decoders don't use overallocation anymore * Set min_length, instead of preallocating memory using _PyUnicodeWriter_Prepare(), in many decoders * _PyUnicode_DecodeUnicodeInternal() checks for integer overflow	2013-04-17 23:02:17 +02:00
Victor Stinner	76b3b2726c	stringlib: remove unused STRINGLIB_RESIZE macro	2013-04-14 16:29:09 +02:00
Serhiy Storchaka	e2cef885a2	Issue #16061 : Speed up str.replace() for replacing 1-character strings.	2013-04-13 22:45:04 +03:00
Victor Stinner	7efa3b8242	Close #13126 : "Simplify" FASTSEARCH() code to help the compiler to emit more efficient machine code. Patch written by Antoine Pitrou. Without this change, str.find() was 10% slower than str.rfind() in the worst case.	2013-04-08 00:26:43 +02:00
Victor Stinner	cfc4c13b04	Add _PyUnicodeWriter_WriteSubstring() function Write a function to enable more optimizations: * If the substring is the whole string and overallocation is disabled, just keep a reference to the string, don't copy characters * Avoid a call to the expensive _PyUnicode_FindMaxChar() function when possible	2013-04-03 01:48:39 +02:00
Serhiy Storchaka	06b16f879f	Remove unused defines.	2013-02-23 14:49:09 +02:00
Serhiy Storchaka	18809fa94e	Remove unused defines.	2013-02-23 14:48:16 +02:00
Antoine Pitrou	4de7457009	Issue #17173 : Remove uses of locale-dependent C functions (isalpha() etc.) in the interpreter. I've left a couple of them in: zlib (third-party lib), getaddrinfo.c (doesn't include Python.h, and probably obsolete), _sre.c (legitimate use for the re.LOCALE flag).	2013-02-09 23:11:27 +01:00
Serhiy Storchaka	b946af5897	Check for NULL before the pointer aligning in fastsearch_memchr_1char. There is no guarantee that NULL is aligned.	2013-01-15 13:32:41 +02:00
Serhiy Storchaka	18ba40b945	Check for NULL before the pointer aligning in fastsearch_memchr_1char. There is no guarantee that NULL is aligned.	2013-01-15 13:27:28 +02:00
Christian Heimes	5f7e8dab11	Issue #16592 : stringlib_bytes_join doesn't raise MemoryError on allocation failure	2012-12-02 07:56:42 +01:00
Victor Stinner	6caa6fb535	(Merge 3.3) Issue #8271 : Fix compilation on Windows	2012-11-05 00:00:50 +01:00
Victor Stinner	ab60de478d	Issue #8271 : Fix compilation on Windows	2012-11-04 23:59:15 +01:00
Ezio Melotti	cfa9636404	#8271 : merge with 3.3.	2012-11-04 23:23:09 +02:00
Ezio Melotti	f7ed5d111b	#8271 : the utf-8 decoder now outputs the correct number of U+FFFD characters when used with the "replace" error handler on invalid utf-8 sequences. Patch by Serhiy Storchaka, tests by Ezio Melotti.	2012-11-04 23:21:38 +02:00

1 2 3 4 5

204 Commits