Serhiy Storchaka
998c9cdd42
Issue #28561 : Clean up UTF-8 encoder: remove dead code, update comments, etc.
...
Patch by Xiang Zhang.
2016-10-30 18:25:27 +02:00
Christian Heimes
f051e43b22
Issue #28126 : Replace Py_MEMCPY with memcpy(). Visual Studio can properly optimize memcpy().
2016-09-13 20:22:02 +02:00
Benjamin Peterson
621b430a14
remove all usage of Py_LOCAL
2016-09-09 13:54:34 -07:00
Victor Stinner
1a05d6c04d
PEP 7 style for if/else in C
...
Add also a newline for readability in normalize_encoding().
2016-09-02 12:12:23 +02:00
Raymond Hettinger
15f44ab043
Issue #27895 : Spelling fixes (Contributed by Ville Skyttä).
2016-08-30 10:47:49 -07:00
Serhiy Storchaka
e09132f2c7
Backed out changeset b0087e17cd5e (issue #26765 )
...
For unknown reasons it perhaps caused a crash on 32-bit Windows (issue #).
2016-07-03 13:57:48 +03:00
Serhiy Storchaka
355048970b
Issue #26765 : Moved wrappers for bytes and bytearray methods to common header
...
file.
2016-07-01 17:57:30 +03:00
Serhiy Storchaka
bcde10aa7e
Issue #26765 : Ensure that bytes- and unicode-specific stringlib files are used
...
with correct type.
2016-05-16 09:42:29 +03:00
Serhiy Storchaka
fb81d3cbe7
Issue #26765 : Moved common code for the replace() method of bytes and bytearray
...
to a template file.
2016-05-05 09:26:07 +03:00
Serhiy Storchaka
dd40fc3e57
Issue #26765 : Moved common code and docstrings for bytes and bytearray methods
...
to bytes_methods.c.
2016-05-04 22:23:26 +03:00
Serhiy Storchaka
b6a9c9761c
Issue #26778 : Fixed "a/an/and" typos in code comment, documentation and error
...
messages.
2016-04-17 09:39:28 +03:00
Serhiy Storchaka
6a7b3a77b4
Issue #26778 : Fixed "a/an/and" typos in code comment and documentation.
2016-04-17 08:32:47 +03:00
Serhiy Storchaka
21a663ea28
Issue #26057 : Got rid of nonneeded use of PyUnicode_FromObject().
2016-04-13 15:37:23 +03:00
Serhiy Storchaka
413fdcea21
Issue #24821 : Refactor STRINGLIB(fastsearch_memchr_1char) and split it on
...
STRINGLIB(find_char) and STRINGLIB(rfind_char) that can be used independedly
without special preconditions.
2015-11-14 15:42:17 +02:00
Victor Stinner
6bd525b656
Optimize error handlers of ASCII and Latin1 encoders when the replacement
...
string is pure ASCII: use _PyBytesWriter_WriteBytes(), don't check individual
character.
Cleanup unicode_encode_ucs1():
* Rename repunicode to rep
* Clear rep object on error
* Factorize code between bytes and unicode path
2015-10-09 13:10:05 +02:00
Victor Stinner
ce179bf6ba
Add _PyBytesWriter_WriteBytes() to factorize the code
2015-10-09 12:57:22 +02:00
Victor Stinner
ad7715891e
_PyBytesWriter: simplify code to avoid "prealloc" parameters
...
Substract preallocate bytes from min_size before calling
_PyBytesWriter_Prepare().
2015-10-09 12:38:53 +02:00
Victor Stinner
e7bf86cd7d
Optimize backslashreplace error handler
...
Issue #25318 : Optimize backslashreplace and xmlcharrefreplace error handlers in
UTF-8 encoder. Optimize also backslashreplace error handler for ASCII and
Latin1 encoders.
Use the new _PyBytesWriter API to optimize these error handlers for the
encoders. It avoids to create an exception and call the slow implementation of
the error handler.
2015-10-09 01:39:28 +02:00
Victor Stinner
fdfbf78114
Issue #25318 : Add _PyBytesWriter API
...
Add a new private API to optimize Unicode encoders. It uses a small buffer
allocated on the stack and supports overallocation.
Use _PyBytesWriter API for UCS1 (ASCII and Latin1) and UTF-8 encoders. Enable
overallocation for the UTF-8 encoder with error handlers.
unicode_encode_ucs1(): initialize collend to collstart+1 to not check the
current character twice, we already know that it is not ASCII.
2015-10-09 00:33:49 +02:00
Victor Stinner
01ada3996b
Issue #25267 : The UTF-8 encoder is now up to 75 times as fast for error
...
handlers: ``ignore``, ``replace``, ``surrogateescape``, ``surrogatepass``.
Patch co-written with Serhiy Storchaka.
2015-10-01 21:54:51 +02:00
Eric V. Smith
ab2aa6dc91
Fixed an incorrect comment.
2015-08-26 14:10:32 -04:00
Serhiy Storchaka
9ce71a6475
Fixed typos in comments.
2015-05-18 22:20:18 +03:00
Serhiy Storchaka
7e29eea926
Fixed typos in comments.
2015-05-18 22:19:42 +03:00
Serhiy Storchaka
0d4df752ac
Issue #15027 : The UTF-32 encoder is now 3x to 7x faster.
2015-05-12 23:12:45 +03:00
Serhiy Storchaka
d9d769fcdd
Issue #23573 : Increased performance of string search operations (str.find,
...
str.index, str.count, the in operator, str.split, str.partition) with
arguments of different kinds (UCS1, UCS2, UCS4).
2015-03-24 21:55:47 +02:00
Serhiy Storchaka
009b811d67
Removed unintentional trailing spaces in non-external and non-generated C files.
2015-03-18 21:53:15 +02:00
Serhiy Storchaka
4fdb68491e
Issue #22896 : Avoid to use PyObject_AsCharBuffer(), PyObject_AsReadBuffer()
...
and PyObject_AsWriteBuffer().
2015-02-03 01:21:08 +02:00
Serhiy Storchaka
b757c83ec6
Issue #22581 : Use more "bytes-like object" throughout the docs and comments.
2014-12-05 22:25:22 +02:00
Benjamin Peterson
1cc9520327
s/stringobject/bytesobject/ ( closes #22036 )
...
Patch by Martin Matusiak.
2014-07-23 21:39:37 -07:00
Benjamin Peterson
d455ce4fd4
merge 3.3
2014-03-30 19:52:39 -04:00
Benjamin Peterson
0ad6098b67
merge 3.2
2014-03-30 19:52:22 -04:00
Benjamin Peterson
23cf403ca1
fix expandtabs overflow detection to be consistent and not rely on signed overflow
2014-03-30 19:47:57 -04:00
Serhiy Storchaka
3079328d29
Reverted changeset b72c5573c5e7 (issue #15027 ).
2014-01-04 22:44:01 +02:00
Serhiy Storchaka
583a93943c
Issue #15027 : Rewrite the UTF-32 encoder. It is now 1.6x to 3.5x faster.
2014-01-04 19:25:37 +02:00
Benjamin Peterson
0ee22bf774
fix format spec recursive expansion ( closes #19729 )
2013-11-26 19:22:36 -06:00
Serhiy Storchaka
dc2fd5101a
Remove dead code committed in issue #12892 .
2013-11-19 15:56:05 +02:00
Serhiy Storchaka
58cf607d13
Issue #12892 : The utf-16* and utf-32* codecs now reject (lone) surrogates.
...
The utf-16* and utf-32* encoders no longer allow surrogate code points
(U+D800-U+DFFF) to be encoded.
The utf-32* decoders no longer decode byte sequences that correspond to
surrogate code points.
The surrogatepass error handler now works with the utf-16* and utf-32* codecs.
Based on patches by Victor Stinner and Kang-Hao (Kenny) Lu.
2013-11-19 11:32:41 +02:00
Ezio Melotti
745d54d2fa
#17806 : Added keyword-argument support for "tabsize" to str/bytes.expandtabs().
2013-11-16 19:10:57 +02:00
Victor Stinner
cc64eb5b9f
Issue #18408 : Fix bytearrayiter.partition()/rpartition(), handle
...
PyByteArray_FromStringAndSize() failure (ex: on memory allocation failure)
2013-10-29 03:15:37 +01:00
Serhiy Storchaka
8fa8ee3970
Issue #18701 : Remove support of old CPython versions (<3.0) from C code.
2013-08-17 00:48:02 +03:00
Raymond Hettinger
d06eeb4a24
merge
2013-08-13 18:20:55 -07:00
Raymond Hettinger
b1b915c796
Issue 18719: Remove a false optimization
...
Remove an unused early-out test from the critical path for
dict and set lookups.
When the strings already have matching lengths, kinds, and hashes,
there is no additional information gained by checking the first
characters (the probability of a mismatch is already known to
be less than 1 in 2**64).
2013-08-13 18:16:34 -07:00
Antoine Pitrou
9ed5f27266
Issue #18722 : Remove uses of the "register" keyword in C code.
2013-08-13 20:18:52 +02:00
Benjamin Peterson
d2b58a9880
only recursively expand in the format spec ( closes #17644 )
2013-05-17 17:34:30 -05:00
Benjamin Peterson
4d94474ba3
rewrite the parsing of field names to be more consistent wrt recursive expansion
2013-05-17 18:22:31 -05:00
Benjamin Peterson
48953632df
merge 3.3
2013-05-17 17:35:28 -05:00
Ezio Melotti
5263c13801
Merge removal of trailing whitespace from 3.3.
2013-04-21 04:08:18 +03:00
Ezio Melotti
6b02772c13
Remove trailing whitespace.
2013-04-21 04:07:51 +03:00
Victor Stinner
8f674ccd64
Close #17694 : Add minimum length to _PyUnicodeWriter
...
* Add also min_char attribute to _PyUnicodeWriter structure (currently unused)
* _PyUnicodeWriter_Init() has no more argument (except the writer itself):
min_length and overallocate must be set explicitly
* In error handlers, only enable overallocation if the replacement string
is longer than 1 character
* CJK decoders don't use overallocation anymore
* Set min_length, instead of preallocating memory using
_PyUnicodeWriter_Prepare(), in many decoders
* _PyUnicode_DecodeUnicodeInternal() checks for integer overflow
2013-04-17 23:02:17 +02:00
Victor Stinner
76b3b2726c
stringlib: remove unused STRINGLIB_RESIZE macro
2013-04-14 16:29:09 +02:00