Miss Islington (bot)
c755ca89c7
[3.7] bpo-24214: Fixed the UTF-8 and UTF-16 incremental decoders. (GH-14304) (GH-14369)
...
* bpo-24214: Fixed the UTF-8 and UTF-16 incremental decoders. (GH-14304)
* The UTF-8 incremental decoders fails now fast if encounter
a sequence that can't be handled by the error handler.
* The UTF-16 incremental decoders with the surrogatepass error
handler decodes now a lone low surrogate with final=False.
(cherry picked from commit 894263ba80
)
Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
2019-06-25 12:29:18 +02:00
Victor Stinner
6f5fa1b4be
bpo-33954: Fix _PyUnicode_InsertThousandsGrouping() (GH-10623) (GH-10718)
...
Fix str.format(), float.__format__() and complex.__format__() methods
for non-ASCII decimal point when using the "n" formatter.
Rewrite _PyUnicode_InsertThousandsGrouping(): it now requires
a _PyUnicodeWriter object for the buffer and a Python str object
for digits.
(cherry picked from commit 59423e3ddd
)
2018-11-26 14:17:01 +01:00
INADA Naoki
a49ac99029
bpo-32677: Add .isascii() to str, bytes and bytearray (GH-5342)
2018-01-27 14:06:21 +09:00
Barry Warsaw
b2e5794870
bpo-31338 ( #3374 )
...
* Add Py_UNREACHABLE() as an alias to abort().
* Use Py_UNREACHABLE() instead of assert(0)
* Convert more unreachable code to use Py_UNREACHABLE()
* Document Py_UNREACHABLE() and a few other macros.
2017-09-14 18:13:16 -07:00
Stefan Krah
f432a3234f
bpo-30923: Silence fall-through warnings included in -Wextra since gcc-7.0. ( #3157 )
2017-08-21 13:09:59 +02:00
Serhiy Storchaka
5075416b8f
bpo-30978: str.format_map() now passes key lookup exceptions through. ( #2790 )
...
Previously any exception was replaced with a KeyError exception.
2017-08-03 11:45:23 +03:00
Serhiy Storchaka
0a58f72762
bpo-24821: Fixed the slowing down to 25 times in the searching of some ( #505 )
...
unlucky Unicode characters.
2017-03-30 09:11:10 +03:00
Serhiy Storchaka
d1302c0154
Issue #28999 : Use Py_RETURN_NONE, Py_RETURN_TRUE and Py_RETURN_FALSE wherever
...
possible but Coccinelle couldn't find opportunity.
2017-01-23 10:23:58 +02:00
Xiang Zhang
7a4da324dc
Issue #29145 : Merge 3.6.
2017-01-10 10:56:38 +08:00
Serhiy Storchaka
998c9cdd42
Issue #28561 : Clean up UTF-8 encoder: remove dead code, update comments, etc.
...
Patch by Xiang Zhang.
2016-10-30 18:25:27 +02:00
Christian Heimes
f051e43b22
Issue #28126 : Replace Py_MEMCPY with memcpy(). Visual Studio can properly optimize memcpy().
2016-09-13 20:22:02 +02:00
Benjamin Peterson
621b430a14
remove all usage of Py_LOCAL
2016-09-09 13:54:34 -07:00
Victor Stinner
1a05d6c04d
PEP 7 style for if/else in C
...
Add also a newline for readability in normalize_encoding().
2016-09-02 12:12:23 +02:00
Raymond Hettinger
15f44ab043
Issue #27895 : Spelling fixes (Contributed by Ville Skyttä).
2016-08-30 10:47:49 -07:00
Serhiy Storchaka
e09132f2c7
Backed out changeset b0087e17cd5e (issue #26765 )
...
For unknown reasons it perhaps caused a crash on 32-bit Windows (issue #).
2016-07-03 13:57:48 +03:00
Serhiy Storchaka
355048970b
Issue #26765 : Moved wrappers for bytes and bytearray methods to common header
...
file.
2016-07-01 17:57:30 +03:00
Serhiy Storchaka
bcde10aa7e
Issue #26765 : Ensure that bytes- and unicode-specific stringlib files are used
...
with correct type.
2016-05-16 09:42:29 +03:00
Serhiy Storchaka
fb81d3cbe7
Issue #26765 : Moved common code for the replace() method of bytes and bytearray
...
to a template file.
2016-05-05 09:26:07 +03:00
Serhiy Storchaka
dd40fc3e57
Issue #26765 : Moved common code and docstrings for bytes and bytearray methods
...
to bytes_methods.c.
2016-05-04 22:23:26 +03:00
Serhiy Storchaka
b6a9c9761c
Issue #26778 : Fixed "a/an/and" typos in code comment, documentation and error
...
messages.
2016-04-17 09:39:28 +03:00
Serhiy Storchaka
6a7b3a77b4
Issue #26778 : Fixed "a/an/and" typos in code comment and documentation.
2016-04-17 08:32:47 +03:00
Serhiy Storchaka
21a663ea28
Issue #26057 : Got rid of nonneeded use of PyUnicode_FromObject().
2016-04-13 15:37:23 +03:00
Serhiy Storchaka
413fdcea21
Issue #24821 : Refactor STRINGLIB(fastsearch_memchr_1char) and split it on
...
STRINGLIB(find_char) and STRINGLIB(rfind_char) that can be used independedly
without special preconditions.
2015-11-14 15:42:17 +02:00
Victor Stinner
6bd525b656
Optimize error handlers of ASCII and Latin1 encoders when the replacement
...
string is pure ASCII: use _PyBytesWriter_WriteBytes(), don't check individual
character.
Cleanup unicode_encode_ucs1():
* Rename repunicode to rep
* Clear rep object on error
* Factorize code between bytes and unicode path
2015-10-09 13:10:05 +02:00
Victor Stinner
ce179bf6ba
Add _PyBytesWriter_WriteBytes() to factorize the code
2015-10-09 12:57:22 +02:00
Victor Stinner
ad7715891e
_PyBytesWriter: simplify code to avoid "prealloc" parameters
...
Substract preallocate bytes from min_size before calling
_PyBytesWriter_Prepare().
2015-10-09 12:38:53 +02:00
Victor Stinner
e7bf86cd7d
Optimize backslashreplace error handler
...
Issue #25318 : Optimize backslashreplace and xmlcharrefreplace error handlers in
UTF-8 encoder. Optimize also backslashreplace error handler for ASCII and
Latin1 encoders.
Use the new _PyBytesWriter API to optimize these error handlers for the
encoders. It avoids to create an exception and call the slow implementation of
the error handler.
2015-10-09 01:39:28 +02:00
Victor Stinner
fdfbf78114
Issue #25318 : Add _PyBytesWriter API
...
Add a new private API to optimize Unicode encoders. It uses a small buffer
allocated on the stack and supports overallocation.
Use _PyBytesWriter API for UCS1 (ASCII and Latin1) and UTF-8 encoders. Enable
overallocation for the UTF-8 encoder with error handlers.
unicode_encode_ucs1(): initialize collend to collstart+1 to not check the
current character twice, we already know that it is not ASCII.
2015-10-09 00:33:49 +02:00
Victor Stinner
01ada3996b
Issue #25267 : The UTF-8 encoder is now up to 75 times as fast for error
...
handlers: ``ignore``, ``replace``, ``surrogateescape``, ``surrogatepass``.
Patch co-written with Serhiy Storchaka.
2015-10-01 21:54:51 +02:00
Eric V. Smith
ab2aa6dc91
Fixed an incorrect comment.
2015-08-26 14:10:32 -04:00
Serhiy Storchaka
9ce71a6475
Fixed typos in comments.
2015-05-18 22:20:18 +03:00
Serhiy Storchaka
7e29eea926
Fixed typos in comments.
2015-05-18 22:19:42 +03:00
Serhiy Storchaka
0d4df752ac
Issue #15027 : The UTF-32 encoder is now 3x to 7x faster.
2015-05-12 23:12:45 +03:00
Serhiy Storchaka
d9d769fcdd
Issue #23573 : Increased performance of string search operations (str.find,
...
str.index, str.count, the in operator, str.split, str.partition) with
arguments of different kinds (UCS1, UCS2, UCS4).
2015-03-24 21:55:47 +02:00
Serhiy Storchaka
009b811d67
Removed unintentional trailing spaces in non-external and non-generated C files.
2015-03-18 21:53:15 +02:00
Serhiy Storchaka
4fdb68491e
Issue #22896 : Avoid to use PyObject_AsCharBuffer(), PyObject_AsReadBuffer()
...
and PyObject_AsWriteBuffer().
2015-02-03 01:21:08 +02:00
Serhiy Storchaka
b757c83ec6
Issue #22581 : Use more "bytes-like object" throughout the docs and comments.
2014-12-05 22:25:22 +02:00
Benjamin Peterson
1cc9520327
s/stringobject/bytesobject/ ( closes #22036 )
...
Patch by Martin Matusiak.
2014-07-23 21:39:37 -07:00
Benjamin Peterson
d455ce4fd4
merge 3.3
2014-03-30 19:52:39 -04:00
Benjamin Peterson
0ad6098b67
merge 3.2
2014-03-30 19:52:22 -04:00
Benjamin Peterson
23cf403ca1
fix expandtabs overflow detection to be consistent and not rely on signed overflow
2014-03-30 19:47:57 -04:00
Serhiy Storchaka
3079328d29
Reverted changeset b72c5573c5e7 (issue #15027 ).
2014-01-04 22:44:01 +02:00
Serhiy Storchaka
583a93943c
Issue #15027 : Rewrite the UTF-32 encoder. It is now 1.6x to 3.5x faster.
2014-01-04 19:25:37 +02:00
Benjamin Peterson
0ee22bf774
fix format spec recursive expansion ( closes #19729 )
2013-11-26 19:22:36 -06:00
Serhiy Storchaka
dc2fd5101a
Remove dead code committed in issue #12892 .
2013-11-19 15:56:05 +02:00
Serhiy Storchaka
58cf607d13
Issue #12892 : The utf-16* and utf-32* codecs now reject (lone) surrogates.
...
The utf-16* and utf-32* encoders no longer allow surrogate code points
(U+D800-U+DFFF) to be encoded.
The utf-32* decoders no longer decode byte sequences that correspond to
surrogate code points.
The surrogatepass error handler now works with the utf-16* and utf-32* codecs.
Based on patches by Victor Stinner and Kang-Hao (Kenny) Lu.
2013-11-19 11:32:41 +02:00
Ezio Melotti
745d54d2fa
#17806 : Added keyword-argument support for "tabsize" to str/bytes.expandtabs().
2013-11-16 19:10:57 +02:00
Victor Stinner
cc64eb5b9f
Issue #18408 : Fix bytearrayiter.partition()/rpartition(), handle
...
PyByteArray_FromStringAndSize() failure (ex: on memory allocation failure)
2013-10-29 03:15:37 +01:00
Serhiy Storchaka
8fa8ee3970
Issue #18701 : Remove support of old CPython versions (<3.0) from C code.
2013-08-17 00:48:02 +03:00
Raymond Hettinger
d06eeb4a24
merge
2013-08-13 18:20:55 -07:00