Don't add parenthesis to type names. Add also quotes around the type names.
Before:
TypeError: unorderable types: int() < NoneType()
After:
TypeError: '<' not supported between instances of 'int' and 'NoneType'
* Don't overallocate by 400% when recode is needed: only overallocate on demand
using _PyBytesWriter.
* Use _PyLong_DigitValue to convert hexadecimal digit to int
* Create _PyBytes_DecodeEscapeRecode() subfunction
Issue #25401: Optimize bytes.fromhex() and bytearray.fromhex(): they are now
between 2x and 3.5x faster. Changes:
* Use a fast-path working on a char* string for ASCII string
* Use a slow-path for non-ASCII string
* Replace slow hex_digit_to_int() function with a O(1) lookup in
_PyLong_DigitValue precomputed table
* Use _PyBytesWriter API to handle the buffer
* Add unit tests to check the error position in error messages
Issue #25399: Don't create temporary bytes objects: modify _PyBytes_Format() to
create work directly on bytearray objects.
* Rename _PyBytes_Format() to _PyBytes_FormatEx() just in case if something
outside CPython uses it
* _PyBytes_FormatEx() now uses (char*, Py_ssize_t) for the input string, so
bytearray_format() doesn't need tot create a temporary input bytes object
* Add use_bytearray parameter to _PyBytes_FormatEx() which is passed to
_PyBytesWriter, to create a bytearray buffer instead of a bytes buffer
Most formatting operations are now between 2.5 and 5 times faster.
* Add much more unit tests on PyBytes_FromFormatV()
* Remove the first loop to compute the length of the output string
* Use _PyBytesWriter to handle the bytes buffer, use overallocation
* Cleanup the code to make simpler and easier to review
Don't require _PyBytesWriter pointer to be a "char *". Same change for
_PyBytesWriter_WriteBytes() parameter.
For example, binascii uses "unsigned char*".
Optimize bytes.__mod__(args) for integere formats: %d (%i, %u), %o, %x and %X.
_PyBytesWriter is now used to format directly the integer into the writer
buffer, instead of using a temporary bytes object.
Formatting is between 30% and 50% faster on a microbenchmark.
string is pure ASCII: use _PyBytesWriter_WriteBytes(), don't check individual
character.
Cleanup unicode_encode_ucs1():
* Rename repunicode to rep
* Clear rep object on error
* Factorize code between bytes and unicode path
* Thanks to the _PyBytesWriter API, output smaller than 512 bytes are allocated
on the stack and so avoid calling _PyBytes_Resize(). Because of that, change
the default buffer size to fmtcnt instead of fmtcnt+100.
* Rely on _PyBytesWriter algorithm to overallocate the buffer instead of using
a custom code. For example, _PyBytesWriter uses a different overallocation
factor (25% or 50%) depending on the platform to get best performances.
* Disable overallocation for the last write.
* Replace C loops to fill characters with memset()
* Add also many comments to _PyBytes_Format()
* Remove unused FORMATBUFLEN constant
* Avoid the creation of a temporary bytes object when formatting a floating
point number (when no custom formatting option is used)
* Fix also reference leaks on error handling
* Use Py_MEMCPY() to copy bytes between two formatters (%)
Issue #25318: Optimize backslashreplace and xmlcharrefreplace error handlers in
UTF-8 encoder. Optimize also backslashreplace error handler for ASCII and
Latin1 encoders.
Use the new _PyBytesWriter API to optimize these error handlers for the
encoders. It avoids to create an exception and call the slow implementation of
the error handler.
Add a new private API to optimize Unicode encoders. It uses a small buffer
allocated on the stack and supports overallocation.
Use _PyBytesWriter API for UCS1 (ASCII and Latin1) and UTF-8 encoders. Enable
overallocation for the UTF-8 encoder with error handlers.
unicode_encode_ucs1(): initialize collend to collstart+1 to not check the
current character twice, we already know that it is not ASCII.
1. Non-ASCII bytes were accepted after shift sequence.
2. A low surrogate could be emitted in case of error in high surrogate.
3. In some circumstances the '\xfd' character was produced instead of the
replacement character '\ufffd' (due to a bug in _PyUnicodeWriter).
1. Non-ASCII bytes were accepted after shift sequence.
2. A low surrogate could be emitted in case of error in high surrogate.
3. In some circumstances the '\xfd' character was produced instead of the
replacement character '\ufffd' (due to a bug in _PyUnicodeWriter).