cpython

Commit Graph

Author	SHA1	Message	Date
Victor Stinner	c3d2bc19e4	Use _PyBytesWriter in _PyBytes_FromIterator()	2015-10-14 14:15:49 +02:00
Victor Stinner	c5c3ba4bec	Add _PyBytesWriter_Resize() function This function gives a control to the buffer size without using min_size.	2015-10-14 13:56:47 +02:00
Victor Stinner	3c50ce39bf	Factorize _PyBytes_FromList() and _PyBytes_FromTuple() code using a C macro	2015-10-14 13:50:40 +02:00
Victor Stinner	f2eafa323b	Split PyBytes_FromObject() into subfunctions	2015-10-14 13:44:29 +02:00
Victor Stinner	2ec8063cc9	Modify _PyBytes_DecodeEscapeRecode() to use _PyBytesAPI * Don't overallocate by 400% when recode is needed: only overallocate on demand using _PyBytesWriter. * Use _PyLong_DigitValue to convert hexadecimal digit to int * Create _PyBytes_DecodeEscapeRecode() subfunction	2015-10-14 13:32:13 +02:00
Victor Stinner	1285e5c805	Fix compiler warnings (uninitialized variables), false alarms in fact	2015-10-14 12:10:20 +02:00
Victor Stinner	f6358a7e4c	_PyBytesWriter_Alloc(): only use 10 bytes of the small buffer in debug mode to enhance code to detect buffer under- and overflow.	2015-10-14 12:02:39 +02:00
Victor Stinner	f091033b14	Issue #25401 : Remove now unused hex_digit_to_int() function	2015-10-14 11:59:46 +02:00
Victor Stinner	2bf8993db9	Optimize bytes.fromhex() and bytearray.fromhex() Issue #25401: Optimize bytes.fromhex() and bytearray.fromhex(): they are now between 2x and 3.5x faster. Changes: * Use a fast-path working on a char* string for ASCII string * Use a slow-path for non-ASCII string * Replace slow hex_digit_to_int() function with a O(1) lookup in _PyLong_DigitValue precomputed table * Use _PyBytesWriter API to handle the buffer * Add unit tests to check the error position in error messages	2015-10-14 11:25:33 +02:00
Victor Stinner	772b2b09f2	Optimize bytearray % args Issue #25399: Don't create temporary bytes objects: modify _PyBytes_Format() to create work directly on bytearray objects. * Rename _PyBytes_Format() to _PyBytes_FormatEx() just in case if something outside CPython uses it * _PyBytes_FormatEx() now uses (char, Py_ssize_t) for the input string, so bytearray_format() doesn't need tot create a temporary input bytes object Add use_bytearray parameter to _PyBytes_FormatEx() which is passed to _PyBytesWriter, to create a bytearray buffer instead of a bytes buffer Most formatting operations are now between 2.5 and 5 times faster.	2015-10-14 09:56:53 +02:00
Victor Stinner	661aaccf9d	Add use_bytearray attribute to _PyBytesWriter Issue #25399: Add a new use_bytearray attribute to _PyBytesWriter to use a bytearray buffer, instead of using a bytes object.	2015-10-14 09:41:48 +02:00
Victor Stinner	199c9a6f4b	Fix long_format_binary() Issue #25399: Fix long_format_binary(), allocate bytes for the bytes writer.	2015-10-14 09:47:23 +02:00
Victor Stinner	03dab786b2	Rewrite PyBytes_FromFormatV() using _PyBytesWriter API * Add much more unit tests on PyBytes_FromFormatV() * Remove the first loop to compute the length of the output string * Use _PyBytesWriter to handle the bytes buffer, use overallocation * Cleanup the code to make simpler and easier to review	2015-10-14 00:21:35 +02:00
Victor Stinner	358af13526	Issue #25353 : Optimize unicode escape and raw unicode escape encoders to use the new _PyBytesWriter API.	2015-10-12 22:36:57 +02:00
Victor Stinner	e9aa5950bb	Fix compilation error in _PyBytesWriter_WriteBytes() on Windows	2015-10-12 13:57:47 +02:00
Victor Stinner	6c2cdae9e6	Writer APIs: use empty string singletons Modify _PyBytesWriter_Finish() and _PyUnicodeWriter_Finish() to return the empty bytes/Unicode string if the string is empty.	2015-10-12 13:29:43 +02:00
Victor Stinner	c29e29bed1	Relax _PyBytesWriter API Don't require _PyBytesWriter pointer to be a "char ". Same change for _PyBytesWriter_WriteBytes() parameter. For example, binascii uses "unsigned char".	2015-10-12 13:12:54 +02:00
Serhiy Storchaka	0d554d7ef1	Issue #24164 : Objects that need calling ``__new__`` with keyword arguments, can now be pickled using pickle protocols older than protocol version 4.	2015-10-10 22:42:18 +03:00
Victor Stinner	0cdad1e2bc	Issue #25349 : Add fast path for b'%c' % int Optimize also %% formater.	2015-10-09 22:50:36 +02:00
Victor Stinner	be75b8cf23	Issue #25349 : Optimize bytes % int Optimize bytes.__mod__(args) for integere formats: %d (%i, %u), %o, %x and %X. _PyBytesWriter is now used to format directly the integer into the writer buffer, instead of using a temporary bytes object. Formatting is between 30% and 50% faster on a microbenchmark.	2015-10-09 22:43:24 +02:00
Victor Stinner	6bd525b656	Optimize error handlers of ASCII and Latin1 encoders when the replacement string is pure ASCII: use _PyBytesWriter_WriteBytes(), don't check individual character. Cleanup unicode_encode_ucs1(): * Rename repunicode to rep * Clear rep object on error * Factorize code between bytes and unicode path	2015-10-09 13:10:05 +02:00
Victor Stinner	ce179bf6ba	Add _PyBytesWriter_WriteBytes() to factorize the code	2015-10-09 12:57:22 +02:00
Victor Stinner	ad7715891e	_PyBytesWriter: simplify code to avoid "prealloc" parameters Substract preallocate bytes from min_size before calling _PyBytesWriter_Prepare().	2015-10-09 12:38:53 +02:00
Victor Stinner	53926a1ce2	_PyBytesWriter: rename size attribute to min_size	2015-10-09 12:37:03 +02:00
Victor Stinner	fa7762ec06	Issue #25349 : Optimize bytes % args using the new private _PyBytesWriter API * Thanks to the _PyBytesWriter API, output smaller than 512 bytes are allocated on the stack and so avoid calling _PyBytes_Resize(). Because of that, change the default buffer size to fmtcnt instead of fmtcnt+100. * Rely on _PyBytesWriter algorithm to overallocate the buffer instead of using a custom code. For example, _PyBytesWriter uses a different overallocation factor (25% or 50%) depending on the platform to get best performances. * Disable overallocation for the last write. * Replace C loops to fill characters with memset() * Add also many comments to _PyBytes_Format() * Remove unused FORMATBUFLEN constant * Avoid the creation of a temporary bytes object when formatting a floating point number (when no custom formatting option is used) * Fix also reference leaks on error handling * Use Py_MEMCPY() to copy bytes between two formatters (%)	2015-10-09 11:48:06 +02:00
Victor Stinner	b3653a3458	Issue #25318 : cleanup code _PyBytesWriter Rename "stack buffer" to "small buffer". Add also an assertion in _PyBytesWriter_GetPos().	2015-10-09 03:38:24 +02:00
Victor Stinner	3fa36ff5e4	Issue #25318 : Fix backslashreplace() Fix code to estimate the needed space.	2015-10-09 03:37:11 +02:00
Victor Stinner	797485e101	Issue #25318 : Avoid sprintf() in backslashreplace() Rewrite backslashreplace() to be closer to PyCodec_BackslashReplaceErrors(). Add also unit tests for non-BMP characters.	2015-10-09 03:17:30 +02:00
Victor Stinner	b13b97d3b8	Issue #25318 : Fix compilation error Replace "#if Py_DEBUG" with "#ifdef Py_DEBUG".	2015-10-09 02:52:16 +02:00
Victor Stinner	0016507c16	Issue #25318 : Move _PyBytesWriter to bytesobject.c Declare also the private API in bytesobject.h.	2015-10-09 01:53:21 +02:00
Victor Stinner	e7bf86cd7d	Optimize backslashreplace error handler Issue #25318: Optimize backslashreplace and xmlcharrefreplace error handlers in UTF-8 encoder. Optimize also backslashreplace error handler for ASCII and Latin1 encoders. Use the new _PyBytesWriter API to optimize these error handlers for the encoders. It avoids to create an exception and call the slow implementation of the error handler.	2015-10-09 01:39:28 +02:00
Victor Stinner	fdfbf78114	Issue #25318 : Add _PyBytesWriter API Add a new private API to optimize Unicode encoders. It uses a small buffer allocated on the stack and supports overallocation. Use _PyBytesWriter API for UCS1 (ASCII and Latin1) and UTF-8 encoders. Enable overallocation for the UTF-8 encoder with error handlers. unicode_encode_ucs1(): initialize collend to collstart+1 to not check the current character twice, we already know that it is not ASCII.	2015-10-09 00:33:49 +02:00
Martin Panter	585a6acfef	Merge typo fixes from 3.5	2015-10-07 11:13:55 +00:00
Martin Panter	ec1aa5c2a1	More typos in 3.5 documentation and comments	2015-10-07 11:03:53 +00:00
Martin Panter	3f930dcd87	Merge typo fixes from 3.4 into 3.5	2015-10-07 11:01:47 +00:00
Martin Panter	9955a373a8	Various minor typos in documentation and comments	2015-10-07 10:26:23 +00:00
Benjamin Peterson	cdae2cb88a	merge 3.5 (closes #24806 )	2015-10-06 19:42:46 -07:00
Benjamin Peterson	59dc696821	merge 3.4 (#24806 )	2015-10-06 19:42:02 -07:00
Benjamin Peterson	bd6c41a185	prevent unacceptable bases from becoming bases through multiple inheritance (#24806 )	2015-10-06 19:36:54 -07:00
Victor Stinner	74e8fac3c8	Issue #25301 : Fix compatibility with ISO C90	2015-10-05 13:49:26 +02:00
Victor Stinner	1d65d9192d	Issue #25301 : The UTF-8 decoder is now up to 15 times as fast for error handlers: ``ignore``, ``replace`` and ``surrogateescape``.	2015-10-05 13:43:50 +02:00
Victor Stinner	eb36fdaad8	Fix _PyUnicodeWriter_PrepareKind() Initialize kind to 0 (PyUnicode_WCHAR_KIND) to ensure that _PyUnicodeWriter_PrepareKind() handles correctly read-only buffer: copy the buffer.	2015-10-03 01:55:51 +02:00
Serhiy Storchaka	29e68edbf4	Issue #24848 : Fixed bugs in UTF-7 decoding of misformed data: 1. Non-ASCII bytes were accepted after shift sequence. 2. A low surrogate could be emitted in case of error in high surrogate. 3. In some circumstances the '\xfd' character was produced instead of the replacement character '\ufffd' (due to a bug in _PyUnicodeWriter).	2015-10-02 13:14:03 +03:00
Serhiy Storchaka	58c8f2bb6d	Issue #24848 : Fixed bugs in UTF-7 decoding of misformed data: 1. Non-ASCII bytes were accepted after shift sequence. 2. A low surrogate could be emitted in case of error in high surrogate. 3. In some circumstances the '\xfd' character was produced instead of the replacement character '\ufffd' (due to a bug in _PyUnicodeWriter).	2015-10-02 13:13:14 +03:00
Serhiy Storchaka	28b21e50c8	Issue #24848 : Fixed bugs in UTF-7 decoding of misformed data: 1. Non-ASCII bytes were accepted after shift sequence. 2. A low surrogate could be emitted in case of error in high surrogate.	2015-10-02 13:07:28 +03:00
Serhiy Storchaka	5dbe245ef2	Issue #24483 : C implementation of functools.lru_cache() now calculates key's hash only once.	2015-10-02 12:47:59 +03:00
Serhiy Storchaka	b9d98d532c	Issue #24483 : C implementation of functools.lru_cache() now calculates key's hash only once.	2015-10-02 12:47:11 +03:00
Victor Stinner	3222da26fe	Make _PyUnicode_TranslateCharmap() symbol private unicodeobject.h exposes PyUnicode_TranslateCharmap() and PyUnicode_Translate().	2015-10-01 22:07:32 +02:00
Victor Stinner	01ada3996b	Issue #25267 : The UTF-8 encoder is now up to 75 times as fast for error handlers: ``ignore``, ``replace``, ``surrogateescape``, ``surrogatepass``. Patch co-written with Serhiy Storchaka.	2015-10-01 21:54:51 +02:00
Victor Stinner	d69dd8bd5e	(Merge 3.5) Issue #25182 : Fix compilation on Windows	2015-09-30 15:03:50 +02:00

1 2 3 4 5 ...

5337 Commits