Commit Graph

5393 Commits

Author SHA1 Message Date
Serhiy Storchaka d17427b7bd Issue #25410: Fixed a memory leak in OrderedDict in the case when key's hash
calculation fails.
2015-10-20 18:21:48 +03:00
Serhiy Storchaka 3e9f87782e Issue #25410: Cleaned up and fixed minor bugs in C implementation of OrderedDict. 2015-10-18 09:54:42 +03:00
Serhiy Storchaka 8003bafd7f Issue #25410: Cleaned up and fixed minor bugs in C implementation of OrderedDict. 2015-10-18 09:53:17 +03:00
Victor Stinner 91108f049f Issue #25210: Change error message of do_richcompare()
Don't add parenthesis to type names. Add also quotes around the type names.

Before:

  TypeError: unorderable types: int() < NoneType()

After:

  TypeError: '<' not supported between instances of 'int' and 'NoneType'
2015-10-14 18:25:31 +02:00
Serhiy Storchaka dbb98c1443 Issue #25406: Fixed a bug in C implementation of OrderedDict.move_to_end()
that caused segmentation fault or hang in iterating after moving several
items to the start of ordered dict.
2015-10-14 19:22:44 +03:00
Serhiy Storchaka 992ec46acc Issue #25406: Fixed a bug in C implementation of OrderedDict.move_to_end()
that caused segmentation fault or hang in iterating after moving several
items to the start of ordered dict.
2015-10-14 19:21:24 +03:00
Victor Stinner c3d2bc19e4 Use _PyBytesWriter in _PyBytes_FromIterator() 2015-10-14 14:15:49 +02:00
Victor Stinner c5c3ba4bec Add _PyBytesWriter_Resize() function
This function gives a control to the buffer size without using min_size.
2015-10-14 13:56:47 +02:00
Victor Stinner 3c50ce39bf Factorize _PyBytes_FromList() and _PyBytes_FromTuple() code using a C macro 2015-10-14 13:50:40 +02:00
Victor Stinner f2eafa323b Split PyBytes_FromObject() into subfunctions 2015-10-14 13:44:29 +02:00
Victor Stinner 2ec8063cc9 Modify _PyBytes_DecodeEscapeRecode() to use _PyBytesAPI
* Don't overallocate by 400% when recode is needed: only overallocate on demand
  using _PyBytesWriter.
* Use _PyLong_DigitValue to convert hexadecimal digit to int
* Create _PyBytes_DecodeEscapeRecode() subfunction
2015-10-14 13:32:13 +02:00
Victor Stinner 1285e5c805 Fix compiler warnings (uninitialized variables), false alarms in fact 2015-10-14 12:10:20 +02:00
Victor Stinner f6358a7e4c _PyBytesWriter_Alloc(): only use 10 bytes of the small buffer in debug mode to
enhance code to detect buffer under- and overflow.
2015-10-14 12:02:39 +02:00
Victor Stinner f091033b14 Issue #25401: Remove now unused hex_digit_to_int() function 2015-10-14 11:59:46 +02:00
Victor Stinner 2bf8993db9 Optimize bytes.fromhex() and bytearray.fromhex()
Issue #25401: Optimize bytes.fromhex() and bytearray.fromhex(): they are now
between 2x and 3.5x faster. Changes:

* Use a fast-path working on a char* string for ASCII string
* Use a slow-path for non-ASCII string
* Replace slow hex_digit_to_int() function with a O(1) lookup in
  _PyLong_DigitValue precomputed table
* Use _PyBytesWriter API to handle the buffer
* Add unit tests to check the error position in error messages
2015-10-14 11:25:33 +02:00
Victor Stinner 772b2b09f2 Optimize bytearray % args
Issue #25399: Don't create temporary bytes objects: modify _PyBytes_Format() to
create work directly on bytearray objects.

* Rename _PyBytes_Format() to _PyBytes_FormatEx() just in case if something
  outside CPython uses it
* _PyBytes_FormatEx() now uses (char*, Py_ssize_t) for the input string, so
  bytearray_format() doesn't need tot create a temporary input bytes object
* Add use_bytearray parameter to _PyBytes_FormatEx() which is passed to
  _PyBytesWriter, to create a bytearray buffer instead of a bytes buffer

Most formatting operations are now between 2.5 and 5 times faster.
2015-10-14 09:56:53 +02:00
Victor Stinner 661aaccf9d Add use_bytearray attribute to _PyBytesWriter
Issue #25399: Add a new use_bytearray attribute to _PyBytesWriter to use a
bytearray buffer, instead of using a bytes object.
2015-10-14 09:41:48 +02:00
Victor Stinner 199c9a6f4b Fix long_format_binary()
Issue #25399: Fix long_format_binary(), allocate bytes for the bytes writer.
2015-10-14 09:47:23 +02:00
Victor Stinner 03dab786b2 Rewrite PyBytes_FromFormatV() using _PyBytesWriter API
* Add much more unit tests on PyBytes_FromFormatV()
* Remove the first loop to compute the length of the output string
* Use _PyBytesWriter to handle the bytes buffer, use overallocation
* Cleanup the code to make simpler and easier to review
2015-10-14 00:21:35 +02:00
Victor Stinner 358af13526 Issue #25353: Optimize unicode escape and raw unicode escape encoders to use
the new _PyBytesWriter API.
2015-10-12 22:36:57 +02:00
Victor Stinner e9aa5950bb Fix compilation error in _PyBytesWriter_WriteBytes() on Windows 2015-10-12 13:57:47 +02:00
Victor Stinner 6c2cdae9e6 Writer APIs: use empty string singletons
Modify _PyBytesWriter_Finish() and _PyUnicodeWriter_Finish() to return the
empty bytes/Unicode string if the string is empty.
2015-10-12 13:29:43 +02:00
Victor Stinner c29e29bed1 Relax _PyBytesWriter API
Don't require _PyBytesWriter pointer to be a "char *". Same change for
_PyBytesWriter_WriteBytes() parameter.

For example, binascii uses "unsigned char*".
2015-10-12 13:12:54 +02:00
Serhiy Storchaka 0d554d7ef1 Issue #24164: Objects that need calling ``__new__`` with keyword arguments,
can now be pickled using pickle protocols older than protocol version 4.
2015-10-10 22:42:18 +03:00
Victor Stinner 0cdad1e2bc Issue #25349: Add fast path for b'%c' % int
Optimize also %% formater.
2015-10-09 22:50:36 +02:00
Victor Stinner be75b8cf23 Issue #25349: Optimize bytes % int
Optimize bytes.__mod__(args) for integere formats: %d (%i, %u), %o, %x and %X.
_PyBytesWriter is now used to format directly the integer into the writer
buffer, instead of using a temporary bytes object.

Formatting is between 30% and 50% faster on a microbenchmark.
2015-10-09 22:43:24 +02:00
Victor Stinner 6bd525b656 Optimize error handlers of ASCII and Latin1 encoders when the replacement
string is pure ASCII: use _PyBytesWriter_WriteBytes(), don't check individual
character.

Cleanup unicode_encode_ucs1():

* Rename repunicode to rep
* Clear rep object on error
* Factorize code between bytes and unicode path
2015-10-09 13:10:05 +02:00
Victor Stinner ce179bf6ba Add _PyBytesWriter_WriteBytes() to factorize the code 2015-10-09 12:57:22 +02:00
Victor Stinner ad7715891e _PyBytesWriter: simplify code to avoid "prealloc" parameters
Substract preallocate bytes from min_size before calling
_PyBytesWriter_Prepare().
2015-10-09 12:38:53 +02:00
Victor Stinner 53926a1ce2 _PyBytesWriter: rename size attribute to min_size 2015-10-09 12:37:03 +02:00
Victor Stinner fa7762ec06 Issue #25349: Optimize bytes % args using the new private _PyBytesWriter API
* Thanks to the _PyBytesWriter API, output smaller than 512 bytes are allocated
  on the stack and so avoid calling _PyBytes_Resize(). Because of that, change
  the default buffer size to fmtcnt instead of fmtcnt+100.
* Rely on _PyBytesWriter algorithm to overallocate the buffer instead of using
  a custom code. For example, _PyBytesWriter uses a different overallocation
  factor (25% or 50%) depending on the platform to get best performances.
* Disable overallocation for the last write.
* Replace C loops to fill characters with memset()
* Add also many comments to _PyBytes_Format()
* Remove unused FORMATBUFLEN constant
* Avoid the creation of a temporary bytes object when formatting a floating
  point number (when no custom formatting option is used)
* Fix also reference leaks on error handling
* Use Py_MEMCPY() to copy bytes between two formatters (%)
2015-10-09 11:48:06 +02:00
Victor Stinner b3653a3458 Issue #25318: cleanup code _PyBytesWriter
Rename "stack buffer" to "small buffer".

Add also an assertion in _PyBytesWriter_GetPos().
2015-10-09 03:38:24 +02:00
Victor Stinner 3fa36ff5e4 Issue #25318: Fix backslashreplace()
Fix code to estimate the needed space.
2015-10-09 03:37:11 +02:00
Victor Stinner 797485e101 Issue #25318: Avoid sprintf() in backslashreplace()
Rewrite backslashreplace() to be closer to PyCodec_BackslashReplaceErrors().

Add also unit tests for non-BMP characters.
2015-10-09 03:17:30 +02:00
Victor Stinner b13b97d3b8 Issue #25318: Fix compilation error
Replace "#if Py_DEBUG" with "#ifdef Py_DEBUG".
2015-10-09 02:52:16 +02:00
Victor Stinner 0016507c16 Issue #25318: Move _PyBytesWriter to bytesobject.c
Declare also the private API in bytesobject.h.
2015-10-09 01:53:21 +02:00
Victor Stinner e7bf86cd7d Optimize backslashreplace error handler
Issue #25318: Optimize backslashreplace and xmlcharrefreplace error handlers in
UTF-8 encoder. Optimize also backslashreplace error handler for ASCII and
Latin1 encoders.

Use the new _PyBytesWriter API to optimize these error handlers for the
encoders. It avoids to create an exception and call the slow implementation of
the error handler.
2015-10-09 01:39:28 +02:00
Victor Stinner fdfbf78114 Issue #25318: Add _PyBytesWriter API
Add a new private API to optimize Unicode encoders. It uses a small buffer
allocated on the stack and supports overallocation.

Use _PyBytesWriter API for UCS1 (ASCII and Latin1) and UTF-8 encoders. Enable
overallocation for the UTF-8 encoder with error handlers.

unicode_encode_ucs1(): initialize collend to collstart+1 to not check the
current character twice, we already know that it is not ASCII.
2015-10-09 00:33:49 +02:00
Martin Panter 585a6acfef Merge typo fixes from 3.5 2015-10-07 11:13:55 +00:00
Martin Panter ec1aa5c2a1 More typos in 3.5 documentation and comments 2015-10-07 11:03:53 +00:00
Martin Panter 3f930dcd87 Merge typo fixes from 3.4 into 3.5 2015-10-07 11:01:47 +00:00
Martin Panter 9955a373a8 Various minor typos in documentation and comments 2015-10-07 10:26:23 +00:00
Benjamin Peterson cdae2cb88a merge 3.5 (closes #24806) 2015-10-06 19:42:46 -07:00
Benjamin Peterson 59dc696821 merge 3.4 (#24806) 2015-10-06 19:42:02 -07:00
Benjamin Peterson bd6c41a185 prevent unacceptable bases from becoming bases through multiple inheritance (#24806) 2015-10-06 19:36:54 -07:00
Victor Stinner 74e8fac3c8 Issue #25301: Fix compatibility with ISO C90 2015-10-05 13:49:26 +02:00
Victor Stinner 1d65d9192d Issue #25301: The UTF-8 decoder is now up to 15 times as fast for error
handlers: ``ignore``, ``replace`` and ``surrogateescape``.
2015-10-05 13:43:50 +02:00
Victor Stinner eb36fdaad8 Fix _PyUnicodeWriter_PrepareKind()
Initialize kind to 0 (PyUnicode_WCHAR_KIND) to ensure that
_PyUnicodeWriter_PrepareKind() handles correctly read-only buffer: copy the
buffer.
2015-10-03 01:55:51 +02:00
Serhiy Storchaka 29e68edbf4 Issue #24848: Fixed bugs in UTF-7 decoding of misformed data:
1. Non-ASCII bytes were accepted after shift sequence.
2. A low surrogate could be emitted in case of error in high surrogate.
3. In some circumstances the '\xfd' character was produced instead of the
replacement character '\ufffd' (due to a bug in _PyUnicodeWriter).
2015-10-02 13:14:03 +03:00
Serhiy Storchaka 58c8f2bb6d Issue #24848: Fixed bugs in UTF-7 decoding of misformed data:
1. Non-ASCII bytes were accepted after shift sequence.
2. A low surrogate could be emitted in case of error in high surrogate.
3. In some circumstances the '\xfd' character was produced instead of the
replacement character '\ufffd' (due to a bug in _PyUnicodeWriter).
2015-10-02 13:13:14 +03:00
Serhiy Storchaka 28b21e50c8 Issue #24848: Fixed bugs in UTF-7 decoding of misformed data:
1. Non-ASCII bytes were accepted after shift sequence.
2. A low surrogate could be emitted in case of error in high surrogate.
2015-10-02 13:07:28 +03:00
Serhiy Storchaka 5dbe245ef2 Issue #24483: C implementation of functools.lru_cache() now calculates key's
hash only once.
2015-10-02 12:47:59 +03:00
Serhiy Storchaka b9d98d532c Issue #24483: C implementation of functools.lru_cache() now calculates key's
hash only once.
2015-10-02 12:47:11 +03:00
Victor Stinner 3222da26fe Make _PyUnicode_TranslateCharmap() symbol private
unicodeobject.h exposes PyUnicode_TranslateCharmap() and PyUnicode_Translate().
2015-10-01 22:07:32 +02:00
Victor Stinner 01ada3996b Issue #25267: The UTF-8 encoder is now up to 75 times as fast for error
handlers: ``ignore``, ``replace``, ``surrogateescape``, ``surrogatepass``.
Patch co-written with Serhiy Storchaka.
2015-10-01 21:54:51 +02:00
Victor Stinner d69dd8bd5e (Merge 3.5) Issue #25182: Fix compilation on Windows 2015-09-30 15:03:50 +02:00
Victor Stinner ae86da9b20 (Merge 3.4) Issue #25182: Fix compilation on Windows 2015-09-30 15:03:31 +02:00
Victor Stinner 89719e1daf Issue #25182: Fix compilation on Windows
Restore also errno value before calling PyErr_SetFromErrno().
2015-09-30 15:01:34 +02:00
Serhiy Storchaka 85c386dee4 Issue #25182: The stdprinter (used as sys.stderr before the io module is
imported at startup) now uses the backslashreplace error handler.
2015-09-30 15:51:01 +03:00
Serhiy Storchaka 008fc77e1e Issue #25182: The stdprinter (used as sys.stderr before the io module is
imported at startup) now uses the backslashreplace error handler.
2015-09-30 15:50:32 +03:00
Serhiy Storchaka a59018c7ab Issue #25182: The stdprinter (used as sys.stderr before the io module is
imported at startup) now uses the backslashreplace error handler.
2015-09-30 15:46:53 +03:00
Victor Stinner c3713e9706 Optimize ascii/latin1+surrogateescape encoders
Issue #25227: Optimize ASCII and latin1 encoders with the ``surrogateescape``
error handler: the encoders are now up to 3 times as fast.

Initial patch written by Serhiy Storchaka.
2015-09-29 12:32:13 +02:00
Victor Stinner 0030cd52da Issue #25227: Cleanup unicode_encode_ucs1() error handler
* Change limit type from unsigned int to Py_UCS4, to use the same type than the
  "ch" variable (an Unicode character).
* Reuse ch variable for _Py_ERROR_XMLCHARREFREPLACE
* Add some newlines for readability
2015-09-24 14:45:00 +02:00
Victor Stinner 54385b206d Issue #24870: revert unwanted change
Sorry, I pushed the patch on the UTF-8 decoder by mistake :-(
2015-09-22 10:46:52 +02:00
Victor Stinner 5ebae87628 Issue #25207, #14626: Fix my commit.
It doesn't work to use #define XXX defined(YYY)" and then "#ifdef XXX"
to check YYY.
2015-09-22 01:29:33 +02:00
Victor Stinner 6174474bea _PyUnicodeWriter_PrepareInternal(): make the assertion more strict 2015-09-22 01:01:17 +02:00
Victor Stinner ca9381ea01 Issue #24870: Add _PyUnicodeWriter_PrepareKind() macro
Add a macro which ensures that the writer has at least the requested kind.
2015-09-22 00:58:32 +02:00
Victor Stinner 5014920cb7 Issue #24870: Reuse the new _Py_error_handler enum
Factorize code with the new get_error_handler() function.

Add some empty lines for readability.
2015-09-22 00:26:54 +02:00
Victor Stinner f96418de05 Issue #24870: Optimize the ASCII decoder for error handlers: surrogateescape,
ignore and replace. Initial patch written by Naoki Inada.

The decoder is now up to 60 times as fast for these error handlers.

Add also unit tests for the ASCII decoder.
2015-09-21 23:06:27 +02:00
Victor Stinner 026977717e Merge 3.5 2015-09-19 13:39:16 +02:00
Victor Stinner 5783fd2c58 Issue #24999: In longobject.c, use two shifts instead of ">> 2*PyLong_SHIFT" to
avoid undefined behaviour when LONG_MAX type is smaller than 60 bits.

This change should fix a warning with the ICC compiler.
2015-09-19 13:39:03 +02:00
Victor Stinner 058258652a Merge 3.5 (pytime, odict) 2015-09-18 13:55:15 +02:00
Victor Stinner 4a0d1e7c36 odictobject.c: fix compiler warning
PyObject_Length() returns a P_ssize_t, not an int. Use a Py_ssize_t to avoid
overflow.
2015-09-18 13:44:11 +02:00
Serhiy Storchaka 56f6e76c68 Issue #15989: Fixed some scarcely probable integer overflows.
It is very unlikely that they can occur in real code for now.
2015-09-06 21:25:30 +03:00
Guido van Rossum ba5f59089a Issue #24912: Prevent __class__ assignment to immutable built-in objects. (Merge 3.5 -> 3.6) 2015-09-05 15:20:57 -07:00
Guido van Rossum 37fdcbc4c3 Issue #24912: Prevent __class__ assignment to immutable built-in objects. (Merge 3.5.0 -> 3.5) 2015-09-05 15:20:08 -07:00
Guido van Rossum 7d293ee97d Issue #24912: Prevent __class__ assignment to immutable built-in objects. 2015-09-04 20:54:07 -07:00
Victor Stinner fa9dfd4f82 Merge 3.5 (odict) 2015-09-03 17:50:30 +02:00
Victor Stinner ca30b02abe Issue #24992: Fix error handling and a race condition (related to garbage
collection) in collections.OrderedDict constructor.

Patch reviewed by Serhiy Storchaka.
2015-09-03 17:50:04 +02:00
Victor Stinner 99bb14bf0c type_call() now detect bugs in type new and init
* Call _Py_CheckFunctionResult() to check for bugs in type
  constructors (tp_new)
* Add assertions to ensure an exception was raised if tp_init failed
  or that no exception was raised if tp_init succeed

Refactor also the function to have less indentation.
2015-09-03 12:16:49 +02:00
Eric V. Smith ab2aa6dc91 Fixed an incorrect comment. 2015-08-26 14:10:32 -04:00
Stefan Krah 5f35725fb8 Merge #15944. 2015-08-08 13:38:59 +02:00
Stefan Krah 0c51595a78 Issue #15944: memoryview: Allow arbitrary formats when casting to bytes.
Original patch by Martin Panter.
2015-08-08 13:38:10 +02:00
Eric Snow 5060bc51ca Merge from 3.5 (issue #24667). 2015-08-07 17:47:35 -06:00
Eric Snow 8c7f9558eb Issue #24667: Resize odict in all cases that the underlying dict resizes. 2015-08-07 17:45:12 -06:00
Eric V. Smith 577d2069da Fix typo in comment. 2015-08-07 18:50:24 -04:00
Raymond Hettinger 4148195c45 Move the active entry multiplication to later in the hash calculation 2015-08-07 00:43:39 -07:00
Raymond Hettinger b501a27ad8 Restore frozenset hash caching removed in cf707dd190a9 2015-08-06 22:15:22 -07:00
Zachary Ware 070bd62cfa Closes #21279: Merge with 3.5 2015-08-06 00:05:13 -05:00
Zachary Ware d987a81d29 Issue #21279: Merge with 3.4 2015-08-06 00:04:23 -05:00
Zachary Ware 79b98df023 Issue #21279: Flesh out str.translate docs
Initial patch by Kinga Farkas, Martin Panter, and John Posner.
2015-08-05 23:54:15 -05:00
Raymond Hettinger a286a51ae1 Fix comment typo 2015-08-01 11:07:11 -07:00
Raymond Hettinger 36c0500990 Tweak the comments 2015-08-01 10:57:42 -07:00
Raymond Hettinger fbffdef47d Issue #24762: Speed-up frozenset_hash() and greatly beef-up the comments. 2015-08-01 09:53:00 -07:00
Raymond Hettinger daffc916aa Issue #24681: Move the most likely test first in set_add_entry(). 2015-07-31 07:58:56 -07:00
Raymond Hettinger 70559b5c20 Issue #24681: Move the store of so->table to the code block where it is used. 2015-07-23 07:42:23 -04:00
Serhiy Storchaka c8fe04484e Issue #23573: Restored optimization of bytes.rfind() and bytearray.rfind()
for single-byte argument on Linux.
2015-07-20 22:58:29 +03:00
Serhiy Storchaka d92d4efe3d Issue #23573: Restored optimization of bytes.rfind() and bytearray.rfind()
for single-byte argument on Linux.
2015-07-20 22:58:02 +03:00
Raymond Hettinger ff9e18a863 Issue #24583: Consolidate previous set object updates into a single function
with a single entry point, named exit points at the bottom, more self-evident
refcount adjustments, and a comment describing why the pre-increment was
necessary at all.
2015-07-20 07:34:05 -04:00
Raymond Hettinger 482c05cbb5 Issue #24583: Fix refcount leak. 2015-07-20 01:23:32 -04:00