Commit Graph

1599 Commits

Author SHA1 Message Date
Serhiy Storchaka a83a6a3275 Issue #28701: _PyUnicode_EqualToASCIIId and _PyUnicode_EqualToASCIIString now
require ASCII right argument and assert this condition in debug build.
2016-11-16 20:02:44 +02:00
Serhiy Storchaka e6d6131f78 Fixed an off-by-one error in _PyUnicode_EqualToASCIIString (issue #28701). 2016-11-16 16:13:13 +02:00
Serhiy Storchaka df66b9c425 Fixed an off-by-one error in _PyUnicode_EqualToASCIIString (issue #28701). 2016-11-16 16:12:56 +02:00
Serhiy Storchaka 292dd1b2ad Fixed an off-by-one error in _PyUnicode_EqualToASCIIString (issue #28701). 2016-11-16 16:12:34 +02:00
Serhiy Storchaka 503db266a5 Issue #21449: Removed private function _PyUnicode_CompareWithId. 2016-11-16 15:56:50 +02:00
Serhiy Storchaka dddec81b2d Issue #21449: Removed private function _PyUnicode_CompareWithId. 2016-11-16 15:56:27 +02:00
Serhiy Storchaka 29a5447360 Issue #28701: Replace _PyUnicode_CompareWithId with _PyUnicode_EqualToASCIIId.
The latter function is more readable, faster and doesn't raise exceptions.

Based on patch by Xiang Zhang.
2016-11-16 15:41:31 +02:00
Serhiy Storchaka fab6acd9f5 Issue #28701: Replace _PyUnicode_CompareWithId with _PyUnicode_EqualToASCIIId.
The latter function is more readable, faster and doesn't raise exceptions.

Based on patch by Xiang Zhang.
2016-11-16 15:41:11 +02:00
Serhiy Storchaka f5894dd646 Issue #28701: Replace _PyUnicode_CompareWithId with _PyUnicode_EqualToASCIIId.
The latter function is more readable, faster and doesn't raise exceptions.

Based on patch by Xiang Zhang.
2016-11-16 15:40:39 +02:00
Serhiy Storchaka 1a73bf365e Issue #28701: Replace PyUnicode_CompareWithASCIIString with _PyUnicode_EqualToASCIIString.
The latter function is more readable, faster and doesn't raise exceptions.
2016-11-16 10:19:57 +02:00
Serhiy Storchaka 3b73ea1278 Issue #28701: Replace PyUnicode_CompareWithASCIIString with _PyUnicode_EqualToASCIIString.
The latter function is more readable, faster and doesn't raise exceptions.
2016-11-16 10:19:20 +02:00
Serhiy Storchaka f4934ea77d Issue #28701: Replace PyUnicode_CompareWithASCIIString with _PyUnicode_EqualToASCIIString.
The latter function is more readable, faster and doesn't raise exceptions.
2016-11-16 10:17:58 +02:00
Serhiy Storchaka 616034eb73 Issue #28648: Fixed crash in Py_DecodeLocale() in debug build on Mac OS X
when decode astral characters.
2016-11-12 14:37:11 +02:00
Serhiy Storchaka babe4f8e5e Issue #28648: Fixed crash in Py_DecodeLocale() in debug build on Mac OS X
when decode astral characters.
2016-11-12 14:36:02 +02:00
Serhiy Storchaka 6b4b6e956e Issue #28648: Fixed crash in Py_DecodeLocale() in debug build on Mac OS X
when decode astral characters.
2016-11-12 14:35:46 +02:00
Serhiy Storchaka 84293aff9f Issue #28648: Fixed crash in Py_DecodeLocale() in debug build on Mac OS X
when decode astral characters.
2016-11-12 14:29:48 +02:00
Serhiy Storchaka b626643734 Issue #28648: Fixed crash in Py_DecodeLocale() in debug build on Mac OS X
when decode astral characters.
2016-11-12 14:28:06 +02:00
Steve Dower 257a4c1503 Closes #27781: Removes special cases for the experimental aspect of PEP 529 2016-11-06 19:35:24 -08:00
Steve Dower 78057b4159 Closes #27781: Removes special cases for the experimental aspect of PEP 529 2016-11-06 19:35:08 -08:00
Eric V. Smith 5646648678 Issue 28128: Print out better error/warning messages for invalid string escapes. Backport to 3.6. 2016-10-31 14:46:26 -04:00
Eric V. Smith 42454af094 Issue 28128: Print out better error/warning messages for invalid string escapes. 2016-10-31 09:22:08 -04:00
Serhiy Storchaka 2edcd1cba4 Issue #28426: Deprecated undocumented functions PyUnicode_AsEncodedObject(),
PyUnicode_AsDecodedObject(), PyUnicode_AsDecodedUnicode() and
PyUnicode_AsEncodedUnicode().
2016-10-27 21:08:00 +03:00
Serhiy Storchaka 0093907f0e Issue #28426: Deprecated undocumented functions PyUnicode_AsEncodedObject(),
PyUnicode_AsDecodedObject(), PyUnicode_AsDecodedUnicode() and
PyUnicode_AsEncodedUnicode().
2016-10-27 21:05:49 +03:00
Serhiy Storchaka a4f8823063 Issue #28408: Fixed a leak and remove redundant code in _PyUnicodeWriter_Finish().
Patch by Xiang Zhang.
2016-10-25 13:25:04 +03:00
Serhiy Storchaka c8bc3d1c07 Issue #28408: Fixed a leak and remove redundant code in _PyUnicodeWriter_Finish().
Patch by Xiang Zhang.
2016-10-25 13:23:56 +03:00
Serhiy Storchaka d7e5ff13bb Issue #28426: Fixed potential crash in PyUnicode_AsDecodedObject() in debug build. 2016-10-25 10:18:16 +03:00
Serhiy Storchaka c4a3e90aa8 Issue #28426: Fixed potential crash in PyUnicode_AsDecodedObject() in debug build. 2016-10-25 10:17:33 +03:00
Serhiy Storchaka 839023f12c Issue #28426: Fixed potential crash in PyUnicode_AsDecodedObject() in debug build. 2016-10-25 10:13:43 +03:00
Serhiy Storchaka 77eede35fc Issue #28426: Fixed potential crash in PyUnicode_AsDecodedObject() in debug build. 2016-10-25 10:07:51 +03:00
Serhiy Storchaka 2fbc019c8c Issue #28439: Remove redundant checks in PyUnicode_EncodeLocale and
PyUnicode_DecodeLocaleAndSize.  Patch by Xiang Zhang.
2016-10-23 15:41:36 +03:00
Serhiy Storchaka f8d7d41507 Issue #28511: Use the "U" format instead of "O!" in PyArg_Parse*. 2016-10-23 15:12:25 +03:00
Serhiy Storchaka 523c449ca0 Issue #28504: Cleanup unicode_decode_call_errorhandler_wchar/writer.
Patch by Xiang Zhang.
2016-10-22 23:18:31 +03:00
Serhiy Storchaka 14ab277632 Issue #28410: Added _PyErr_FormatFromCause() -- the helper for raising
new exception with setting current exception as __cause__.

_PyErr_FormatFromCause(exception, format, args...) is equivalent to Python

    raise exception(format % args) from sys.exc_info()[1]
2016-10-21 17:10:42 +03:00
Serhiy Storchaka 467ab194fc Issue #28410: Added _PyErr_FormatFromCause() -- the helper for raising
new exception with setting current exception as __cause__.

_PyErr_FormatFromCause(exception, format, args...) is equivalent to Python

    raise exception(format % args) from sys.exc_info()[1]
2016-10-21 17:09:17 +03:00
Benjamin Peterson d6d49f16f4 merge 3.6 (#28454) 2016-10-16 15:42:33 -07:00
Benjamin Peterson 3aa75528a1 merge 3.5 (#28454) 2016-10-16 15:42:24 -07:00
Benjamin Peterson 8d761ff045 remove extra PyErr_Format arguments (closes #28454)
Patch from Xiang Zhang.
2016-10-16 15:41:46 -07:00
Victor Stinner 5a33759fba Merge 3.6 2016-10-12 13:59:13 +02:00
Victor Stinner ebe17e0347 Fix _Py_normalize_encoding() command
It's not exactly the same than encodings.normalize_encoding(): the C function
also converts to lowercase.
2016-10-12 13:57:45 +02:00
Benjamin Peterson 8a3748290a merge 3.6 (#28417) 2016-10-11 23:01:12 -07:00
Benjamin Peterson b329e1bb5b va_end vargs2 once (closes #28417) 2016-10-11 23:00:58 -07:00
Serhiy Storchaka 2e58f1a52a Issue #28400: Removed uncessary checks in unicode_char and resize_copy.
1. In resize_copy we don't need to PyUnicode_READY(unicode) since when
it's not PyUnicode_WCHAR_KIND it should be ready.
2. In unicode_char, PyUnicode_1BYTE_KIND is handled by get_latin1_char.

Patch by Xiang Zhang.
2016-10-09 23:44:48 +03:00
Serhiy Storchaka 21d9f10c94 Merge from 3.5. 2016-10-08 22:46:01 +03:00
Serhiy Storchaka 9c0e1f83af Issue #28379: Added sanity checks and tests for PyUnicode_CopyCharacters().
Patch by Xiang Zhang.
2016-10-08 22:45:38 +03:00
Victor Stinner 44f4874e68 Merge 3.5 2016-09-21 14:13:53 +02:00
Victor Stinner 1ddf53d496 Fix PyUnicode_FromFormatV() error handling
Issue #28233: Fix a memory leak if the format string contains a non-ASCII
character, destroy the unicode writer.
2016-09-21 14:13:14 +02:00
Christian Heimes 2f2fee19ec va_end() all va_copy()ed va_lists. 2016-09-21 11:37:27 +02:00
Benjamin Peterson 0c21214f3e replace usage of Py_VA_COPY with the (C99) standard va_copy 2016-09-20 20:39:33 -07:00
Christian Heimes f051e43b22 Issue #28126: Replace Py_MEMCPY with memcpy(). Visual Studio can properly optimize memcpy(). 2016-09-13 20:22:02 +02:00
Benjamin Peterson 621b430a14 remove all usage of Py_LOCAL 2016-09-09 13:54:34 -07:00
Benjamin Peterson 33d2a492d0 promote some shifts to unsigned, so as not to invoke undefined behavior 2016-09-06 20:40:04 -07:00
R David Murray 110b6fecbb #27364: Deprecate invalid escape strings in str/byutes.
Patch by Emanuel Barry, reviewed by Serhiy Storchaka and Martin Panter.
2016-09-08 15:34:08 -04:00
Steve Dower cc16be85c0 Issue #27781: Change file system encoding on Windows to UTF-8 (PEP 529) 2016-09-08 10:35:16 -07:00
Benjamin Peterson 47ff0734b8 more PY_LONG_LONG to long long 2016-09-08 09:15:54 -07:00
Benjamin Peterson 2e7c5e9c11 replace some Py_LOCAL_INLINE with the inline keyword 2016-09-07 15:33:32 -07:00
Benjamin Peterson 4b9abf3a27 merge 3.5 2016-09-06 20:42:17 -07:00
Brett Cannon a571120410 Issue #27182: Add support for path-like objects to PyUnicode_FSDecoder(). 2016-09-06 19:36:01 -07:00
Victor Stinner 62ec3317d2 Optimize unicode_escape and raw_unicode_escape
Issue #16334. Patch written by Serhiy Storchaka.
2016-09-06 17:04:34 -07:00
Victor Stinner 2740e46089 _PyUnicodeWriter: assert that max character <= MAX_UNICODE 2016-09-06 16:58:36 -07:00
Brett Cannon ec6ce879c7 Issue #26027: Support path-like objects in PyUnicode-FSConverter().
This is to add support for os.exec*() and os.spawn*() functions. Part
of PEP 519.
2016-09-06 15:50:29 -07:00
Benjamin Peterson 9b3d77052f replace Python aliases for standard integer types with the standard integer types (#17884) 2016-09-06 13:24:00 -07:00
Serhiy Storchaka ea525a2d1a Issue #27078: Added BUILD_STRING opcode. Optimized f-strings evaluation. 2016-09-06 22:07:53 +03:00
Benjamin Peterson af580dff4a replace PY_LONG_LONG with long long 2016-09-06 10:46:49 -07:00
Benjamin Peterson ed4aa83ff7 require a long long data type (closes #27961) 2016-09-05 17:44:18 -07:00
Victor Stinner 942889aae2 Issue #27938: Add a fast-path for us-ascii encoding
Other changes:

* Rewrite _Py_normalize_encoding() as a C implementation of
  encodings.normalize_encoding(). For example, " utf-8 " is now normalized to
  "utf_8". So the fast path is now used for more name variants of the same
  encoding.
* Avoid strcpy() when encoding is NULL: call directly the UTF-8 codec
2016-09-05 15:40:10 -07:00
Victor Stinner 1a05d6c04d PEP 7 style for if/else in C
Add also a newline for readability in normalize_encoding().
2016-09-02 12:12:23 +02:00
Raymond Hettinger 15f44ab043 Issue #27895: Spelling fixes (Contributed by Ville Skyttä). 2016-08-30 10:47:49 -07:00
Serhiy Storchaka febc332056 Issue #26754: Undocumented support of general bytes-like objects
as path in compile() and similar functions is now deprecated.
2016-08-06 23:29:29 +03:00
Berker Peksag ced8d4c6eb Issue #27454: Use PyDict_SetDefault in PyUnicode_InternInPlace
Patch by INADA Naoki.
2016-07-25 04:40:39 +03:00
Serhiy Storchaka f95de0e8cc Issue #26754: PyUnicode_FSDecoder() accepted a filename argument encoded as
an iterable of integers. Now only strings and byte-like objects are accepted.
2016-06-18 13:56:16 +03:00
Serhiy Storchaka 9305d83425 Issue #26754: PyUnicode_FSDecoder() accepted a filename argument encoded as
an iterable of integers. Now only strings and byte-like objects are accepted.
2016-06-18 13:53:36 +03:00
Martin Panter 0b7d84de6b Issue #27171: Merge typo fixes from 3.5 2016-06-02 10:11:18 +00:00
Martin Panter e26da7c03a Issue #27171: Fix typos in documentation, comments, and test function names 2016-06-02 10:07:09 +00:00
Serhiy Storchaka dd40fc3e57 Issue #26765: Moved common code and docstrings for bytes and bytearray methods
to bytes_methods.c.
2016-05-04 22:23:26 +03:00
Martin Panter cda80940ed Issue #15984: Merge PyUnicode doc from 3.5 2016-04-15 02:27:11 +00:00
Martin Panter 6245cb3c01 Correct “an” → “a” with “Unicode”, “user”, “UTF”, etc
This affects documentation, code comments, and a debugging messages.
2016-04-15 02:14:19 +00:00
Serhiy Storchaka 21a663ea28 Issue #26057: Got rid of nonneeded use of PyUnicode_FromObject(). 2016-04-13 15:37:23 +03:00
Serhiy Storchaka f01e408c16 Issue #26200: Added Py_SETREF and replaced Py_XSETREF with Py_SETREF
in places where Py_DECREF was used.
2016-04-10 18:12:01 +03:00
Serhiy Storchaka 57a01d3a0e Issue #26200: Added Py_SETREF and replaced Py_XSETREF with Py_SETREF
in places where Py_DECREF was used.
2016-04-10 18:05:40 +03:00
Serhiy Storchaka ec39756960 Issue #22570: Renamed Py_SETREF to Py_XSETREF. 2016-04-06 09:50:03 +03:00
Serhiy Storchaka 48842714b9 Issue #22570: Renamed Py_SETREF to Py_XSETREF. 2016-04-06 09:45:48 +03:00
Serhiy Storchaka ab479c49d3 Issue #26494: Fixed crash on iterating exhausting iterators.
Affected classes are generic sequence iterators, iterators of str, bytes,
bytearray, list, tuple, set, frozenset, dict, OrderedDict, corresponding
views and os.scandir() iterator.
2016-03-30 20:41:15 +03:00
Serhiy Storchaka fbb1c5ee06 Issue #26494: Fixed crash on iterating exhausting iterators.
Affected classes are generic sequence iterators, iterators of str, bytes,
bytearray, list, tuple, set, frozenset, dict, OrderedDict, corresponding
views and os.scandir() iterator.
2016-03-30 20:40:02 +03:00
Victor Stinner f2192855dd Merge 3.5 2016-03-01 22:07:53 +01:00
Victor Stinner 337986740f Issue #26464: Fix unicode_fast_translate() again
Initialize i variable if the string is non-ASCII.
2016-03-01 21:59:58 +01:00
Victor Stinner 3d9d77a3dc Merge 3.5 2016-03-01 21:30:50 +01:00
Victor Stinner 6c9aa8f2bf Fix str.translate()
Issue #26464: Fix str.translate() when string is ASCII and first replacements
removes character, but next replacement uses a non-ASCII character or a string
longer than 1 character. Regression introduced in Python 3.5.0.
2016-03-01 21:30:30 +01:00
Victor Stinner 5b96f17b1c Merge 3.5 2016-01-27 17:01:13 +01:00
Victor Stinner 5bc03a6d4d Fix resize_compact()
Issue #26217: resize_compact() must set wstr_length to 0 after freeing the wstr
string. Otherwise, an assertion fails in _PyUnicode_CheckConsistency().
2016-01-27 16:56:53 +01:00
Serhiy Storchaka 726fc139a5 Issue #20440: More use of Py_SETREF.
This patch is manually crafted and contains changes that couldn't be handled
automatically.
2015-12-27 15:44:33 +02:00
Serhiy Storchaka 191321d11b Issue #20440: More use of Py_SETREF.
This patch is manually crafted and contains changes that couldn't be handled
automatically.
2015-12-27 15:41:34 +02:00
Serhiy Storchaka ef1585eb9a Issue #25923: Added more const qualifiers to signatures of static and private functions. 2015-12-25 20:01:53 +02:00
Serhiy Storchaka 2d06e84455 Issue #25923: Added the const qualifier to static constant arrays. 2015-12-25 19:53:18 +02:00
Serhiy Storchaka f006940351 Issue #20440: Massive replacing unsafe attribute setting code with special
macro Py_SETREF.
2015-12-24 10:39:57 +02:00
Serhiy Storchaka 5a57ade58e Issue #20440: Massive replacing unsafe attribute setting code with special
macro Py_SETREF.
2015-12-24 10:35:59 +02:00
Serhiy Storchaka 9b3a2eec1c Issues #25890, #25891, #25892: Removed unused variables in Windows code.
Reported by Alexander Riccio.
2015-12-18 10:03:13 +02:00
Serhiy Storchaka 7c088a9b5c Issue #25709: Fixed problem with in-place string concatenation and utf-8 cache. 2015-12-03 01:05:52 +02:00
Serhiy Storchaka 6648bf5661 Issue #25709: Fixed problem with in-place string concatenation and utf-8 cache. 2015-12-03 01:04:37 +02:00
Serhiy Storchaka 31b9410654 Issue #25709: Fixed problem with in-place string concatenation and utf-8 cache. 2015-12-03 01:02:03 +02:00
Serhiy Storchaka 7aa690860e Issue #25709: Fixed problem with in-place string concatenation and utf-8 cache. 2015-12-03 01:02:03 +02:00
Benjamin Peterson d798dc1034 merge 3.5 (#25630) 2015-11-15 21:57:50 -08:00
Benjamin Peterson a4d33b3428 make the PyUnicode_FSConverter cleanup set the decrefed argument to NULL (closes #25630) 2015-11-15 21:57:39 -08:00
Serhiy Storchaka 413fdcea21 Issue #24821: Refactor STRINGLIB(fastsearch_memchr_1char) and split it on
STRINGLIB(find_char) and STRINGLIB(rfind_char) that can be used independedly
without special preconditions.
2015-11-14 15:42:17 +02:00
Serhiy Storchaka 4a7c03aab4 Issue #25523: Merge a-to-an corrections from 3.5. 2015-11-02 14:44:29 +02:00
Serhiy Storchaka a84f6c3dd3 Issue #25523: Merge a-to-an corrections from 3.4. 2015-11-02 14:39:05 +02:00
Serhiy Storchaka d65c9496da Issue #25523: Further a-to-an corrections. 2015-11-02 14:10:23 +02:00
Victor Stinner 358af13526 Issue #25353: Optimize unicode escape and raw unicode escape encoders to use
the new _PyBytesWriter API.
2015-10-12 22:36:57 +02:00
Victor Stinner 6c2cdae9e6 Writer APIs: use empty string singletons
Modify _PyBytesWriter_Finish() and _PyUnicodeWriter_Finish() to return the
empty bytes/Unicode string if the string is empty.
2015-10-12 13:29:43 +02:00
Victor Stinner 6bd525b656 Optimize error handlers of ASCII and Latin1 encoders when the replacement
string is pure ASCII: use _PyBytesWriter_WriteBytes(), don't check individual
character.

Cleanup unicode_encode_ucs1():

* Rename repunicode to rep
* Clear rep object on error
* Factorize code between bytes and unicode path
2015-10-09 13:10:05 +02:00
Victor Stinner ce179bf6ba Add _PyBytesWriter_WriteBytes() to factorize the code 2015-10-09 12:57:22 +02:00
Victor Stinner ad7715891e _PyBytesWriter: simplify code to avoid "prealloc" parameters
Substract preallocate bytes from min_size before calling
_PyBytesWriter_Prepare().
2015-10-09 12:38:53 +02:00
Victor Stinner 3fa36ff5e4 Issue #25318: Fix backslashreplace()
Fix code to estimate the needed space.
2015-10-09 03:37:11 +02:00
Victor Stinner 797485e101 Issue #25318: Avoid sprintf() in backslashreplace()
Rewrite backslashreplace() to be closer to PyCodec_BackslashReplaceErrors().

Add also unit tests for non-BMP characters.
2015-10-09 03:17:30 +02:00
Victor Stinner 0016507c16 Issue #25318: Move _PyBytesWriter to bytesobject.c
Declare also the private API in bytesobject.h.
2015-10-09 01:53:21 +02:00
Victor Stinner e7bf86cd7d Optimize backslashreplace error handler
Issue #25318: Optimize backslashreplace and xmlcharrefreplace error handlers in
UTF-8 encoder. Optimize also backslashreplace error handler for ASCII and
Latin1 encoders.

Use the new _PyBytesWriter API to optimize these error handlers for the
encoders. It avoids to create an exception and call the slow implementation of
the error handler.
2015-10-09 01:39:28 +02:00
Victor Stinner fdfbf78114 Issue #25318: Add _PyBytesWriter API
Add a new private API to optimize Unicode encoders. It uses a small buffer
allocated on the stack and supports overallocation.

Use _PyBytesWriter API for UCS1 (ASCII and Latin1) and UTF-8 encoders. Enable
overallocation for the UTF-8 encoder with error handlers.

unicode_encode_ucs1(): initialize collend to collstart+1 to not check the
current character twice, we already know that it is not ASCII.
2015-10-09 00:33:49 +02:00
Victor Stinner 74e8fac3c8 Issue #25301: Fix compatibility with ISO C90 2015-10-05 13:49:26 +02:00
Victor Stinner 1d65d9192d Issue #25301: The UTF-8 decoder is now up to 15 times as fast for error
handlers: ``ignore``, ``replace`` and ``surrogateescape``.
2015-10-05 13:43:50 +02:00
Victor Stinner eb36fdaad8 Fix _PyUnicodeWriter_PrepareKind()
Initialize kind to 0 (PyUnicode_WCHAR_KIND) to ensure that
_PyUnicodeWriter_PrepareKind() handles correctly read-only buffer: copy the
buffer.
2015-10-03 01:55:51 +02:00
Serhiy Storchaka 29e68edbf4 Issue #24848: Fixed bugs in UTF-7 decoding of misformed data:
1. Non-ASCII bytes were accepted after shift sequence.
2. A low surrogate could be emitted in case of error in high surrogate.
3. In some circumstances the '\xfd' character was produced instead of the
replacement character '\ufffd' (due to a bug in _PyUnicodeWriter).
2015-10-02 13:14:03 +03:00
Serhiy Storchaka 58c8f2bb6d Issue #24848: Fixed bugs in UTF-7 decoding of misformed data:
1. Non-ASCII bytes were accepted after shift sequence.
2. A low surrogate could be emitted in case of error in high surrogate.
3. In some circumstances the '\xfd' character was produced instead of the
replacement character '\ufffd' (due to a bug in _PyUnicodeWriter).
2015-10-02 13:13:14 +03:00
Serhiy Storchaka 28b21e50c8 Issue #24848: Fixed bugs in UTF-7 decoding of misformed data:
1. Non-ASCII bytes were accepted after shift sequence.
2. A low surrogate could be emitted in case of error in high surrogate.
2015-10-02 13:07:28 +03:00
Victor Stinner 3222da26fe Make _PyUnicode_TranslateCharmap() symbol private
unicodeobject.h exposes PyUnicode_TranslateCharmap() and PyUnicode_Translate().
2015-10-01 22:07:32 +02:00
Victor Stinner 01ada3996b Issue #25267: The UTF-8 encoder is now up to 75 times as fast for error
handlers: ``ignore``, ``replace``, ``surrogateescape``, ``surrogatepass``.
Patch co-written with Serhiy Storchaka.
2015-10-01 21:54:51 +02:00
Victor Stinner c3713e9706 Optimize ascii/latin1+surrogateescape encoders
Issue #25227: Optimize ASCII and latin1 encoders with the ``surrogateescape``
error handler: the encoders are now up to 3 times as fast.

Initial patch written by Serhiy Storchaka.
2015-09-29 12:32:13 +02:00
Victor Stinner 0030cd52da Issue #25227: Cleanup unicode_encode_ucs1() error handler
* Change limit type from unsigned int to Py_UCS4, to use the same type than the
  "ch" variable (an Unicode character).
* Reuse ch variable for _Py_ERROR_XMLCHARREFREPLACE
* Add some newlines for readability
2015-09-24 14:45:00 +02:00
Victor Stinner 54385b206d Issue #24870: revert unwanted change
Sorry, I pushed the patch on the UTF-8 decoder by mistake :-(
2015-09-22 10:46:52 +02:00
Victor Stinner 5ebae87628 Issue #25207, #14626: Fix my commit.
It doesn't work to use #define XXX defined(YYY)" and then "#ifdef XXX"
to check YYY.
2015-09-22 01:29:33 +02:00
Victor Stinner 6174474bea _PyUnicodeWriter_PrepareInternal(): make the assertion more strict 2015-09-22 01:01:17 +02:00
Victor Stinner ca9381ea01 Issue #24870: Add _PyUnicodeWriter_PrepareKind() macro
Add a macro which ensures that the writer has at least the requested kind.
2015-09-22 00:58:32 +02:00
Victor Stinner 5014920cb7 Issue #24870: Reuse the new _Py_error_handler enum
Factorize code with the new get_error_handler() function.

Add some empty lines for readability.
2015-09-22 00:26:54 +02:00
Victor Stinner f96418de05 Issue #24870: Optimize the ASCII decoder for error handlers: surrogateescape,
ignore and replace. Initial patch written by Naoki Inada.

The decoder is now up to 60 times as fast for these error handlers.

Add also unit tests for the ASCII decoder.
2015-09-21 23:06:27 +02:00
Zachary Ware 070bd62cfa Closes #21279: Merge with 3.5 2015-08-06 00:05:13 -05:00
Zachary Ware d987a81d29 Issue #21279: Merge with 3.4 2015-08-06 00:04:23 -05:00
Zachary Ware 79b98df023 Issue #21279: Flesh out str.translate docs
Initial patch by Kinga Farkas, Martin Panter, and John Posner.
2015-08-05 23:54:15 -05:00
Raymond Hettinger ac2ef65c32 Make the unicode equality test an external function rather than in-lining it.
The real benefit of the unicode specialized function comes from
bypassing the overhead of PyObject_RichCompareBool() and not
from being in-lined (especially since there was almost no shared
data between the caller and callee).  Also, the in-lining was
having a negative effect on code generation for the callee.
2015-07-04 16:04:44 -07:00
Serhiy Storchaka d4ea03c785 Issue #24284: The startswith and endswith methods of the str class no longer
return True when finding the empty string and the indexes are completely out
of range.
2015-05-31 09:15:51 +03:00
Antoine Pitrou 873e0df946 Fix some compilation warnings when using gcc (-Wmaybe-uninitialized). 2015-05-19 21:06:04 +02:00
Antoine Pitrou f6d1f1fa8a Fix some compilation warnings when using gcc (-Wmaybe-uninitialized). 2015-05-19 21:04:33 +02:00
Serhiy Storchaka 0d4df752ac Issue #15027: The UTF-32 encoder is now 3x to 7x faster. 2015-05-12 23:12:45 +03:00
Serhiy Storchaka 7e9d1d1a1b Issue #23908: os functions now reject paths with embedded null character
on Windows instead of silently truncate them.

Removed no longer used _PyUnicode_HasNULChars().
2015-04-20 10:12:28 +03:00
Serhiy Storchaka 1009bf18b3 Issue #23501: Argumen Clinic now generates code into separate files by default. 2015-04-03 23:53:51 +03:00
Victor Stinner 1912b39def _PyUnicodeWriter_WriteStr() now checks that the input string is consistent
in debug mode to detect bugs earlier.

_PyUnicodeWriter_Finish() doesn't check if the read only string is consistent,
whereas it does check consistency for strings built by itself.
2015-03-26 09:37:23 +01:00
Serhiy Storchaka d9d769fcdd Issue #23573: Increased performance of string search operations (str.find,
str.index, str.count, the in operator, str.split, str.partition) with
arguments of different kinds (UCS1, UCS2, UCS4).
2015-03-24 21:55:47 +02:00
Victor Stinner f50e187724 Fix compiler warnings: comparison between signed and unsigned numbers 2015-03-20 11:32:24 +01:00
Victor Stinner 0c39b1b970 Initialize variables to prevent GCC warnings 2015-03-18 15:02:06 +01:00
Benjamin Peterson e5a853c390 use PyMem_NEW to detect overflow (closes #23362) 2015-03-02 13:23:25 -05:00
Steve Dower 3e96f324dc Issue #23451: Update pyconfig.h for Windows to require Vista headers and remove unnecessary version checks. 2015-03-02 08:01:10 -08:00
Serhiy Storchaka 78a8249127 Issue #23490: Fixed possible crashes related to interoperability between
old-style and new API for string with 2**30-1 characters.
2015-02-20 21:34:39 +02:00
Serhiy Storchaka e55181f517 Issue #23490: Fixed possible crashes related to interoperability between
old-style and new API for string with 2**30-1 characters.
2015-02-20 21:34:06 +02:00
Serhiy Storchaka 4d0d982985 Issue #23446: Use PyMem_New instead of PyMem_Malloc to avoid possible integer
overflows.  Added few missed PyErr_NoMemory().
2015-02-16 13:33:32 +02:00
Serhiy Storchaka 1a1ff29659 Issue #23446: Use PyMem_New instead of PyMem_Malloc to avoid possible integer
overflows.  Added few missed PyErr_NoMemory().
2015-02-16 13:28:22 +02:00
Serhiy Storchaka 4dbc305002 Issue #23055: Fixed a buffer overflow in PyUnicode_FromFormatV. Analysis
and fix by Guido Vranken.
2015-01-27 22:18:46 +02:00
Victor Stinner 29dacf2e97 Issue #15859: PyUnicode_EncodeFSDefault(), PyUnicode_EncodeMBCS() and
PyUnicode_EncodeCodePage() now raise an exception if the object is not an
Unicode object. For PyUnicode_EncodeFSDefault(), it was already the case on
platforms other than Windows. Patch written by Campbell Barton.
2015-01-26 16:41:32 +01:00
Serhiy Storchaka bbd3aa8ece Issue #23321: Fixed a crash in str.decode() when error handler returned
replacment string longer than mailformed input data.
2015-01-26 01:24:31 +02:00
Serhiy Storchaka 7e4b9057b3 Issue #23321: Fixed a crash in str.decode() when error handler returned
replacment string longer than mailformed input data.
2015-01-26 01:22:54 +02:00
Ethan Furman b95b56150f Issue20284: Implement PEP461 2015-01-23 20:05:18 -08:00
Serhiy Storchaka 82e07b92b3 Issue #23181: More "codepoint" -> "code point". 2015-01-18 11:33:31 +02:00
Serhiy Storchaka d3faf43f9b Issue #23181: More "codepoint" -> "code point". 2015-01-18 11:28:37 +02:00
Serhiy Storchaka b757c83ec6 Issue #22581: Use more "bytes-like object" throughout the docs and comments. 2014-12-05 22:25:22 +02:00
Serhiy Storchaka 133b11b566 Issue #22975: Close block at right place. 2014-12-01 18:56:28 +02:00
Serhiy Storchaka 92bf919ed0 Issue #22581: Use more "bytes-like object" throughout the docs and comments. 2014-12-05 22:26:10 +02:00
Serhiy Storchaka 407249c62b Issue #22975: Close block at right place. 2014-12-01 18:56:54 +02:00
Victor Stinner 3aa979e0cd Issue #20948: Inline makefmt() in unicode_fromformat_arg() 2014-11-18 21:40:51 +01:00
Antoine Pitrou b6dc9b7554 Fixed signed/unsigned comparison warning 2014-10-15 23:14:53 +02:00
Antoine Pitrou 4e334241b7 Fixed signed/unsigned comparison warning 2014-10-15 23:14:53 +02:00
Benjamin Peterson 736982d36d merge 3.4 (closes #22643) 2014-10-15 12:17:47 -04:00
Benjamin Peterson 9c422f3c3d merge 3.3 2014-10-15 12:17:33 -04:00
Benjamin Peterson 1e211ff10d it suffices to check for PY_SSIZE_T_MAX overflow (#22643) 2014-10-15 12:17:21 -04:00
Benjamin Peterson 315aa40403 Merge 3.4 2014-10-15 11:51:17 -04:00
Benjamin Peterson 60d7a73194 Merge 3.3 2014-10-15 11:51:12 -04:00
Benjamin Peterson c0e64f5027 make sure length is unsigned 2014-10-15 11:51:05 -04:00
Benjamin Peterson 6925264334 merge 3.4 (#22643) 2014-10-15 11:49:15 -04:00
Benjamin Peterson 1cbb3fe775 merge 3.3 (#22643) 2014-10-15 11:48:41 -04:00
Benjamin Peterson e1bd38c03c fix integer overflow in unicode case operations (closes #22643) 2014-10-15 11:47:36 -04:00
Gregory P. Smith 8486f9b134 Fix "warning: comparison between signed and unsigned integer expressions"
-Wsign-compare warnings in unicodeobject.c.  These were all a result
of sizeof() being unsigned and being compared to a Py_ssize_t.
Not actual problems.
2014-09-30 00:33:24 -07:00
Benjamin Peterson fd97a6fb2d merge 3.4 (#22520) 2014-09-29 23:02:56 -04:00
Benjamin Peterson 43030ee780 merge 3.3 (#22520) 2014-09-29 23:02:35 -04:00
Benjamin Peterson 736b8012b4 prevent overflow in unicode_repr (closes #22520) 2014-09-29 23:02:15 -04:00
Benjamin Peterson 10e4b2545e merge 3.4 (closes #22518) 2014-09-29 18:53:58 -04:00
Benjamin Peterson 2b76ce6d27 merge 3.3 (closes #22518) 2014-09-29 18:50:06 -04:00
Benjamin Peterson a1c1be4e03 cleanup overflowing handling in unicode_decode_call_errorhandler and unicode_encode_ucs1 (closes #22518) 2014-09-29 18:18:57 -04:00
Serhiy Storchaka 20b39b27d9 Removed redundant casts to `char *`.
Corresponding functions now accept `const char *` (issue #1772673).
2014-09-28 11:27:24 +03:00
Benjamin Peterson fa5021699a Merge 3.3 2014-10-15 23:58:32 -04:00
Serhiy Storchaka d8a1447c99 Issue #22215: Now ValueError is raised instead of TypeError when str or bytes
argument contains not permitted null character or byte.
2014-09-06 20:07:17 +03:00
Victor Stinner 12174a5dca Issue #22156: Fix "comparison between signed and unsigned integers" compiler
warnings in the Objects/ subdirectory.

PyType_FromSpecWithBases() and PyType_FromSpec() now reject explicitly negative
slot identifiers.
2014-08-15 23:17:38 +02:00
Victor Stinner f6a271ae98 Issue #18395: Rename ``_Py_char2wchar()`` to :c:func:`Py_DecodeLocale`, rename
``_Py_wchar2char()`` to :c:func:`Py_EncodeLocale`, and document these
functions.
2014-08-01 12:28:48 +02:00
Victor Stinner e1f17c6c0b unicodeobject.c: fix a compiler warning on Windows 64 bits 2014-07-25 14:03:03 +02:00
Victor Stinner c68b7fba86 (Merge 3.4) Issue #21892, #21893: Partial revert of changeset 4f55e802baf0,
PyErr_Format() uses "%zd" for Py_ssize_t, not PY_FORMAT_SIZE_T
2014-07-04 22:50:13 +02:00
Victor Stinner a33bce0945 Issue #21892, #21893: Partial revert of changeset 4f55e802baf0, PyErr_Format()
uses "%zd" for Py_ssize_t, not PY_FORMAT_SIZE_T
2014-07-04 22:47:46 +02:00
Victor Stinner 9f43505f3d (Merge 3.4) Closes #21892, #21893: Use PY_FORMAT_SIZE_T instead of %zi or %zu
to format C size_t, because %zi/%u is not supported on all platforms.
2014-07-01 08:57:54 +02:00
Victor Stinner 293f3f526d Closes #21892, #21893: Use PY_FORMAT_SIZE_T instead of %zi or %zu to format C
size_t, because %zi/%u is not supported on all platforms.
2014-07-01 08:57:10 +02:00
Serhiy Storchaka 48070c1248 Issue #23803: Fixed str.partition() and str.rpartition() when a separator
is wider then partitioned string.
2015-03-29 19:21:02 +03:00
Benjamin Peterson 92ce1b4392 merge 3.3 (#23362) 2015-03-02 13:23:41 -05:00
Victor Stinner 4dd25256e2 Issue #21118: PyLong_AS_LONG() result type is long
Even if PyLong_AS_LONG() cannot fail, I prefer to use the right type.
2014-04-08 09:14:21 +02:00
Benjamin Peterson 1365de764e fix reference leaks in the translate fast path (closes #21175)
Patch by Josh Rosenberg.
2014-04-07 20:15:41 -04:00
Victor Stinner 872b291b96 Issue #21118: Optimize also str.translate() for ASCII => ASCII deletion 2014-04-05 14:27:07 +02:00
Victor Stinner 4ff33af257 Issue #21118: Add unit test for invalid character replacement (code point higher than U+10ffff) 2014-04-05 11:56:37 +02:00
Victor Stinner 89a76abf20 Issue #21118: Optimize str.translate() for ASCII => ASCII translation 2014-04-05 11:44:04 +02:00
Victor Stinner 8a4422e78d Issue #21118: Remove unused variable 2014-04-05 00:15:52 +02:00
Victor Stinner 1194ea020c Issue #21118: Use _PyUnicodeWriter API in str.translate() to simplify and
factorize the code
2014-04-04 19:37:40 +02:00
Ethan Furman 9ab748013b Issue19995: more informative error message; spelling corrections; use operator.mod instead of __mod__ 2014-03-21 06:38:46 -07:00
Ethan Furman 38d872ee5d Issue19995: passing a non-int to %o, %c, %x, or %X now raises an exception 2014-03-19 08:38:52 -07:00
Victor Stinner 7d00cc1a64 Issue #20574: Implement incremental decoder for cp65001 code
(Windows code page 65001, Microsoft UTF-8).
2014-03-17 23:08:06 +01:00
Kristján Valur Jónsson 25dded041f Make the various iterators' "setstate" sliently and consistently clip the
index.  This avoids the possibility of setting an iterator to an invalid
state.
2014-03-05 13:47:57 +00:00
Kristján Valur Jónsson c5cc5011ac Make the various iterators' "setstate" sliently and consistently clip the
index.  This avoids the possibility of setting an iterator to an invalid
state.
2014-03-05 15:23:07 +00:00
Serhiy Storchaka 94ee389308 Issue #19619: Blacklist non-text codecs in method API
str.encode, bytes.decode and bytearray.decode now use an
internal API to throw LookupError for known non-text encodings,
rather than attempting the encoding or decoding operation and
then throwing a TypeError for an unexpected output type.

The latter mechanism remains in place for third party non-text
encodings.

Backported changeset d68df99d7a57.
2014-02-24 14:43:03 +02:00
Benjamin Peterson 4267869ad8 merge 3.3 (#20507) 2014-02-15 13:03:20 -05:00
Benjamin Peterson 9743b2c2b5 give non-iterable TypeError a message (closes #20507) 2014-02-15 13:02:52 -05:00
Serhiy Storchaka dfe98a102e Issue #20437: Fixed 22 potential bugs when deleting objects references. 2014-02-09 13:46:20 +02:00
Serhiy Storchaka 505ff755d7 Issue #20437: Fixed 21 potential bugs when deleting objects references. 2014-02-09 13:33:53 +02:00
Larry Hastings 2623c8c23c Issue #20530: Argument Clinic's signature format has been revised again.
The new syntax is highly human readable while still preventing false
positives.  The syntax also extends Python syntax to denote "self" and
positional-only parameters, allowing inspect.Signature objects to be
totally accurate for all supported builtins in Python 3.4.
2014-02-08 22:15:29 -08:00
Serhiy Storchaka 6cbf151032 Issue #20538: UTF-7 incremental decoder produced inconsistant string when
input was truncated in BASE64 section.
2014-02-08 14:06:33 +02:00
Serhiy Storchaka 016a3f33a5 Issue #20538: UTF-7 incremental decoder produced inconsistant string when
input was truncated in BASE64 section.
2014-02-08 14:01:29 +02:00
Larry Hastings 581ee3618c Issue #20326: Argument Clinic now uses a simple, unique signature to
annotate text signatures in docstrings, resulting in fewer false
positives.  "self" parameters are also explicitly marked, allowing
inspect.Signature() to authoritatively detect (and skip) said parameters.

Issue #20326: Argument Clinic now generates separate checksums for the
input and output sections of the block, allowing external tools to verify
that the input has not changed (and thus the output is not out-of-date).
2014-01-28 05:00:08 -08:00
Larry Hastings c20472640c Issue #20390: Small fixes and improvements for Argument Clinic. 2014-01-25 20:43:29 -08:00
Larry Hastings 5c66189e88 Issue #20189: Four additional builtin types (PyTypeObject,
PyMethodDescr_Type, _PyMethodWrapper_Type, and PyWrapperDescr_Type)
have been modified to provide introspection information for builtins.
Also: many additional Lib, test suite, and Argument Clinic fixes.
2014-01-24 06:17:25 -08:00
Ethan Furman a70805e1fa Issue19995: fixed typo; switched from test.support.check_warnings to assertWarns 2014-01-12 08:42:35 -08:00
Ethan Furman f9bba9c67f Issue19995: issue deprecation warning for non-integer values to %c, %o, %x, %X 2014-01-11 23:20:58 -08:00
Larry Hastings 61272b77b0 Issue #19273: The marker comments Argument Clinic uses have been changed
to improve readability.
2014-01-07 12:41:53 -08:00
Ethan Furman df3ed242c0 Issue19995: %o, %x, %X now only accept ints 2014-01-05 06:50:30 -08:00
Serhiy Storchaka 3079328d29 Reverted changeset b72c5573c5e7 (issue #15027). 2014-01-04 22:44:01 +02:00
Serhiy Storchaka 583a93943c Issue #15027: Rewrite the UTF-32 encoder. It is now 1.6x to 3.5x faster. 2014-01-04 19:25:37 +02:00
Victor Stinner fa4e68d425 Remove deadcode (HASH macro is no more defined) 2014-01-03 17:42:18 +01:00
Victor Stinner 92a419eea4 Remove now unused variables 2014-01-03 17:39:40 +01:00
Victor Stinner f3b46b4a66 unicode_char() uses get_latin1_char() to get latin1 singleton characters 2014-01-03 13:16:00 +01:00
Victor Stinner 985a82a6d2 add unicode_char() in unicodeobject.c to factorize code 2014-01-03 12:53:47 +01:00
Larry Hastings 44e2eaab54 Issue #19674: inspect.signature() now produces a correct signature
for some builtins.
2013-11-23 15:37:55 -08:00
Larry Hastings ebdcb50b8a Issue #19730: Argument Clinic now supports all the existing PyArg
"format units" as legacy converters, as well as two new features:
"self converters" and the "version" directive.
2013-11-23 14:54:00 -08:00
Nick Coghlan c72e4e6dcc Issue #19619: Blacklist non-text codecs in method API
str.encode, bytes.decode and bytearray.decode now use an
internal API to throw LookupError for known non-text encodings,
rather than attempting the encoding or decoding operation and
then throwing a TypeError for an unexpected output type.

The latter mechanism remains in place for third party non-text
encodings.
2013-11-22 22:39:36 +10:00
Christian Heimes 985ecdcfc2 ssue #19183: Implement PEP 456 'secure and interchangeable hash algorithm'.
Python now uses SipHash24 on all major platforms.
2013-11-20 11:46:18 +01:00
Victor Stinner 4a58707a34 Add _PyUnicodeWriter_WriteASCIIString() function 2013-11-19 12:54:53 +01:00
Serhiy Storchaka 58cf607d13 Issue #12892: The utf-16* and utf-32* codecs now reject (lone) surrogates.
The utf-16* and utf-32* encoders no longer allow surrogate code points
(U+D800-U+DFFF) to be encoded.
The utf-32* decoders no longer decode byte sequences that correspond to
surrogate code points.
The surrogatepass error handler now works with the utf-16* and utf-32* codecs.

Based on patches by Victor Stinner and Kang-Hao (Kenny) Lu.
2013-11-19 11:32:41 +02:00
Victor Stinner 6989ba0174 Issue #19581: Change the overallocation factor of _PyUnicodeWriter on Windows
On Windows, a factor of 50% gives best performances.
2013-11-18 21:08:39 +01:00
Larry Hastings ed4a1c5703 Argument Clinic: rename "self" to "module" for module-level functions. 2013-11-18 09:32:13 -08:00
Ezio Melotti 745d54d2fa #17806: Added keyword-argument support for "tabsize" to str/bytes.expandtabs(). 2013-11-16 19:10:57 +02:00
Nick Coghlan 8b097b4ed7 Close #17828: better handling of codec errors
- output type errors now redirect users to the type-neutral
  convenience functions in the codecs module
- stateless errors that occur during encoding and decoding
  will now be automatically wrapped in exceptions that give
  the name of the codec involved
2013-11-13 23:49:21 +10:00
Victor Stinner 66b3270975 _Py_normalize_encoding(): explain how the value 6 was computed 2013-11-07 23:12:23 +01:00
Victor Stinner df23e30bea Fix _Py_normalize_encoding(): ensure that buffer is big enough to store "utf-8"
if the input string is NULL
2013-11-07 13:33:36 +01:00
Victor Stinner ad14ccd047 Issue #19512: add _PyUnicode_CompareWithId() function
_PyUnicode_CompareWithId() is faster than PyUnicode_CompareWithASCIIString()
when both strings are equal and interned.

Add also _PyId_builtins identifier for "builtins" common string.
2013-11-07 00:46:04 +01:00
Victor Stinner 21ea21ef6d Issue #19424: PyUnicode_CompareWithASCIIString() normalizes memcmp() result
to -1, 0, 1
2013-11-04 11:28:26 +01:00
Victor Stinner f0c7b2af05 Issue #16286: remove duplicated identity check from unicode_compare()
Move the test to PyUnicode_Compare()
2013-11-04 11:27:14 +01:00
Victor Stinner fd9e44db37 Issue #16286: optimize PyUnicode_RichCompare() for identical strings (same
pointer) for any operator, not only Py_EQ and Py_NE.

Code of bytes_richcompare() and PyUnicode_RichCompare() is now closer.
2013-11-04 11:23:05 +01:00
Victor Stinner c8bc5377ac Issue #16286: write a new subfunction bytes_compare_eq()
* cleanup bytes_richcompare()
* PyUnicode_RichCompare(): replace a test with a XOR
2013-11-04 11:08:10 +01:00
Victor Stinner e1b1592fd4 Issue #19424: Fix a compiler warning on comparing signed/unsigned size_t
Patch written by Zachary Ware.
2013-11-03 13:53:12 +01:00
Victor Stinner a6b9b071a3 Issue #19424: Fix a compiler warning
memcmp() just takes raw pointers
2013-10-30 18:27:13 +01:00
Victor Stinner 602f7cf0b9 Issue #19424: Optimize PyUnicode_CompareWithASCIIString()
Use fast memcmp() instead of a loop using the slow PyUnicode_READ() macro.
strlen() is still necessary to check Unicode string containing null bytes.
2013-10-29 23:31:50 +01:00
Victor Stinner 68b674c9d4 Issue #19437: Fix _PyUnicode_New() (constructor of legacy string), set all
attributes before checking for error. The destructor expects all attributes to
be set. It is now safe to call Py_DECREF(unicode) in the constructor.
2013-10-29 19:31:43 +01:00
Victor Stinner fa3ba4c3bc Issue #18609: Add a fast-path for "iso8859-1" encoding
On AIX, the locale encoding may be "iso8859-1", which was not a known syntax of
the legacy ISO 8859-1 encoding.

Using a C codec instead of a Python codec is faster but also avoids tricky
issues during Python startup or complex code.
2013-10-29 11:34:05 +01:00
Victor Stinner a5afb58986 Issue #18408: Fix PyUnicode_AsUTF8AndSize(), raise MemoryError exception on
memory allocation failure
2013-10-29 01:28:23 +01:00
Serhiy Storchaka c679227e31 Issue #1772673: The type of `char*` arguments now changed to `const char*`. 2013-10-19 21:03:34 +03:00
Serhiy Storchaka 55e092f545 Issue #19279: UTF-7 decoder no more produces illegal strings. 2013-10-19 20:39:28 +03:00
Serhiy Storchaka 35804e4c63 Issue #19279: UTF-7 decoder no more produces illegal strings. 2013-10-19 20:38:19 +03:00
Larry Hastings 3182680210 Issue #16612: Add "Argument Clinic", a compile-time preprocessor
for C files to generate argument parsing code.  (See PEP 436.)
2013-10-19 00:09:25 -07:00
Ethan Furman fb13721b1b Close #18780: %-formatting now prints value for int subclasses with %d, %i, and %u codes. 2013-08-31 10:18:55 -07:00
Antoine Pitrou 9ed5f27266 Issue #18722: Remove uses of the "register" keyword in C code. 2013-08-13 20:18:52 +02:00
Raymond Hettinger e56666d17f Silence compiler warning about an uninitialized variable 2013-08-04 11:51:03 -07:00
Raymond Hettinger 5ed1b38a7d merge 2013-08-04 11:51:35 -07:00
Christian Heimes b578735dff Check return value of PyType_Ready(&EncodingMapType)
CID 486654
2013-07-20 14:57:28 +02:00
Christian Heimes 26532f7519 Check return value of PyType_Ready(&EncodingMapType)
CID 486654
2013-07-20 14:57:16 +02:00
Victor Stinner e699e5a218 Issue #18408: Don't check unicode consistency in _PyUnicode_HAS_UTF8_MEMORY()
and _PyUnicode_HAS_WSTR_MEMORY() macros

These macros are called in unicode_dealloc(), whereas the unicode object can be
"inconsistent" if the creation of the object failed.

For example, when unicode_subtype_new() fails on a memory allocation,
_PyUnicode_CheckConsistency() fails with an assertion error because data is
NULL.
2013-07-15 18:22:47 +02:00
Victor Stinner 9e6b4d715c Issue #18408: _PyUnicodeWriter_Finish() now clears its buffer attribute in all
cases, so _PyUnicodeWriter_Dealloc() can be called after finish.
2013-07-09 00:37:24 +02:00
Victor Stinner 15a0bd3965 Issue #18408: Fix _PyUnicodeWriter_Finish(): clear writer->buffer,
so _PyUnicodeWriter_Dealloc() can be called on the writer after finish.
2013-07-08 22:29:55 +02:00
Victor Stinner 6f8eeee7b9 Issue #18203: Fix _Py_DecodeUTF8_surrogateescape(), use PyMem_RawMalloc() as _Py_char2wchar() 2013-07-07 22:57:45 +02:00
Victor Stinner 1a7425f67a Issue #18203: Replace malloc() with PyMem_RawMalloc() at Python initialization
* Replace malloc() with PyMem_RawMalloc()
* Replace PyMem_Malloc() with PyMem_RawMalloc() where the GIL is not held.
* _Py_char2wchar() now returns a buffer allocated by PyMem_RawMalloc(), instead
  of PyMem_Malloc()
2013-07-07 16:25:15 +02:00
Christian Heimes d47802eef7 Fix ref leak in error case of unicode find, count, formatlong
CID 983315: Resource leak (RESOURCE_LEAK)
CID 983316: Resource leak (RESOURCE_LEAK)
CID 983317: Resource leak (RESOURCE_LEAK)
2013-06-29 21:33:36 +02:00
Christian Heimes d47a0456b1 Fix ref leak in error case of unicode index
CID 983319 (#1 of 2): Resource leak (RESOURCE_LEAK)
leaked_storage: Variable substring going out of scope leaks the storage it points to.
2013-06-29 21:21:37 +02:00
Christian Heimes ea71a525c3 Fix ref leak in error case of unicode rindex and rfind
CID 983320: Resource leak (RESOURCE_LEAK)
CID 983321: Resource leak (RESOURCE_LEAK)
leaked_storage: Variable substring going out of scope leaks the storage it points to.
2013-06-29 21:17:34 +02:00
Christian Heimes 305e49e17e Fix memory leak in endswith
CID 1040368 (#1 of 1): Resource leak (RESOURCE_LEAK)
leaked_storage: Variable substring going out of scope leaks the storage it points to.
2013-06-29 20:41:06 +02:00
Serhiy Storchaka c89533f72f Issue #18184: PyUnicode_FromFormat() and PyUnicode_FromFormatV() now raise
OverflowError when an argument of %c format is out of range.
2013-06-23 20:21:16 +03:00
Serhiy Storchaka 8eeae2126c Issue #18184: PyUnicode_FromFormat() and PyUnicode_FromFormatV() now raise
OverflowError when an argument of %c format is out of range.
2013-06-23 20:12:14 +03:00
Benjamin Peterson 3164f5d565 merge 3.3 (#18183) 2013-06-10 09:24:01 -07:00
Benjamin Peterson 7e30373126 remove MAX_MAXCHAR because it's unsafe for computing maximum codepoitn value (see #18183) 2013-06-10 09:19:46 -07:00
Victor Stinner 9f067f490f Issue #9566: Fix compiler warning on Windows 64-bit 2013-06-05 00:21:31 +02:00
Antoine Pitrou 7ce35a1816 Issue #17237: Fix crash in the ASCII decoder on m68k. 2013-05-11 15:59:37 +02:00
Antoine Pitrou 8b0e98426d Issue #17237: Fix crash in the ASCII decoder on m68k. 2013-05-11 15:58:34 +02:00
Victor Stinner f4f24248dc Fix uninitialized value in charmap_decode_mapping() 2013-05-07 01:01:31 +02:00
Victor Stinner 8cecc8c262 Issue #7330: Implement width and precision (ex: "%5.3s") for the format string
of PyUnicode_FromFormat() function, original patch written by Ysj Ray.
2013-05-06 23:11:54 +02:00
Victor Stinner bb4503f61e Partial revert of changeset 9744b2df134c
PyUnicode_Append() cannot call directly resize_compact(): I forgot that a
string can be ready *and* not compact (a legacy string can also be ready).
2013-04-18 09:41:34 +02:00
Victor Stinner fb161b1b6d Split PyUnicode_DecodeCharmap() into subfunction for readability 2013-04-18 01:44:27 +02:00
Victor Stinner 170ca6f84b Fix bug in Unicode decoders related to _PyUnicodeWriter
Bug introduced by changesets 7ed9993d53b4 and edf029fc9591.
2013-04-18 00:25:28 +02:00
Victor Stinner 376cfa122d Fix typo in unicode_decode_call_errorhandler_writer()
Bug introduced by changeset 7ed9993d53b4.
2013-04-17 23:58:16 +02:00
Victor Stinner 8f674ccd64 Close #17694: Add minimum length to _PyUnicodeWriter
* Add also min_char attribute to _PyUnicodeWriter structure (currently unused)
 * _PyUnicodeWriter_Init() has no more argument (except the writer itself):
   min_length and overallocate must be set explicitly
 * In error handlers, only enable overallocation if the replacement string
   is longer than 1 character
 * CJK decoders don't use overallocation anymore
 * Set min_length, instead of preallocating memory using
   _PyUnicodeWriter_Prepare(), in many decoders
 * _PyUnicode_DecodeUnicodeInternal() checks for integer overflow
2013-04-17 23:02:17 +02:00
Victor Stinner 77282cb4f8 Cleanup PyUnicode_Contains()
* No need to double-check that strings are ready: test already done by
   PyUnicode_FromObject()
 * Remove useless kind variable (use kind1 instead)
2013-04-14 19:22:47 +02:00
Victor Stinner d92e078c8d Minor change: fix character in do_strip() for the ASCII case 2013-04-14 19:17:42 +02:00
Victor Stinner f033510fee Cleanup PyUnicode_Append()
* Check also that right is a Unicode object
 * call directly resize_compact() instead of unicode_resize() for a more
   explicit error handling, and to avoid testing some properties twice
   (ex: unicode_modifiable())
2013-04-14 19:13:03 +02:00
Victor Stinner 4560f9c63f PyUnicode_Join(): move use_memcpy test out of the loop to cleanup and optimize the code 2013-04-14 18:56:46 +02:00
Victor Stinner 55c08781e8 Optimize repr(str): use _PyUnicode_FastCopyCharacters() when no character is escaped 2013-04-14 18:45:39 +02:00
Victor Stinner af03757d20 Optimize ascii(str): don't encode/decode repr if repr is already ASCII 2013-04-14 18:44:10 +02:00
Victor Stinner 8a1a6cffd6 Add _PyUnicodeWriter_WriteCharInline() 2013-04-14 02:35:33 +02:00
Serhiy Storchaka e2cef885a2 Issue #16061: Speed up str.replace() for replacing 1-character strings. 2013-04-13 22:45:04 +03:00
Victor Stinner a0dd0213cc Close #17693: Rewrite CJK decoders to use the _PyUnicodeWriter API instead of
the legacy Py_UNICODE API.

Add also a new _PyUnicodeWriter_WriteChar() function.
2013-04-11 22:09:04 +02:00
Victor Stinner 247109e74d Issue #17615: On Windows (VS2010), Performances of wmemcmp() to compare Unicode
strings are not convincing. For UCS2 (16-bit wchar_t type), use a dummy loop
instead of wmemcmp(). The dummy loop is as fast, or a little bit faster.

wchar_t is only 16-bit long on Windows. wmemcmp() is still used for 32-bit
wchar_t.
2013-04-09 23:53:26 +02:00
Victor Stinner 0cff4b16d9 replace(): only call PyUnicode_DATA(u) once 2013-04-09 22:52:48 +02:00
Victor Stinner cc7af72192 Write super-fast version of str.strip(), str.lstrip() and str.rstrip() for pure ASCII 2013-04-09 22:39:24 +02:00
Victor Stinner f50a4e9bc9 Don't calls macros in PyUnicode_WRITE() parameters
PyUnicode_WRITE() expands some parameters twice or more.
2013-04-09 22:38:52 +02:00
Victor Stinner 9c79e41fc5 Fix do_strip(): don't call PyUnicode_READ() in Py_UNICODE_ISSPACE() to not call
it twice
2013-04-09 22:21:08 +02:00
Victor Stinner b3a6014504 Fix _PyUnicode_XStrip()
Inline the BLOOM_MEMBER() to only call PyUnicode_READ() only once (per loop
iteration). Store also the length of the seperator in a variable to avoid calls
to PyUnicode_GET_LENGTH().
2013-04-09 22:19:21 +02:00
Victor Stinner 63d5c1a14a Optimize PyUnicode_DecodeCharmap()
Avoid expensive PyUnicode_READ() and PyUnicode_WRITE(), manipulate pointers
instead.
2013-04-09 22:13:33 +02:00
Victor Stinner a85af502a4 Optimize make_bloom_mask(), used by str.strip(), str.lstrip() and str.rstrip()
Write specialized functions per Unicode kind to avoid the expensive
PyUnicode_READ() macro.
2013-04-09 21:53:54 +02:00