Brett Cannon
a571120410
Issue #27182 : Add support for path-like objects to PyUnicode_FSDecoder().
2016-09-06 19:36:01 -07:00
Victor Stinner
62ec3317d2
Optimize unicode_escape and raw_unicode_escape
...
Issue #16334 . Patch written by Serhiy Storchaka.
2016-09-06 17:04:34 -07:00
Victor Stinner
2740e46089
_PyUnicodeWriter: assert that max character <= MAX_UNICODE
2016-09-06 16:58:36 -07:00
Brett Cannon
ec6ce879c7
Issue #26027 : Support path-like objects in PyUnicode-FSConverter().
...
This is to add support for os.exec*() and os.spawn*() functions. Part
of PEP 519.
2016-09-06 15:50:29 -07:00
Benjamin Peterson
9b3d77052f
replace Python aliases for standard integer types with the standard integer types ( #17884 )
2016-09-06 13:24:00 -07:00
Serhiy Storchaka
ea525a2d1a
Issue #27078 : Added BUILD_STRING opcode. Optimized f-strings evaluation.
2016-09-06 22:07:53 +03:00
Benjamin Peterson
af580dff4a
replace PY_LONG_LONG with long long
2016-09-06 10:46:49 -07:00
Benjamin Peterson
ed4aa83ff7
require a long long data type ( closes #27961 )
2016-09-05 17:44:18 -07:00
Victor Stinner
942889aae2
Issue #27938 : Add a fast-path for us-ascii encoding
...
Other changes:
* Rewrite _Py_normalize_encoding() as a C implementation of
encodings.normalize_encoding(). For example, " utf-8 " is now normalized to
"utf_8". So the fast path is now used for more name variants of the same
encoding.
* Avoid strcpy() when encoding is NULL: call directly the UTF-8 codec
2016-09-05 15:40:10 -07:00
Victor Stinner
1a05d6c04d
PEP 7 style for if/else in C
...
Add also a newline for readability in normalize_encoding().
2016-09-02 12:12:23 +02:00
Raymond Hettinger
15f44ab043
Issue #27895 : Spelling fixes (Contributed by Ville Skyttä).
2016-08-30 10:47:49 -07:00
Serhiy Storchaka
febc332056
Issue #26754 : Undocumented support of general bytes-like objects
...
as path in compile() and similar functions is now deprecated.
2016-08-06 23:29:29 +03:00
Berker Peksag
ced8d4c6eb
Issue #27454 : Use PyDict_SetDefault in PyUnicode_InternInPlace
...
Patch by INADA Naoki.
2016-07-25 04:40:39 +03:00
Serhiy Storchaka
f95de0e8cc
Issue #26754 : PyUnicode_FSDecoder() accepted a filename argument encoded as
...
an iterable of integers. Now only strings and byte-like objects are accepted.
2016-06-18 13:56:16 +03:00
Serhiy Storchaka
9305d83425
Issue #26754 : PyUnicode_FSDecoder() accepted a filename argument encoded as
...
an iterable of integers. Now only strings and byte-like objects are accepted.
2016-06-18 13:53:36 +03:00
Martin Panter
0b7d84de6b
Issue #27171 : Merge typo fixes from 3.5
2016-06-02 10:11:18 +00:00
Martin Panter
e26da7c03a
Issue #27171 : Fix typos in documentation, comments, and test function names
2016-06-02 10:07:09 +00:00
Serhiy Storchaka
dd40fc3e57
Issue #26765 : Moved common code and docstrings for bytes and bytearray methods
...
to bytes_methods.c.
2016-05-04 22:23:26 +03:00
Martin Panter
cda80940ed
Issue #15984 : Merge PyUnicode doc from 3.5
2016-04-15 02:27:11 +00:00
Martin Panter
6245cb3c01
Correct “an” → “a” with “Unicode”, “user”, “UTF”, etc
...
This affects documentation, code comments, and a debugging messages.
2016-04-15 02:14:19 +00:00
Serhiy Storchaka
21a663ea28
Issue #26057 : Got rid of nonneeded use of PyUnicode_FromObject().
2016-04-13 15:37:23 +03:00
Serhiy Storchaka
f01e408c16
Issue #26200 : Added Py_SETREF and replaced Py_XSETREF with Py_SETREF
...
in places where Py_DECREF was used.
2016-04-10 18:12:01 +03:00
Serhiy Storchaka
57a01d3a0e
Issue #26200 : Added Py_SETREF and replaced Py_XSETREF with Py_SETREF
...
in places where Py_DECREF was used.
2016-04-10 18:05:40 +03:00
Serhiy Storchaka
ec39756960
Issue #22570 : Renamed Py_SETREF to Py_XSETREF.
2016-04-06 09:50:03 +03:00
Serhiy Storchaka
48842714b9
Issue #22570 : Renamed Py_SETREF to Py_XSETREF.
2016-04-06 09:45:48 +03:00
Serhiy Storchaka
ab479c49d3
Issue #26494 : Fixed crash on iterating exhausting iterators.
...
Affected classes are generic sequence iterators, iterators of str, bytes,
bytearray, list, tuple, set, frozenset, dict, OrderedDict, corresponding
views and os.scandir() iterator.
2016-03-30 20:41:15 +03:00
Serhiy Storchaka
fbb1c5ee06
Issue #26494 : Fixed crash on iterating exhausting iterators.
...
Affected classes are generic sequence iterators, iterators of str, bytes,
bytearray, list, tuple, set, frozenset, dict, OrderedDict, corresponding
views and os.scandir() iterator.
2016-03-30 20:40:02 +03:00
Victor Stinner
f2192855dd
Merge 3.5
2016-03-01 22:07:53 +01:00
Victor Stinner
337986740f
Issue #26464 : Fix unicode_fast_translate() again
...
Initialize i variable if the string is non-ASCII.
2016-03-01 21:59:58 +01:00
Victor Stinner
3d9d77a3dc
Merge 3.5
2016-03-01 21:30:50 +01:00
Victor Stinner
6c9aa8f2bf
Fix str.translate()
...
Issue #26464 : Fix str.translate() when string is ASCII and first replacements
removes character, but next replacement uses a non-ASCII character or a string
longer than 1 character. Regression introduced in Python 3.5.0.
2016-03-01 21:30:30 +01:00
Victor Stinner
5b96f17b1c
Merge 3.5
2016-01-27 17:01:13 +01:00
Victor Stinner
5bc03a6d4d
Fix resize_compact()
...
Issue #26217 : resize_compact() must set wstr_length to 0 after freeing the wstr
string. Otherwise, an assertion fails in _PyUnicode_CheckConsistency().
2016-01-27 16:56:53 +01:00
Serhiy Storchaka
726fc139a5
Issue #20440 : More use of Py_SETREF.
...
This patch is manually crafted and contains changes that couldn't be handled
automatically.
2015-12-27 15:44:33 +02:00
Serhiy Storchaka
191321d11b
Issue #20440 : More use of Py_SETREF.
...
This patch is manually crafted and contains changes that couldn't be handled
automatically.
2015-12-27 15:41:34 +02:00
Serhiy Storchaka
ef1585eb9a
Issue #25923 : Added more const qualifiers to signatures of static and private functions.
2015-12-25 20:01:53 +02:00
Serhiy Storchaka
2d06e84455
Issue #25923 : Added the const qualifier to static constant arrays.
2015-12-25 19:53:18 +02:00
Serhiy Storchaka
f006940351
Issue #20440 : Massive replacing unsafe attribute setting code with special
...
macro Py_SETREF.
2015-12-24 10:39:57 +02:00
Serhiy Storchaka
5a57ade58e
Issue #20440 : Massive replacing unsafe attribute setting code with special
...
macro Py_SETREF.
2015-12-24 10:35:59 +02:00
Serhiy Storchaka
9b3a2eec1c
Issues #25890 , #25891 , #25892 : Removed unused variables in Windows code.
...
Reported by Alexander Riccio.
2015-12-18 10:03:13 +02:00
Serhiy Storchaka
7c088a9b5c
Issue #25709 : Fixed problem with in-place string concatenation and utf-8 cache.
2015-12-03 01:05:52 +02:00
Serhiy Storchaka
6648bf5661
Issue #25709 : Fixed problem with in-place string concatenation and utf-8 cache.
2015-12-03 01:04:37 +02:00
Serhiy Storchaka
31b9410654
Issue #25709 : Fixed problem with in-place string concatenation and utf-8 cache.
2015-12-03 01:02:03 +02:00
Serhiy Storchaka
7aa690860e
Issue #25709 : Fixed problem with in-place string concatenation and utf-8 cache.
2015-12-03 01:02:03 +02:00
Benjamin Peterson
d798dc1034
merge 3.5 ( #25630 )
2015-11-15 21:57:50 -08:00
Benjamin Peterson
a4d33b3428
make the PyUnicode_FSConverter cleanup set the decrefed argument to NULL ( closes #25630 )
2015-11-15 21:57:39 -08:00
Serhiy Storchaka
413fdcea21
Issue #24821 : Refactor STRINGLIB(fastsearch_memchr_1char) and split it on
...
STRINGLIB(find_char) and STRINGLIB(rfind_char) that can be used independedly
without special preconditions.
2015-11-14 15:42:17 +02:00
Serhiy Storchaka
4a7c03aab4
Issue #25523 : Merge a-to-an corrections from 3.5.
2015-11-02 14:44:29 +02:00
Serhiy Storchaka
a84f6c3dd3
Issue #25523 : Merge a-to-an corrections from 3.4.
2015-11-02 14:39:05 +02:00
Serhiy Storchaka
d65c9496da
Issue #25523 : Further a-to-an corrections.
2015-11-02 14:10:23 +02:00
Victor Stinner
358af13526
Issue #25353 : Optimize unicode escape and raw unicode escape encoders to use
...
the new _PyBytesWriter API.
2015-10-12 22:36:57 +02:00
Victor Stinner
6c2cdae9e6
Writer APIs: use empty string singletons
...
Modify _PyBytesWriter_Finish() and _PyUnicodeWriter_Finish() to return the
empty bytes/Unicode string if the string is empty.
2015-10-12 13:29:43 +02:00
Victor Stinner
6bd525b656
Optimize error handlers of ASCII and Latin1 encoders when the replacement
...
string is pure ASCII: use _PyBytesWriter_WriteBytes(), don't check individual
character.
Cleanup unicode_encode_ucs1():
* Rename repunicode to rep
* Clear rep object on error
* Factorize code between bytes and unicode path
2015-10-09 13:10:05 +02:00
Victor Stinner
ce179bf6ba
Add _PyBytesWriter_WriteBytes() to factorize the code
2015-10-09 12:57:22 +02:00
Victor Stinner
ad7715891e
_PyBytesWriter: simplify code to avoid "prealloc" parameters
...
Substract preallocate bytes from min_size before calling
_PyBytesWriter_Prepare().
2015-10-09 12:38:53 +02:00
Victor Stinner
3fa36ff5e4
Issue #25318 : Fix backslashreplace()
...
Fix code to estimate the needed space.
2015-10-09 03:37:11 +02:00
Victor Stinner
797485e101
Issue #25318 : Avoid sprintf() in backslashreplace()
...
Rewrite backslashreplace() to be closer to PyCodec_BackslashReplaceErrors().
Add also unit tests for non-BMP characters.
2015-10-09 03:17:30 +02:00
Victor Stinner
0016507c16
Issue #25318 : Move _PyBytesWriter to bytesobject.c
...
Declare also the private API in bytesobject.h.
2015-10-09 01:53:21 +02:00
Victor Stinner
e7bf86cd7d
Optimize backslashreplace error handler
...
Issue #25318 : Optimize backslashreplace and xmlcharrefreplace error handlers in
UTF-8 encoder. Optimize also backslashreplace error handler for ASCII and
Latin1 encoders.
Use the new _PyBytesWriter API to optimize these error handlers for the
encoders. It avoids to create an exception and call the slow implementation of
the error handler.
2015-10-09 01:39:28 +02:00
Victor Stinner
fdfbf78114
Issue #25318 : Add _PyBytesWriter API
...
Add a new private API to optimize Unicode encoders. It uses a small buffer
allocated on the stack and supports overallocation.
Use _PyBytesWriter API for UCS1 (ASCII and Latin1) and UTF-8 encoders. Enable
overallocation for the UTF-8 encoder with error handlers.
unicode_encode_ucs1(): initialize collend to collstart+1 to not check the
current character twice, we already know that it is not ASCII.
2015-10-09 00:33:49 +02:00
Victor Stinner
74e8fac3c8
Issue #25301 : Fix compatibility with ISO C90
2015-10-05 13:49:26 +02:00
Victor Stinner
1d65d9192d
Issue #25301 : The UTF-8 decoder is now up to 15 times as fast for error
...
handlers: ``ignore``, ``replace`` and ``surrogateescape``.
2015-10-05 13:43:50 +02:00
Victor Stinner
eb36fdaad8
Fix _PyUnicodeWriter_PrepareKind()
...
Initialize kind to 0 (PyUnicode_WCHAR_KIND) to ensure that
_PyUnicodeWriter_PrepareKind() handles correctly read-only buffer: copy the
buffer.
2015-10-03 01:55:51 +02:00
Serhiy Storchaka
29e68edbf4
Issue #24848 : Fixed bugs in UTF-7 decoding of misformed data:
...
1. Non-ASCII bytes were accepted after shift sequence.
2. A low surrogate could be emitted in case of error in high surrogate.
3. In some circumstances the '\xfd' character was produced instead of the
replacement character '\ufffd' (due to a bug in _PyUnicodeWriter).
2015-10-02 13:14:03 +03:00
Serhiy Storchaka
58c8f2bb6d
Issue #24848 : Fixed bugs in UTF-7 decoding of misformed data:
...
1. Non-ASCII bytes were accepted after shift sequence.
2. A low surrogate could be emitted in case of error in high surrogate.
3. In some circumstances the '\xfd' character was produced instead of the
replacement character '\ufffd' (due to a bug in _PyUnicodeWriter).
2015-10-02 13:13:14 +03:00
Serhiy Storchaka
28b21e50c8
Issue #24848 : Fixed bugs in UTF-7 decoding of misformed data:
...
1. Non-ASCII bytes were accepted after shift sequence.
2. A low surrogate could be emitted in case of error in high surrogate.
2015-10-02 13:07:28 +03:00
Victor Stinner
3222da26fe
Make _PyUnicode_TranslateCharmap() symbol private
...
unicodeobject.h exposes PyUnicode_TranslateCharmap() and PyUnicode_Translate().
2015-10-01 22:07:32 +02:00
Victor Stinner
01ada3996b
Issue #25267 : The UTF-8 encoder is now up to 75 times as fast for error
...
handlers: ``ignore``, ``replace``, ``surrogateescape``, ``surrogatepass``.
Patch co-written with Serhiy Storchaka.
2015-10-01 21:54:51 +02:00
Victor Stinner
c3713e9706
Optimize ascii/latin1+surrogateescape encoders
...
Issue #25227 : Optimize ASCII and latin1 encoders with the ``surrogateescape``
error handler: the encoders are now up to 3 times as fast.
Initial patch written by Serhiy Storchaka.
2015-09-29 12:32:13 +02:00
Victor Stinner
0030cd52da
Issue #25227 : Cleanup unicode_encode_ucs1() error handler
...
* Change limit type from unsigned int to Py_UCS4, to use the same type than the
"ch" variable (an Unicode character).
* Reuse ch variable for _Py_ERROR_XMLCHARREFREPLACE
* Add some newlines for readability
2015-09-24 14:45:00 +02:00
Victor Stinner
54385b206d
Issue #24870 : revert unwanted change
...
Sorry, I pushed the patch on the UTF-8 decoder by mistake :-(
2015-09-22 10:46:52 +02:00
Victor Stinner
5ebae87628
Issue #25207 , #14626 : Fix my commit.
...
It doesn't work to use #define XXX defined(YYY)" and then "#ifdef XXX"
to check YYY.
2015-09-22 01:29:33 +02:00
Victor Stinner
6174474bea
_PyUnicodeWriter_PrepareInternal(): make the assertion more strict
2015-09-22 01:01:17 +02:00
Victor Stinner
ca9381ea01
Issue #24870 : Add _PyUnicodeWriter_PrepareKind() macro
...
Add a macro which ensures that the writer has at least the requested kind.
2015-09-22 00:58:32 +02:00
Victor Stinner
5014920cb7
Issue #24870 : Reuse the new _Py_error_handler enum
...
Factorize code with the new get_error_handler() function.
Add some empty lines for readability.
2015-09-22 00:26:54 +02:00
Victor Stinner
f96418de05
Issue #24870 : Optimize the ASCII decoder for error handlers: surrogateescape,
...
ignore and replace. Initial patch written by Naoki Inada.
The decoder is now up to 60 times as fast for these error handlers.
Add also unit tests for the ASCII decoder.
2015-09-21 23:06:27 +02:00
Zachary Ware
070bd62cfa
Closes #21279 : Merge with 3.5
2015-08-06 00:05:13 -05:00
Zachary Ware
d987a81d29
Issue #21279 : Merge with 3.4
2015-08-06 00:04:23 -05:00
Zachary Ware
79b98df023
Issue #21279 : Flesh out str.translate docs
...
Initial patch by Kinga Farkas, Martin Panter, and John Posner.
2015-08-05 23:54:15 -05:00
Raymond Hettinger
ac2ef65c32
Make the unicode equality test an external function rather than in-lining it.
...
The real benefit of the unicode specialized function comes from
bypassing the overhead of PyObject_RichCompareBool() and not
from being in-lined (especially since there was almost no shared
data between the caller and callee). Also, the in-lining was
having a negative effect on code generation for the callee.
2015-07-04 16:04:44 -07:00
Serhiy Storchaka
d4ea03c785
Issue #24284 : The startswith and endswith methods of the str class no longer
...
return True when finding the empty string and the indexes are completely out
of range.
2015-05-31 09:15:51 +03:00
Antoine Pitrou
873e0df946
Fix some compilation warnings when using gcc (-Wmaybe-uninitialized).
2015-05-19 21:06:04 +02:00
Antoine Pitrou
f6d1f1fa8a
Fix some compilation warnings when using gcc (-Wmaybe-uninitialized).
2015-05-19 21:04:33 +02:00
Serhiy Storchaka
0d4df752ac
Issue #15027 : The UTF-32 encoder is now 3x to 7x faster.
2015-05-12 23:12:45 +03:00
Serhiy Storchaka
7e9d1d1a1b
Issue #23908 : os functions now reject paths with embedded null character
...
on Windows instead of silently truncate them.
Removed no longer used _PyUnicode_HasNULChars().
2015-04-20 10:12:28 +03:00
Serhiy Storchaka
1009bf18b3
Issue #23501 : Argumen Clinic now generates code into separate files by default.
2015-04-03 23:53:51 +03:00
Victor Stinner
1912b39def
_PyUnicodeWriter_WriteStr() now checks that the input string is consistent
...
in debug mode to detect bugs earlier.
_PyUnicodeWriter_Finish() doesn't check if the read only string is consistent,
whereas it does check consistency for strings built by itself.
2015-03-26 09:37:23 +01:00
Serhiy Storchaka
d9d769fcdd
Issue #23573 : Increased performance of string search operations (str.find,
...
str.index, str.count, the in operator, str.split, str.partition) with
arguments of different kinds (UCS1, UCS2, UCS4).
2015-03-24 21:55:47 +02:00
Victor Stinner
f50e187724
Fix compiler warnings: comparison between signed and unsigned numbers
2015-03-20 11:32:24 +01:00
Victor Stinner
0c39b1b970
Initialize variables to prevent GCC warnings
2015-03-18 15:02:06 +01:00
Benjamin Peterson
e5a853c390
use PyMem_NEW to detect overflow ( closes #23362 )
2015-03-02 13:23:25 -05:00
Steve Dower
3e96f324dc
Issue #23451 : Update pyconfig.h for Windows to require Vista headers and remove unnecessary version checks.
2015-03-02 08:01:10 -08:00
Serhiy Storchaka
78a8249127
Issue #23490 : Fixed possible crashes related to interoperability between
...
old-style and new API for string with 2**30-1 characters.
2015-02-20 21:34:39 +02:00
Serhiy Storchaka
e55181f517
Issue #23490 : Fixed possible crashes related to interoperability between
...
old-style and new API for string with 2**30-1 characters.
2015-02-20 21:34:06 +02:00
Serhiy Storchaka
4d0d982985
Issue #23446 : Use PyMem_New instead of PyMem_Malloc to avoid possible integer
...
overflows. Added few missed PyErr_NoMemory().
2015-02-16 13:33:32 +02:00
Serhiy Storchaka
1a1ff29659
Issue #23446 : Use PyMem_New instead of PyMem_Malloc to avoid possible integer
...
overflows. Added few missed PyErr_NoMemory().
2015-02-16 13:28:22 +02:00
Serhiy Storchaka
4dbc305002
Issue #23055 : Fixed a buffer overflow in PyUnicode_FromFormatV. Analysis
...
and fix by Guido Vranken.
2015-01-27 22:18:46 +02:00
Victor Stinner
29dacf2e97
Issue #15859 : PyUnicode_EncodeFSDefault(), PyUnicode_EncodeMBCS() and
...
PyUnicode_EncodeCodePage() now raise an exception if the object is not an
Unicode object. For PyUnicode_EncodeFSDefault(), it was already the case on
platforms other than Windows. Patch written by Campbell Barton.
2015-01-26 16:41:32 +01:00
Serhiy Storchaka
bbd3aa8ece
Issue #23321 : Fixed a crash in str.decode() when error handler returned
...
replacment string longer than mailformed input data.
2015-01-26 01:24:31 +02:00
Serhiy Storchaka
7e4b9057b3
Issue #23321 : Fixed a crash in str.decode() when error handler returned
...
replacment string longer than mailformed input data.
2015-01-26 01:22:54 +02:00
Ethan Furman
b95b56150f
Issue20284: Implement PEP461
2015-01-23 20:05:18 -08:00
Serhiy Storchaka
82e07b92b3
Issue #23181 : More "codepoint" -> "code point".
2015-01-18 11:33:31 +02:00
Serhiy Storchaka
d3faf43f9b
Issue #23181 : More "codepoint" -> "code point".
2015-01-18 11:28:37 +02:00
Serhiy Storchaka
b757c83ec6
Issue #22581 : Use more "bytes-like object" throughout the docs and comments.
2014-12-05 22:25:22 +02:00
Serhiy Storchaka
133b11b566
Issue #22975 : Close block at right place.
2014-12-01 18:56:28 +02:00
Serhiy Storchaka
92bf919ed0
Issue #22581 : Use more "bytes-like object" throughout the docs and comments.
2014-12-05 22:26:10 +02:00
Serhiy Storchaka
407249c62b
Issue #22975 : Close block at right place.
2014-12-01 18:56:54 +02:00
Victor Stinner
3aa979e0cd
Issue #20948 : Inline makefmt() in unicode_fromformat_arg()
2014-11-18 21:40:51 +01:00
Antoine Pitrou
b6dc9b7554
Fixed signed/unsigned comparison warning
2014-10-15 23:14:53 +02:00
Antoine Pitrou
4e334241b7
Fixed signed/unsigned comparison warning
2014-10-15 23:14:53 +02:00
Benjamin Peterson
736982d36d
merge 3.4 ( closes #22643 )
2014-10-15 12:17:47 -04:00
Benjamin Peterson
9c422f3c3d
merge 3.3
2014-10-15 12:17:33 -04:00
Benjamin Peterson
1e211ff10d
it suffices to check for PY_SSIZE_T_MAX overflow ( #22643 )
2014-10-15 12:17:21 -04:00
Benjamin Peterson
315aa40403
Merge 3.4
2014-10-15 11:51:17 -04:00
Benjamin Peterson
60d7a73194
Merge 3.3
2014-10-15 11:51:12 -04:00
Benjamin Peterson
c0e64f5027
make sure length is unsigned
2014-10-15 11:51:05 -04:00
Benjamin Peterson
6925264334
merge 3.4 ( #22643 )
2014-10-15 11:49:15 -04:00
Benjamin Peterson
1cbb3fe775
merge 3.3 ( #22643 )
2014-10-15 11:48:41 -04:00
Benjamin Peterson
e1bd38c03c
fix integer overflow in unicode case operations ( closes #22643 )
2014-10-15 11:47:36 -04:00
Gregory P. Smith
8486f9b134
Fix "warning: comparison between signed and unsigned integer expressions"
...
-Wsign-compare warnings in unicodeobject.c. These were all a result
of sizeof() being unsigned and being compared to a Py_ssize_t.
Not actual problems.
2014-09-30 00:33:24 -07:00
Benjamin Peterson
fd97a6fb2d
merge 3.4 ( #22520 )
2014-09-29 23:02:56 -04:00
Benjamin Peterson
43030ee780
merge 3.3 ( #22520 )
2014-09-29 23:02:35 -04:00
Benjamin Peterson
736b8012b4
prevent overflow in unicode_repr ( closes #22520 )
2014-09-29 23:02:15 -04:00
Benjamin Peterson
10e4b2545e
merge 3.4 ( closes #22518 )
2014-09-29 18:53:58 -04:00
Benjamin Peterson
2b76ce6d27
merge 3.3 ( closes #22518 )
2014-09-29 18:50:06 -04:00
Benjamin Peterson
a1c1be4e03
cleanup overflowing handling in unicode_decode_call_errorhandler and unicode_encode_ucs1 ( closes #22518 )
2014-09-29 18:18:57 -04:00
Serhiy Storchaka
20b39b27d9
Removed redundant casts to `char *`.
...
Corresponding functions now accept `const char *` (issue #1772673 ).
2014-09-28 11:27:24 +03:00
Benjamin Peterson
fa5021699a
Merge 3.3
2014-10-15 23:58:32 -04:00
Serhiy Storchaka
d8a1447c99
Issue #22215 : Now ValueError is raised instead of TypeError when str or bytes
...
argument contains not permitted null character or byte.
2014-09-06 20:07:17 +03:00
Victor Stinner
12174a5dca
Issue #22156 : Fix "comparison between signed and unsigned integers" compiler
...
warnings in the Objects/ subdirectory.
PyType_FromSpecWithBases() and PyType_FromSpec() now reject explicitly negative
slot identifiers.
2014-08-15 23:17:38 +02:00
Victor Stinner
f6a271ae98
Issue #18395 : Rename ``_Py_char2wchar()`` to :c:func:`Py_DecodeLocale`, rename
...
``_Py_wchar2char()`` to :c:func:`Py_EncodeLocale`, and document these
functions.
2014-08-01 12:28:48 +02:00
Victor Stinner
e1f17c6c0b
unicodeobject.c: fix a compiler warning on Windows 64 bits
2014-07-25 14:03:03 +02:00
Victor Stinner
c68b7fba86
(Merge 3.4) Issue #21892 , #21893 : Partial revert of changeset 4f55e802baf0,
...
PyErr_Format() uses "%zd" for Py_ssize_t, not PY_FORMAT_SIZE_T
2014-07-04 22:50:13 +02:00
Victor Stinner
a33bce0945
Issue #21892 , #21893 : Partial revert of changeset 4f55e802baf0, PyErr_Format()
...
uses "%zd" for Py_ssize_t, not PY_FORMAT_SIZE_T
2014-07-04 22:47:46 +02:00
Victor Stinner
9f43505f3d
(Merge 3.4) Closes #21892 , #21893 : Use PY_FORMAT_SIZE_T instead of %zi or %zu
...
to format C size_t, because %zi/%u is not supported on all platforms.
2014-07-01 08:57:54 +02:00
Victor Stinner
293f3f526d
Closes #21892 , #21893 : Use PY_FORMAT_SIZE_T instead of %zi or %zu to format C
...
size_t, because %zi/%u is not supported on all platforms.
2014-07-01 08:57:10 +02:00
Serhiy Storchaka
48070c1248
Issue #23803 : Fixed str.partition() and str.rpartition() when a separator
...
is wider then partitioned string.
2015-03-29 19:21:02 +03:00
Benjamin Peterson
92ce1b4392
merge 3.3 ( #23362 )
2015-03-02 13:23:41 -05:00
Victor Stinner
4dd25256e2
Issue #21118 : PyLong_AS_LONG() result type is long
...
Even if PyLong_AS_LONG() cannot fail, I prefer to use the right type.
2014-04-08 09:14:21 +02:00
Benjamin Peterson
1365de764e
fix reference leaks in the translate fast path ( closes #21175 )
...
Patch by Josh Rosenberg.
2014-04-07 20:15:41 -04:00
Victor Stinner
872b291b96
Issue #21118 : Optimize also str.translate() for ASCII => ASCII deletion
2014-04-05 14:27:07 +02:00
Victor Stinner
4ff33af257
Issue #21118 : Add unit test for invalid character replacement (code point higher than U+10ffff)
2014-04-05 11:56:37 +02:00
Victor Stinner
89a76abf20
Issue #21118 : Optimize str.translate() for ASCII => ASCII translation
2014-04-05 11:44:04 +02:00
Victor Stinner
8a4422e78d
Issue #21118 : Remove unused variable
2014-04-05 00:15:52 +02:00
Victor Stinner
1194ea020c
Issue #21118 : Use _PyUnicodeWriter API in str.translate() to simplify and
...
factorize the code
2014-04-04 19:37:40 +02:00
Ethan Furman
9ab748013b
Issue19995: more informative error message; spelling corrections; use operator.mod instead of __mod__
2014-03-21 06:38:46 -07:00
Ethan Furman
38d872ee5d
Issue19995: passing a non-int to %o, %c, %x, or %X now raises an exception
2014-03-19 08:38:52 -07:00
Victor Stinner
7d00cc1a64
Issue #20574 : Implement incremental decoder for cp65001 code
...
(Windows code page 65001, Microsoft UTF-8).
2014-03-17 23:08:06 +01:00
Kristján Valur Jónsson
25dded041f
Make the various iterators' "setstate" sliently and consistently clip the
...
index. This avoids the possibility of setting an iterator to an invalid
state.
2014-03-05 13:47:57 +00:00
Kristján Valur Jónsson
c5cc5011ac
Make the various iterators' "setstate" sliently and consistently clip the
...
index. This avoids the possibility of setting an iterator to an invalid
state.
2014-03-05 15:23:07 +00:00
Serhiy Storchaka
94ee389308
Issue #19619 : Blacklist non-text codecs in method API
...
str.encode, bytes.decode and bytearray.decode now use an
internal API to throw LookupError for known non-text encodings,
rather than attempting the encoding or decoding operation and
then throwing a TypeError for an unexpected output type.
The latter mechanism remains in place for third party non-text
encodings.
Backported changeset d68df99d7a57.
2014-02-24 14:43:03 +02:00
Benjamin Peterson
4267869ad8
merge 3.3 ( #20507 )
2014-02-15 13:03:20 -05:00
Benjamin Peterson
9743b2c2b5
give non-iterable TypeError a message ( closes #20507 )
2014-02-15 13:02:52 -05:00
Serhiy Storchaka
dfe98a102e
Issue #20437 : Fixed 22 potential bugs when deleting objects references.
2014-02-09 13:46:20 +02:00
Serhiy Storchaka
505ff755d7
Issue #20437 : Fixed 21 potential bugs when deleting objects references.
2014-02-09 13:33:53 +02:00
Larry Hastings
2623c8c23c
Issue #20530 : Argument Clinic's signature format has been revised again.
...
The new syntax is highly human readable while still preventing false
positives. The syntax also extends Python syntax to denote "self" and
positional-only parameters, allowing inspect.Signature objects to be
totally accurate for all supported builtins in Python 3.4.
2014-02-08 22:15:29 -08:00
Serhiy Storchaka
6cbf151032
Issue #20538 : UTF-7 incremental decoder produced inconsistant string when
...
input was truncated in BASE64 section.
2014-02-08 14:06:33 +02:00
Serhiy Storchaka
016a3f33a5
Issue #20538 : UTF-7 incremental decoder produced inconsistant string when
...
input was truncated in BASE64 section.
2014-02-08 14:01:29 +02:00
Larry Hastings
581ee3618c
Issue #20326 : Argument Clinic now uses a simple, unique signature to
...
annotate text signatures in docstrings, resulting in fewer false
positives. "self" parameters are also explicitly marked, allowing
inspect.Signature() to authoritatively detect (and skip) said parameters.
Issue #20326 : Argument Clinic now generates separate checksums for the
input and output sections of the block, allowing external tools to verify
that the input has not changed (and thus the output is not out-of-date).
2014-01-28 05:00:08 -08:00
Larry Hastings
c20472640c
Issue #20390 : Small fixes and improvements for Argument Clinic.
2014-01-25 20:43:29 -08:00
Larry Hastings
5c66189e88
Issue #20189 : Four additional builtin types (PyTypeObject,
...
PyMethodDescr_Type, _PyMethodWrapper_Type, and PyWrapperDescr_Type)
have been modified to provide introspection information for builtins.
Also: many additional Lib, test suite, and Argument Clinic fixes.
2014-01-24 06:17:25 -08:00
Ethan Furman
a70805e1fa
Issue19995: fixed typo; switched from test.support.check_warnings to assertWarns
2014-01-12 08:42:35 -08:00
Ethan Furman
f9bba9c67f
Issue19995: issue deprecation warning for non-integer values to %c, %o, %x, %X
2014-01-11 23:20:58 -08:00
Larry Hastings
61272b77b0
Issue #19273 : The marker comments Argument Clinic uses have been changed
...
to improve readability.
2014-01-07 12:41:53 -08:00
Ethan Furman
df3ed242c0
Issue19995: %o, %x, %X now only accept ints
2014-01-05 06:50:30 -08:00
Serhiy Storchaka
3079328d29
Reverted changeset b72c5573c5e7 (issue #15027 ).
2014-01-04 22:44:01 +02:00
Serhiy Storchaka
583a93943c
Issue #15027 : Rewrite the UTF-32 encoder. It is now 1.6x to 3.5x faster.
2014-01-04 19:25:37 +02:00
Victor Stinner
fa4e68d425
Remove deadcode (HASH macro is no more defined)
2014-01-03 17:42:18 +01:00
Victor Stinner
92a419eea4
Remove now unused variables
2014-01-03 17:39:40 +01:00
Victor Stinner
f3b46b4a66
unicode_char() uses get_latin1_char() to get latin1 singleton characters
2014-01-03 13:16:00 +01:00
Victor Stinner
985a82a6d2
add unicode_char() in unicodeobject.c to factorize code
2014-01-03 12:53:47 +01:00
Larry Hastings
44e2eaab54
Issue #19674 : inspect.signature() now produces a correct signature
...
for some builtins.
2013-11-23 15:37:55 -08:00
Larry Hastings
ebdcb50b8a
Issue #19730 : Argument Clinic now supports all the existing PyArg
...
"format units" as legacy converters, as well as two new features:
"self converters" and the "version" directive.
2013-11-23 14:54:00 -08:00
Nick Coghlan
c72e4e6dcc
Issue #19619 : Blacklist non-text codecs in method API
...
str.encode, bytes.decode and bytearray.decode now use an
internal API to throw LookupError for known non-text encodings,
rather than attempting the encoding or decoding operation and
then throwing a TypeError for an unexpected output type.
The latter mechanism remains in place for third party non-text
encodings.
2013-11-22 22:39:36 +10:00
Christian Heimes
985ecdcfc2
ssue #19183 : Implement PEP 456 'secure and interchangeable hash algorithm'.
...
Python now uses SipHash24 on all major platforms.
2013-11-20 11:46:18 +01:00
Victor Stinner
4a58707a34
Add _PyUnicodeWriter_WriteASCIIString() function
2013-11-19 12:54:53 +01:00
Serhiy Storchaka
58cf607d13
Issue #12892 : The utf-16* and utf-32* codecs now reject (lone) surrogates.
...
The utf-16* and utf-32* encoders no longer allow surrogate code points
(U+D800-U+DFFF) to be encoded.
The utf-32* decoders no longer decode byte sequences that correspond to
surrogate code points.
The surrogatepass error handler now works with the utf-16* and utf-32* codecs.
Based on patches by Victor Stinner and Kang-Hao (Kenny) Lu.
2013-11-19 11:32:41 +02:00
Victor Stinner
6989ba0174
Issue #19581 : Change the overallocation factor of _PyUnicodeWriter on Windows
...
On Windows, a factor of 50% gives best performances.
2013-11-18 21:08:39 +01:00
Larry Hastings
ed4a1c5703
Argument Clinic: rename "self" to "module" for module-level functions.
2013-11-18 09:32:13 -08:00
Ezio Melotti
745d54d2fa
#17806 : Added keyword-argument support for "tabsize" to str/bytes.expandtabs().
2013-11-16 19:10:57 +02:00
Nick Coghlan
8b097b4ed7
Close #17828 : better handling of codec errors
...
- output type errors now redirect users to the type-neutral
convenience functions in the codecs module
- stateless errors that occur during encoding and decoding
will now be automatically wrapped in exceptions that give
the name of the codec involved
2013-11-13 23:49:21 +10:00
Victor Stinner
66b3270975
_Py_normalize_encoding(): explain how the value 6 was computed
2013-11-07 23:12:23 +01:00
Victor Stinner
df23e30bea
Fix _Py_normalize_encoding(): ensure that buffer is big enough to store "utf-8"
...
if the input string is NULL
2013-11-07 13:33:36 +01:00
Victor Stinner
ad14ccd047
Issue #19512 : add _PyUnicode_CompareWithId() function
...
_PyUnicode_CompareWithId() is faster than PyUnicode_CompareWithASCIIString()
when both strings are equal and interned.
Add also _PyId_builtins identifier for "builtins" common string.
2013-11-07 00:46:04 +01:00
Victor Stinner
21ea21ef6d
Issue #19424 : PyUnicode_CompareWithASCIIString() normalizes memcmp() result
...
to -1, 0, 1
2013-11-04 11:28:26 +01:00
Victor Stinner
f0c7b2af05
Issue #16286 : remove duplicated identity check from unicode_compare()
...
Move the test to PyUnicode_Compare()
2013-11-04 11:27:14 +01:00
Victor Stinner
fd9e44db37
Issue #16286 : optimize PyUnicode_RichCompare() for identical strings (same
...
pointer) for any operator, not only Py_EQ and Py_NE.
Code of bytes_richcompare() and PyUnicode_RichCompare() is now closer.
2013-11-04 11:23:05 +01:00
Victor Stinner
c8bc5377ac
Issue #16286 : write a new subfunction bytes_compare_eq()
...
* cleanup bytes_richcompare()
* PyUnicode_RichCompare(): replace a test with a XOR
2013-11-04 11:08:10 +01:00
Victor Stinner
e1b1592fd4
Issue #19424 : Fix a compiler warning on comparing signed/unsigned size_t
...
Patch written by Zachary Ware.
2013-11-03 13:53:12 +01:00
Victor Stinner
a6b9b071a3
Issue #19424 : Fix a compiler warning
...
memcmp() just takes raw pointers
2013-10-30 18:27:13 +01:00
Victor Stinner
602f7cf0b9
Issue #19424 : Optimize PyUnicode_CompareWithASCIIString()
...
Use fast memcmp() instead of a loop using the slow PyUnicode_READ() macro.
strlen() is still necessary to check Unicode string containing null bytes.
2013-10-29 23:31:50 +01:00
Victor Stinner
68b674c9d4
Issue #19437 : Fix _PyUnicode_New() (constructor of legacy string), set all
...
attributes before checking for error. The destructor expects all attributes to
be set. It is now safe to call Py_DECREF(unicode) in the constructor.
2013-10-29 19:31:43 +01:00
Victor Stinner
fa3ba4c3bc
Issue #18609 : Add a fast-path for "iso8859-1" encoding
...
On AIX, the locale encoding may be "iso8859-1", which was not a known syntax of
the legacy ISO 8859-1 encoding.
Using a C codec instead of a Python codec is faster but also avoids tricky
issues during Python startup or complex code.
2013-10-29 11:34:05 +01:00
Victor Stinner
a5afb58986
Issue #18408 : Fix PyUnicode_AsUTF8AndSize(), raise MemoryError exception on
...
memory allocation failure
2013-10-29 01:28:23 +01:00
Serhiy Storchaka
c679227e31
Issue #1772673 : The type of `char*` arguments now changed to `const char*`.
2013-10-19 21:03:34 +03:00
Serhiy Storchaka
55e092f545
Issue #19279 : UTF-7 decoder no more produces illegal strings.
2013-10-19 20:39:28 +03:00
Serhiy Storchaka
35804e4c63
Issue #19279 : UTF-7 decoder no more produces illegal strings.
2013-10-19 20:38:19 +03:00
Larry Hastings
3182680210
Issue #16612 : Add "Argument Clinic", a compile-time preprocessor
...
for C files to generate argument parsing code. (See PEP 436.)
2013-10-19 00:09:25 -07:00
Ethan Furman
fb13721b1b
Close #18780 : %-formatting now prints value for int subclasses with %d, %i, and %u codes.
2013-08-31 10:18:55 -07:00
Antoine Pitrou
9ed5f27266
Issue #18722 : Remove uses of the "register" keyword in C code.
2013-08-13 20:18:52 +02:00
Raymond Hettinger
e56666d17f
Silence compiler warning about an uninitialized variable
2013-08-04 11:51:03 -07:00
Raymond Hettinger
5ed1b38a7d
merge
2013-08-04 11:51:35 -07:00
Christian Heimes
b578735dff
Check return value of PyType_Ready(&EncodingMapType)
...
CID 486654
2013-07-20 14:57:28 +02:00
Christian Heimes
26532f7519
Check return value of PyType_Ready(&EncodingMapType)
...
CID 486654
2013-07-20 14:57:16 +02:00
Victor Stinner
e699e5a218
Issue #18408 : Don't check unicode consistency in _PyUnicode_HAS_UTF8_MEMORY()
...
and _PyUnicode_HAS_WSTR_MEMORY() macros
These macros are called in unicode_dealloc(), whereas the unicode object can be
"inconsistent" if the creation of the object failed.
For example, when unicode_subtype_new() fails on a memory allocation,
_PyUnicode_CheckConsistency() fails with an assertion error because data is
NULL.
2013-07-15 18:22:47 +02:00
Victor Stinner
9e6b4d715c
Issue #18408 : _PyUnicodeWriter_Finish() now clears its buffer attribute in all
...
cases, so _PyUnicodeWriter_Dealloc() can be called after finish.
2013-07-09 00:37:24 +02:00
Victor Stinner
15a0bd3965
Issue #18408 : Fix _PyUnicodeWriter_Finish(): clear writer->buffer,
...
so _PyUnicodeWriter_Dealloc() can be called on the writer after finish.
2013-07-08 22:29:55 +02:00
Victor Stinner
6f8eeee7b9
Issue #18203 : Fix _Py_DecodeUTF8_surrogateescape(), use PyMem_RawMalloc() as _Py_char2wchar()
2013-07-07 22:57:45 +02:00
Victor Stinner
1a7425f67a
Issue #18203 : Replace malloc() with PyMem_RawMalloc() at Python initialization
...
* Replace malloc() with PyMem_RawMalloc()
* Replace PyMem_Malloc() with PyMem_RawMalloc() where the GIL is not held.
* _Py_char2wchar() now returns a buffer allocated by PyMem_RawMalloc(), instead
of PyMem_Malloc()
2013-07-07 16:25:15 +02:00
Christian Heimes
d47802eef7
Fix ref leak in error case of unicode find, count, formatlong
...
CID 983315: Resource leak (RESOURCE_LEAK)
CID 983316: Resource leak (RESOURCE_LEAK)
CID 983317: Resource leak (RESOURCE_LEAK)
2013-06-29 21:33:36 +02:00
Christian Heimes
d47a0456b1
Fix ref leak in error case of unicode index
...
CID 983319 (#1 of 2): Resource leak (RESOURCE_LEAK)
leaked_storage: Variable substring going out of scope leaks the storage it points to.
2013-06-29 21:21:37 +02:00
Christian Heimes
ea71a525c3
Fix ref leak in error case of unicode rindex and rfind
...
CID 983320: Resource leak (RESOURCE_LEAK)
CID 983321: Resource leak (RESOURCE_LEAK)
leaked_storage: Variable substring going out of scope leaks the storage it points to.
2013-06-29 21:17:34 +02:00
Christian Heimes
305e49e17e
Fix memory leak in endswith
...
CID 1040368 (#1 of 1): Resource leak (RESOURCE_LEAK)
leaked_storage: Variable substring going out of scope leaks the storage it points to.
2013-06-29 20:41:06 +02:00
Serhiy Storchaka
c89533f72f
Issue #18184 : PyUnicode_FromFormat() and PyUnicode_FromFormatV() now raise
...
OverflowError when an argument of %c format is out of range.
2013-06-23 20:21:16 +03:00
Serhiy Storchaka
8eeae2126c
Issue #18184 : PyUnicode_FromFormat() and PyUnicode_FromFormatV() now raise
...
OverflowError when an argument of %c format is out of range.
2013-06-23 20:12:14 +03:00
Benjamin Peterson
3164f5d565
merge 3.3 ( #18183 )
2013-06-10 09:24:01 -07:00
Benjamin Peterson
7e30373126
remove MAX_MAXCHAR because it's unsafe for computing maximum codepoitn value (see #18183 )
2013-06-10 09:19:46 -07:00
Victor Stinner
9f067f490f
Issue #9566 : Fix compiler warning on Windows 64-bit
2013-06-05 00:21:31 +02:00
Antoine Pitrou
7ce35a1816
Issue #17237 : Fix crash in the ASCII decoder on m68k.
2013-05-11 15:59:37 +02:00
Antoine Pitrou
8b0e98426d
Issue #17237 : Fix crash in the ASCII decoder on m68k.
2013-05-11 15:58:34 +02:00
Victor Stinner
f4f24248dc
Fix uninitialized value in charmap_decode_mapping()
2013-05-07 01:01:31 +02:00
Victor Stinner
8cecc8c262
Issue #7330 : Implement width and precision (ex: "%5.3s") for the format string
...
of PyUnicode_FromFormat() function, original patch written by Ysj Ray.
2013-05-06 23:11:54 +02:00
Victor Stinner
bb4503f61e
Partial revert of changeset 9744b2df134c
...
PyUnicode_Append() cannot call directly resize_compact(): I forgot that a
string can be ready *and* not compact (a legacy string can also be ready).
2013-04-18 09:41:34 +02:00
Victor Stinner
fb161b1b6d
Split PyUnicode_DecodeCharmap() into subfunction for readability
2013-04-18 01:44:27 +02:00
Victor Stinner
170ca6f84b
Fix bug in Unicode decoders related to _PyUnicodeWriter
...
Bug introduced by changesets 7ed9993d53b4 and edf029fc9591.
2013-04-18 00:25:28 +02:00
Victor Stinner
376cfa122d
Fix typo in unicode_decode_call_errorhandler_writer()
...
Bug introduced by changeset 7ed9993d53b4.
2013-04-17 23:58:16 +02:00
Victor Stinner
8f674ccd64
Close #17694 : Add minimum length to _PyUnicodeWriter
...
* Add also min_char attribute to _PyUnicodeWriter structure (currently unused)
* _PyUnicodeWriter_Init() has no more argument (except the writer itself):
min_length and overallocate must be set explicitly
* In error handlers, only enable overallocation if the replacement string
is longer than 1 character
* CJK decoders don't use overallocation anymore
* Set min_length, instead of preallocating memory using
_PyUnicodeWriter_Prepare(), in many decoders
* _PyUnicode_DecodeUnicodeInternal() checks for integer overflow
2013-04-17 23:02:17 +02:00
Victor Stinner
77282cb4f8
Cleanup PyUnicode_Contains()
...
* No need to double-check that strings are ready: test already done by
PyUnicode_FromObject()
* Remove useless kind variable (use kind1 instead)
2013-04-14 19:22:47 +02:00
Victor Stinner
d92e078c8d
Minor change: fix character in do_strip() for the ASCII case
2013-04-14 19:17:42 +02:00
Victor Stinner
f033510fee
Cleanup PyUnicode_Append()
...
* Check also that right is a Unicode object
* call directly resize_compact() instead of unicode_resize() for a more
explicit error handling, and to avoid testing some properties twice
(ex: unicode_modifiable())
2013-04-14 19:13:03 +02:00
Victor Stinner
4560f9c63f
PyUnicode_Join(): move use_memcpy test out of the loop to cleanup and optimize the code
2013-04-14 18:56:46 +02:00
Victor Stinner
55c08781e8
Optimize repr(str): use _PyUnicode_FastCopyCharacters() when no character is escaped
2013-04-14 18:45:39 +02:00
Victor Stinner
af03757d20
Optimize ascii(str): don't encode/decode repr if repr is already ASCII
2013-04-14 18:44:10 +02:00
Victor Stinner
8a1a6cffd6
Add _PyUnicodeWriter_WriteCharInline()
2013-04-14 02:35:33 +02:00
Serhiy Storchaka
e2cef885a2
Issue #16061 : Speed up str.replace() for replacing 1-character strings.
2013-04-13 22:45:04 +03:00
Victor Stinner
a0dd0213cc
Close #17693 : Rewrite CJK decoders to use the _PyUnicodeWriter API instead of
...
the legacy Py_UNICODE API.
Add also a new _PyUnicodeWriter_WriteChar() function.
2013-04-11 22:09:04 +02:00
Victor Stinner
247109e74d
Issue #17615 : On Windows (VS2010), Performances of wmemcmp() to compare Unicode
...
strings are not convincing. For UCS2 (16-bit wchar_t type), use a dummy loop
instead of wmemcmp(). The dummy loop is as fast, or a little bit faster.
wchar_t is only 16-bit long on Windows. wmemcmp() is still used for 32-bit
wchar_t.
2013-04-09 23:53:26 +02:00
Victor Stinner
0cff4b16d9
replace(): only call PyUnicode_DATA(u) once
2013-04-09 22:52:48 +02:00
Victor Stinner
cc7af72192
Write super-fast version of str.strip(), str.lstrip() and str.rstrip() for pure ASCII
2013-04-09 22:39:24 +02:00
Victor Stinner
f50a4e9bc9
Don't calls macros in PyUnicode_WRITE() parameters
...
PyUnicode_WRITE() expands some parameters twice or more.
2013-04-09 22:38:52 +02:00
Victor Stinner
9c79e41fc5
Fix do_strip(): don't call PyUnicode_READ() in Py_UNICODE_ISSPACE() to not call
...
it twice
2013-04-09 22:21:08 +02:00
Victor Stinner
b3a6014504
Fix _PyUnicode_XStrip()
...
Inline the BLOOM_MEMBER() to only call PyUnicode_READ() only once (per loop
iteration). Store also the length of the seperator in a variable to avoid calls
to PyUnicode_GET_LENGTH().
2013-04-09 22:19:21 +02:00
Victor Stinner
63d5c1a14a
Optimize PyUnicode_DecodeCharmap()
...
Avoid expensive PyUnicode_READ() and PyUnicode_WRITE(), manipulate pointers
instead.
2013-04-09 22:13:33 +02:00
Victor Stinner
a85af502a4
Optimize make_bloom_mask(), used by str.strip(), str.lstrip() and str.rstrip()
...
Write specialized functions per Unicode kind to avoid the expensive
PyUnicode_READ() macro.
2013-04-09 21:53:54 +02:00
Victor Stinner
69ed0f4c86
Use PyUnicode_READ() instead of PyUnicode_READ_CHAR()
...
"PyUnicode_READ_CHAR() is less efficient than PyUnicode_READ() because it calls
PyUnicode_KIND() and might call it twice." according to its documentation.
2013-04-09 21:48:24 +02:00
Victor Stinner
03c3e35d42
Add fast-path in PyUnicode_DecodeCharmap() for pure 8 bit encodings:
...
cp037, cp500 and iso8859_1 codecs
2013-04-09 21:53:09 +02:00
Victor Stinner
cd777eaf53
Issue #17615 : Comparing two Unicode strings now uses wmemcmp() when possible
...
wmemcmp() is twice faster than a dummy loop (342 usec vs 744 usec) on Fedora
18/x86_64, GCC 4.7.2.
2013-04-08 22:43:44 +02:00
Victor Stinner
c1302bba4c
Issue #17615 : Expand expensive PyUnicode_READ() macro in unicode_compare():
...
write specialized functions for each combination of Unicode kinds.
2013-04-08 21:50:54 +02:00
Victor Stinner
207dd38726
fix unused variable
2013-04-03 03:14:58 +02:00
Victor Stinner
eb4b5ac8af
Close #16757 : Avoid calling the expensive _PyUnicode_FindMaxChar() function
...
when possible
2013-04-03 02:02:33 +02:00
Victor Stinner
cfc4c13b04
Add _PyUnicodeWriter_WriteSubstring() function
...
Write a function to enable more optimizations:
* If the substring is the whole string and overallocation is disabled, just
keep a reference to the string, don't copy characters
* Avoid a call to the expensive _PyUnicode_FindMaxChar() function when
possible
2013-04-03 01:48:39 +02:00
Raymond Hettinger
51612fd803
merge
2013-03-23 08:21:52 -07:00
Raymond Hettinger
378170d5d9
Issue 17447: Clarify that str.isidentifier doesn't check for reserved keywords.
2013-03-23 08:21:12 -07:00
Victor Stinner
fb84b5d48d
(Merge 3.3) _PyUnicode_Writer() now also reuses Unicode singletons:
...
empty string and latin1 single character
2013-03-06 19:29:09 +01:00
Victor Stinner
2cb16aa3cb
_PyUnicode_Writer() now also reuses Unicode singletons:
...
empty string and latin1 single character
2013-03-06 19:28:37 +01:00
Victor Stinner
cf77da9fb5
Backed out changeset b9f7b1bf36aa
2013-03-06 01:09:24 +01:00
Victor Stinner
313cac88c5
Issue #17223 : Fix PyUnicode_FromUnicode() on Windows (16-bit wchar_t type)
...
to reject invalid UTF-16 surrogate.
2013-03-06 00:41:50 +01:00
Victor Stinner
36025478bf
(Merge 3.3) Issue #17223 : Fix PyUnicode_FromUnicode() for string of 1 character
...
outside the range U+0000-U+10ffff.
2013-02-26 00:16:57 +01:00
Victor Stinner
d21b58c05d
Issue #17223 : Fix PyUnicode_FromUnicode() for string of 1 character outside
...
the range U+0000-U+10ffff.
2013-02-26 00:15:54 +01:00
Victor Stinner
cfd2c1b4cc
(Merge 3.3) Issue #17137 : When an Unicode string is resized, the internal wide
...
character string (wstr) format is now cleared.
2013-02-07 23:17:34 +01:00
Victor Stinner
bbbac2ec34
Issue #17137 : When an Unicode string is resized, the internal wide character
...
string (wstr) format is now cleared.
2013-02-07 23:12:46 +01:00
Serhiy Storchaka
d0c79dcda5
Issue #17043 : The unicode-internal decoder no longer read past the end of
...
input buffer.
2013-02-07 16:26:55 +02:00
Serhiy Storchaka
03ee12ed72
Issue #17043 : The unicode-internal decoder no longer read past the end of
...
input buffer.
2013-02-07 16:25:25 +02:00
Serhiy Storchaka
3fd4ab356d
Issue #17043 : The unicode-internal decoder no longer read past the end of
...
input buffer.
2013-02-07 16:23:21 +02:00
Serhiy Storchaka
2aee6a6460
Issue #16971 : Fix a refleak in the charmap decoder.
2013-01-29 12:16:57 +02:00
Serhiy Storchaka
afb1cb5579
Issue #16971 : Fix a refleak in the charmap decoder.
2013-01-29 12:13:22 +02:00
Serhiy Storchaka
8fe5a9f9c3
Issue #16979 : Fix error handling bugs in the unicode-escape-decode decoder.
2013-01-29 10:37:39 +02:00
Serhiy Storchaka
24193debd4
Issue #16979 : Fix error handling bugs in the unicode-escape-decode decoder.
2013-01-29 10:28:07 +02:00
Serhiy Storchaka
d679377be7
Issue #16979 : Fix error handling bugs in the unicode-escape-decode decoder.
2013-01-29 10:20:44 +02:00
Serhiy Storchaka
ed3c4128c0
Issue #10156 : In the interpreter's initialization phase, unicode globals
...
are now initialized dynamically as needed.
2013-01-26 12:18:17 +02:00
Serhiy Storchaka
678db84b37
Issue #10156 : In the interpreter's initialization phase, unicode globals
...
are now initialized dynamically as needed.
2013-01-26 12:16:36 +02:00
Serhiy Storchaka
059972535f
Issue #10156 : In the interpreter's initialization phase, unicode globals
...
are now initialized dynamically as needed.
2013-01-26 12:14:02 +02:00
Serhiy Storchaka
570c5b2354
Issue #16980 : Fix processing of escaped non-ascii bytes in the
...
unicode-escape-decode decoder.
2013-01-25 23:53:29 +02:00
Serhiy Storchaka
73e38809e0
Issue #16980 : Fix processing of escaped non-ascii bytes in the
...
unicode-escape-decode decoder.
2013-01-25 23:52:21 +02:00
Serhiy Storchaka
6481bfb2b5
Issue #16335 : Fix integer overflow in unicode-escape decoder.
2013-01-21 11:44:40 +02:00
Serhiy Storchaka
c35f3a9f61
Issue #16335 : Fix integer overflow in unicode-escape decoder.
2013-01-21 11:42:57 +02:00
Serhiy Storchaka
4f5f0e54e0
Issue #16335 : Fix integer overflow in unicode-escape decoder.
2013-01-21 11:38:00 +02:00
Serhiy Storchaka
441d30fac7
Issue #15989 : Fix several occurrences of integer overflow
...
when result of PyLong_AsLong() narrowed to int without checks.
This is a backport of changesets 13e2e44db99d and 525407d89277.
2013-01-19 12:26:26 +02:00
Serhiy Storchaka
9101e23ff6
Issue #15989 : Fix several occurrences of integer overflow
...
when result of PyLong_AsLong() narrowed to int without checks.
This is a backport of changesets 13e2e44db99d and 525407d89277.
2013-01-19 12:41:45 +02:00
Serhiy Storchaka
55e2cb497b
Issue #14850 : Now a chamap decoder treates U+FFFE as "undefined mapping"
...
in any mapping, not only in an unicode string.
2013-01-15 15:30:04 +02:00
Serhiy Storchaka
45d16d9924
Issue #14850 : Now a chamap decoder treates U+FFFE as "undefined mapping"
...
in any mapping, not only in an unicode string.
2013-01-15 15:01:20 +02:00
Serhiy Storchaka
4fb8caee87
Issue #14850 : Now a chamap decoder treates U+FFFE as "undefined mapping"
...
in any mapping, not only in an unicode string.
2013-01-15 14:43:21 +02:00
Serhiy Storchaka
7898043868
Issue #15989 : Fix several occurrences of integer overflow
...
when result of PyLong_AsLong() narrowed to int without checks.
2013-01-15 01:12:17 +02:00
Benjamin Peterson
0b32a480bd
merge 3.3 ( #16906 )
2013-01-09 09:52:22 -06:00
Benjamin Peterson
0c270a8bb7
correct static string clearing loop ( closes #16906 )
2013-01-09 09:52:01 -06:00
Serhiy Storchaka
24a3ef6999
Issue #11461 : Fix the incremental UTF-16 decoder. Original patch by
...
Amaury Forgeot d'Arc. Added tests for partial decoding of non-BMP
characters.
2013-01-08 23:41:55 +02:00
Serhiy Storchaka
ae3b32ad6b
Issue #11461 : Fix the incremental UTF-16 decoder. Original patch by
...
Amaury Forgeot d'Arc. Added tests for partial decoding of non-BMP
characters.
2013-01-08 23:40:52 +02:00
Serhiy Storchaka
48e188e573
Issue #11461 : Fix the incremental UTF-16 decoder. Original patch by
...
Amaury Forgeot d'Arc. Added tests for partial decoding of non-BMP
characters.
2013-01-08 23:14:24 +02:00
Serhiy Storchaka
dec798eb46
Fix out of bound read in UTF-32 decoder on "narrow Unicode" builds.
2013-01-08 22:45:42 +02:00
Serhiy Storchaka
4e02538bf3
Issue #16856 : Fix a segmentation fault from calling repr() on a dict with
...
a key whose repr raise an exception.
2013-01-04 12:40:35 +02:00
Serhiy Storchaka
6c83e739d7
Issue #16856 : Fix a segmentation fault from calling repr() on a dict with
...
a key whose repr raise an exception.
2013-01-04 12:39:34 +02:00
Victor Stinner
18aa4477d3
Close #16281 : handle tailmatch() failure and remove useless comment
...
"honor direction and do a forward or backwards search": the runtime speed may
be different, but I consider that it doesn't really matter in practice. The
direction was never honored before: Python 2.7 uses memcmp() for the str type
for example.
2013-01-03 03:18:09 +01:00
Victor Stinner
7ae320d667
(Merge 3.2) Issue #16455 : On FreeBSD and Solaris, if the locale is C, the
...
ASCII/surrogateescape codec is now used, instead of the locale encoding, to
decode the command line arguments. This change fixes inconsistencies with
os.fsencode() and os.fsdecode() because these operating systems announces an
ASCII locale encoding, whereas the ISO-8859-1 encoding is used in practice.
2013-01-03 01:21:07 +01:00
Victor Stinner
20b654acb5
Issue #16455 : On FreeBSD and Solaris, if the locale is C, the
...
ASCII/surrogateescape codec is now used, instead of the locale encoding, to
decode the command line arguments. This change fixes inconsistencies with
os.fsencode() and os.fsdecode() because these operating systems announces an
ASCII locale encoding, whereas the ISO-8859-1 encoding is used in practice.
2013-01-03 01:08:58 +01:00
Andrew Svetlov
2606a6f197
Issue #16719 : Get rid of WindowsError. Use OSError instead
...
Patch by Serhiy Storchaka.
2012-12-19 14:33:35 +02:00
Gregory P. Smith
27dc02e8c5
Fix the internals of our hash functions to used unsigned values during hash
...
computation as the overflow behavior of signed integers is undefined.
NOTE: This change is smaller compared to 3.2 as much of this cleanup had
already been done. I added the comment that my change in 3.2 added so that the
code would match up. Otherwise this just adds or synchronizes appropriate UL
designations on some constants to be pedantic.
In practice we require compiling everything with -fwrapv which forces overflow
to be defined as twos compliment but this keeps the code cleaner for checkers
or in the case where someone has compiled it without -fwrapv or their
compiler's equivalent. We could work to get rid of the -fwrapv requirement
in 3.4 but that requires more planning.
Found by Clang trunk's Undefined Behavior Sanitizer (UBSan).
Cleanup only - no functionality or hash values change.
2012-12-10 19:51:29 -08:00
Gregory P. Smith
c2176e46d7
Fix the internals of our hash functions to used unsigned values during hash
...
computation as the overflow behavior of signed integers is undefined.
NOTE: This change is smaller compared to 3.2 as much of this cleanup had
already been done. I added the comment that my change in 3.2 added so that the
code would match up. Otherwise this just adds or synchronizes appropriate UL
designations on some constants to be pedantic.
In practice we require compiling everything with -fwrapv which forces overflow
to be defined as twos compliment but this keeps the code cleaner for checkers
or in the case where someone has compiled it without -fwrapv or their
compiler's equivalent.
Found by Clang trunk's Undefined Behavior Sanitizer (UBSan).
Cleanup only - no functionality or hash values change.
2012-12-10 18:32:53 -08:00
Gregory P. Smith
27cbcd6241
Fix the internals of our hash functions to used unsigned values during hash
...
computation as the overflow behavior of signed integers is undefined.
In practice we require compiling everything with -fwrapv which forces overflow
to be defined as twos compliment but this keeps the code cleaner for checkers
or in the case where someone has compiled it without -fwrapv or their
compiler's equivalent.
Found by Clang trunk's Undefined Behavior Sanitizer (UBSan).
Cleanup only - no functionality or hash values change.
2012-12-10 18:15:46 -08:00
Victor Stinner
8dbd421b4d
Cleanup unicodeobject.c
...
* Remove micro-optization:
(errors == "surrogateescape" || strcmp(errors, "surrogateescape") == 0).
Only use strcmp()
* Initialize 'arg' members in unicode_format_arg() to help the compiler to
diagnose real bugs and also make the code simpler to read
2012-12-04 09:30:24 +01:00
Victor Stinner
d45c7f8d74
Issue #16455 : On FreeBSD and Solaris, if the locale is C, the
...
ASCII/surrogateescape codec is now used, instead of the locale encoding, to
decode the command line arguments. This change fixes inconsistencies with
os.fsencode() and os.fsdecode() because these operating systems announces an
ASCII locale encoding, whereas the ISO-8859-1 encoding is used in practice.
2012-12-04 01:34:47 +01:00