Victor Stinner
9e6b4d715c
Issue #18408 : _PyUnicodeWriter_Finish() now clears its buffer attribute in all
...
cases, so _PyUnicodeWriter_Dealloc() can be called after finish.
2013-07-09 00:37:24 +02:00
Victor Stinner
15a0bd3965
Issue #18408 : Fix _PyUnicodeWriter_Finish(): clear writer->buffer,
...
so _PyUnicodeWriter_Dealloc() can be called on the writer after finish.
2013-07-08 22:29:55 +02:00
Victor Stinner
6f8eeee7b9
Issue #18203 : Fix _Py_DecodeUTF8_surrogateescape(), use PyMem_RawMalloc() as _Py_char2wchar()
2013-07-07 22:57:45 +02:00
Victor Stinner
1a7425f67a
Issue #18203 : Replace malloc() with PyMem_RawMalloc() at Python initialization
...
* Replace malloc() with PyMem_RawMalloc()
* Replace PyMem_Malloc() with PyMem_RawMalloc() where the GIL is not held.
* _Py_char2wchar() now returns a buffer allocated by PyMem_RawMalloc(), instead
of PyMem_Malloc()
2013-07-07 16:25:15 +02:00
Christian Heimes
d47802eef7
Fix ref leak in error case of unicode find, count, formatlong
...
CID 983315: Resource leak (RESOURCE_LEAK)
CID 983316: Resource leak (RESOURCE_LEAK)
CID 983317: Resource leak (RESOURCE_LEAK)
2013-06-29 21:33:36 +02:00
Christian Heimes
d47a0456b1
Fix ref leak in error case of unicode index
...
CID 983319 (#1 of 2): Resource leak (RESOURCE_LEAK)
leaked_storage: Variable substring going out of scope leaks the storage it points to.
2013-06-29 21:21:37 +02:00
Christian Heimes
ea71a525c3
Fix ref leak in error case of unicode rindex and rfind
...
CID 983320: Resource leak (RESOURCE_LEAK)
CID 983321: Resource leak (RESOURCE_LEAK)
leaked_storage: Variable substring going out of scope leaks the storage it points to.
2013-06-29 21:17:34 +02:00
Christian Heimes
305e49e17e
Fix memory leak in endswith
...
CID 1040368 (#1 of 1): Resource leak (RESOURCE_LEAK)
leaked_storage: Variable substring going out of scope leaks the storage it points to.
2013-06-29 20:41:06 +02:00
Serhiy Storchaka
c89533f72f
Issue #18184 : PyUnicode_FromFormat() and PyUnicode_FromFormatV() now raise
...
OverflowError when an argument of %c format is out of range.
2013-06-23 20:21:16 +03:00
Serhiy Storchaka
8eeae2126c
Issue #18184 : PyUnicode_FromFormat() and PyUnicode_FromFormatV() now raise
...
OverflowError when an argument of %c format is out of range.
2013-06-23 20:12:14 +03:00
Benjamin Peterson
3164f5d565
merge 3.3 ( #18183 )
2013-06-10 09:24:01 -07:00
Benjamin Peterson
7e30373126
remove MAX_MAXCHAR because it's unsafe for computing maximum codepoitn value (see #18183 )
2013-06-10 09:19:46 -07:00
Victor Stinner
9f067f490f
Issue #9566 : Fix compiler warning on Windows 64-bit
2013-06-05 00:21:31 +02:00
Antoine Pitrou
7ce35a1816
Issue #17237 : Fix crash in the ASCII decoder on m68k.
2013-05-11 15:59:37 +02:00
Antoine Pitrou
8b0e98426d
Issue #17237 : Fix crash in the ASCII decoder on m68k.
2013-05-11 15:58:34 +02:00
Victor Stinner
f4f24248dc
Fix uninitialized value in charmap_decode_mapping()
2013-05-07 01:01:31 +02:00
Victor Stinner
8cecc8c262
Issue #7330 : Implement width and precision (ex: "%5.3s") for the format string
...
of PyUnicode_FromFormat() function, original patch written by Ysj Ray.
2013-05-06 23:11:54 +02:00
Victor Stinner
bb4503f61e
Partial revert of changeset 9744b2df134c
...
PyUnicode_Append() cannot call directly resize_compact(): I forgot that a
string can be ready *and* not compact (a legacy string can also be ready).
2013-04-18 09:41:34 +02:00
Victor Stinner
fb161b1b6d
Split PyUnicode_DecodeCharmap() into subfunction for readability
2013-04-18 01:44:27 +02:00
Victor Stinner
170ca6f84b
Fix bug in Unicode decoders related to _PyUnicodeWriter
...
Bug introduced by changesets 7ed9993d53b4 and edf029fc9591.
2013-04-18 00:25:28 +02:00
Victor Stinner
376cfa122d
Fix typo in unicode_decode_call_errorhandler_writer()
...
Bug introduced by changeset 7ed9993d53b4.
2013-04-17 23:58:16 +02:00
Victor Stinner
8f674ccd64
Close #17694 : Add minimum length to _PyUnicodeWriter
...
* Add also min_char attribute to _PyUnicodeWriter structure (currently unused)
* _PyUnicodeWriter_Init() has no more argument (except the writer itself):
min_length and overallocate must be set explicitly
* In error handlers, only enable overallocation if the replacement string
is longer than 1 character
* CJK decoders don't use overallocation anymore
* Set min_length, instead of preallocating memory using
_PyUnicodeWriter_Prepare(), in many decoders
* _PyUnicode_DecodeUnicodeInternal() checks for integer overflow
2013-04-17 23:02:17 +02:00
Victor Stinner
77282cb4f8
Cleanup PyUnicode_Contains()
...
* No need to double-check that strings are ready: test already done by
PyUnicode_FromObject()
* Remove useless kind variable (use kind1 instead)
2013-04-14 19:22:47 +02:00
Victor Stinner
d92e078c8d
Minor change: fix character in do_strip() for the ASCII case
2013-04-14 19:17:42 +02:00
Victor Stinner
f033510fee
Cleanup PyUnicode_Append()
...
* Check also that right is a Unicode object
* call directly resize_compact() instead of unicode_resize() for a more
explicit error handling, and to avoid testing some properties twice
(ex: unicode_modifiable())
2013-04-14 19:13:03 +02:00
Victor Stinner
4560f9c63f
PyUnicode_Join(): move use_memcpy test out of the loop to cleanup and optimize the code
2013-04-14 18:56:46 +02:00
Victor Stinner
55c08781e8
Optimize repr(str): use _PyUnicode_FastCopyCharacters() when no character is escaped
2013-04-14 18:45:39 +02:00
Victor Stinner
af03757d20
Optimize ascii(str): don't encode/decode repr if repr is already ASCII
2013-04-14 18:44:10 +02:00
Victor Stinner
8a1a6cffd6
Add _PyUnicodeWriter_WriteCharInline()
2013-04-14 02:35:33 +02:00
Serhiy Storchaka
e2cef885a2
Issue #16061 : Speed up str.replace() for replacing 1-character strings.
2013-04-13 22:45:04 +03:00
Victor Stinner
a0dd0213cc
Close #17693 : Rewrite CJK decoders to use the _PyUnicodeWriter API instead of
...
the legacy Py_UNICODE API.
Add also a new _PyUnicodeWriter_WriteChar() function.
2013-04-11 22:09:04 +02:00
Victor Stinner
247109e74d
Issue #17615 : On Windows (VS2010), Performances of wmemcmp() to compare Unicode
...
strings are not convincing. For UCS2 (16-bit wchar_t type), use a dummy loop
instead of wmemcmp(). The dummy loop is as fast, or a little bit faster.
wchar_t is only 16-bit long on Windows. wmemcmp() is still used for 32-bit
wchar_t.
2013-04-09 23:53:26 +02:00
Victor Stinner
0cff4b16d9
replace(): only call PyUnicode_DATA(u) once
2013-04-09 22:52:48 +02:00
Victor Stinner
cc7af72192
Write super-fast version of str.strip(), str.lstrip() and str.rstrip() for pure ASCII
2013-04-09 22:39:24 +02:00
Victor Stinner
f50a4e9bc9
Don't calls macros in PyUnicode_WRITE() parameters
...
PyUnicode_WRITE() expands some parameters twice or more.
2013-04-09 22:38:52 +02:00
Victor Stinner
9c79e41fc5
Fix do_strip(): don't call PyUnicode_READ() in Py_UNICODE_ISSPACE() to not call
...
it twice
2013-04-09 22:21:08 +02:00
Victor Stinner
b3a6014504
Fix _PyUnicode_XStrip()
...
Inline the BLOOM_MEMBER() to only call PyUnicode_READ() only once (per loop
iteration). Store also the length of the seperator in a variable to avoid calls
to PyUnicode_GET_LENGTH().
2013-04-09 22:19:21 +02:00
Victor Stinner
63d5c1a14a
Optimize PyUnicode_DecodeCharmap()
...
Avoid expensive PyUnicode_READ() and PyUnicode_WRITE(), manipulate pointers
instead.
2013-04-09 22:13:33 +02:00
Victor Stinner
a85af502a4
Optimize make_bloom_mask(), used by str.strip(), str.lstrip() and str.rstrip()
...
Write specialized functions per Unicode kind to avoid the expensive
PyUnicode_READ() macro.
2013-04-09 21:53:54 +02:00
Victor Stinner
69ed0f4c86
Use PyUnicode_READ() instead of PyUnicode_READ_CHAR()
...
"PyUnicode_READ_CHAR() is less efficient than PyUnicode_READ() because it calls
PyUnicode_KIND() and might call it twice." according to its documentation.
2013-04-09 21:48:24 +02:00
Victor Stinner
03c3e35d42
Add fast-path in PyUnicode_DecodeCharmap() for pure 8 bit encodings:
...
cp037, cp500 and iso8859_1 codecs
2013-04-09 21:53:09 +02:00
Victor Stinner
cd777eaf53
Issue #17615 : Comparing two Unicode strings now uses wmemcmp() when possible
...
wmemcmp() is twice faster than a dummy loop (342 usec vs 744 usec) on Fedora
18/x86_64, GCC 4.7.2.
2013-04-08 22:43:44 +02:00
Victor Stinner
c1302bba4c
Issue #17615 : Expand expensive PyUnicode_READ() macro in unicode_compare():
...
write specialized functions for each combination of Unicode kinds.
2013-04-08 21:50:54 +02:00
Victor Stinner
207dd38726
fix unused variable
2013-04-03 03:14:58 +02:00
Victor Stinner
eb4b5ac8af
Close #16757 : Avoid calling the expensive _PyUnicode_FindMaxChar() function
...
when possible
2013-04-03 02:02:33 +02:00
Victor Stinner
cfc4c13b04
Add _PyUnicodeWriter_WriteSubstring() function
...
Write a function to enable more optimizations:
* If the substring is the whole string and overallocation is disabled, just
keep a reference to the string, don't copy characters
* Avoid a call to the expensive _PyUnicode_FindMaxChar() function when
possible
2013-04-03 01:48:39 +02:00
Raymond Hettinger
51612fd803
merge
2013-03-23 08:21:52 -07:00
Raymond Hettinger
378170d5d9
Issue 17447: Clarify that str.isidentifier doesn't check for reserved keywords.
2013-03-23 08:21:12 -07:00
Victor Stinner
fb84b5d48d
(Merge 3.3) _PyUnicode_Writer() now also reuses Unicode singletons:
...
empty string and latin1 single character
2013-03-06 19:29:09 +01:00
Victor Stinner
2cb16aa3cb
_PyUnicode_Writer() now also reuses Unicode singletons:
...
empty string and latin1 single character
2013-03-06 19:28:37 +01:00
Victor Stinner
cf77da9fb5
Backed out changeset b9f7b1bf36aa
2013-03-06 01:09:24 +01:00
Victor Stinner
313cac88c5
Issue #17223 : Fix PyUnicode_FromUnicode() on Windows (16-bit wchar_t type)
...
to reject invalid UTF-16 surrogate.
2013-03-06 00:41:50 +01:00
Victor Stinner
36025478bf
(Merge 3.3) Issue #17223 : Fix PyUnicode_FromUnicode() for string of 1 character
...
outside the range U+0000-U+10ffff.
2013-02-26 00:16:57 +01:00
Victor Stinner
d21b58c05d
Issue #17223 : Fix PyUnicode_FromUnicode() for string of 1 character outside
...
the range U+0000-U+10ffff.
2013-02-26 00:15:54 +01:00
Victor Stinner
cfd2c1b4cc
(Merge 3.3) Issue #17137 : When an Unicode string is resized, the internal wide
...
character string (wstr) format is now cleared.
2013-02-07 23:17:34 +01:00
Victor Stinner
bbbac2ec34
Issue #17137 : When an Unicode string is resized, the internal wide character
...
string (wstr) format is now cleared.
2013-02-07 23:12:46 +01:00
Serhiy Storchaka
d0c79dcda5
Issue #17043 : The unicode-internal decoder no longer read past the end of
...
input buffer.
2013-02-07 16:26:55 +02:00
Serhiy Storchaka
03ee12ed72
Issue #17043 : The unicode-internal decoder no longer read past the end of
...
input buffer.
2013-02-07 16:25:25 +02:00
Serhiy Storchaka
3fd4ab356d
Issue #17043 : The unicode-internal decoder no longer read past the end of
...
input buffer.
2013-02-07 16:23:21 +02:00
Serhiy Storchaka
2aee6a6460
Issue #16971 : Fix a refleak in the charmap decoder.
2013-01-29 12:16:57 +02:00
Serhiy Storchaka
afb1cb5579
Issue #16971 : Fix a refleak in the charmap decoder.
2013-01-29 12:13:22 +02:00
Serhiy Storchaka
8fe5a9f9c3
Issue #16979 : Fix error handling bugs in the unicode-escape-decode decoder.
2013-01-29 10:37:39 +02:00
Serhiy Storchaka
24193debd4
Issue #16979 : Fix error handling bugs in the unicode-escape-decode decoder.
2013-01-29 10:28:07 +02:00
Serhiy Storchaka
d679377be7
Issue #16979 : Fix error handling bugs in the unicode-escape-decode decoder.
2013-01-29 10:20:44 +02:00
Serhiy Storchaka
ed3c4128c0
Issue #10156 : In the interpreter's initialization phase, unicode globals
...
are now initialized dynamically as needed.
2013-01-26 12:18:17 +02:00
Serhiy Storchaka
678db84b37
Issue #10156 : In the interpreter's initialization phase, unicode globals
...
are now initialized dynamically as needed.
2013-01-26 12:16:36 +02:00
Serhiy Storchaka
059972535f
Issue #10156 : In the interpreter's initialization phase, unicode globals
...
are now initialized dynamically as needed.
2013-01-26 12:14:02 +02:00
Serhiy Storchaka
570c5b2354
Issue #16980 : Fix processing of escaped non-ascii bytes in the
...
unicode-escape-decode decoder.
2013-01-25 23:53:29 +02:00
Serhiy Storchaka
73e38809e0
Issue #16980 : Fix processing of escaped non-ascii bytes in the
...
unicode-escape-decode decoder.
2013-01-25 23:52:21 +02:00
Serhiy Storchaka
6481bfb2b5
Issue #16335 : Fix integer overflow in unicode-escape decoder.
2013-01-21 11:44:40 +02:00
Serhiy Storchaka
c35f3a9f61
Issue #16335 : Fix integer overflow in unicode-escape decoder.
2013-01-21 11:42:57 +02:00
Serhiy Storchaka
4f5f0e54e0
Issue #16335 : Fix integer overflow in unicode-escape decoder.
2013-01-21 11:38:00 +02:00
Serhiy Storchaka
441d30fac7
Issue #15989 : Fix several occurrences of integer overflow
...
when result of PyLong_AsLong() narrowed to int without checks.
This is a backport of changesets 13e2e44db99d and 525407d89277.
2013-01-19 12:26:26 +02:00
Serhiy Storchaka
9101e23ff6
Issue #15989 : Fix several occurrences of integer overflow
...
when result of PyLong_AsLong() narrowed to int without checks.
This is a backport of changesets 13e2e44db99d and 525407d89277.
2013-01-19 12:41:45 +02:00
Serhiy Storchaka
55e2cb497b
Issue #14850 : Now a chamap decoder treates U+FFFE as "undefined mapping"
...
in any mapping, not only in an unicode string.
2013-01-15 15:30:04 +02:00
Serhiy Storchaka
45d16d9924
Issue #14850 : Now a chamap decoder treates U+FFFE as "undefined mapping"
...
in any mapping, not only in an unicode string.
2013-01-15 15:01:20 +02:00
Serhiy Storchaka
4fb8caee87
Issue #14850 : Now a chamap decoder treates U+FFFE as "undefined mapping"
...
in any mapping, not only in an unicode string.
2013-01-15 14:43:21 +02:00
Serhiy Storchaka
7898043868
Issue #15989 : Fix several occurrences of integer overflow
...
when result of PyLong_AsLong() narrowed to int without checks.
2013-01-15 01:12:17 +02:00
Benjamin Peterson
0b32a480bd
merge 3.3 ( #16906 )
2013-01-09 09:52:22 -06:00
Benjamin Peterson
0c270a8bb7
correct static string clearing loop ( closes #16906 )
2013-01-09 09:52:01 -06:00
Serhiy Storchaka
24a3ef6999
Issue #11461 : Fix the incremental UTF-16 decoder. Original patch by
...
Amaury Forgeot d'Arc. Added tests for partial decoding of non-BMP
characters.
2013-01-08 23:41:55 +02:00
Serhiy Storchaka
ae3b32ad6b
Issue #11461 : Fix the incremental UTF-16 decoder. Original patch by
...
Amaury Forgeot d'Arc. Added tests for partial decoding of non-BMP
characters.
2013-01-08 23:40:52 +02:00
Serhiy Storchaka
48e188e573
Issue #11461 : Fix the incremental UTF-16 decoder. Original patch by
...
Amaury Forgeot d'Arc. Added tests for partial decoding of non-BMP
characters.
2013-01-08 23:14:24 +02:00
Serhiy Storchaka
dec798eb46
Fix out of bound read in UTF-32 decoder on "narrow Unicode" builds.
2013-01-08 22:45:42 +02:00
Serhiy Storchaka
4e02538bf3
Issue #16856 : Fix a segmentation fault from calling repr() on a dict with
...
a key whose repr raise an exception.
2013-01-04 12:40:35 +02:00
Serhiy Storchaka
6c83e739d7
Issue #16856 : Fix a segmentation fault from calling repr() on a dict with
...
a key whose repr raise an exception.
2013-01-04 12:39:34 +02:00
Victor Stinner
18aa4477d3
Close #16281 : handle tailmatch() failure and remove useless comment
...
"honor direction and do a forward or backwards search": the runtime speed may
be different, but I consider that it doesn't really matter in practice. The
direction was never honored before: Python 2.7 uses memcmp() for the str type
for example.
2013-01-03 03:18:09 +01:00
Victor Stinner
7ae320d667
(Merge 3.2) Issue #16455 : On FreeBSD and Solaris, if the locale is C, the
...
ASCII/surrogateescape codec is now used, instead of the locale encoding, to
decode the command line arguments. This change fixes inconsistencies with
os.fsencode() and os.fsdecode() because these operating systems announces an
ASCII locale encoding, whereas the ISO-8859-1 encoding is used in practice.
2013-01-03 01:21:07 +01:00
Victor Stinner
20b654acb5
Issue #16455 : On FreeBSD and Solaris, if the locale is C, the
...
ASCII/surrogateescape codec is now used, instead of the locale encoding, to
decode the command line arguments. This change fixes inconsistencies with
os.fsencode() and os.fsdecode() because these operating systems announces an
ASCII locale encoding, whereas the ISO-8859-1 encoding is used in practice.
2013-01-03 01:08:58 +01:00
Andrew Svetlov
2606a6f197
Issue #16719 : Get rid of WindowsError. Use OSError instead
...
Patch by Serhiy Storchaka.
2012-12-19 14:33:35 +02:00
Gregory P. Smith
27dc02e8c5
Fix the internals of our hash functions to used unsigned values during hash
...
computation as the overflow behavior of signed integers is undefined.
NOTE: This change is smaller compared to 3.2 as much of this cleanup had
already been done. I added the comment that my change in 3.2 added so that the
code would match up. Otherwise this just adds or synchronizes appropriate UL
designations on some constants to be pedantic.
In practice we require compiling everything with -fwrapv which forces overflow
to be defined as twos compliment but this keeps the code cleaner for checkers
or in the case where someone has compiled it without -fwrapv or their
compiler's equivalent. We could work to get rid of the -fwrapv requirement
in 3.4 but that requires more planning.
Found by Clang trunk's Undefined Behavior Sanitizer (UBSan).
Cleanup only - no functionality or hash values change.
2012-12-10 19:51:29 -08:00
Gregory P. Smith
c2176e46d7
Fix the internals of our hash functions to used unsigned values during hash
...
computation as the overflow behavior of signed integers is undefined.
NOTE: This change is smaller compared to 3.2 as much of this cleanup had
already been done. I added the comment that my change in 3.2 added so that the
code would match up. Otherwise this just adds or synchronizes appropriate UL
designations on some constants to be pedantic.
In practice we require compiling everything with -fwrapv which forces overflow
to be defined as twos compliment but this keeps the code cleaner for checkers
or in the case where someone has compiled it without -fwrapv or their
compiler's equivalent.
Found by Clang trunk's Undefined Behavior Sanitizer (UBSan).
Cleanup only - no functionality or hash values change.
2012-12-10 18:32:53 -08:00
Gregory P. Smith
27cbcd6241
Fix the internals of our hash functions to used unsigned values during hash
...
computation as the overflow behavior of signed integers is undefined.
In practice we require compiling everything with -fwrapv which forces overflow
to be defined as twos compliment but this keeps the code cleaner for checkers
or in the case where someone has compiled it without -fwrapv or their
compiler's equivalent.
Found by Clang trunk's Undefined Behavior Sanitizer (UBSan).
Cleanup only - no functionality or hash values change.
2012-12-10 18:15:46 -08:00
Victor Stinner
8dbd421b4d
Cleanup unicodeobject.c
...
* Remove micro-optization:
(errors == "surrogateescape" || strcmp(errors, "surrogateescape") == 0).
Only use strcmp()
* Initialize 'arg' members in unicode_format_arg() to help the compiler to
diagnose real bugs and also make the code simpler to read
2012-12-04 09:30:24 +01:00
Victor Stinner
d45c7f8d74
Issue #16455 : On FreeBSD and Solaris, if the locale is C, the
...
ASCII/surrogateescape codec is now used, instead of the locale encoding, to
decode the command line arguments. This change fixes inconsistencies with
os.fsencode() and os.fsdecode() because these operating systems announces an
ASCII locale encoding, whereas the ISO-8859-1 encoding is used in practice.
2012-12-04 01:34:47 +01:00
Victor Stinner
2660e427d1
(Merge 3.2) Issue #16416 : On Mac OS X, operating system data are now always
...
encoded/decoded to/from UTF-8/surrogateescape, instead of the locale encoding
(which may be ASCII if no locale environment variable is set), to avoid
inconsistencies with os.fsencode() and os.fsdecode() functions which are
already using UTF-8/surrogateescape.
2012-12-03 12:48:53 +01:00
Victor Stinner
27b1ca29cc
Issue #16416 : On Mac OS X, operating system data are now always
...
encoded/decoded to/from UTF-8/surrogateescape, instead of the locale encoding
(which may be ASCII if no locale environment variable is set), to avoid
inconsistencies with os.fsencode() and os.fsdecode() functions which are
already using UTF-8/surrogateescape.
2012-12-03 12:47:59 +01:00
Antoine Pitrou
5439458a2a
Issue #16215 : Fix potential double memory free in str.replace().
...
Patch by Serhiy Storchaka.
2012-11-17 23:29:28 +01:00
Antoine Pitrou
6d5ad227a5
Issue #16215 : Fix potential double memory free in str.replace().
...
Patch by Serhiy Storchaka.
2012-11-17 23:28:17 +01:00
Victor Stinner
0d92c4f667
Issue #16416 : Fix error handling in _Py_wchar2char() _Py_char2wchar() functions
2012-11-12 23:32:21 +01:00
Victor Stinner
fc009eff9e
Close #16311 : Use the _PyUnicodeWriter API in text decoders
...
* Remove unicode_widen(): replaced with _PyUnicodeWriter_Prepare()
* Remove unicode_putchar(): replaced with
PyUnicodeWriter_Prepare() + PyUnicode_WRITER()
* When handling an decoding error, only overallocate the buffer by +25%
instead of +100%
2012-11-07 00:36:38 +01:00
Ezio Melotti
cfa9636404
#8271 : merge with 3.3.
2012-11-04 23:23:09 +02:00
Ezio Melotti
f7ed5d111b
#8271 : the utf-8 decoder now outputs the correct number of U+FFFD characters when used with the "replace" error handler on invalid utf-8 sequences. Patch by Serhiy Storchaka, tests by Ezio Melotti.
2012-11-04 23:21:38 +02:00
Benjamin Peterson
7ff2094bc7
merge 3.3 ( #16369 )
2012-10-30 23:31:12 -04:00
Benjamin Peterson
e8ea97fffb
merge 3.2 ( #16369 )
2012-10-30 23:27:52 -04:00
Benjamin Peterson
c43112823b
initialize more global type objects ( closes #16369 )
2012-10-30 23:21:10 -04:00
Victor Stinner
e64322e034
Close #14625 : Rewrite the UTF-32 decoder. It is now 3x to 4x faster
...
Patch written by Serhiy Storchaka.
2012-10-30 23:12:47 +01:00
Victor Stinner
76df43de30
Issue #16330 : Use surrogate-related macros
...
Patch written by Serhiy Storchaka.
2012-10-30 01:42:39 +01:00
Mark Dickinson
fb90c0934c
Issue #14700 : Fix buggy overflow checks for large precision and width in new-style and old-style formatting.
2012-10-28 10:18:03 +00:00
Victor Stinner
c6cf1ba29e
Replace usage of the deprecated Py_UNICODE_COPY() with Py_MEMCPY() in resize_copy()
2012-10-23 02:54:47 +02:00
Victor Stinner
fe75fb4b3e
Optimize _PyUnicode_HasNULChars(): use findchar() instead of PyUnicode_Contains()
2012-10-23 02:52:18 +02:00
Victor Stinner
6fa627578a
Inline raise_translate_exception(): it is only used once
2012-10-23 02:51:50 +02:00
Victor Stinner
e5567ad236
Optimize PyUnicode_RichCompare() for Py_EQ and Py_NE: always use memcmp()
2012-10-23 02:48:49 +02:00
Christian Heimes
743e0cd6b5
Issue #16166 : Add PY_LITTLE_ENDIAN and PY_BIG_ENDIAN macros and unified
...
endianess detection and handling.
2012-10-17 23:52:17 +02:00
Chris Jerdonek
4a7df9aba9
Issue #14783 : Merge changes from 3.3.
2012-10-07 15:02:16 -07:00
Chris Jerdonek
042fa653ab
Issue #14783 : Merge changes from 3.2.
2012-10-07 14:56:27 -07:00
Chris Jerdonek
83fe2e1c22
Issue #14783 : Improve int() docstring and also str(), range(), and slice().
...
This commit rewrites the docstring for int() to incorporate the documentation
changes made in issue #16036 . It also switches the docstrings for int(),
str(), range(), and slice() to use multi-line signatures.
2012-10-07 14:48:36 -07:00
Victor Stinner
4c63a972d1
Cleanup PyUnicode_FromFormatV() for zero padding
...
Skip the "0" instead of parsing it twice: detect zero padding and then parsed
as a digit of the width.
2012-10-06 23:55:33 +02:00
Victor Stinner
15a1136547
Issue #16147 : PyUnicode_FromFormatV() doesn't need anymore to allocate a buffer
...
on the heap to format numbers.
2012-10-06 23:48:20 +02:00
Victor Stinner
ff5a848db5
Issue #16147 : PyUnicode_FromFormatV() now raises an error if the argument of
...
'%c' is not in the range(0x110000).
2012-10-06 23:05:45 +02:00
Victor Stinner
3921e90c5a
Issue #16147 : PyUnicode_FromFormatV() now detects integer overflow when parsing
...
width and precision
2012-10-06 23:05:00 +02:00
Victor Stinner
e215d960be
Issue #16147 : Rewrite PyUnicode_FromFormatV() to use _PyUnicodeWriter API
...
* Simplify the code: replace 4 steps with one unique step using the
_PyUnicodeWriter API. PyUnicode_Format() has the same design. It avoids to
store intermediate results which require to allocate an array of pointers on
the heap.
* Use the _PyUnicodeWriter API for speed (and its convinient API):
overallocate the buffer to reduce the number of "realloc()"
* Implement "width" and "precision" in Python, don't rely on sprintf(). It
avoids to need of a temporary buffer allocated on the heap: only use a small
buffer allocated in the stack.
* Add _PyUnicodeWriter_WriteCstr() function
* Split PyUnicode_FromFormatV() into two functions: add
unicode_fromformat_arg().
* Inline parse_format_flags(): the format of an argument is now only parsed
once, it's no more needed to have a subfunction.
* Optimize PyUnicode_FromFormatV() for characters between two "%" arguments:
search the next "%" and copy the substring in one chunk, instead of copying
character per character.
2012-10-06 23:03:36 +02:00
Mark Dickinson
ff9c54aca2
Issue #16096 : Merge fixes from 3.3.
2012-10-06 18:05:14 +01:00
Mark Dickinson
c04ddff290
Issue #16096 : Fix several occurrences of potential signed integer overflow. Thanks Serhiy Storchaka.
2012-10-06 18:04:49 +01:00
Victor Stinner
8c6db45d3e
In debug mode, unicode_write_cstr() now checks that non-ASCII characters are
...
not written into an ASCII string
2012-10-06 00:40:45 +02:00
Ezio Melotti
080a2c087e
#16127 : merge with 3.3.
2012-10-05 03:34:02 +03:00
Ezio Melotti
e7f90375b1
#16127 : remove outdated references to narrow builds. Patch by Serhiy Storchaka.
2012-10-05 03:33:31 +03:00
Victor Stinner
1929407406
Fix PyUnicode_Format(): return NULL if PyUnicode_READY(uformat) failed
...
This error cannot occur in practice: PyUnicode_FromObject() always return
a "ready" string.
2012-10-05 00:09:33 +02:00
Victor Stinner
770e19e0cc
Optimize unicode_compare(): use memcmp() when comparing two UCS1 strings
2012-10-04 22:59:45 +02:00
Victor Stinner
90db9c47dc
Enable also ptr==ptr optimization in PyUnicode_Compare()
...
It was already implemented in PyUnicode_RichCompare()
2012-10-04 21:53:50 +02:00
Victor Stinner
aa7712711d
unicode_result_wchar(): move the assert() to the "#ifdef Py_DEBUG" block
2012-10-04 02:32:58 +02:00
Victor Stinner
a4708231e6
Split the huge PyUnicode_Format() function (+540 lines) into subfunctions
2012-10-04 02:19:54 +02:00
Victor Stinner
a049443fab
PyUnicode_Format(): disable overallocation when we are writing the last part
...
of the output string
2012-10-03 23:03:46 +02:00
Victor Stinner
afffce489b
Unicode: resize_compact() and resize_inplace() fills also the Unicode strings
...
with invalid bytes in debug mode, as done by PyUnicode_New()
2012-10-03 23:03:17 +02:00
Victor Stinner
c89d28fdfc
Issue #15609 : Fix refleak introduced by my last optimization
2012-10-02 12:54:07 +02:00
Victor Stinner
621ef3d84f
Issue #15609 : Optimize str%args for integer argument
...
- Use _PyLong_FormatWriter() instead of formatlong() when possible, to avoid
a temporary buffer
- Enable the fast path when width is smaller or equals to the length,
and when the precision is bigger or equals to the length
- Add unit tests!
- formatlong() uses PyUnicode_Resize() instead of _PyUnicode_FromASCII()
to resize the output string
2012-10-02 00:33:47 +02:00
Antoine Pitrou
a1f7655fa7
Issue #15379 : Fix passing of non-BMP characters as integers for the charmap decoder (already working as unicode strings).
...
Patch by Serhiy Storchaka.
2012-09-23 20:00:04 +02:00
Antoine Pitrou
6f80f5d444
Issue #15379 : Fix passing of non-BMP characters as integers for the charmap decoder (already working as unicode strings).
...
Patch by Serhiy Storchaka.
2012-09-23 19:55:21 +02:00
Antoine Pitrou
ca8aa4acf6
Issue #15144 : Fix possible integer overflow when handling pointers as integer values, by using Py_uintptr_t instead of size_t.
...
Patch by Serhiy Storchaka.
2012-09-20 20:56:47 +02:00
Christian Heimes
5f520f4fed
Issue #15900 : Fixed reference leak in PyUnicode_TranslateCharmap()
2012-09-11 14:03:25 +02:00
Christian Heimes
f4f9939a96
Fixed memory leak in error branch of formatfloat(). CID 719687
2012-09-10 11:48:41 +02:00
Antoine Pitrou
057119b0b7
Fix C++-style comment (xlc compilation failure)
2012-09-02 17:56:33 +02:00
Benjamin Peterson
59043f96ea
merge 3.2 ( #15801 )
2012-08-28 18:01:45 -04:00
Benjamin Peterson
28a6cfaefc
use the stricter PyMapping_Check ( closes #15801 )
2012-08-28 17:55:35 -04:00
Stefan Krah
8528c3145e
Issue #15728 : Fix leak in PyUnicode_AsWideCharString(). Found by Coverity.
2012-08-19 21:52:43 +02:00
Nick Coghlan
0e41628d35
Merge str docstring fix from 3.2
2012-08-16 14:14:30 +10:00
Nick Coghlan
573b1fd779
Fix str docstring
2012-08-16 14:13:07 +10:00
Antoine Pitrou
b4bbee25b1
Issue #14579 : Fix CVE-2012-2135: vulnerability in the utf-16 decoder after error handling.
...
Patch by Serhiy Storchaka.
2012-07-21 00:45:14 +02:00
Mark Dickinson
01ac8b6ab1
Use correct types for ASCII_CHAR_MASK integer constants.
2012-07-07 14:08:48 +02:00
Antoine Pitrou
aaefac76dd
Issue #14874 : Restore charmap decoding speed to pre-PEP 393 levels.
...
Patch by Serhiy Storchaka.
2012-06-16 22:48:21 +02:00
Victor Stinner
f185226244
_copy_characters(): move debug code at the top to avoid noisy #ifdef
...
And don't use assert() anymore if check_maxchar is set: return -1 on error
instead.
2012-06-16 16:38:26 +02:00
Victor Stinner
07621338fb
Fix PyUnicode_GetSize(): Don't replace _PyUnicode_Ready() exception
2012-06-16 04:53:46 +02:00
Victor Stinner
8a8b3eaabe
Fix a compiler warning in _copy_characters() and remove debug code
2012-06-16 04:53:25 +02:00
Victor Stinner
24e403bbee
Oops, fix my previous change on _copy_characters()
2012-06-16 04:53:00 +02:00
Victor Stinner
ca439eecea
Fix unicode_adjust_maxchar(): catch PyUnicode_New() failure
2012-06-16 03:17:34 +02:00
Victor Stinner
184252ad3f
Fix "%f" format of str%args if the result is not an ASCII or latin1 string
2012-06-16 02:57:41 +02:00
Victor Stinner
9a77770add
Remove debug code
2012-06-16 02:44:43 +02:00
Victor Stinner
c9d369f1bf
Optimize _PyUnicode_FastCopyCharacters() when maxchar(from) > maxchar(to)
2012-06-16 02:22:37 +02:00
Victor Stinner
f05e17ece9
unicodeobject.c: Remove debug code
2012-06-16 01:53:04 +02:00
Antoine Pitrou
27f6a3b0bf
Issue #15026 : utf-16 encoding is now significantly faster (up to 10x).
...
Patch by Serhiy Storchaka.
2012-06-15 22:15:23 +02:00
Kristján Valur Jónsson
55e5dc8371
Rearrange code to beat an optimizer bug affecting Release x64 on windows
...
with VS2010sp1
2012-06-06 21:58:08 +00:00
Victor Stinner
d7b7c7472b
Issue #14993 : Use standard "unsigned char" instead of a unsigned char bitfield
2012-06-04 22:52:12 +02:00
Kristjan Valur Jonsson
85634d7a2e
Issue #14909 : A number of places were using PyMem_Realloc() apis and
...
PyObject_GC_Resize() with incorrect error handling. In case of errors,
the original object would be leaked. This checkin fixes those cases.
2012-05-31 09:37:31 +00:00
Victor Stinner
3a7d096f2f
Issue #14744 : Fix compilation on Windows (part 2)
2012-05-29 18:53:56 +02:00
Victor Stinner
d3f0882dfb
Issue #14744 : Use the new _PyUnicodeWriter internal API to speed up str%args and str.format(args)
...
* Formatting string, int, float and complex use the _PyUnicodeWriter API. It
avoids a temporary buffer in most cases.
* Add _PyUnicodeWriter_WriteStr() to restore the PyAccu optimization: just
keep a reference to the string if the output is only composed of one string
* Disable overallocation when formatting the last argument of str%args and
str.format(args)
* Overallocation allocates at least 100 characters: add min_length attribute
to the _PyUnicodeWriter structure
* Add new private functions: _PyUnicode_FastCopyCharacters(),
_PyUnicode_FastFill() and _PyUnicode_FromASCII()
The speed up is around 20% in average.
2012-05-29 12:57:52 +02:00
Antoine Pitrou
63065d761e
Issue #14624 : UTF-16 decoding is now 3x to 4x faster on various inputs.
...
Patch by Serhiy Storchaka.
2012-05-15 23:48:04 +02:00
Martin v. Löwis
b05c0738d8
Silence VS 2010 signed/unsigned warnings.
2012-05-15 13:45:49 +02:00
Antoine Pitrou
758153badb
Fix refleaks introduced by 83da67651687.
2012-05-12 15:51:51 +02:00
Antoine Pitrou
e45c0c5cef
Fix logic error introduced by 83da67651687.
2012-05-12 15:49:07 +02:00
Benjamin Peterson
1ff2e35e84
simplify by shortcutting when the kind of the needle is larger than the haystack
2012-05-11 17:41:20 -05:00
Antoine Pitrou
ca5f91b888
Issue #14738 : Speed-up UTF-8 decoding on non-ASCII data. Patch by Serhiy Storchaka.
2012-05-10 16:36:02 +02:00
Victor Stinner
3b1a74a9c3
Rename unicode_write_t structure and its methods to "_PyUnicodeWriter"
2012-05-09 22:25:00 +02:00
Victor Stinner
ee4544c920
Issue #14744 : Inline unicode_writer_write_char() and unicode_write_str()
...
Optimize also PyUnicode_Format(): call unicode_writer_prepare() only once
per argument.
2012-05-09 22:24:08 +02:00
Victor Stinner
f59c28c930
unicode_writer_finish() checks string consistency
2012-05-09 03:24:14 +02:00
Victor Stinner
106802547c
Backout ab500b297900: the check for integer overflow is wrong
...
Issue #14716 : Change integer overflow check in unicode_writer_prepare()
to compute the limit at compile time instead of runtime. Patch writen by Serhiy
Storchaka.
2012-05-07 23:50:05 +02:00
Victor Stinner
0576f9b4cf
Issue #14716 : Change integer overflow check in unicode_writer_prepare()
...
to compute the limit at compile time instead of runtime. Patch writen by Serhiy
Storchaka.
2012-05-07 13:02:44 +02:00
Victor Stinner
202fdca133
Close #14716 : str.format() now uses the new "unicode writer" API instead of the
...
PyAccu API. For example, it makes str.format() from 25% to 30% faster on Linux.
2012-05-07 12:47:02 +02:00
Mark Dickinson
99e2e5552a
Issue #14700 : Fix two broken and undefined-behaviour-inducing overflow checks in old-style string formatting. Thanks Serhiy Storchaka for report and original patch.
2012-05-07 11:20:50 +01:00
Victor Stinner
d0dba6eee8
unicode_writer: don't force inline when it is not necessary
...
Keep inline for performance critical functions (functions used in loops)
2012-05-04 01:19:15 +02:00
Benjamin Peterson
b63f49f2b4
if the kind of the string to count is larger than the string to search, shortcut to 0
2012-05-03 18:31:07 -04:00
Victor Stinner
a7b654be30
unicode_writer: add finish() method and assertions to write_str() method
...
* The write_str() method does nothing if the length is zero.
* Replace "struct unicode_writer_t" with "unicode_writer_t"
2012-05-03 23:58:55 +02:00
Victor Stinner
bf4e266397
Issue #14687 : Remove redundant length attribute of unicode_write_t
...
The length can be read directly from the buffer
2012-05-03 19:27:14 +02:00
Victor Stinner
7989157e49
Issue #14687 : Cleanup unicode_writer_prepare()
...
"Inline" PyUnicode_Resize(): call directly resize_compact()
2012-05-03 13:43:07 +02:00
Victor Stinner
f2c76aa6cb
Issue #14687 : str%tuple now uses an optimistic "unicode writer" instead of an
...
accumulator. Directly write characters into the output (don't use a temporary
list): resize and widen the string on demand.
2012-05-03 13:10:40 +02:00
Victor Stinner
1b487b467b
Issue #14624 , #14687 : Optimize unicode_widen()
...
Don't convert uninitialized characters. Patch written by Serhiy Storchaka.
2012-05-03 12:29:04 +02:00
Victor Stinner
3a7f7977f1
Remove buggy assertion in PyUnicode_Substring()
...
Use also directly unicode_empty, instead of PyUnicode_New(0,0).
2012-05-03 03:36:40 +02:00
Victor Stinner
684d5fd420
Fix PyUnicode_Substring() for start >= length and start > end
...
Remove the fast-path for 1-character string: unicode_fromascii() and
_PyUnicode_FromUCS*() now have their own fast-path for 1-character strings.
2012-05-03 02:32:34 +02:00
Victor Stinner
b6cd014d75
Unicode: optimize creating of 1-character strings
2012-05-03 02:17:04 +02:00
Victor Stinner
bff7c96834
Issue #14687 : Optimize str%tuple for the "%(name)s" syntax
...
Avoid an useless and expensive call to PyUnicode_READ().
2012-05-03 01:44:59 +02:00
Victor Stinner
e6abb488c9
unicodeobject.c: Add MAX_MAXCHAR() macro to (micro-)optimize the computation
...
of the second argument of PyUnicode_New().
* Create also align_maxchar() function
* Optimize fix_decimal_and_space_to_ascii(): don't compute the maximum
character when ch <= 127 (it is ASCII)
2012-05-02 01:15:40 +02:00
Victor Stinner
438106b66e
Issue #14687 : Cleanup PyUnicode_Format()
2012-05-02 00:41:57 +02:00
Victor Stinner
b5c3ea3af3
Issue #14687 : Optimize str%args
...
* formatfloat() uses unicode_fromascii() instead of PyUnicode_DecodeASCII()
to not have to check characters, we know that it is really ASCII
* Use PyUnicode_FromOrdinal() instead of _PyUnicode_FromUCS4() to format
a character: if avoids a call to ucs4lib_find_max_char() to compute
the maximum character (whereas we already know it, it is just the character
itself)
2012-05-02 00:29:36 +02:00
Victor Stinner
b80e46eca4
Issue #14687 : Avoid an useless duplicated string in PyUnicode_Format()
2012-04-30 05:21:52 +02:00
Victor Stinner
aff3cc659b
Issue #14687 : Cleanup PyUnicode_Format()
2012-04-30 05:19:21 +02:00
Victor Stinner
b11d91d969
Fix my previous commit: bool is a long, restore the specical case for bool
2012-04-28 00:25:34 +02:00
Victor Stinner
d0880d57b0
Simplify and optimize formatlong()
...
* Remove _PyBytes_FormatLong(): inline it into formatlong()
* the input type is always a long, so remove the code for bool
* don't duplicate the string if the length does not change
* Use PyUnicode_DATA() instead of _PyUnicode_AsString()
2012-04-27 23:40:13 +02:00
Victor Stinner
94d558b063
Optimize _PyUnicode_FindMaxChar() find pure ASCII strings
2012-04-27 22:26:58 +02:00
Victor Stinner
8f825060f1
Check newly created consistency using _PyUnicode_CheckConsistency(str, 1)
...
* In debug mode, fill the string data with invalid characters
* Simplify also reference counting in PyCodec_BackslashReplaceErrors()
and PyCodec_XMLCharRefReplaceError()
2012-04-27 13:55:39 +02:00
Victor Stinner
718fbf078c
_PyUnicode_CheckConsistency() ensures that the unicode string ends with a
...
null character
2012-04-26 00:39:37 +02:00
Benjamin Peterson
b9f4c9daad
make pointer arith c89
2012-04-23 21:45:40 -04:00