Antoine Pitrou
7ce35a1816
Issue #17237 : Fix crash in the ASCII decoder on m68k.
2013-05-11 15:59:37 +02:00
Antoine Pitrou
8b0e98426d
Issue #17237 : Fix crash in the ASCII decoder on m68k.
2013-05-11 15:58:34 +02:00
Victor Stinner
f4f24248dc
Fix uninitialized value in charmap_decode_mapping()
2013-05-07 01:01:31 +02:00
Victor Stinner
8cecc8c262
Issue #7330 : Implement width and precision (ex: "%5.3s") for the format string
...
of PyUnicode_FromFormat() function, original patch written by Ysj Ray.
2013-05-06 23:11:54 +02:00
Victor Stinner
bb4503f61e
Partial revert of changeset 9744b2df134c
...
PyUnicode_Append() cannot call directly resize_compact(): I forgot that a
string can be ready *and* not compact (a legacy string can also be ready).
2013-04-18 09:41:34 +02:00
Victor Stinner
fb161b1b6d
Split PyUnicode_DecodeCharmap() into subfunction for readability
2013-04-18 01:44:27 +02:00
Victor Stinner
170ca6f84b
Fix bug in Unicode decoders related to _PyUnicodeWriter
...
Bug introduced by changesets 7ed9993d53b4 and edf029fc9591.
2013-04-18 00:25:28 +02:00
Victor Stinner
376cfa122d
Fix typo in unicode_decode_call_errorhandler_writer()
...
Bug introduced by changeset 7ed9993d53b4.
2013-04-17 23:58:16 +02:00
Victor Stinner
8f674ccd64
Close #17694 : Add minimum length to _PyUnicodeWriter
...
* Add also min_char attribute to _PyUnicodeWriter structure (currently unused)
* _PyUnicodeWriter_Init() has no more argument (except the writer itself):
min_length and overallocate must be set explicitly
* In error handlers, only enable overallocation if the replacement string
is longer than 1 character
* CJK decoders don't use overallocation anymore
* Set min_length, instead of preallocating memory using
_PyUnicodeWriter_Prepare(), in many decoders
* _PyUnicode_DecodeUnicodeInternal() checks for integer overflow
2013-04-17 23:02:17 +02:00
Victor Stinner
77282cb4f8
Cleanup PyUnicode_Contains()
...
* No need to double-check that strings are ready: test already done by
PyUnicode_FromObject()
* Remove useless kind variable (use kind1 instead)
2013-04-14 19:22:47 +02:00
Victor Stinner
d92e078c8d
Minor change: fix character in do_strip() for the ASCII case
2013-04-14 19:17:42 +02:00
Victor Stinner
f033510fee
Cleanup PyUnicode_Append()
...
* Check also that right is a Unicode object
* call directly resize_compact() instead of unicode_resize() for a more
explicit error handling, and to avoid testing some properties twice
(ex: unicode_modifiable())
2013-04-14 19:13:03 +02:00
Victor Stinner
4560f9c63f
PyUnicode_Join(): move use_memcpy test out of the loop to cleanup and optimize the code
2013-04-14 18:56:46 +02:00
Victor Stinner
55c08781e8
Optimize repr(str): use _PyUnicode_FastCopyCharacters() when no character is escaped
2013-04-14 18:45:39 +02:00
Victor Stinner
af03757d20
Optimize ascii(str): don't encode/decode repr if repr is already ASCII
2013-04-14 18:44:10 +02:00
Victor Stinner
8a1a6cffd6
Add _PyUnicodeWriter_WriteCharInline()
2013-04-14 02:35:33 +02:00
Serhiy Storchaka
e2cef885a2
Issue #16061 : Speed up str.replace() for replacing 1-character strings.
2013-04-13 22:45:04 +03:00
Victor Stinner
a0dd0213cc
Close #17693 : Rewrite CJK decoders to use the _PyUnicodeWriter API instead of
...
the legacy Py_UNICODE API.
Add also a new _PyUnicodeWriter_WriteChar() function.
2013-04-11 22:09:04 +02:00
Victor Stinner
247109e74d
Issue #17615 : On Windows (VS2010), Performances of wmemcmp() to compare Unicode
...
strings are not convincing. For UCS2 (16-bit wchar_t type), use a dummy loop
instead of wmemcmp(). The dummy loop is as fast, or a little bit faster.
wchar_t is only 16-bit long on Windows. wmemcmp() is still used for 32-bit
wchar_t.
2013-04-09 23:53:26 +02:00
Victor Stinner
0cff4b16d9
replace(): only call PyUnicode_DATA(u) once
2013-04-09 22:52:48 +02:00
Victor Stinner
cc7af72192
Write super-fast version of str.strip(), str.lstrip() and str.rstrip() for pure ASCII
2013-04-09 22:39:24 +02:00
Victor Stinner
f50a4e9bc9
Don't calls macros in PyUnicode_WRITE() parameters
...
PyUnicode_WRITE() expands some parameters twice or more.
2013-04-09 22:38:52 +02:00
Victor Stinner
9c79e41fc5
Fix do_strip(): don't call PyUnicode_READ() in Py_UNICODE_ISSPACE() to not call
...
it twice
2013-04-09 22:21:08 +02:00
Victor Stinner
b3a6014504
Fix _PyUnicode_XStrip()
...
Inline the BLOOM_MEMBER() to only call PyUnicode_READ() only once (per loop
iteration). Store also the length of the seperator in a variable to avoid calls
to PyUnicode_GET_LENGTH().
2013-04-09 22:19:21 +02:00
Victor Stinner
63d5c1a14a
Optimize PyUnicode_DecodeCharmap()
...
Avoid expensive PyUnicode_READ() and PyUnicode_WRITE(), manipulate pointers
instead.
2013-04-09 22:13:33 +02:00
Victor Stinner
a85af502a4
Optimize make_bloom_mask(), used by str.strip(), str.lstrip() and str.rstrip()
...
Write specialized functions per Unicode kind to avoid the expensive
PyUnicode_READ() macro.
2013-04-09 21:53:54 +02:00
Victor Stinner
69ed0f4c86
Use PyUnicode_READ() instead of PyUnicode_READ_CHAR()
...
"PyUnicode_READ_CHAR() is less efficient than PyUnicode_READ() because it calls
PyUnicode_KIND() and might call it twice." according to its documentation.
2013-04-09 21:48:24 +02:00
Victor Stinner
03c3e35d42
Add fast-path in PyUnicode_DecodeCharmap() for pure 8 bit encodings:
...
cp037, cp500 and iso8859_1 codecs
2013-04-09 21:53:09 +02:00
Victor Stinner
cd777eaf53
Issue #17615 : Comparing two Unicode strings now uses wmemcmp() when possible
...
wmemcmp() is twice faster than a dummy loop (342 usec vs 744 usec) on Fedora
18/x86_64, GCC 4.7.2.
2013-04-08 22:43:44 +02:00
Victor Stinner
c1302bba4c
Issue #17615 : Expand expensive PyUnicode_READ() macro in unicode_compare():
...
write specialized functions for each combination of Unicode kinds.
2013-04-08 21:50:54 +02:00
Victor Stinner
207dd38726
fix unused variable
2013-04-03 03:14:58 +02:00
Victor Stinner
eb4b5ac8af
Close #16757 : Avoid calling the expensive _PyUnicode_FindMaxChar() function
...
when possible
2013-04-03 02:02:33 +02:00
Victor Stinner
cfc4c13b04
Add _PyUnicodeWriter_WriteSubstring() function
...
Write a function to enable more optimizations:
* If the substring is the whole string and overallocation is disabled, just
keep a reference to the string, don't copy characters
* Avoid a call to the expensive _PyUnicode_FindMaxChar() function when
possible
2013-04-03 01:48:39 +02:00
Raymond Hettinger
51612fd803
merge
2013-03-23 08:21:52 -07:00
Raymond Hettinger
378170d5d9
Issue 17447: Clarify that str.isidentifier doesn't check for reserved keywords.
2013-03-23 08:21:12 -07:00
Victor Stinner
fb84b5d48d
(Merge 3.3) _PyUnicode_Writer() now also reuses Unicode singletons:
...
empty string and latin1 single character
2013-03-06 19:29:09 +01:00
Victor Stinner
2cb16aa3cb
_PyUnicode_Writer() now also reuses Unicode singletons:
...
empty string and latin1 single character
2013-03-06 19:28:37 +01:00
Victor Stinner
cf77da9fb5
Backed out changeset b9f7b1bf36aa
2013-03-06 01:09:24 +01:00
Victor Stinner
313cac88c5
Issue #17223 : Fix PyUnicode_FromUnicode() on Windows (16-bit wchar_t type)
...
to reject invalid UTF-16 surrogate.
2013-03-06 00:41:50 +01:00
Victor Stinner
36025478bf
(Merge 3.3) Issue #17223 : Fix PyUnicode_FromUnicode() for string of 1 character
...
outside the range U+0000-U+10ffff.
2013-02-26 00:16:57 +01:00
Victor Stinner
d21b58c05d
Issue #17223 : Fix PyUnicode_FromUnicode() for string of 1 character outside
...
the range U+0000-U+10ffff.
2013-02-26 00:15:54 +01:00
Victor Stinner
cfd2c1b4cc
(Merge 3.3) Issue #17137 : When an Unicode string is resized, the internal wide
...
character string (wstr) format is now cleared.
2013-02-07 23:17:34 +01:00
Victor Stinner
bbbac2ec34
Issue #17137 : When an Unicode string is resized, the internal wide character
...
string (wstr) format is now cleared.
2013-02-07 23:12:46 +01:00
Serhiy Storchaka
d0c79dcda5
Issue #17043 : The unicode-internal decoder no longer read past the end of
...
input buffer.
2013-02-07 16:26:55 +02:00
Serhiy Storchaka
03ee12ed72
Issue #17043 : The unicode-internal decoder no longer read past the end of
...
input buffer.
2013-02-07 16:25:25 +02:00
Serhiy Storchaka
3fd4ab356d
Issue #17043 : The unicode-internal decoder no longer read past the end of
...
input buffer.
2013-02-07 16:23:21 +02:00
Serhiy Storchaka
2aee6a6460
Issue #16971 : Fix a refleak in the charmap decoder.
2013-01-29 12:16:57 +02:00
Serhiy Storchaka
afb1cb5579
Issue #16971 : Fix a refleak in the charmap decoder.
2013-01-29 12:13:22 +02:00
Serhiy Storchaka
8fe5a9f9c3
Issue #16979 : Fix error handling bugs in the unicode-escape-decode decoder.
2013-01-29 10:37:39 +02:00
Serhiy Storchaka
24193debd4
Issue #16979 : Fix error handling bugs in the unicode-escape-decode decoder.
2013-01-29 10:28:07 +02:00