Victor Stinner
3a50e7056e
Issue #12281 : Rewrite the MBCS codec to handle correctly replace and ignore
...
error handlers on all Windows versions. The MBCS codec is now supporting all
error handlers, instead of only replace to encode and ignore to decode.
2011-10-18 21:21:00 +02:00
Benjamin Peterson
7a6debe79c
remove some duplication
2011-10-15 09:25:28 -04:00
Victor Stinner
f5cff56a1b
Issue #13088 : Add shared Py_hexdigits constant to format a number into base 16
2011-10-14 02:13:11 +02:00
Antoine Pitrou
f0b934b01a
Reuse the stringlib in findchar(), and make its signature more convenient
2011-10-13 18:55:09 +02:00
Victor Stinner
55c991197b
Optimize unicode_subscript() for step != 1 and ascii strings
2011-10-13 01:17:06 +02:00
Victor Stinner
127226ba69
Don't use PyUnicode_MAX_CHAR_VALUE() macro in Py_MAX()
2011-10-13 01:12:34 +02:00
Victor Stinner
9e7a1bcfd6
Optimize findchar() for PyUnicode_1BYTE_KIND: use memchr and memrchr
2011-10-13 00:18:12 +02:00
Antoine Pitrou
dd4e2f0153
Issue #13155 : Optimize finding the optimal character width of an unicode string
2011-10-13 00:02:27 +02:00
Victor Stinner
49a0a21f37
Unicode replace() avoids calling unicode_adjust_maxchar() when it's useless
...
Add also a special case if the result is an empty string.
2011-10-12 23:46:10 +02:00
Victor Stinner
983b1434bd
Backed out changeset 952d91a7d376
...
If maxchar == PyUnicode_MAX_CHAR_VALUE(unicode), we do an useless copy.
2011-10-12 00:54:35 +02:00
Antoine Pitrou
e55ad2dff0
Relax condition
2011-10-12 00:36:51 +02:00
Victor Stinner
4e10100dee
Fix compiler warning in _PyUnicode_FromUCS2()
2011-10-11 23:27:52 +02:00
Antoine Pitrou
950468e553
Use _PyUnicode_CONVERT_BYTES() where applicable.
2011-10-11 22:45:48 +02:00
Victor Stinner
577db2c9f0
PyUnicode_AsUnicodeCopy() now checks if PyUnicode_AsUnicode() failed
2011-10-11 22:12:48 +02:00
Victor Stinner
c4f281eba3
Fix misuse of PyUnicode_GET_SIZE, use PyUnicode_GET_LENGTH instead
2011-10-11 22:11:42 +02:00
Antoine Pitrou
e459a0877e
Issue #13136 : speed up conversion between different character widths.
2011-10-11 20:58:41 +02:00
Antoine Pitrou
2871698546
/* Remove unused code. It has been committed out since 2000 (!). */
2011-10-11 03:17:47 +02:00
Antoine Pitrou
53bb548f22
Avoid exporting private helpers
...
(thanks "make smelly")
2011-10-10 23:49:24 +02:00
Victor Stinner
794d567b17
any_find_slice() doesn't use callbacks anymore
...
* Call directly the right find/rfind method: allow inlining functions
* Remove Py_LOCAL_CALLBACK (added for any_find_slice)
2011-10-10 03:21:36 +02:00
Martin v. Löwis
afe55bba33
Add API for static strings, primarily good for identifiers.
...
Thanks to Konrad Schöbel and Jasper Schulz for helping with the mass-editing.
2011-10-09 10:38:36 +02:00
Antoine Pitrou
eaf139b3fc
Fix typo in the PyUnicode_Find() implementation
2011-10-09 00:33:09 +02:00
Martin v. Löwis
c47adb04b3
Change PyUnicode_KIND to 1,2,4. Drop _KIND_SIZE and _CHARACTER_SIZE.
2011-10-07 20:55:35 +02:00
Victor Stinner
dd07732af5
PyUnicode_Join() calls directly memcpy() if all strings are of the same kind
2011-10-07 17:02:31 +02:00
Antoine Pitrou
978b9d2a27
Fix formatting memory consumption with very large padding specifications
2011-10-07 12:35:48 +02:00
Victor Stinner
59de0ee9e0
str.replace(a, a) is now returning str unchanged if a is a
2011-10-07 10:01:28 +02:00
Antoine Pitrou
5c0ba36d5f
Fix massive slowdown in string formatting with the % operator
2011-10-07 01:54:09 +02:00
Antoine Pitrou
7c46da7993
Ensure that 1-char singletons get used
2011-10-06 22:07:51 +02:00
Victor Stinner
c6f0df7b20
Fix PyUnicode_Join() for len==1 and non-exact string
2011-10-06 15:58:54 +02:00
Antoine Pitrou
15a66cf134
Fix compilation under Windows
2011-10-06 15:25:32 +02:00
Victor Stinner
200f21340d
Fix assertion in unicode_adjust_maxchar()
2011-10-06 13:27:56 +02:00
Victor Stinner
acf47b807f
Fix my last change on PyUnicode_Join(): don't process separator if len==1
2011-10-06 12:32:37 +02:00
Victor Stinner
25a4b29c95
str.replace() avoids memory when it's possible
2011-10-06 12:31:55 +02:00
Victor Stinner
56c161ab00
_copy_characters() fails more quickly in debug mode on inconsistent state
2011-10-06 02:47:11 +02:00
Victor Stinner
c729b8e92f
Fix a compiler warning: don't define unicode_is_singleton() in release mode
2011-10-06 02:36:59 +02:00
Victor Stinner
fb9ea8c57e
Don't check for the maximum character when copying from unicodeobject.c
...
* Create copy_characters() function which doesn't check for the maximum
character in release mode
* _PyUnicode_CheckConsistency() is no more static to be able to use it
in _PyUnicode_FormatAdvanced() (in formatter_unicode.c)
* _PyUnicode_CheckConsistency() checks the string hash
2011-10-06 01:45:57 +02:00
Victor Stinner
05d1189566
Fix post-condition in unicode_repr(): check the result, not the input
2011-10-06 01:13:58 +02:00
Victor Stinner
f48323e3b3
replace() uses unicode_fromascii() if the input and replace string is ASCII
2011-10-05 23:27:08 +02:00
Victor Stinner
0617b6e18b
unicode_fromascii() checks that the input is ASCII in debug mode
2011-10-05 23:26:01 +02:00
Victor Stinner
c3cec7868b
Add asciilib: similar to ucs1, ucs2 and ucs4 library, but specialized to ASCII
...
ucs1, ucs2 and ucs4 libraries have to scan created substring to find the
maximum character, whereas it is not need to ASCII strings. Because ASCII
strings are common, it is useful to optimize ASCII.
2011-10-05 21:24:08 +02:00
Victor Stinner
14f8f02826
Fix PyUnicode_Partition(): str_in->str_obj
2011-10-05 20:58:25 +02:00
Victor Stinner
bb10a1f759
Ensure that newly created strings use the most efficient store in debug mode
2011-10-05 01:34:17 +02:00
Victor Stinner
9310abbf40
Replace PyUnicodeObject* with PyObject* where it was inappropriate
2011-10-05 00:59:23 +02:00
Victor Stinner
ce5faf673e
unicodeobject.c doesn't make output strings ready in debug mode
...
Try to only create non ready strings in debug mode to ensure that all functions
(not only in unicodeobject.c, everywhere) make input strings ready.
2011-10-05 00:42:43 +02:00
Georg Brandl
7597addbd4
More typoes.
2011-10-05 16:36:47 +02:00
Victor Stinner
c80d6d20d5
Speedup str[a 🅱️ step] for step != 1
...
Try to stop the scanner of the maximum character before the end using a limit
depending on the kind (e.g. 256 for PyUnicode_2BYTE_KIND).
2011-10-05 14:13:28 +02:00
Victor Stinner
ae86485517
Speedup find_maxchar_surrogates() for 32-bit wchar_t
...
If we have at least one character in U+10000-U+10FFFF, we know that we must use
PyUnicode_4BYTE_KIND kind.
2011-10-05 14:02:44 +02:00
Victor Stinner
b9275c104e
Speedup str[a:b] and PyUnicode_FromKindAndData
...
* str[a:b] doesn't scan the string for the maximum character if the string
is ascii only
* PyUnicode_FromKindAndData() stops if we are sure that we cannot use a
shorter character type. For example, _PyUnicode_FromUCS1() stops if we
have at least one character in range U+0080-U+00FF
2011-10-05 14:01:42 +02:00
Victor Stinner
702c734395
Speedup the ASCII decoder
...
It is faster for long string and a little bit faster for short strings,
benchmark on Linux 32 bits, Intel Core i5 @ 3.33GHz:
./python -m timeit 'x=b"a"' 'x.decode("ascii")'
./python -m timeit 'x=b"x"*80' 'x.decode("ascii")'
./python -m timeit 'x=b"abc"*4096' 'x.decode("ascii")'
length | before | after
-------+------------+-----------
1 | 0.234 usec | 0.229 usec
80 | 0.381 usec | 0.357 usec
12,288 | 11.2 usec | 3.01 usec
2011-10-05 13:50:52 +02:00
Victor Stinner
e1335c711c
Fix usage og PyUnicode_READY()
2011-10-04 20:53:03 +02:00
Victor Stinner
e06e145943
_PyUnicode_READY_REPLACE() cannot be used in unicode_subtype_new()
2011-10-04 20:52:31 +02:00