Commit Graph

690 Commits

Author SHA1 Message Date
Victor Stinner 9310abbf40 Replace PyUnicodeObject* with PyObject* where it was inappropriate 2011-10-05 00:59:23 +02:00
Victor Stinner ce5faf673e unicodeobject.c doesn't make output strings ready in debug mode
Try to only create non ready strings in debug mode to ensure that all functions
(not only in unicodeobject.c, everywhere) make input strings ready.
2011-10-05 00:42:43 +02:00
Georg Brandl 7597addbd4 More typoes. 2011-10-05 16:36:47 +02:00
Victor Stinner c80d6d20d5 Speedup str[a🅱️step] for step != 1
Try to stop the scanner of the maximum character before the end using a limit
depending on the kind (e.g. 256 for PyUnicode_2BYTE_KIND).
2011-10-05 14:13:28 +02:00
Victor Stinner ae86485517 Speedup find_maxchar_surrogates() for 32-bit wchar_t
If we have at least one character in U+10000-U+10FFFF, we know that we must use
PyUnicode_4BYTE_KIND kind.
2011-10-05 14:02:44 +02:00
Victor Stinner b9275c104e Speedup str[a:b] and PyUnicode_FromKindAndData
* str[a:b] doesn't scan the string for the maximum character if the string
   is ascii only
 * PyUnicode_FromKindAndData() stops if we are sure that we cannot use a
   shorter character type. For example, _PyUnicode_FromUCS1() stops if we
   have at least one character in range U+0080-U+00FF
2011-10-05 14:01:42 +02:00
Victor Stinner 702c734395 Speedup the ASCII decoder
It is faster for long string and a little bit faster for short strings,
benchmark on Linux 32 bits, Intel Core i5 @ 3.33GHz:

./python -m timeit 'x=b"a"' 'x.decode("ascii")'
./python -m timeit 'x=b"x"*80' 'x.decode("ascii")'
./python -m timeit 'x=b"abc"*4096' 'x.decode("ascii")'

length |   before   | after
-------+------------+-----------
     1 | 0.234 usec | 0.229 usec
    80 | 0.381 usec | 0.357 usec
12,288 |  11.2 usec |  3.01 usec
2011-10-05 13:50:52 +02:00
Victor Stinner e1335c711c Fix usage og PyUnicode_READY() 2011-10-04 20:53:03 +02:00
Victor Stinner e06e145943 _PyUnicode_READY_REPLACE() cannot be used in unicode_subtype_new() 2011-10-04 20:52:31 +02:00
Victor Stinner 17efeed284 Add DONT_MAKE_RESULT_READY to unicodeobject.c to help detecting bugs
Use also _PyUnicode_READY_REPLACE() when it's applicable.
2011-10-04 20:05:46 +02:00
Victor Stinner 6b56a7fd3d Add assertion to _Py_ReleaseInternedUnicodeStrings() if READY fails 2011-10-04 20:04:52 +02:00
Antoine Pitrou 875f29bb95 Fix naïve heuristic in unicode slicing (followup to 1b4f886dc9e2) 2011-10-04 20:00:49 +02:00
Antoine Pitrou 2242522fde Add a necessary call to PyUnicode_READY() (followup to ab5086539ab9) 2011-10-04 19:10:51 +02:00
Antoine Pitrou 7aec401966 Optimize string slicing to use the new API 2011-10-04 19:08:01 +02:00
Antoine Pitrou e19aa388e8 When expandtabs() would be a no-op, don't create a duplicate string 2011-10-04 16:04:01 +02:00
Antoine Pitrou e71d574a39 Migrate str.expandtabs to the new API 2011-10-04 15:55:09 +02:00
Benjamin Peterson 7f3140ef80 fix parens 2011-10-03 19:37:29 -04:00
Benjamin Peterson 4bfce8f81f fix formatting 2011-10-03 19:35:07 -04:00
Benjamin Peterson ccc51c1fc6 fix compiler warnings 2011-10-03 19:34:12 -04:00
Victor Stinner b092365cc6 Move in-place Unicode append to its own subfunction 2011-10-04 01:17:31 +02:00
Victor Stinner a5f9163501 Reindent internal Unicode macros 2011-10-04 01:07:11 +02:00
Victor Stinner a41463c203 Document utf8_length and wstr_length states
Ensure these states with assertions in _PyUnicode_CheckConsistency().
2011-10-04 01:05:08 +02:00
Victor Stinner 9566311014 resize_inplace() sets utf8_length to zero if the utf8 is not shared8
Cleanup also the code.
2011-10-04 01:03:50 +02:00
Victor Stinner 9e9d689d85 PyUnicode_New() sets utf8_length to zero for latin1 2011-10-04 01:02:02 +02:00
Victor Stinner 016980454e Unicode: raise SystemError instead of ValueError or RuntimeError on invalid
state
2011-10-04 00:04:26 +02:00
Victor Stinner 7f11ad4594 Unicode: document when the wstr pointer is shared with data
Add also related assertions to _PyUnicode_CheckConsistency().
2011-10-04 00:00:20 +02:00
Victor Stinner 03490918b7 Add _PyUnicode_HAS_WSTR_MEMORY() macro 2011-10-03 23:45:12 +02:00
Victor Stinner 9ce5a835bb PyUnicode_Join() checks output length in debug mode
PyUnicode_CopyCharacters() may copies less character than requested size, if
the input string is smaller than the argument. (This is very unlikely, but who
knows!?)

Avoid also calling PyUnicode_CopyCharacters() if the string is empty.
2011-10-03 23:36:02 +02:00
Victor Stinner b803895355 Fix a compiler warning in PyUnicode_Append()
Don't check PyUnicode_CopyCharacters() in release mode. Rename also some
variables.
2011-10-03 23:27:56 +02:00
Victor Stinner 8cfcbed4e3 Improve string forms and PyUnicode_Resize() documentation
Remove also the FIXME for resize_copy(): as discussed with Martin, copy the
string on resize if the string is not resizable is just fine.
2011-10-03 23:19:21 +02:00
Victor Stinner 77bb47b312 Simplify unicode_resizable(): singletons reference count is at least 2 2011-10-03 20:06:05 +02:00
Victor Stinner 85041a54bd _PyUnicode_CheckConsistency() checks utf8 field consistency 2011-10-03 14:42:39 +02:00
Victor Stinner 3cf4637e4e unicode_subtype_new() copies also the ascii flag 2011-10-03 14:42:15 +02:00
Victor Stinner 42dfd71333 unicode_kind_name() doesn't check consistency anymore
It is is called from _PyUnicode_Dump() and so must not fail.
2011-10-03 14:41:45 +02:00
Victor Stinner a3b334da6d PyUnicode_Ready() now sets ascii=1 if maxchar < 128
ascii=1 is no more reserved to PyASCIIObject. Use
PyUnicode_IS_COMPACT_ASCII(obj) to check if obj is a PyASCIIObject (as before).
2011-10-03 13:53:37 +02:00
Victor Stinner 1b4f9ceca7 Create _PyUnicode_READY_REPLACE() to reuse singleton
Only use _PyUnicode_READY_REPLACE() on just created strings.
2011-10-03 13:28:14 +02:00
Victor Stinner c379ead9af Fix resize_compact() and resize_inplace(); reenable full resize optimizations
* resize_compact() updates also wstr_len for non-ascii strings sharing wstr
 * resize_inplace() updates also utf8_len/wstr_len for strings sharing
   utf8/wstr
2011-10-03 12:52:27 +02:00
Victor Stinner 34411e17b0 resize_inplace() has been fixed: reenable this optimization 2011-10-03 12:21:33 +02:00
Victor Stinner a849a4b6b4 _PyUnicode_Dump() indicates if wstr and/or utf8 are shared 2011-10-03 12:12:11 +02:00
Victor Stinner 1c8d0c76a1 Fix resize_inplace(): update shared utf8 pointer 2011-10-03 12:11:00 +02:00
Victor Stinner ca4f7a4298 Disable unicode_resize() optimization on Windows (16-bit wchar_t) 2011-10-03 04:18:04 +02:00
Victor Stinner 126c559d05 _PyUnicode_Ready() for 16-bit wchar_t 2011-10-03 04:17:10 +02:00
Victor Stinner 2fd82278cb Fix compilation error on Windows
Fix also a compiler warning.
2011-10-03 04:06:05 +02:00
Victor Stinner a3be613a56 Use PyUnicode_WCHAR_KIND to check if a string is a wstr string
Simplify the test in wstr pointer in unicode_sizeof().
2011-10-03 02:16:37 +02:00
Victor Stinner 910337b42e Add _PyUnicode_CheckConsistency() macro to help debugging
* Document Unicode string states
 * Use _PyUnicode_CheckConsistency() to ensure that objects are always
   consistent.
2011-10-03 03:20:16 +02:00
Victor Stinner 4fae54cb0e In release mode, PyUnicode_InternInPlace() does nothing if the input is NULL or
not a unicode, instead of failing with a fatal error.

Use assertions in debug mode (provide better error messages).
2011-10-03 02:01:52 +02:00
Victor Stinner 23e5668214 PyUnicode_Append() now works in-place when it's possible 2011-10-03 03:54:37 +02:00
Victor Stinner fe226c0d37 Rewrite PyUnicode_Resize()
* Rename _PyUnicode_Resize() to unicode_resize()
 * unicode_resize() creates a copy if the string cannot be resized instead
   of failing
 * Optimize resize_copy() for wstr strings
 * Disable temporary resize_inplace()
2011-10-03 03:52:20 +02:00
Victor Stinner 829c0adca9 Add _PyUnicode_HAS_UTF8_MEMORY() macro 2011-10-03 01:08:02 +02:00
Victor Stinner fe0c155c4f Write _PyUnicode_Dump() to help debugging 2011-10-03 02:59:31 +02:00