Victor Stinner
9310abbf40
Replace PyUnicodeObject* with PyObject* where it was inappropriate
2011-10-05 00:59:23 +02:00
Victor Stinner
ce5faf673e
unicodeobject.c doesn't make output strings ready in debug mode
...
Try to only create non ready strings in debug mode to ensure that all functions
(not only in unicodeobject.c, everywhere) make input strings ready.
2011-10-05 00:42:43 +02:00
Georg Brandl
7597addbd4
More typoes.
2011-10-05 16:36:47 +02:00
Victor Stinner
c80d6d20d5
Speedup str[a 🅱️ step] for step != 1
...
Try to stop the scanner of the maximum character before the end using a limit
depending on the kind (e.g. 256 for PyUnicode_2BYTE_KIND).
2011-10-05 14:13:28 +02:00
Victor Stinner
ae86485517
Speedup find_maxchar_surrogates() for 32-bit wchar_t
...
If we have at least one character in U+10000-U+10FFFF, we know that we must use
PyUnicode_4BYTE_KIND kind.
2011-10-05 14:02:44 +02:00
Victor Stinner
b9275c104e
Speedup str[a:b] and PyUnicode_FromKindAndData
...
* str[a:b] doesn't scan the string for the maximum character if the string
is ascii only
* PyUnicode_FromKindAndData() stops if we are sure that we cannot use a
shorter character type. For example, _PyUnicode_FromUCS1() stops if we
have at least one character in range U+0080-U+00FF
2011-10-05 14:01:42 +02:00
Victor Stinner
702c734395
Speedup the ASCII decoder
...
It is faster for long string and a little bit faster for short strings,
benchmark on Linux 32 bits, Intel Core i5 @ 3.33GHz:
./python -m timeit 'x=b"a"' 'x.decode("ascii")'
./python -m timeit 'x=b"x"*80' 'x.decode("ascii")'
./python -m timeit 'x=b"abc"*4096' 'x.decode("ascii")'
length | before | after
-------+------------+-----------
1 | 0.234 usec | 0.229 usec
80 | 0.381 usec | 0.357 usec
12,288 | 11.2 usec | 3.01 usec
2011-10-05 13:50:52 +02:00
Victor Stinner
e1335c711c
Fix usage og PyUnicode_READY()
2011-10-04 20:53:03 +02:00
Victor Stinner
e06e145943
_PyUnicode_READY_REPLACE() cannot be used in unicode_subtype_new()
2011-10-04 20:52:31 +02:00
Victor Stinner
17efeed284
Add DONT_MAKE_RESULT_READY to unicodeobject.c to help detecting bugs
...
Use also _PyUnicode_READY_REPLACE() when it's applicable.
2011-10-04 20:05:46 +02:00
Victor Stinner
6b56a7fd3d
Add assertion to _Py_ReleaseInternedUnicodeStrings() if READY fails
2011-10-04 20:04:52 +02:00
Antoine Pitrou
875f29bb95
Fix naïve heuristic in unicode slicing (followup to 1b4f886dc9e2)
2011-10-04 20:00:49 +02:00
Antoine Pitrou
2242522fde
Add a necessary call to PyUnicode_READY() (followup to ab5086539ab9)
2011-10-04 19:10:51 +02:00
Antoine Pitrou
7aec401966
Optimize string slicing to use the new API
2011-10-04 19:08:01 +02:00
Antoine Pitrou
e19aa388e8
When expandtabs() would be a no-op, don't create a duplicate string
2011-10-04 16:04:01 +02:00
Antoine Pitrou
e71d574a39
Migrate str.expandtabs to the new API
2011-10-04 15:55:09 +02:00
Benjamin Peterson
7f3140ef80
fix parens
2011-10-03 19:37:29 -04:00
Benjamin Peterson
4bfce8f81f
fix formatting
2011-10-03 19:35:07 -04:00
Benjamin Peterson
ccc51c1fc6
fix compiler warnings
2011-10-03 19:34:12 -04:00
Victor Stinner
b092365cc6
Move in-place Unicode append to its own subfunction
2011-10-04 01:17:31 +02:00
Victor Stinner
a5f9163501
Reindent internal Unicode macros
2011-10-04 01:07:11 +02:00
Victor Stinner
a41463c203
Document utf8_length and wstr_length states
...
Ensure these states with assertions in _PyUnicode_CheckConsistency().
2011-10-04 01:05:08 +02:00
Victor Stinner
9566311014
resize_inplace() sets utf8_length to zero if the utf8 is not shared8
...
Cleanup also the code.
2011-10-04 01:03:50 +02:00
Victor Stinner
9e9d689d85
PyUnicode_New() sets utf8_length to zero for latin1
2011-10-04 01:02:02 +02:00
Victor Stinner
016980454e
Unicode: raise SystemError instead of ValueError or RuntimeError on invalid
...
state
2011-10-04 00:04:26 +02:00
Victor Stinner
7f11ad4594
Unicode: document when the wstr pointer is shared with data
...
Add also related assertions to _PyUnicode_CheckConsistency().
2011-10-04 00:00:20 +02:00
Victor Stinner
03490918b7
Add _PyUnicode_HAS_WSTR_MEMORY() macro
2011-10-03 23:45:12 +02:00
Victor Stinner
9ce5a835bb
PyUnicode_Join() checks output length in debug mode
...
PyUnicode_CopyCharacters() may copies less character than requested size, if
the input string is smaller than the argument. (This is very unlikely, but who
knows!?)
Avoid also calling PyUnicode_CopyCharacters() if the string is empty.
2011-10-03 23:36:02 +02:00
Victor Stinner
b803895355
Fix a compiler warning in PyUnicode_Append()
...
Don't check PyUnicode_CopyCharacters() in release mode. Rename also some
variables.
2011-10-03 23:27:56 +02:00
Victor Stinner
8cfcbed4e3
Improve string forms and PyUnicode_Resize() documentation
...
Remove also the FIXME for resize_copy(): as discussed with Martin, copy the
string on resize if the string is not resizable is just fine.
2011-10-03 23:19:21 +02:00
Victor Stinner
77bb47b312
Simplify unicode_resizable(): singletons reference count is at least 2
2011-10-03 20:06:05 +02:00
Victor Stinner
85041a54bd
_PyUnicode_CheckConsistency() checks utf8 field consistency
2011-10-03 14:42:39 +02:00
Victor Stinner
3cf4637e4e
unicode_subtype_new() copies also the ascii flag
2011-10-03 14:42:15 +02:00
Victor Stinner
42dfd71333
unicode_kind_name() doesn't check consistency anymore
...
It is is called from _PyUnicode_Dump() and so must not fail.
2011-10-03 14:41:45 +02:00
Victor Stinner
a3b334da6d
PyUnicode_Ready() now sets ascii=1 if maxchar < 128
...
ascii=1 is no more reserved to PyASCIIObject. Use
PyUnicode_IS_COMPACT_ASCII(obj) to check if obj is a PyASCIIObject (as before).
2011-10-03 13:53:37 +02:00
Victor Stinner
1b4f9ceca7
Create _PyUnicode_READY_REPLACE() to reuse singleton
...
Only use _PyUnicode_READY_REPLACE() on just created strings.
2011-10-03 13:28:14 +02:00
Victor Stinner
c379ead9af
Fix resize_compact() and resize_inplace(); reenable full resize optimizations
...
* resize_compact() updates also wstr_len for non-ascii strings sharing wstr
* resize_inplace() updates also utf8_len/wstr_len for strings sharing
utf8/wstr
2011-10-03 12:52:27 +02:00
Victor Stinner
34411e17b0
resize_inplace() has been fixed: reenable this optimization
2011-10-03 12:21:33 +02:00
Victor Stinner
a849a4b6b4
_PyUnicode_Dump() indicates if wstr and/or utf8 are shared
2011-10-03 12:12:11 +02:00
Victor Stinner
1c8d0c76a1
Fix resize_inplace(): update shared utf8 pointer
2011-10-03 12:11:00 +02:00
Victor Stinner
ca4f7a4298
Disable unicode_resize() optimization on Windows (16-bit wchar_t)
2011-10-03 04:18:04 +02:00
Victor Stinner
126c559d05
_PyUnicode_Ready() for 16-bit wchar_t
2011-10-03 04:17:10 +02:00
Victor Stinner
2fd82278cb
Fix compilation error on Windows
...
Fix also a compiler warning.
2011-10-03 04:06:05 +02:00
Victor Stinner
a3be613a56
Use PyUnicode_WCHAR_KIND to check if a string is a wstr string
...
Simplify the test in wstr pointer in unicode_sizeof().
2011-10-03 02:16:37 +02:00
Victor Stinner
910337b42e
Add _PyUnicode_CheckConsistency() macro to help debugging
...
* Document Unicode string states
* Use _PyUnicode_CheckConsistency() to ensure that objects are always
consistent.
2011-10-03 03:20:16 +02:00
Victor Stinner
4fae54cb0e
In release mode, PyUnicode_InternInPlace() does nothing if the input is NULL or
...
not a unicode, instead of failing with a fatal error.
Use assertions in debug mode (provide better error messages).
2011-10-03 02:01:52 +02:00
Victor Stinner
23e5668214
PyUnicode_Append() now works in-place when it's possible
2011-10-03 03:54:37 +02:00
Victor Stinner
fe226c0d37
Rewrite PyUnicode_Resize()
...
* Rename _PyUnicode_Resize() to unicode_resize()
* unicode_resize() creates a copy if the string cannot be resized instead
of failing
* Optimize resize_copy() for wstr strings
* Disable temporary resize_inplace()
2011-10-03 03:52:20 +02:00
Victor Stinner
829c0adca9
Add _PyUnicode_HAS_UTF8_MEMORY() macro
2011-10-03 01:08:02 +02:00
Victor Stinner
fe0c155c4f
Write _PyUnicode_Dump() to help debugging
2011-10-03 02:59:31 +02:00