Commit Graph

3936 Commits

Author SHA1 Message Date
Antoine Pitrou cf28eacafe Issue #13188: When called without an explicit traceback argument,
generator.throw() now gets the traceback from the passed exception's
``__traceback__`` attribute.  Patch by Petri Lehtinen.
2011-10-18 16:42:55 +02:00
Antoine Pitrou 5b9f4c1539 Fix typo 2011-10-17 19:21:04 +02:00
Benjamin Peterson 897d059221 merge 3.2 (#13199) 2011-10-17 13:10:24 -04:00
Benjamin Peterson 7a6debe79c remove some duplication 2011-10-15 09:25:28 -04:00
Martin v. Löwis 1c67dd9b15 Port SetAttrString/HasAttrString to SetAttrId/GetAttrId. 2011-10-14 15:16:45 +02:00
Martin v. Löwis bd928fef42 Rename _Py_identifier to _Py_IDENTIFIER. 2011-10-14 10:20:37 +02:00
Victor Stinner f5cff56a1b Issue #13088: Add shared Py_hexdigits constant to format a number into base 16 2011-10-14 02:13:11 +02:00
Victor Stinner d1a9cc29b9 dictviews_or() uses _Py_identifier 2011-10-13 22:51:17 +02:00
Martin v. Löwis bfc6d74b25 Use GetAttrId directly. Proposed by Amaury. 2011-10-13 20:03:57 +02:00
Antoine Pitrou f0b934b01a Reuse the stringlib in findchar(), and make its signature more convenient 2011-10-13 18:55:09 +02:00
Antoine Pitrou c198d0599b Add a comment explaining this heuristic. 2011-10-13 18:07:37 +02:00
Antoine Pitrou dda339e6d2 Simplify heuristic for when to use memchr 2011-10-13 17:58:11 +02:00
Victor Stinner 55c991197b Optimize unicode_subscript() for step != 1 and ascii strings 2011-10-13 01:17:06 +02:00
Victor Stinner 127226ba69 Don't use PyUnicode_MAX_CHAR_VALUE() macro in Py_MAX() 2011-10-13 01:12:34 +02:00
Victor Stinner 9e7a1bcfd6 Optimize findchar() for PyUnicode_1BYTE_KIND: use memchr and memrchr 2011-10-13 00:18:12 +02:00
Antoine Pitrou dd4e2f0153 Issue #13155: Optimize finding the optimal character width of an unicode string 2011-10-13 00:02:27 +02:00
Victor Stinner 49a0a21f37 Unicode replace() avoids calling unicode_adjust_maxchar() when it's useless
Add also a special case if the result is an empty string.
2011-10-12 23:46:10 +02:00
Antoine Pitrou 6b4883dec0 PEP 3151 / issue #12555: reworking the OS and IO exception hierarchy. 2011-10-12 02:54:14 +02:00
Victor Stinner 983b1434bd Backed out changeset 952d91a7d376
If maxchar == PyUnicode_MAX_CHAR_VALUE(unicode), we do an useless copy.
2011-10-12 00:54:35 +02:00
Antoine Pitrou e55ad2dff0 Relax condition 2011-10-12 00:36:51 +02:00
Victor Stinner d218bf14cc stringlib: Fix STRINGLIB_STR for UCS2/UCS4 2011-10-12 00:14:32 +02:00
Victor Stinner 4e10100dee Fix compiler warning in _PyUnicode_FromUCS2() 2011-10-11 23:27:52 +02:00
Victor Stinner 8cc70dcf70 Fix fastsearch for UCS2 and UCS4
* If needle is 0, try (p[0] >> 16) & 0xff for UCS4
 * Disable fastsearch_memchr_1char() if needle is zero for UCS2 and UCS4
2011-10-11 23:22:22 +02:00
Antoine Pitrou 950468e553 Use _PyUnicode_CONVERT_BYTES() where applicable. 2011-10-11 22:45:48 +02:00
Victor Stinner 577db2c9f0 PyUnicode_AsUnicodeCopy() now checks if PyUnicode_AsUnicode() failed 2011-10-11 22:12:48 +02:00
Victor Stinner c4f281eba3 Fix misuse of PyUnicode_GET_SIZE, use PyUnicode_GET_LENGTH instead 2011-10-11 22:11:42 +02:00
Victor Stinner ed2682be2f Reuse PyUnicode_Copy() in validate_and_copy_tuple() 2011-10-11 21:53:24 +02:00
Antoine Pitrou e459a0877e Issue #13136: speed up conversion between different character widths. 2011-10-11 20:58:41 +02:00
Antoine Pitrou 2c3b2302ad Issue #13134: optimize finding single-character strings using memchr 2011-10-11 20:29:21 +02:00
Antoine Pitrou 2871698546 /* Remove unused code. It has been committed out since 2000 (!). */ 2011-10-11 03:17:47 +02:00
Antoine Pitrou 53bb548f22 Avoid exporting private helpers
(thanks "make smelly")
2011-10-10 23:49:24 +02:00
Martin v. Löwis 1ee1b6fe0d Use identifier API for PyObject_GetAttrString. 2011-10-10 18:11:30 +02:00
Victor Stinner 794d567b17 any_find_slice() doesn't use callbacks anymore
* Call directly the right find/rfind method: allow inlining functions
 * Remove Py_LOCAL_CALLBACK (added for any_find_slice)
2011-10-10 03:21:36 +02:00
Martin v. Löwis afe55bba33 Add API for static strings, primarily good for identifiers.
Thanks to Konrad Schöbel and Jasper Schulz for helping with the mass-editing.
2011-10-09 10:38:36 +02:00
Antoine Pitrou eaf139b3fc Fix typo in the PyUnicode_Find() implementation 2011-10-09 00:33:09 +02:00
Georg Brandl 388349add2 Closes #12192: Document that mutating list methods do not return the instance (original patch by Mike Hoy). 2011-10-08 18:32:40 +02:00
Martin v. Löwis c47adb04b3 Change PyUnicode_KIND to 1,2,4. Drop _KIND_SIZE and _CHARACTER_SIZE. 2011-10-07 20:55:35 +02:00
Victor Stinner dd07732af5 PyUnicode_Join() calls directly memcpy() if all strings are of the same kind 2011-10-07 17:02:31 +02:00
Antoine Pitrou 978b9d2a27 Fix formatting memory consumption with very large padding specifications 2011-10-07 12:35:48 +02:00
Victor Stinner 59de0ee9e0 str.replace(a, a) is now returning str unchanged if a is a 2011-10-07 10:01:28 +02:00
Antoine Pitrou 4574e62c6e Fix massive slowdown in string formatting with str.format.
Example:
./python -m timeit -s "f='{}' + '-' * 1024 + '{}'; s='abcd' * 16384" "f.format(s, s)"

-> before: 547 usec per loop
-> after: 13 usec per loop
-> 3.2: 22.5 usec per loop
-> 2.7: 12.6 usec per loop
2011-10-07 02:26:47 +02:00
Antoine Pitrou 5c0ba36d5f Fix massive slowdown in string formatting with the % operator 2011-10-07 01:54:09 +02:00
Antoine Pitrou 7c46da7993 Ensure that 1-char singletons get used 2011-10-06 22:07:51 +02:00
Antoine Pitrou c61c8d7a5e Issue #12911: Fix memory consumption when calculating the repr() of huge tuples or lists.
This introduces a small private API for this common pattern.
The issue has been discovered thanks to Martin's huge-mem buildbot.
2011-10-06 19:04:12 +02:00
Antoine Pitrou eeb7eea1f9 Issue #12911: Fix memory consumption when calculating the repr() of huge tuples or lists.
This introduces a small private API for this common pattern.
The issue has been discovered thanks to Martin's huge-mem buildbot.
2011-10-06 18:57:27 +02:00
Victor Stinner c6f0df7b20 Fix PyUnicode_Join() for len==1 and non-exact string 2011-10-06 15:58:54 +02:00
Antoine Pitrou dbf697ae5c Fix compilation warnings under 64-bit Windows 2011-10-06 15:34:41 +02:00
Antoine Pitrou 15a66cf134 Fix compilation under Windows 2011-10-06 15:25:32 +02:00
Victor Stinner 200f21340d Fix assertion in unicode_adjust_maxchar() 2011-10-06 13:27:56 +02:00
Victor Stinner acf47b807f Fix my last change on PyUnicode_Join(): don't process separator if len==1 2011-10-06 12:32:37 +02:00
Victor Stinner 25a4b29c95 str.replace() avoids memory when it's possible 2011-10-06 12:31:55 +02:00
Victor Stinner 56c161ab00 _copy_characters() fails more quickly in debug mode on inconsistent state 2011-10-06 02:47:11 +02:00
Victor Stinner c729b8e92f Fix a compiler warning: don't define unicode_is_singleton() in release mode 2011-10-06 02:36:59 +02:00
Victor Stinner fb9ea8c57e Don't check for the maximum character when copying from unicodeobject.c
* Create copy_characters() function which doesn't check for the maximum
   character in release mode
 * _PyUnicode_CheckConsistency() is no more static to be able to use it
   in _PyUnicode_FormatAdvanced() (in formatter_unicode.c)
 * _PyUnicode_CheckConsistency() checks the string hash
2011-10-06 01:45:57 +02:00
Victor Stinner 05d1189566 Fix post-condition in unicode_repr(): check the result, not the input 2011-10-06 01:13:58 +02:00
Victor Stinner f48323e3b3 replace() uses unicode_fromascii() if the input and replace string is ASCII 2011-10-05 23:27:08 +02:00
Victor Stinner 0617b6e18b unicode_fromascii() checks that the input is ASCII in debug mode 2011-10-05 23:26:01 +02:00
Victor Stinner c3cec7868b Add asciilib: similar to ucs1, ucs2 and ucs4 library, but specialized to ASCII
ucs1, ucs2 and ucs4 libraries have to scan created substring to find the
maximum character, whereas it is not need to ASCII strings. Because ASCII
strings are common, it is useful to optimize ASCII.
2011-10-05 21:24:08 +02:00
Victor Stinner 14f8f02826 Fix PyUnicode_Partition(): str_in->str_obj 2011-10-05 20:58:25 +02:00
Victor Stinner 31392e741d Fix my_basename(): make the string ready 2011-10-05 20:14:23 +02:00
Victor Stinner bb10a1f759 Ensure that newly created strings use the most efficient store in debug mode 2011-10-05 01:34:17 +02:00
Victor Stinner 9310abbf40 Replace PyUnicodeObject* with PyObject* where it was inappropriate 2011-10-05 00:59:23 +02:00
Victor Stinner ce5faf673e unicodeobject.c doesn't make output strings ready in debug mode
Try to only create non ready strings in debug mode to ensure that all functions
(not only in unicodeobject.c, everywhere) make input strings ready.
2011-10-05 00:42:43 +02:00
Georg Brandl 7597addbd4 More typoes. 2011-10-05 16:36:47 +02:00
Victor Stinner c80d6d20d5 Speedup str[a🅱️step] for step != 1
Try to stop the scanner of the maximum character before the end using a limit
depending on the kind (e.g. 256 for PyUnicode_2BYTE_KIND).
2011-10-05 14:13:28 +02:00
Victor Stinner ae86485517 Speedup find_maxchar_surrogates() for 32-bit wchar_t
If we have at least one character in U+10000-U+10FFFF, we know that we must use
PyUnicode_4BYTE_KIND kind.
2011-10-05 14:02:44 +02:00
Victor Stinner b9275c104e Speedup str[a:b] and PyUnicode_FromKindAndData
* str[a:b] doesn't scan the string for the maximum character if the string
   is ascii only
 * PyUnicode_FromKindAndData() stops if we are sure that we cannot use a
   shorter character type. For example, _PyUnicode_FromUCS1() stops if we
   have at least one character in range U+0080-U+00FF
2011-10-05 14:01:42 +02:00
Victor Stinner 702c734395 Speedup the ASCII decoder
It is faster for long string and a little bit faster for short strings,
benchmark on Linux 32 bits, Intel Core i5 @ 3.33GHz:

./python -m timeit 'x=b"a"' 'x.decode("ascii")'
./python -m timeit 'x=b"x"*80' 'x.decode("ascii")'
./python -m timeit 'x=b"abc"*4096' 'x.decode("ascii")'

length |   before   | after
-------+------------+-----------
     1 | 0.234 usec | 0.229 usec
    80 | 0.381 usec | 0.357 usec
12,288 |  11.2 usec |  3.01 usec
2011-10-05 13:50:52 +02:00
Victor Stinner e1335c711c Fix usage og PyUnicode_READY() 2011-10-04 20:53:03 +02:00
Victor Stinner e06e145943 _PyUnicode_READY_REPLACE() cannot be used in unicode_subtype_new() 2011-10-04 20:52:31 +02:00
Victor Stinner 17efeed284 Add DONT_MAKE_RESULT_READY to unicodeobject.c to help detecting bugs
Use also _PyUnicode_READY_REPLACE() when it's applicable.
2011-10-04 20:05:46 +02:00
Victor Stinner 6b56a7fd3d Add assertion to _Py_ReleaseInternedUnicodeStrings() if READY fails 2011-10-04 20:04:52 +02:00
Antoine Pitrou 875f29bb95 Fix naïve heuristic in unicode slicing (followup to 1b4f886dc9e2) 2011-10-04 20:00:49 +02:00
Antoine Pitrou 2242522fde Add a necessary call to PyUnicode_READY() (followup to ab5086539ab9) 2011-10-04 19:10:51 +02:00
Antoine Pitrou 7aec401966 Optimize string slicing to use the new API 2011-10-04 19:08:01 +02:00
Antoine Pitrou e19aa388e8 When expandtabs() would be a no-op, don't create a duplicate string 2011-10-04 16:04:01 +02:00
Antoine Pitrou e71d574a39 Migrate str.expandtabs to the new API 2011-10-04 15:55:09 +02:00
Benjamin Peterson 7f3140ef80 fix parens 2011-10-03 19:37:29 -04:00
Benjamin Peterson 4bfce8f81f fix formatting 2011-10-03 19:35:07 -04:00
Benjamin Peterson ccc51c1fc6 fix compiler warnings 2011-10-03 19:34:12 -04:00
Victor Stinner b092365cc6 Move in-place Unicode append to its own subfunction 2011-10-04 01:17:31 +02:00
Victor Stinner a5f9163501 Reindent internal Unicode macros 2011-10-04 01:07:11 +02:00
Victor Stinner a41463c203 Document utf8_length and wstr_length states
Ensure these states with assertions in _PyUnicode_CheckConsistency().
2011-10-04 01:05:08 +02:00
Victor Stinner 9566311014 resize_inplace() sets utf8_length to zero if the utf8 is not shared8
Cleanup also the code.
2011-10-04 01:03:50 +02:00
Victor Stinner 9e9d689d85 PyUnicode_New() sets utf8_length to zero for latin1 2011-10-04 01:02:02 +02:00
Victor Stinner 016980454e Unicode: raise SystemError instead of ValueError or RuntimeError on invalid
state
2011-10-04 00:04:26 +02:00
Victor Stinner 7f11ad4594 Unicode: document when the wstr pointer is shared with data
Add also related assertions to _PyUnicode_CheckConsistency().
2011-10-04 00:00:20 +02:00
Victor Stinner 03490918b7 Add _PyUnicode_HAS_WSTR_MEMORY() macro 2011-10-03 23:45:12 +02:00
Victor Stinner 9ce5a835bb PyUnicode_Join() checks output length in debug mode
PyUnicode_CopyCharacters() may copies less character than requested size, if
the input string is smaller than the argument. (This is very unlikely, but who
knows!?)

Avoid also calling PyUnicode_CopyCharacters() if the string is empty.
2011-10-03 23:36:02 +02:00
Victor Stinner b803895355 Fix a compiler warning in PyUnicode_Append()
Don't check PyUnicode_CopyCharacters() in release mode. Rename also some
variables.
2011-10-03 23:27:56 +02:00
Victor Stinner 8cfcbed4e3 Improve string forms and PyUnicode_Resize() documentation
Remove also the FIXME for resize_copy(): as discussed with Martin, copy the
string on resize if the string is not resizable is just fine.
2011-10-03 23:19:21 +02:00
Victor Stinner 77bb47b312 Simplify unicode_resizable(): singletons reference count is at least 2 2011-10-03 20:06:05 +02:00
Victor Stinner 85041a54bd _PyUnicode_CheckConsistency() checks utf8 field consistency 2011-10-03 14:42:39 +02:00
Victor Stinner 3cf4637e4e unicode_subtype_new() copies also the ascii flag 2011-10-03 14:42:15 +02:00
Victor Stinner 42dfd71333 unicode_kind_name() doesn't check consistency anymore
It is is called from _PyUnicode_Dump() and so must not fail.
2011-10-03 14:41:45 +02:00
Victor Stinner a3b334da6d PyUnicode_Ready() now sets ascii=1 if maxchar < 128
ascii=1 is no more reserved to PyASCIIObject. Use
PyUnicode_IS_COMPACT_ASCII(obj) to check if obj is a PyASCIIObject (as before).
2011-10-03 13:53:37 +02:00
Victor Stinner 1b4f9ceca7 Create _PyUnicode_READY_REPLACE() to reuse singleton
Only use _PyUnicode_READY_REPLACE() on just created strings.
2011-10-03 13:28:14 +02:00
Victor Stinner c379ead9af Fix resize_compact() and resize_inplace(); reenable full resize optimizations
* resize_compact() updates also wstr_len for non-ascii strings sharing wstr
 * resize_inplace() updates also utf8_len/wstr_len for strings sharing
   utf8/wstr
2011-10-03 12:52:27 +02:00
Victor Stinner 34411e17b0 resize_inplace() has been fixed: reenable this optimization 2011-10-03 12:21:33 +02:00
Victor Stinner a849a4b6b4 _PyUnicode_Dump() indicates if wstr and/or utf8 are shared 2011-10-03 12:12:11 +02:00