Commit Graph

1224 Commits

Author SHA1 Message Date
Antoine Pitrou 8b0e98426d Issue #17237: Fix crash in the ASCII decoder on m68k. 2013-05-11 15:58:34 +02:00
Victor Stinner f4f24248dc Fix uninitialized value in charmap_decode_mapping() 2013-05-07 01:01:31 +02:00
Victor Stinner 8cecc8c262 Issue #7330: Implement width and precision (ex: "%5.3s") for the format string
of PyUnicode_FromFormat() function, original patch written by Ysj Ray.
2013-05-06 23:11:54 +02:00
Victor Stinner bb4503f61e Partial revert of changeset 9744b2df134c
PyUnicode_Append() cannot call directly resize_compact(): I forgot that a
string can be ready *and* not compact (a legacy string can also be ready).
2013-04-18 09:41:34 +02:00
Victor Stinner fb161b1b6d Split PyUnicode_DecodeCharmap() into subfunction for readability 2013-04-18 01:44:27 +02:00
Victor Stinner 170ca6f84b Fix bug in Unicode decoders related to _PyUnicodeWriter
Bug introduced by changesets 7ed9993d53b4 and edf029fc9591.
2013-04-18 00:25:28 +02:00
Victor Stinner 376cfa122d Fix typo in unicode_decode_call_errorhandler_writer()
Bug introduced by changeset 7ed9993d53b4.
2013-04-17 23:58:16 +02:00
Victor Stinner 8f674ccd64 Close #17694: Add minimum length to _PyUnicodeWriter
* Add also min_char attribute to _PyUnicodeWriter structure (currently unused)
 * _PyUnicodeWriter_Init() has no more argument (except the writer itself):
   min_length and overallocate must be set explicitly
 * In error handlers, only enable overallocation if the replacement string
   is longer than 1 character
 * CJK decoders don't use overallocation anymore
 * Set min_length, instead of preallocating memory using
   _PyUnicodeWriter_Prepare(), in many decoders
 * _PyUnicode_DecodeUnicodeInternal() checks for integer overflow
2013-04-17 23:02:17 +02:00
Victor Stinner 77282cb4f8 Cleanup PyUnicode_Contains()
* No need to double-check that strings are ready: test already done by
   PyUnicode_FromObject()
 * Remove useless kind variable (use kind1 instead)
2013-04-14 19:22:47 +02:00
Victor Stinner d92e078c8d Minor change: fix character in do_strip() for the ASCII case 2013-04-14 19:17:42 +02:00
Victor Stinner f033510fee Cleanup PyUnicode_Append()
* Check also that right is a Unicode object
 * call directly resize_compact() instead of unicode_resize() for a more
   explicit error handling, and to avoid testing some properties twice
   (ex: unicode_modifiable())
2013-04-14 19:13:03 +02:00
Victor Stinner 4560f9c63f PyUnicode_Join(): move use_memcpy test out of the loop to cleanup and optimize the code 2013-04-14 18:56:46 +02:00
Victor Stinner 55c08781e8 Optimize repr(str): use _PyUnicode_FastCopyCharacters() when no character is escaped 2013-04-14 18:45:39 +02:00
Victor Stinner af03757d20 Optimize ascii(str): don't encode/decode repr if repr is already ASCII 2013-04-14 18:44:10 +02:00
Victor Stinner 8a1a6cffd6 Add _PyUnicodeWriter_WriteCharInline() 2013-04-14 02:35:33 +02:00
Serhiy Storchaka e2cef885a2 Issue #16061: Speed up str.replace() for replacing 1-character strings. 2013-04-13 22:45:04 +03:00
Victor Stinner a0dd0213cc Close #17693: Rewrite CJK decoders to use the _PyUnicodeWriter API instead of
the legacy Py_UNICODE API.

Add also a new _PyUnicodeWriter_WriteChar() function.
2013-04-11 22:09:04 +02:00
Victor Stinner 247109e74d Issue #17615: On Windows (VS2010), Performances of wmemcmp() to compare Unicode
strings are not convincing. For UCS2 (16-bit wchar_t type), use a dummy loop
instead of wmemcmp(). The dummy loop is as fast, or a little bit faster.

wchar_t is only 16-bit long on Windows. wmemcmp() is still used for 32-bit
wchar_t.
2013-04-09 23:53:26 +02:00
Victor Stinner 0cff4b16d9 replace(): only call PyUnicode_DATA(u) once 2013-04-09 22:52:48 +02:00
Victor Stinner cc7af72192 Write super-fast version of str.strip(), str.lstrip() and str.rstrip() for pure ASCII 2013-04-09 22:39:24 +02:00
Victor Stinner f50a4e9bc9 Don't calls macros in PyUnicode_WRITE() parameters
PyUnicode_WRITE() expands some parameters twice or more.
2013-04-09 22:38:52 +02:00
Victor Stinner 9c79e41fc5 Fix do_strip(): don't call PyUnicode_READ() in Py_UNICODE_ISSPACE() to not call
it twice
2013-04-09 22:21:08 +02:00
Victor Stinner b3a6014504 Fix _PyUnicode_XStrip()
Inline the BLOOM_MEMBER() to only call PyUnicode_READ() only once (per loop
iteration). Store also the length of the seperator in a variable to avoid calls
to PyUnicode_GET_LENGTH().
2013-04-09 22:19:21 +02:00
Victor Stinner 63d5c1a14a Optimize PyUnicode_DecodeCharmap()
Avoid expensive PyUnicode_READ() and PyUnicode_WRITE(), manipulate pointers
instead.
2013-04-09 22:13:33 +02:00
Victor Stinner a85af502a4 Optimize make_bloom_mask(), used by str.strip(), str.lstrip() and str.rstrip()
Write specialized functions per Unicode kind to avoid the expensive
PyUnicode_READ() macro.
2013-04-09 21:53:54 +02:00
Victor Stinner 69ed0f4c86 Use PyUnicode_READ() instead of PyUnicode_READ_CHAR()
"PyUnicode_READ_CHAR() is less efficient than PyUnicode_READ() because it calls
PyUnicode_KIND() and might call it twice." according to its documentation.
2013-04-09 21:48:24 +02:00
Victor Stinner 03c3e35d42 Add fast-path in PyUnicode_DecodeCharmap() for pure 8 bit encodings:
cp037, cp500 and iso8859_1 codecs
2013-04-09 21:53:09 +02:00
Victor Stinner cd777eaf53 Issue #17615: Comparing two Unicode strings now uses wmemcmp() when possible
wmemcmp() is twice faster than a dummy loop (342 usec vs 744 usec) on Fedora
18/x86_64, GCC 4.7.2.
2013-04-08 22:43:44 +02:00
Victor Stinner c1302bba4c Issue #17615: Expand expensive PyUnicode_READ() macro in unicode_compare():
write specialized functions for each combination of Unicode kinds.
2013-04-08 21:50:54 +02:00
Victor Stinner 207dd38726 fix unused variable 2013-04-03 03:14:58 +02:00
Victor Stinner eb4b5ac8af Close #16757: Avoid calling the expensive _PyUnicode_FindMaxChar() function
when possible
2013-04-03 02:02:33 +02:00
Victor Stinner cfc4c13b04 Add _PyUnicodeWriter_WriteSubstring() function
Write a function to enable more optimizations:

 * If the substring is the whole string and overallocation is disabled, just
   keep a reference to the string, don't copy characters
 * Avoid a call to the expensive _PyUnicode_FindMaxChar() function when
   possible
2013-04-03 01:48:39 +02:00
Raymond Hettinger 51612fd803 merge 2013-03-23 08:21:52 -07:00
Raymond Hettinger 378170d5d9 Issue 17447: Clarify that str.isidentifier doesn't check for reserved keywords. 2013-03-23 08:21:12 -07:00
Victor Stinner fb84b5d48d (Merge 3.3) _PyUnicode_Writer() now also reuses Unicode singletons:
empty string and latin1 single character
2013-03-06 19:29:09 +01:00
Victor Stinner 2cb16aa3cb _PyUnicode_Writer() now also reuses Unicode singletons:
empty string and latin1 single character
2013-03-06 19:28:37 +01:00
Victor Stinner cf77da9fb5 Backed out changeset b9f7b1bf36aa 2013-03-06 01:09:24 +01:00
Victor Stinner 313cac88c5 Issue #17223: Fix PyUnicode_FromUnicode() on Windows (16-bit wchar_t type)
to reject invalid UTF-16 surrogate.
2013-03-06 00:41:50 +01:00
Victor Stinner 36025478bf (Merge 3.3) Issue #17223: Fix PyUnicode_FromUnicode() for string of 1 character
outside the range U+0000-U+10ffff.
2013-02-26 00:16:57 +01:00
Victor Stinner d21b58c05d Issue #17223: Fix PyUnicode_FromUnicode() for string of 1 character outside
the range U+0000-U+10ffff.
2013-02-26 00:15:54 +01:00
Victor Stinner cfd2c1b4cc (Merge 3.3) Issue #17137: When an Unicode string is resized, the internal wide
character string (wstr) format is now cleared.
2013-02-07 23:17:34 +01:00
Victor Stinner bbbac2ec34 Issue #17137: When an Unicode string is resized, the internal wide character
string (wstr) format is now cleared.
2013-02-07 23:12:46 +01:00
Serhiy Storchaka d0c79dcda5 Issue #17043: The unicode-internal decoder no longer read past the end of
input buffer.
2013-02-07 16:26:55 +02:00
Serhiy Storchaka 03ee12ed72 Issue #17043: The unicode-internal decoder no longer read past the end of
input buffer.
2013-02-07 16:25:25 +02:00
Serhiy Storchaka 3fd4ab356d Issue #17043: The unicode-internal decoder no longer read past the end of
input buffer.
2013-02-07 16:23:21 +02:00
Serhiy Storchaka 2aee6a6460 Issue #16971: Fix a refleak in the charmap decoder. 2013-01-29 12:16:57 +02:00
Serhiy Storchaka afb1cb5579 Issue #16971: Fix a refleak in the charmap decoder. 2013-01-29 12:13:22 +02:00
Serhiy Storchaka 8fe5a9f9c3 Issue #16979: Fix error handling bugs in the unicode-escape-decode decoder. 2013-01-29 10:37:39 +02:00
Serhiy Storchaka 24193debd4 Issue #16979: Fix error handling bugs in the unicode-escape-decode decoder. 2013-01-29 10:28:07 +02:00
Serhiy Storchaka d679377be7 Issue #16979: Fix error handling bugs in the unicode-escape-decode decoder. 2013-01-29 10:20:44 +02:00
Serhiy Storchaka ed3c4128c0 Issue #10156: In the interpreter's initialization phase, unicode globals
are now initialized dynamically as needed.
2013-01-26 12:18:17 +02:00
Serhiy Storchaka 678db84b37 Issue #10156: In the interpreter's initialization phase, unicode globals
are now initialized dynamically as needed.
2013-01-26 12:16:36 +02:00
Serhiy Storchaka 059972535f Issue #10156: In the interpreter's initialization phase, unicode globals
are now initialized dynamically as needed.
2013-01-26 12:14:02 +02:00
Serhiy Storchaka 570c5b2354 Issue #16980: Fix processing of escaped non-ascii bytes in the
unicode-escape-decode decoder.
2013-01-25 23:53:29 +02:00
Serhiy Storchaka 73e38809e0 Issue #16980: Fix processing of escaped non-ascii bytes in the
unicode-escape-decode decoder.
2013-01-25 23:52:21 +02:00
Serhiy Storchaka 6481bfb2b5 Issue #16335: Fix integer overflow in unicode-escape decoder. 2013-01-21 11:44:40 +02:00
Serhiy Storchaka c35f3a9f61 Issue #16335: Fix integer overflow in unicode-escape decoder. 2013-01-21 11:42:57 +02:00
Serhiy Storchaka 4f5f0e54e0 Issue #16335: Fix integer overflow in unicode-escape decoder. 2013-01-21 11:38:00 +02:00
Serhiy Storchaka 441d30fac7 Issue #15989: Fix several occurrences of integer overflow
when result of PyLong_AsLong() narrowed to int without checks.

This is a backport of changesets 13e2e44db99d and 525407d89277.
2013-01-19 12:26:26 +02:00
Serhiy Storchaka 9101e23ff6 Issue #15989: Fix several occurrences of integer overflow
when result of PyLong_AsLong() narrowed to int without checks.

This is a backport of changesets 13e2e44db99d and 525407d89277.
2013-01-19 12:41:45 +02:00
Serhiy Storchaka 55e2cb497b Issue #14850: Now a chamap decoder treates U+FFFE as "undefined mapping"
in any mapping, not only in an unicode string.
2013-01-15 15:30:04 +02:00
Serhiy Storchaka 45d16d9924 Issue #14850: Now a chamap decoder treates U+FFFE as "undefined mapping"
in any mapping, not only in an unicode string.
2013-01-15 15:01:20 +02:00
Serhiy Storchaka 4fb8caee87 Issue #14850: Now a chamap decoder treates U+FFFE as "undefined mapping"
in any mapping, not only in an unicode string.
2013-01-15 14:43:21 +02:00
Serhiy Storchaka 7898043868 Issue #15989: Fix several occurrences of integer overflow
when result of PyLong_AsLong() narrowed to int without checks.
2013-01-15 01:12:17 +02:00
Benjamin Peterson 0b32a480bd merge 3.3 (#16906) 2013-01-09 09:52:22 -06:00
Benjamin Peterson 0c270a8bb7 correct static string clearing loop (closes #16906) 2013-01-09 09:52:01 -06:00
Serhiy Storchaka 24a3ef6999 Issue #11461: Fix the incremental UTF-16 decoder. Original patch by
Amaury Forgeot d'Arc. Added tests for partial decoding of non-BMP
characters.
2013-01-08 23:41:55 +02:00
Serhiy Storchaka ae3b32ad6b Issue #11461: Fix the incremental UTF-16 decoder. Original patch by
Amaury Forgeot d'Arc. Added tests for partial decoding of non-BMP
characters.
2013-01-08 23:40:52 +02:00
Serhiy Storchaka 48e188e573 Issue #11461: Fix the incremental UTF-16 decoder. Original patch by
Amaury Forgeot d'Arc. Added tests for partial decoding of non-BMP
characters.
2013-01-08 23:14:24 +02:00
Serhiy Storchaka dec798eb46 Fix out of bound read in UTF-32 decoder on "narrow Unicode" builds. 2013-01-08 22:45:42 +02:00
Serhiy Storchaka 4e02538bf3 Issue #16856: Fix a segmentation fault from calling repr() on a dict with
a key whose repr raise an exception.
2013-01-04 12:40:35 +02:00
Serhiy Storchaka 6c83e739d7 Issue #16856: Fix a segmentation fault from calling repr() on a dict with
a key whose repr raise an exception.
2013-01-04 12:39:34 +02:00
Victor Stinner 18aa4477d3 Close #16281: handle tailmatch() failure and remove useless comment
"honor direction and do a forward or backwards search": the runtime speed may
be different, but I consider that it doesn't really matter in practice. The
direction was never honored before: Python 2.7 uses memcmp() for the str type
for example.
2013-01-03 03:18:09 +01:00
Victor Stinner 7ae320d667 (Merge 3.2) Issue #16455: On FreeBSD and Solaris, if the locale is C, the
ASCII/surrogateescape codec is now used, instead of the locale encoding, to
decode the command line arguments. This change fixes inconsistencies with
os.fsencode() and os.fsdecode() because these operating systems announces an
ASCII locale encoding, whereas the ISO-8859-1 encoding is used in practice.
2013-01-03 01:21:07 +01:00
Victor Stinner 20b654acb5 Issue #16455: On FreeBSD and Solaris, if the locale is C, the
ASCII/surrogateescape codec is now used, instead of the locale encoding, to
decode the command line arguments. This change fixes inconsistencies with
os.fsencode() and os.fsdecode() because these operating systems announces an
ASCII locale encoding, whereas the ISO-8859-1 encoding is used in practice.
2013-01-03 01:08:58 +01:00
Andrew Svetlov 2606a6f197 Issue #16719: Get rid of WindowsError. Use OSError instead
Patch by Serhiy Storchaka.
2012-12-19 14:33:35 +02:00
Gregory P. Smith 27dc02e8c5 Fix the internals of our hash functions to used unsigned values during hash
computation as the overflow behavior of signed integers is undefined.

NOTE: This change is smaller compared to 3.2 as much of this cleanup had
already been done.  I added the comment that my change in 3.2 added so that the
code would match up.  Otherwise this just adds or synchronizes appropriate UL
designations on some constants to be pedantic.

In practice we require compiling everything with -fwrapv which forces overflow
to be defined as twos compliment but this keeps the code cleaner for checkers
or in the case where someone has compiled it without -fwrapv or their
compiler's equivalent.  We could work to get rid of the -fwrapv requirement
in 3.4 but that requires more planning.

Found by Clang trunk's Undefined Behavior Sanitizer (UBSan).

Cleanup only - no functionality or hash values change.
2012-12-10 19:51:29 -08:00
Gregory P. Smith c2176e46d7 Fix the internals of our hash functions to used unsigned values during hash
computation as the overflow behavior of signed integers is undefined.

NOTE: This change is smaller compared to 3.2 as much of this cleanup had
already been done.  I added the comment that my change in 3.2 added so that the
code would match up.  Otherwise this just adds or synchronizes appropriate UL
designations on some constants to be pedantic.

In practice we require compiling everything with -fwrapv which forces overflow
to be defined as twos compliment but this keeps the code cleaner for checkers
or in the case where someone has compiled it without -fwrapv or their
compiler's equivalent.

Found by Clang trunk's Undefined Behavior Sanitizer (UBSan).

Cleanup only - no functionality or hash values change.
2012-12-10 18:32:53 -08:00
Gregory P. Smith 27cbcd6241 Fix the internals of our hash functions to used unsigned values during hash
computation as the overflow behavior of signed integers is undefined.

In practice we require compiling everything with -fwrapv which forces overflow
to be defined as twos compliment but this keeps the code cleaner for checkers
or in the case where someone has compiled it without -fwrapv or their
compiler's equivalent.

Found by Clang trunk's Undefined Behavior Sanitizer (UBSan).

Cleanup only - no functionality or hash values change.
2012-12-10 18:15:46 -08:00
Victor Stinner 8dbd421b4d Cleanup unicodeobject.c
* Remove micro-optization:
   (errors == "surrogateescape" || strcmp(errors, "surrogateescape") == 0).
   Only use strcmp()
 * Initialize 'arg' members in unicode_format_arg() to help the compiler to
   diagnose real bugs and also make the code simpler to read
2012-12-04 09:30:24 +01:00
Victor Stinner d45c7f8d74 Issue #16455: On FreeBSD and Solaris, if the locale is C, the
ASCII/surrogateescape codec is now used, instead of the locale encoding, to
decode the command line arguments. This change fixes inconsistencies with
os.fsencode() and os.fsdecode() because these operating systems announces an
ASCII locale encoding, whereas the ISO-8859-1 encoding is used in practice.
2012-12-04 01:34:47 +01:00
Victor Stinner 2660e427d1 (Merge 3.2) Issue #16416: On Mac OS X, operating system data are now always
encoded/decoded to/from UTF-8/surrogateescape, instead of the locale encoding
(which may be ASCII if no locale environment variable is set), to avoid
inconsistencies with os.fsencode() and os.fsdecode() functions which are
already using UTF-8/surrogateescape.
2012-12-03 12:48:53 +01:00
Victor Stinner 27b1ca29cc Issue #16416: On Mac OS X, operating system data are now always
encoded/decoded to/from UTF-8/surrogateescape, instead of the locale encoding
(which may be ASCII if no locale environment variable is set), to avoid
inconsistencies with os.fsencode() and os.fsdecode() functions which are
already using UTF-8/surrogateescape.
2012-12-03 12:47:59 +01:00
Antoine Pitrou 5439458a2a Issue #16215: Fix potential double memory free in str.replace().
Patch by Serhiy Storchaka.
2012-11-17 23:29:28 +01:00
Antoine Pitrou 6d5ad227a5 Issue #16215: Fix potential double memory free in str.replace().
Patch by Serhiy Storchaka.
2012-11-17 23:28:17 +01:00
Victor Stinner 0d92c4f667 Issue #16416: Fix error handling in _Py_wchar2char() _Py_char2wchar() functions 2012-11-12 23:32:21 +01:00
Victor Stinner fc009eff9e Close #16311: Use the _PyUnicodeWriter API in text decoders
* Remove unicode_widen(): replaced with _PyUnicodeWriter_Prepare()
 * Remove unicode_putchar(): replaced with
   PyUnicodeWriter_Prepare() + PyUnicode_WRITER()
 * When handling an decoding error, only overallocate the buffer by +25%
   instead of +100%
2012-11-07 00:36:38 +01:00
Ezio Melotti cfa9636404 #8271: merge with 3.3. 2012-11-04 23:23:09 +02:00
Ezio Melotti f7ed5d111b #8271: the utf-8 decoder now outputs the correct number of U+FFFD characters when used with the "replace" error handler on invalid utf-8 sequences. Patch by Serhiy Storchaka, tests by Ezio Melotti. 2012-11-04 23:21:38 +02:00
Benjamin Peterson 7ff2094bc7 merge 3.3 (#16369) 2012-10-30 23:31:12 -04:00
Benjamin Peterson e8ea97fffb merge 3.2 (#16369) 2012-10-30 23:27:52 -04:00
Benjamin Peterson c43112823b initialize more global type objects (closes #16369) 2012-10-30 23:21:10 -04:00
Victor Stinner e64322e034 Close #14625: Rewrite the UTF-32 decoder. It is now 3x to 4x faster
Patch written by Serhiy Storchaka.
2012-10-30 23:12:47 +01:00
Victor Stinner 76df43de30 Issue #16330: Use surrogate-related macros
Patch written by Serhiy Storchaka.
2012-10-30 01:42:39 +01:00
Mark Dickinson fb90c0934c Issue #14700: Fix buggy overflow checks for large precision and width in new-style and old-style formatting. 2012-10-28 10:18:03 +00:00
Victor Stinner c6cf1ba29e Replace usage of the deprecated Py_UNICODE_COPY() with Py_MEMCPY() in resize_copy() 2012-10-23 02:54:47 +02:00
Victor Stinner fe75fb4b3e Optimize _PyUnicode_HasNULChars(): use findchar() instead of PyUnicode_Contains() 2012-10-23 02:52:18 +02:00
Victor Stinner 6fa627578a Inline raise_translate_exception(): it is only used once 2012-10-23 02:51:50 +02:00
Victor Stinner e5567ad236 Optimize PyUnicode_RichCompare() for Py_EQ and Py_NE: always use memcmp() 2012-10-23 02:48:49 +02:00
Christian Heimes 743e0cd6b5 Issue #16166: Add PY_LITTLE_ENDIAN and PY_BIG_ENDIAN macros and unified
endianess detection and handling.
2012-10-17 23:52:17 +02:00
Chris Jerdonek 4a7df9aba9 Issue #14783: Merge changes from 3.3. 2012-10-07 15:02:16 -07:00
Chris Jerdonek 042fa653ab Issue #14783: Merge changes from 3.2. 2012-10-07 14:56:27 -07:00
Chris Jerdonek 83fe2e1c22 Issue #14783: Improve int() docstring and also str(), range(), and slice().
This commit rewrites the docstring for int() to incorporate the documentation
changes made in issue #16036.  It also switches the docstrings for int(),
str(), range(), and slice() to use multi-line signatures.
2012-10-07 14:48:36 -07:00
Victor Stinner 4c63a972d1 Cleanup PyUnicode_FromFormatV() for zero padding
Skip the "0" instead of parsing it twice: detect zero padding and then parsed
as a digit of the width.
2012-10-06 23:55:33 +02:00
Victor Stinner 15a1136547 Issue #16147: PyUnicode_FromFormatV() doesn't need anymore to allocate a buffer
on the heap to format numbers.
2012-10-06 23:48:20 +02:00
Victor Stinner ff5a848db5 Issue #16147: PyUnicode_FromFormatV() now raises an error if the argument of
'%c' is not in the range(0x110000).
2012-10-06 23:05:45 +02:00
Victor Stinner 3921e90c5a Issue #16147: PyUnicode_FromFormatV() now detects integer overflow when parsing
width and precision
2012-10-06 23:05:00 +02:00
Victor Stinner e215d960be Issue #16147: Rewrite PyUnicode_FromFormatV() to use _PyUnicodeWriter API
* Simplify the code: replace 4 steps with one unique step using the
   _PyUnicodeWriter API. PyUnicode_Format() has the same design. It avoids to
   store intermediate results which require to allocate an array of pointers on
   the heap.
 * Use the _PyUnicodeWriter API for speed (and its convinient API):
   overallocate the buffer to reduce the number of "realloc()"
 * Implement "width" and "precision" in Python, don't rely on sprintf(). It
   avoids to need of a temporary buffer allocated on the heap: only use a small
   buffer allocated in the stack.
 * Add _PyUnicodeWriter_WriteCstr() function
 * Split PyUnicode_FromFormatV() into two functions: add
   unicode_fromformat_arg().
 * Inline parse_format_flags(): the format of an argument is now only parsed
   once, it's no more needed to have a subfunction.
 * Optimize PyUnicode_FromFormatV() for characters between two "%" arguments:
   search the next "%" and copy the substring in one chunk, instead of copying
   character per character.
2012-10-06 23:03:36 +02:00
Mark Dickinson ff9c54aca2 Issue #16096: Merge fixes from 3.3. 2012-10-06 18:05:14 +01:00
Mark Dickinson c04ddff290 Issue #16096: Fix several occurrences of potential signed integer overflow. Thanks Serhiy Storchaka. 2012-10-06 18:04:49 +01:00
Victor Stinner 8c6db45d3e In debug mode, unicode_write_cstr() now checks that non-ASCII characters are
not written into an ASCII string
2012-10-06 00:40:45 +02:00
Ezio Melotti 080a2c087e #16127: merge with 3.3. 2012-10-05 03:34:02 +03:00
Ezio Melotti e7f90375b1 #16127: remove outdated references to narrow builds. Patch by Serhiy Storchaka. 2012-10-05 03:33:31 +03:00
Victor Stinner 1929407406 Fix PyUnicode_Format(): return NULL if PyUnicode_READY(uformat) failed
This error cannot occur in practice: PyUnicode_FromObject() always return
a "ready" string.
2012-10-05 00:09:33 +02:00
Victor Stinner 770e19e0cc Optimize unicode_compare(): use memcmp() when comparing two UCS1 strings 2012-10-04 22:59:45 +02:00
Victor Stinner 90db9c47dc Enable also ptr==ptr optimization in PyUnicode_Compare()
It was already implemented in PyUnicode_RichCompare()
2012-10-04 21:53:50 +02:00
Victor Stinner aa7712711d unicode_result_wchar(): move the assert() to the "#ifdef Py_DEBUG" block 2012-10-04 02:32:58 +02:00
Victor Stinner a4708231e6 Split the huge PyUnicode_Format() function (+540 lines) into subfunctions 2012-10-04 02:19:54 +02:00
Victor Stinner a049443fab PyUnicode_Format(): disable overallocation when we are writing the last part
of the output string
2012-10-03 23:03:46 +02:00
Victor Stinner afffce489b Unicode: resize_compact() and resize_inplace() fills also the Unicode strings
with invalid bytes in debug mode, as done by PyUnicode_New()
2012-10-03 23:03:17 +02:00
Victor Stinner c89d28fdfc Issue #15609: Fix refleak introduced by my last optimization 2012-10-02 12:54:07 +02:00
Victor Stinner 621ef3d84f Issue #15609: Optimize str%args for integer argument
- Use _PyLong_FormatWriter() instead of formatlong() when possible, to avoid
   a temporary buffer
 - Enable the fast path when width is smaller or equals to the length,
   and when the precision is bigger or equals to the length
 - Add unit tests!
 - formatlong() uses PyUnicode_Resize() instead of _PyUnicode_FromASCII()
   to resize the output string
2012-10-02 00:33:47 +02:00
Antoine Pitrou a1f7655fa7 Issue #15379: Fix passing of non-BMP characters as integers for the charmap decoder (already working as unicode strings).
Patch by Serhiy Storchaka.
2012-09-23 20:00:04 +02:00
Antoine Pitrou 6f80f5d444 Issue #15379: Fix passing of non-BMP characters as integers for the charmap decoder (already working as unicode strings).
Patch by Serhiy Storchaka.
2012-09-23 19:55:21 +02:00
Antoine Pitrou ca8aa4acf6 Issue #15144: Fix possible integer overflow when handling pointers as integer values, by using Py_uintptr_t instead of size_t.
Patch by Serhiy Storchaka.
2012-09-20 20:56:47 +02:00
Christian Heimes 5f520f4fed Issue #15900: Fixed reference leak in PyUnicode_TranslateCharmap() 2012-09-11 14:03:25 +02:00
Christian Heimes f4f9939a96 Fixed memory leak in error branch of formatfloat(). CID 719687 2012-09-10 11:48:41 +02:00
Antoine Pitrou 057119b0b7 Fix C++-style comment (xlc compilation failure) 2012-09-02 17:56:33 +02:00
Benjamin Peterson 59043f96ea merge 3.2 (#15801) 2012-08-28 18:01:45 -04:00
Benjamin Peterson 28a6cfaefc use the stricter PyMapping_Check (closes #15801) 2012-08-28 17:55:35 -04:00
Stefan Krah 8528c3145e Issue #15728: Fix leak in PyUnicode_AsWideCharString(). Found by Coverity. 2012-08-19 21:52:43 +02:00
Nick Coghlan 0e41628d35 Merge str docstring fix from 3.2 2012-08-16 14:14:30 +10:00
Nick Coghlan 573b1fd779 Fix str docstring 2012-08-16 14:13:07 +10:00
Antoine Pitrou b4bbee25b1 Issue #14579: Fix CVE-2012-2135: vulnerability in the utf-16 decoder after error handling.
Patch by Serhiy Storchaka.
2012-07-21 00:45:14 +02:00
Mark Dickinson 01ac8b6ab1 Use correct types for ASCII_CHAR_MASK integer constants. 2012-07-07 14:08:48 +02:00
Antoine Pitrou aaefac76dd Issue #14874: Restore charmap decoding speed to pre-PEP 393 levels.
Patch by Serhiy Storchaka.
2012-06-16 22:48:21 +02:00
Victor Stinner f185226244 _copy_characters(): move debug code at the top to avoid noisy #ifdef
And don't use assert() anymore if check_maxchar is set: return -1 on error
instead.
2012-06-16 16:38:26 +02:00
Victor Stinner 07621338fb Fix PyUnicode_GetSize(): Don't replace _PyUnicode_Ready() exception 2012-06-16 04:53:46 +02:00
Victor Stinner 8a8b3eaabe Fix a compiler warning in _copy_characters() and remove debug code 2012-06-16 04:53:25 +02:00
Victor Stinner 24e403bbee Oops, fix my previous change on _copy_characters() 2012-06-16 04:53:00 +02:00
Victor Stinner ca439eecea Fix unicode_adjust_maxchar(): catch PyUnicode_New() failure 2012-06-16 03:17:34 +02:00
Victor Stinner 184252ad3f Fix "%f" format of str%args if the result is not an ASCII or latin1 string 2012-06-16 02:57:41 +02:00
Victor Stinner 9a77770add Remove debug code 2012-06-16 02:44:43 +02:00
Victor Stinner c9d369f1bf Optimize _PyUnicode_FastCopyCharacters() when maxchar(from) > maxchar(to) 2012-06-16 02:22:37 +02:00
Victor Stinner f05e17ece9 unicodeobject.c: Remove debug code 2012-06-16 01:53:04 +02:00
Antoine Pitrou 27f6a3b0bf Issue #15026: utf-16 encoding is now significantly faster (up to 10x).
Patch by Serhiy Storchaka.
2012-06-15 22:15:23 +02:00
Kristján Valur Jónsson 55e5dc8371 Rearrange code to beat an optimizer bug affecting Release x64 on windows
with VS2010sp1
2012-06-06 21:58:08 +00:00
Victor Stinner d7b7c7472b Issue #14993: Use standard "unsigned char" instead of a unsigned char bitfield 2012-06-04 22:52:12 +02:00
Kristjan Valur Jonsson 85634d7a2e Issue #14909: A number of places were using PyMem_Realloc() apis and
PyObject_GC_Resize() with incorrect error handling.  In case of errors,
the original object would be leaked.  This checkin fixes those cases.
2012-05-31 09:37:31 +00:00
Victor Stinner 3a7d096f2f Issue #14744: Fix compilation on Windows (part 2) 2012-05-29 18:53:56 +02:00
Victor Stinner d3f0882dfb Issue #14744: Use the new _PyUnicodeWriter internal API to speed up str%args and str.format(args)
* Formatting string, int, float and complex use the _PyUnicodeWriter API. It
   avoids a temporary buffer in most cases.
 * Add _PyUnicodeWriter_WriteStr() to restore the PyAccu optimization: just
   keep a reference to the string if the output is only composed of one string
 * Disable overallocation when formatting the last argument of str%args and
   str.format(args)
 * Overallocation allocates at least 100 characters: add min_length attribute
   to the _PyUnicodeWriter structure
 * Add new private functions: _PyUnicode_FastCopyCharacters(),
   _PyUnicode_FastFill() and _PyUnicode_FromASCII()

The speed up is around 20% in average.
2012-05-29 12:57:52 +02:00
Antoine Pitrou 63065d761e Issue #14624: UTF-16 decoding is now 3x to 4x faster on various inputs.
Patch by Serhiy Storchaka.
2012-05-15 23:48:04 +02:00
Martin v. Löwis b05c0738d8 Silence VS 2010 signed/unsigned warnings. 2012-05-15 13:45:49 +02:00
Antoine Pitrou 758153badb Fix refleaks introduced by 83da67651687. 2012-05-12 15:51:51 +02:00
Antoine Pitrou e45c0c5cef Fix logic error introduced by 83da67651687. 2012-05-12 15:49:07 +02:00
Benjamin Peterson 1ff2e35e84 simplify by shortcutting when the kind of the needle is larger than the haystack 2012-05-11 17:41:20 -05:00
Antoine Pitrou ca5f91b888 Issue #14738: Speed-up UTF-8 decoding on non-ASCII data. Patch by Serhiy Storchaka. 2012-05-10 16:36:02 +02:00
Victor Stinner 3b1a74a9c3 Rename unicode_write_t structure and its methods to "_PyUnicodeWriter" 2012-05-09 22:25:00 +02:00
Victor Stinner ee4544c920 Issue #14744: Inline unicode_writer_write_char() and unicode_write_str()
Optimize also PyUnicode_Format(): call unicode_writer_prepare() only once
per argument.
2012-05-09 22:24:08 +02:00
Victor Stinner f59c28c930 unicode_writer_finish() checks string consistency 2012-05-09 03:24:14 +02:00
Victor Stinner 106802547c Backout ab500b297900: the check for integer overflow is wrong
Issue #14716: Change integer overflow check in unicode_writer_prepare()
to compute the limit at compile time instead of runtime. Patch writen by Serhiy
Storchaka.
2012-05-07 23:50:05 +02:00
Victor Stinner 0576f9b4cf Issue #14716: Change integer overflow check in unicode_writer_prepare()
to compute the limit at compile time instead of runtime. Patch writen by Serhiy
Storchaka.
2012-05-07 13:02:44 +02:00
Victor Stinner 202fdca133 Close #14716: str.format() now uses the new "unicode writer" API instead of the
PyAccu API. For example, it makes str.format() from 25% to 30% faster on Linux.
2012-05-07 12:47:02 +02:00
Mark Dickinson 99e2e5552a Issue #14700: Fix two broken and undefined-behaviour-inducing overflow checks in old-style string formatting. Thanks Serhiy Storchaka for report and original patch. 2012-05-07 11:20:50 +01:00
Victor Stinner d0dba6eee8 unicode_writer: don't force inline when it is not necessary
Keep inline for performance critical functions (functions used in loops)
2012-05-04 01:19:15 +02:00
Benjamin Peterson b63f49f2b4 if the kind of the string to count is larger than the string to search, shortcut to 0 2012-05-03 18:31:07 -04:00
Victor Stinner a7b654be30 unicode_writer: add finish() method and assertions to write_str() method
* The write_str() method does nothing if the length is zero.
 * Replace "struct unicode_writer_t" with "unicode_writer_t"
2012-05-03 23:58:55 +02:00
Victor Stinner bf4e266397 Issue #14687: Remove redundant length attribute of unicode_write_t
The length can be read directly from the buffer
2012-05-03 19:27:14 +02:00
Victor Stinner 7989157e49 Issue #14687: Cleanup unicode_writer_prepare()
"Inline" PyUnicode_Resize(): call directly resize_compact()
2012-05-03 13:43:07 +02:00
Victor Stinner f2c76aa6cb Issue #14687: str%tuple now uses an optimistic "unicode writer" instead of an
accumulator. Directly write characters into the output (don't use a temporary
list): resize and widen the string on demand.
2012-05-03 13:10:40 +02:00
Victor Stinner 1b487b467b Issue #14624, #14687: Optimize unicode_widen()
Don't convert uninitialized characters. Patch written by Serhiy Storchaka.
2012-05-03 12:29:04 +02:00
Victor Stinner 3a7f7977f1 Remove buggy assertion in PyUnicode_Substring()
Use also directly unicode_empty, instead of PyUnicode_New(0,0).
2012-05-03 03:36:40 +02:00
Victor Stinner 684d5fd420 Fix PyUnicode_Substring() for start >= length and start > end
Remove the fast-path for 1-character string: unicode_fromascii() and
_PyUnicode_FromUCS*() now have their own fast-path for 1-character strings.
2012-05-03 02:32:34 +02:00
Victor Stinner b6cd014d75 Unicode: optimize creating of 1-character strings 2012-05-03 02:17:04 +02:00
Victor Stinner bff7c96834 Issue #14687: Optimize str%tuple for the "%(name)s" syntax
Avoid an useless and expensive call to PyUnicode_READ().
2012-05-03 01:44:59 +02:00
Victor Stinner e6abb488c9 unicodeobject.c: Add MAX_MAXCHAR() macro to (micro-)optimize the computation
of the second argument of PyUnicode_New().

 * Create also align_maxchar() function
 * Optimize fix_decimal_and_space_to_ascii(): don't compute the maximum
   character when ch <= 127 (it is ASCII)
2012-05-02 01:15:40 +02:00
Victor Stinner 438106b66e Issue #14687: Cleanup PyUnicode_Format() 2012-05-02 00:41:57 +02:00
Victor Stinner b5c3ea3af3 Issue #14687: Optimize str%args
* formatfloat() uses unicode_fromascii() instead of PyUnicode_DecodeASCII()
   to not have to check characters, we know that it is really ASCII
 * Use PyUnicode_FromOrdinal() instead of _PyUnicode_FromUCS4() to format
   a character: if avoids a call to ucs4lib_find_max_char() to compute
   the maximum character (whereas we already know it, it is just the character
   itself)
2012-05-02 00:29:36 +02:00
Victor Stinner b80e46eca4 Issue #14687: Avoid an useless duplicated string in PyUnicode_Format() 2012-04-30 05:21:52 +02:00
Victor Stinner aff3cc659b Issue #14687: Cleanup PyUnicode_Format() 2012-04-30 05:19:21 +02:00
Victor Stinner b11d91d969 Fix my previous commit: bool is a long, restore the specical case for bool 2012-04-28 00:25:34 +02:00
Victor Stinner d0880d57b0 Simplify and optimize formatlong()
* Remove _PyBytes_FormatLong(): inline it into formatlong()
 * the input type is always a long, so remove the code for bool
 * don't duplicate the string if the length does not change
 * Use PyUnicode_DATA() instead of _PyUnicode_AsString()
2012-04-27 23:40:13 +02:00
Victor Stinner 94d558b063 Optimize _PyUnicode_FindMaxChar() find pure ASCII strings 2012-04-27 22:26:58 +02:00
Victor Stinner 8f825060f1 Check newly created consistency using _PyUnicode_CheckConsistency(str, 1)
* In debug mode, fill the string data with invalid characters
 * Simplify also reference counting in PyCodec_BackslashReplaceErrors()
   and PyCodec_XMLCharRefReplaceError()
2012-04-27 13:55:39 +02:00
Victor Stinner 718fbf078c _PyUnicode_CheckConsistency() ensures that the unicode string ends with a
null character
2012-04-26 00:39:37 +02:00
Benjamin Peterson b9f4c9daad make pointer arith c89 2012-04-23 21:45:40 -04:00
Benjamin Peterson f3b7d86e25 use correct base ptr 2012-04-23 18:07:01 -04:00
Benjamin Peterson 2844a7a6d3 simplify and reformat 2012-04-23 18:00:25 -04:00
Victor Stinner ece58deb9f Close #14648: Compute correctly maxchar in str.format() for substrin 2012-04-23 23:36:38 +02:00
Benjamin Peterson 64ed576de8 merge 3.2 (#14509) 2012-04-09 15:04:39 -04:00
Benjamin Peterson ca819c3c9d merge 3.1 (#14509) 2012-04-09 15:01:02 -04:00
Benjamin Peterson f6622c8a3e fix build without Py_DEBUG and DNDEBUG (closes #14509) 2012-04-09 14:53:07 -04:00
Victor Stinner afb5205c48 Close #14249: Use bit shifts instead of an union, it's more efficient.
Patch written by Serhiy Storchaka
2012-04-05 22:54:49 +02:00
Victor Stinner e7eee01f36 Close #14249: Use an union instead of a long to short pointer to avoid aliasing
issue. Speed up UTF-16 by 20%.
2012-04-05 13:44:34 +02:00
Antoine Pitrou a701388de1 Rename _PyIter_GetBuiltin to _PyObject_GetBuiltin, and do not include it in the stable ABI. 2012-04-05 00:04:20 +02:00
Kristján Valur Jónsson 31668b8f7a Issue #14288: Serialization support for builtin iterators. 2012-04-03 10:49:41 +00:00
Benjamin Peterson 0df542985a grammar 2012-03-26 14:50:32 -04:00
Benjamin Peterson c067d6661f merge 3.2 2012-03-25 22:41:16 -04:00
Benjamin Peterson a8755c586e kill this terribly outdated comment 2012-03-25 22:40:54 -04:00
Victor Stinner 0d03478b88 Remove an unused variable 2012-03-06 02:06:01 +01:00