Commit Graph

241 Commits

Author SHA1 Message Date
Serhiy Storchaka 4fdb68491e Issue #22896: Avoid to use PyObject_AsCharBuffer(), PyObject_AsReadBuffer()
and PyObject_AsWriteBuffer().
2015-02-03 01:21:08 +02:00
Serhiy Storchaka b757c83ec6 Issue #22581: Use more "bytes-like object" throughout the docs and comments. 2014-12-05 22:25:22 +02:00
Benjamin Peterson 1cc9520327 s/stringobject/bytesobject/ (closes #22036)
Patch by Martin Matusiak.
2014-07-23 21:39:37 -07:00
Benjamin Peterson d455ce4fd4 merge 3.3 2014-03-30 19:52:39 -04:00
Benjamin Peterson 0ad6098b67 merge 3.2 2014-03-30 19:52:22 -04:00
Benjamin Peterson 23cf403ca1 fix expandtabs overflow detection to be consistent and not rely on signed overflow 2014-03-30 19:47:57 -04:00
Serhiy Storchaka 3079328d29 Reverted changeset b72c5573c5e7 (issue #15027). 2014-01-04 22:44:01 +02:00
Serhiy Storchaka 583a93943c Issue #15027: Rewrite the UTF-32 encoder. It is now 1.6x to 3.5x faster. 2014-01-04 19:25:37 +02:00
Benjamin Peterson 0ee22bf774 fix format spec recursive expansion (closes #19729) 2013-11-26 19:22:36 -06:00
Serhiy Storchaka dc2fd5101a Remove dead code committed in issue #12892. 2013-11-19 15:56:05 +02:00
Serhiy Storchaka 58cf607d13 Issue #12892: The utf-16* and utf-32* codecs now reject (lone) surrogates.
The utf-16* and utf-32* encoders no longer allow surrogate code points
(U+D800-U+DFFF) to be encoded.
The utf-32* decoders no longer decode byte sequences that correspond to
surrogate code points.
The surrogatepass error handler now works with the utf-16* and utf-32* codecs.

Based on patches by Victor Stinner and Kang-Hao (Kenny) Lu.
2013-11-19 11:32:41 +02:00
Ezio Melotti 745d54d2fa #17806: Added keyword-argument support for "tabsize" to str/bytes.expandtabs(). 2013-11-16 19:10:57 +02:00
Victor Stinner cc64eb5b9f Issue #18408: Fix bytearrayiter.partition()/rpartition(), handle
PyByteArray_FromStringAndSize() failure (ex: on memory allocation failure)
2013-10-29 03:15:37 +01:00
Serhiy Storchaka 8fa8ee3970 Issue #18701: Remove support of old CPython versions (<3.0) from C code. 2013-08-17 00:48:02 +03:00
Raymond Hettinger d06eeb4a24 merge 2013-08-13 18:20:55 -07:00
Raymond Hettinger b1b915c796 Issue 18719: Remove a false optimization
Remove an unused early-out test from the critical path for
dict and set lookups.

When the strings already have matching lengths, kinds, and hashes,
there is no additional information gained by checking the first
characters (the probability of a mismatch is already known to
be less than 1 in 2**64).
2013-08-13 18:16:34 -07:00
Antoine Pitrou 9ed5f27266 Issue #18722: Remove uses of the "register" keyword in C code. 2013-08-13 20:18:52 +02:00
Benjamin Peterson d2b58a9880 only recursively expand in the format spec (closes #17644) 2013-05-17 17:34:30 -05:00
Benjamin Peterson 4d94474ba3 rewrite the parsing of field names to be more consistent wrt recursive expansion 2013-05-17 18:22:31 -05:00
Benjamin Peterson 48953632df merge 3.3 2013-05-17 17:35:28 -05:00
Ezio Melotti 5263c13801 Merge removal of trailing whitespace from 3.3. 2013-04-21 04:08:18 +03:00
Ezio Melotti 6b02772c13 Remove trailing whitespace. 2013-04-21 04:07:51 +03:00
Victor Stinner 8f674ccd64 Close #17694: Add minimum length to _PyUnicodeWriter
* Add also min_char attribute to _PyUnicodeWriter structure (currently unused)
 * _PyUnicodeWriter_Init() has no more argument (except the writer itself):
   min_length and overallocate must be set explicitly
 * In error handlers, only enable overallocation if the replacement string
   is longer than 1 character
 * CJK decoders don't use overallocation anymore
 * Set min_length, instead of preallocating memory using
   _PyUnicodeWriter_Prepare(), in many decoders
 * _PyUnicode_DecodeUnicodeInternal() checks for integer overflow
2013-04-17 23:02:17 +02:00
Victor Stinner 76b3b2726c stringlib: remove unused STRINGLIB_RESIZE macro 2013-04-14 16:29:09 +02:00
Serhiy Storchaka e2cef885a2 Issue #16061: Speed up str.replace() for replacing 1-character strings. 2013-04-13 22:45:04 +03:00
Victor Stinner 7efa3b8242 Close #13126: "Simplify" FASTSEARCH() code to help the compiler to emit more
efficient machine code. Patch written by Antoine Pitrou.

Without this change, str.find() was 10% slower than str.rfind() in the worst
case.
2013-04-08 00:26:43 +02:00
Victor Stinner cfc4c13b04 Add _PyUnicodeWriter_WriteSubstring() function
Write a function to enable more optimizations:

 * If the substring is the whole string and overallocation is disabled, just
   keep a reference to the string, don't copy characters
 * Avoid a call to the expensive _PyUnicode_FindMaxChar() function when
   possible
2013-04-03 01:48:39 +02:00
Serhiy Storchaka 06b16f879f Remove unused defines. 2013-02-23 14:49:09 +02:00
Serhiy Storchaka 18809fa94e Remove unused defines. 2013-02-23 14:48:16 +02:00
Antoine Pitrou 4de7457009 Issue #17173: Remove uses of locale-dependent C functions (isalpha() etc.) in the interpreter.
I've left a couple of them in: zlib (third-party lib), getaddrinfo.c
(doesn't include Python.h, and probably obsolete), _sre.c (legitimate
use for the re.LOCALE flag).
2013-02-09 23:11:27 +01:00
Serhiy Storchaka b946af5897 Check for NULL before the pointer aligning in fastsearch_memchr_1char.
There is no guarantee that NULL is aligned.
2013-01-15 13:32:41 +02:00
Serhiy Storchaka 18ba40b945 Check for NULL before the pointer aligning in fastsearch_memchr_1char.
There is no guarantee that NULL is aligned.
2013-01-15 13:27:28 +02:00
Christian Heimes 5f7e8dab11 Issue #16592: stringlib_bytes_join doesn't raise MemoryError on allocation failure 2012-12-02 07:56:42 +01:00
Victor Stinner 6caa6fb535 (Merge 3.3) Issue #8271: Fix compilation on Windows 2012-11-05 00:00:50 +01:00
Victor Stinner ab60de478d Issue #8271: Fix compilation on Windows 2012-11-04 23:59:15 +01:00
Ezio Melotti cfa9636404 #8271: merge with 3.3. 2012-11-04 23:23:09 +02:00
Ezio Melotti f7ed5d111b #8271: the utf-8 decoder now outputs the correct number of U+FFFD characters when used with the "replace" error handler on invalid utf-8 sequences. Patch by Serhiy Storchaka, tests by Ezio Melotti. 2012-11-04 23:21:38 +02:00
Antoine Pitrou 6f7b0da6bc Issue #12805: Make bytes.join and bytearray.join faster when the separator is empty.
Patch by Serhiy Storchaka.
2012-10-20 23:08:34 +02:00
Christian Heimes 743e0cd6b5 Issue #16166: Add PY_LITTLE_ENDIAN and PY_BIG_ENDIAN macros and unified
endianess detection and handling.
2012-10-17 23:52:17 +02:00
Antoine Pitrou cfc22b4a9b Issue #15958: bytes.join and bytearray.join now accept arbitrary buffer objects. 2012-10-16 21:07:23 +02:00
Antoine Pitrou ca8aa4acf6 Issue #15144: Fix possible integer overflow when handling pointers as integer values, by using Py_uintptr_t instead of size_t.
Patch by Serhiy Storchaka.
2012-09-20 20:56:47 +02:00
Victor Stinner b3f5501250 Close #15534: Fix a typo in the fast search function of the string library (_s => s)
Replace _s with ptr to avoid future confusion. Add also non regression tests.
2012-08-02 23:05:01 +02:00
Mark Dickinson fb90c0934c Issue #14700: Fix buggy overflow checks for large precision and width in new-style and old-style formatting. 2012-10-28 10:18:03 +00:00
Mark Dickinson 01ac8b6ab1 Use correct types for ASCII_CHAR_MASK integer constants. 2012-07-07 14:08:48 +02:00
Mark Dickinson 106c4145ff Issue #14923: Optimize continuation-byte check in UTF-8 decoding. Patch by Serhiy Storchaka. 2012-06-23 21:45:14 +01:00
Antoine Pitrou a759d4e9f4 Make private function static (from `make smelly`) 2012-06-21 17:26:28 +02:00
Antoine Pitrou 27f6a3b0bf Issue #15026: utf-16 encoding is now significantly faster (up to 10x).
Patch by Serhiy Storchaka.
2012-06-15 22:15:23 +02:00
Victor Stinner d7b7c7472b Issue #14993: Use standard "unsigned char" instead of a unsigned char bitfield 2012-06-04 22:52:12 +02:00
Victor Stinner d3f0882dfb Issue #14744: Use the new _PyUnicodeWriter internal API to speed up str%args and str.format(args)
* Formatting string, int, float and complex use the _PyUnicodeWriter API. It
   avoids a temporary buffer in most cases.
 * Add _PyUnicodeWriter_WriteStr() to restore the PyAccu optimization: just
   keep a reference to the string if the output is only composed of one string
 * Disable overallocation when formatting the last argument of str%args and
   str.format(args)
 * Overallocation allocates at least 100 characters: add min_length attribute
   to the _PyUnicodeWriter structure
 * Add new private functions: _PyUnicode_FastCopyCharacters(),
   _PyUnicode_FastFill() and _PyUnicode_FromASCII()

The speed up is around 20% in average.
2012-05-29 12:57:52 +02:00
Antoine Pitrou 63065d761e Issue #14624: UTF-16 decoding is now 3x to 4x faster on various inputs.
Patch by Serhiy Storchaka.
2012-05-15 23:48:04 +02:00
Antoine Pitrou ca5f91b888 Issue #14738: Speed-up UTF-8 decoding on non-ASCII data. Patch by Serhiy Storchaka. 2012-05-10 16:36:02 +02:00
Victor Stinner 3b1a74a9c3 Rename unicode_write_t structure and its methods to "_PyUnicodeWriter" 2012-05-09 22:25:00 +02:00
Victor Stinner ee4544c920 Issue #14744: Inline unicode_writer_write_char() and unicode_write_str()
Optimize also PyUnicode_Format(): call unicode_writer_prepare() only once
per argument.
2012-05-09 22:24:08 +02:00
Victor Stinner 202fdca133 Close #14716: str.format() now uses the new "unicode writer" API instead of the
PyAccu API. For example, it makes str.format() from 25% to 30% faster on Linux.
2012-05-07 12:47:02 +02:00
Antoine Pitrou d0acb411ef Issue #14387: Do not include accu.h from Python.h. 2012-03-22 14:42:18 +01:00
Victor Stinner 41a863cb81 Issue #13706: Fix format(int, "n") for locale with non-ASCII thousands separator
* Decode thousands separator and decimal point using PyUnicode_DecodeLocale()
   (from the locale encoding), instead of decoding them implicitly from latin1
 * Remove _PyUnicode_InsertThousandsGroupingLocale(), it was not used
 * Change _PyUnicode_InsertThousandsGrouping() API to return the maximum
   character if unicode is NULL
 * Replace MIN/MAX macros by Py_MIN/Py_MAX
 * stringlib/undef.h undefines STRINGLIB_IS_UNICODE
 * stringlib/localeutil.h only supports Unicode
2012-02-24 00:37:51 +01:00
Benjamin Peterson 21e0da228d remove some usage of Py_UNICODE_TOUPPER/LOWER 2012-01-11 21:00:42 -05:00
Victor Stinner 6099a03202 Issue #13624: Write a specialized UTF-8 encoder to allow more optimization
The main bottleneck was the PyUnicode_READ() macro.
2011-12-18 14:22:26 +01:00
Victor Stinner f8eac00779 Issue #13623: Fix a performance regression introduced by issue #12170 in
bytes.find() and handle correctly OverflowError (raise the same ValueError than
the error for -1).
2011-12-18 01:17:41 +01:00
Victor Stinner b37b17423b Replace PyUnicode_FromUnicode(NULL, 0) by PyUnicode_New(0, 0)
Create an empty string with the new Unicode API.
2011-12-01 03:18:59 +01:00
Antoine Pitrou 0a3229de6b Issue #13417: speed up utf-8 decoding by around 2x for the non-fully-ASCII case.
This almost catches up with pre-PEP 393 performance, when decoding needed
only one pass.
2011-11-21 20:39:13 +01:00
Victor Stinner 0fc35196bb stringlib: remove unused STRINGLIB_FILL 2011-11-20 19:30:15 +01:00
Victor Stinner 7931d9a951 Replace PyUnicodeObject type by PyObject
* _PyUnicode_CheckConsistency() now takes a PyObject* instead of void*
 * Remove now useless casts to PyObject*
2011-11-04 00:22:48 +01:00
Victor Stinner 9db1a8b69f Replace PyUnicodeObject* by PyObject* where it was irrevelant
A Unicode string can now be a PyASCIIObject, PyCompactUnicodeObject or
PyUnicodeObject. Aliasing a PyASCIIObject* or PyCompactUnicodeObject* to
PyUnicodeObject* is wrong
2011-10-23 20:04:37 +02:00
Antoine Pitrou ac65d96777 Issue #12170: The count(), find(), rfind(), index() and rindex() methods
of bytes and bytearray objects now accept an integer between 0 and 255
as their first argument.  Patch by Petri Lehtinen.
2011-10-20 23:54:17 +02:00
Antoine Pitrou 5b9f4c1539 Fix typo 2011-10-17 19:21:04 +02:00
Antoine Pitrou c198d0599b Add a comment explaining this heuristic. 2011-10-13 18:07:37 +02:00
Antoine Pitrou dda339e6d2 Simplify heuristic for when to use memchr 2011-10-13 17:58:11 +02:00
Antoine Pitrou dd4e2f0153 Issue #13155: Optimize finding the optimal character width of an unicode string 2011-10-13 00:02:27 +02:00
Victor Stinner d218bf14cc stringlib: Fix STRINGLIB_STR for UCS2/UCS4 2011-10-12 00:14:32 +02:00
Victor Stinner 8cc70dcf70 Fix fastsearch for UCS2 and UCS4
* If needle is 0, try (p[0] >> 16) & 0xff for UCS4
 * Disable fastsearch_memchr_1char() if needle is zero for UCS2 and UCS4
2011-10-11 23:22:22 +02:00
Antoine Pitrou 2c3b2302ad Issue #13134: optimize finding single-character strings using memchr 2011-10-11 20:29:21 +02:00
Martin v. Löwis c47adb04b3 Change PyUnicode_KIND to 1,2,4. Drop _KIND_SIZE and _CHARACTER_SIZE. 2011-10-07 20:55:35 +02:00
Antoine Pitrou 4574e62c6e Fix massive slowdown in string formatting with str.format.
Example:
./python -m timeit -s "f='{}' + '-' * 1024 + '{}'; s='abcd' * 16384" "f.format(s, s)"

-> before: 547 usec per loop
-> after: 13 usec per loop
-> 3.2: 22.5 usec per loop
-> 2.7: 12.6 usec per loop
2011-10-07 02:26:47 +02:00
Antoine Pitrou dbf697ae5c Fix compilation warnings under 64-bit Windows 2011-10-06 15:34:41 +02:00
Victor Stinner c3cec7868b Add asciilib: similar to ucs1, ucs2 and ucs4 library, but specialized to ASCII
ucs1, ucs2 and ucs4 libraries have to scan created substring to find the
maximum character, whereas it is not need to ASCII strings. Because ASCII
strings are common, it is useful to optimize ASCII.
2011-10-05 21:24:08 +02:00
Victor Stinner e57b1c0da1 Mark PyUnicode_FromUCS[124] as private 2011-09-28 22:20:48 +02:00
Martin v. Löwis d63a3b8beb Implement PEP 393. 2011-09-28 07:41:54 +02:00
Mark Dickinson c7d93b7614 Issue #1621: Fix undefined behaviour from signed overflow in datetime module hashes, array and list iterations, and get_integer (stringlib/string_format.h) 2011-09-25 15:34:32 +01:00
Mark Dickinson 36f27c995a Issue #1621: Fix undefined behaviour from signed overflow in get_integer (stringlib/formatter.h) 2011-09-24 19:11:53 +01:00
Eric V. Smith 12ebefc9d3 Closes #12579. Positional fields with str.format_map() now raise a ValueError instead of SystemError. 2011-07-18 14:03:41 -04:00
Jesus Cea 6159ee3cf5 MERGE: startswith and endswith don't accept None as slice index. Patch by Torsten Becker. (closes #11828) 2011-04-20 17:42:50 +02:00
Jesus Cea ac4515063c startswith and endswith don't accept None as slice index. Patch by Torsten Becker. (closes #11828) 2011-04-20 17:09:23 +02:00
Ezio Melotti 4969f709cc #11515: Merge with 3.1. 2011-03-15 05:59:46 +02:00
Ezio Melotti 42da663e6f #11515: fix several typos. Patch by Piotr Kasprzyk. 2011-03-15 05:18:48 +02:00
Eric Smith a1eac7218b Issue #11302: missing type check on _string.formatter_field_name_split and _string.formatter_parser caused crash.
Originial patch by haypo, reviewed by me, okayed by Georg.
2011-01-29 11:15:35 +00:00
Eric Smith 984bb58000 Issue #7094: Add alternate ('#') flag to __format__ methods for float, complex and Decimal. Allows greater control over when decimal points appear. Added to make transitioning from %-formatting easier. '#g' still has a problem with Decimal which I'll fix soon. 2010-11-25 16:08:06 +00:00
Antoine Pitrou a277ec4ad9 Followup to r86170: fix reference leak in str.format 2010-11-05 12:23:55 +00:00
Eric Smith 27bbca6f79 Issue #6081: Add str.format_map. str.format_map(mapping) is similar to str.format(**mapping), except mapping does not get converted to a dict. 2010-11-04 17:06:58 +00:00
Georg Brandl 66c221e993 #9418: first step of moving private string methods to _string module. 2010-10-14 07:04:07 +00:00
Florent Xicluna eb6f3ead00 Fix #8530: Prevent stringlib fastsearch from reading beyond the front of an array. 2010-08-08 22:07:16 +00:00
Mark Dickinson 388122d43b Issue #9337: Make float.__str__ identical to float.__repr__.
(And similarly for complex numbers.)
2010-08-04 20:56:28 +00:00
Mark Dickinson fc070313dd Merged revisions 83400 via svnmerge from
svn+ssh://pythondev@svn.python.org/python/branches/py3k

........
  r83400 | mark.dickinson | 2010-08-01 11:41:49 +0100 (Sun, 01 Aug 2010) | 7 lines

  Issue #9416: Fix some issues with complex formatting where the
  output with no type specifier failed to match the str output:

    - format(complex(-0.0, 2.0), '-') omitted the real part from the output,
    - format(complex(0.0, 2.0), '-') included a sign and parentheses.
........
2010-08-01 10:43:42 +00:00
Mark Dickinson 5b65df7ce2 Issue #9416: Fix some issues with complex formatting where the
output with no type specifier failed to match the str output:

  - format(complex(-0.0, 2.0), '-') omitted the real part from the output,
  - format(complex(0.0, 2.0), '-') included a sign and parentheses.
2010-08-01 10:41:49 +00:00
Benjamin Peterson 3b107f99c7 remove unneeded error check 2010-07-11 03:36:35 +00:00
Benjamin Peterson 99bcf5ce08 Merged revisions 81823,81835 via svnmerge from
svn+ssh://pythondev@svn.python.org/python/branches/py3k

................
  r81823 | benjamin.peterson | 2010-06-07 17:31:26 -0500 (Mon, 07 Jun 2010) | 9 lines

  Merged revisions 81820 via svnmerge from
  svn+ssh://pythondev@svn.python.org/python/trunk

  ........
    r81820 | benjamin.peterson | 2010-06-07 17:23:23 -0500 (Mon, 07 Jun 2010) | 1 line

    correctly overflow when indexes are too large
  ........
................
  r81835 | benjamin.peterson | 2010-06-08 09:57:22 -0500 (Tue, 08 Jun 2010) | 9 lines

  Merged revisions 81834 via svnmerge from
  svn+ssh://pythondev@svn.python.org/python/trunk

  ........
    r81834 | benjamin.peterson | 2010-06-08 09:53:29 -0500 (Tue, 08 Jun 2010) | 1 line

    kill extra word
  ........
................
2010-06-08 15:12:17 +00:00
Benjamin Peterson 504b6e8115 Merged revisions 81824 via svnmerge from
svn+ssh://pythondev@svn.python.org/python/trunk

........
  r81824 | benjamin.peterson | 2010-06-07 17:32:44 -0500 (Mon, 07 Jun 2010) | 1 line

  remove extra byte and fix comment
........
2010-06-07 22:35:08 +00:00
Benjamin Peterson 59a1b2f732 Merged revisions 81820 via svnmerge from
svn+ssh://pythondev@svn.python.org/python/trunk

........
  r81820 | benjamin.peterson | 2010-06-07 17:23:23 -0500 (Mon, 07 Jun 2010) | 1 line

  correctly overflow when indexes are too large
........
2010-06-07 22:31:26 +00:00
Benjamin Peterson d240071cd8 Merged revisions 81813 via svnmerge from
svn+ssh://pythondev@svn.python.org/python/trunk

........
  r81813 | benjamin.peterson | 2010-06-07 16:37:09 -0500 (Mon, 07 Jun 2010) | 2 lines

  locale grouping strings should end in '\0'
........
2010-06-07 21:41:35 +00:00
Antoine Pitrou 7f14f0d8a0 Recorded merge of revisions 81032 via svnmerge from
svn+ssh://pythondev@svn.python.org/python/branches/py3k

................
  r81032 | antoine.pitrou | 2010-05-09 17:52:27 +0200 (dim., 09 mai 2010) | 9 lines

  Recorded merge of revisions 81029 via svnmerge from
  svn+ssh://pythondev@svn.python.org/python/trunk

  ........
    r81029 | antoine.pitrou | 2010-05-09 16:46:46 +0200 (dim., 09 mai 2010) | 3 lines

    Untabify C files. Will watch buildbots.
  ........
................
2010-05-09 16:14:21 +00:00