Victor Stinner
ab60de478d
Issue #8271 : Fix compilation on Windows
2012-11-04 23:59:15 +01:00
Ezio Melotti
f7ed5d111b
#8271 : the utf-8 decoder now outputs the correct number of U+FFFD characters when used with the "replace" error handler on invalid utf-8 sequences. Patch by Serhiy Storchaka, tests by Ezio Melotti.
2012-11-04 23:21:38 +02:00
Antoine Pitrou
ca8aa4acf6
Issue #15144 : Fix possible integer overflow when handling pointers as integer values, by using Py_uintptr_t instead of size_t.
...
Patch by Serhiy Storchaka.
2012-09-20 20:56:47 +02:00
Victor Stinner
b3f5501250
Close #15534 : Fix a typo in the fast search function of the string library (_s => s)
...
Replace _s with ptr to avoid future confusion. Add also non regression tests.
2012-08-02 23:05:01 +02:00
Mark Dickinson
01ac8b6ab1
Use correct types for ASCII_CHAR_MASK integer constants.
2012-07-07 14:08:48 +02:00
Mark Dickinson
106c4145ff
Issue #14923 : Optimize continuation-byte check in UTF-8 decoding. Patch by Serhiy Storchaka.
2012-06-23 21:45:14 +01:00
Antoine Pitrou
a759d4e9f4
Make private function static (from `make smelly`)
2012-06-21 17:26:28 +02:00
Antoine Pitrou
27f6a3b0bf
Issue #15026 : utf-16 encoding is now significantly faster (up to 10x).
...
Patch by Serhiy Storchaka.
2012-06-15 22:15:23 +02:00
Victor Stinner
d7b7c7472b
Issue #14993 : Use standard "unsigned char" instead of a unsigned char bitfield
2012-06-04 22:52:12 +02:00
Victor Stinner
d3f0882dfb
Issue #14744 : Use the new _PyUnicodeWriter internal API to speed up str%args and str.format(args)
...
* Formatting string, int, float and complex use the _PyUnicodeWriter API. It
avoids a temporary buffer in most cases.
* Add _PyUnicodeWriter_WriteStr() to restore the PyAccu optimization: just
keep a reference to the string if the output is only composed of one string
* Disable overallocation when formatting the last argument of str%args and
str.format(args)
* Overallocation allocates at least 100 characters: add min_length attribute
to the _PyUnicodeWriter structure
* Add new private functions: _PyUnicode_FastCopyCharacters(),
_PyUnicode_FastFill() and _PyUnicode_FromASCII()
The speed up is around 20% in average.
2012-05-29 12:57:52 +02:00
Antoine Pitrou
63065d761e
Issue #14624 : UTF-16 decoding is now 3x to 4x faster on various inputs.
...
Patch by Serhiy Storchaka.
2012-05-15 23:48:04 +02:00
Antoine Pitrou
ca5f91b888
Issue #14738 : Speed-up UTF-8 decoding on non-ASCII data. Patch by Serhiy Storchaka.
2012-05-10 16:36:02 +02:00
Victor Stinner
3b1a74a9c3
Rename unicode_write_t structure and its methods to "_PyUnicodeWriter"
2012-05-09 22:25:00 +02:00
Victor Stinner
ee4544c920
Issue #14744 : Inline unicode_writer_write_char() and unicode_write_str()
...
Optimize also PyUnicode_Format(): call unicode_writer_prepare() only once
per argument.
2012-05-09 22:24:08 +02:00
Victor Stinner
202fdca133
Close #14716 : str.format() now uses the new "unicode writer" API instead of the
...
PyAccu API. For example, it makes str.format() from 25% to 30% faster on Linux.
2012-05-07 12:47:02 +02:00
Antoine Pitrou
d0acb411ef
Issue #14387 : Do not include accu.h from Python.h.
2012-03-22 14:42:18 +01:00
Victor Stinner
41a863cb81
Issue #13706 : Fix format(int, "n") for locale with non-ASCII thousands separator
...
* Decode thousands separator and decimal point using PyUnicode_DecodeLocale()
(from the locale encoding), instead of decoding them implicitly from latin1
* Remove _PyUnicode_InsertThousandsGroupingLocale(), it was not used
* Change _PyUnicode_InsertThousandsGrouping() API to return the maximum
character if unicode is NULL
* Replace MIN/MAX macros by Py_MIN/Py_MAX
* stringlib/undef.h undefines STRINGLIB_IS_UNICODE
* stringlib/localeutil.h only supports Unicode
2012-02-24 00:37:51 +01:00
Benjamin Peterson
21e0da228d
remove some usage of Py_UNICODE_TOUPPER/LOWER
2012-01-11 21:00:42 -05:00
Victor Stinner
6099a03202
Issue #13624 : Write a specialized UTF-8 encoder to allow more optimization
...
The main bottleneck was the PyUnicode_READ() macro.
2011-12-18 14:22:26 +01:00
Victor Stinner
f8eac00779
Issue #13623 : Fix a performance regression introduced by issue #12170 in
...
bytes.find() and handle correctly OverflowError (raise the same ValueError than
the error for -1).
2011-12-18 01:17:41 +01:00
Victor Stinner
b37b17423b
Replace PyUnicode_FromUnicode(NULL, 0) by PyUnicode_New(0, 0)
...
Create an empty string with the new Unicode API.
2011-12-01 03:18:59 +01:00
Antoine Pitrou
0a3229de6b
Issue #13417 : speed up utf-8 decoding by around 2x for the non-fully-ASCII case.
...
This almost catches up with pre-PEP 393 performance, when decoding needed
only one pass.
2011-11-21 20:39:13 +01:00
Victor Stinner
0fc35196bb
stringlib: remove unused STRINGLIB_FILL
2011-11-20 19:30:15 +01:00
Victor Stinner
7931d9a951
Replace PyUnicodeObject type by PyObject
...
* _PyUnicode_CheckConsistency() now takes a PyObject* instead of void*
* Remove now useless casts to PyObject*
2011-11-04 00:22:48 +01:00
Victor Stinner
9db1a8b69f
Replace PyUnicodeObject* by PyObject* where it was irrevelant
...
A Unicode string can now be a PyASCIIObject, PyCompactUnicodeObject or
PyUnicodeObject. Aliasing a PyASCIIObject* or PyCompactUnicodeObject* to
PyUnicodeObject* is wrong
2011-10-23 20:04:37 +02:00
Antoine Pitrou
ac65d96777
Issue #12170 : The count(), find(), rfind(), index() and rindex() methods
...
of bytes and bytearray objects now accept an integer between 0 and 255
as their first argument. Patch by Petri Lehtinen.
2011-10-20 23:54:17 +02:00
Antoine Pitrou
5b9f4c1539
Fix typo
2011-10-17 19:21:04 +02:00
Antoine Pitrou
c198d0599b
Add a comment explaining this heuristic.
2011-10-13 18:07:37 +02:00
Antoine Pitrou
dda339e6d2
Simplify heuristic for when to use memchr
2011-10-13 17:58:11 +02:00
Antoine Pitrou
dd4e2f0153
Issue #13155 : Optimize finding the optimal character width of an unicode string
2011-10-13 00:02:27 +02:00
Victor Stinner
d218bf14cc
stringlib: Fix STRINGLIB_STR for UCS2/UCS4
2011-10-12 00:14:32 +02:00
Victor Stinner
8cc70dcf70
Fix fastsearch for UCS2 and UCS4
...
* If needle is 0, try (p[0] >> 16) & 0xff for UCS4
* Disable fastsearch_memchr_1char() if needle is zero for UCS2 and UCS4
2011-10-11 23:22:22 +02:00
Antoine Pitrou
2c3b2302ad
Issue #13134 : optimize finding single-character strings using memchr
2011-10-11 20:29:21 +02:00
Martin v. Löwis
c47adb04b3
Change PyUnicode_KIND to 1,2,4. Drop _KIND_SIZE and _CHARACTER_SIZE.
2011-10-07 20:55:35 +02:00
Antoine Pitrou
4574e62c6e
Fix massive slowdown in string formatting with str.format.
...
Example:
./python -m timeit -s "f='{}' + '-' * 1024 + '{}'; s='abcd' * 16384" "f.format(s, s)"
-> before: 547 usec per loop
-> after: 13 usec per loop
-> 3.2: 22.5 usec per loop
-> 2.7: 12.6 usec per loop
2011-10-07 02:26:47 +02:00
Antoine Pitrou
dbf697ae5c
Fix compilation warnings under 64-bit Windows
2011-10-06 15:34:41 +02:00
Victor Stinner
c3cec7868b
Add asciilib: similar to ucs1, ucs2 and ucs4 library, but specialized to ASCII
...
ucs1, ucs2 and ucs4 libraries have to scan created substring to find the
maximum character, whereas it is not need to ASCII strings. Because ASCII
strings are common, it is useful to optimize ASCII.
2011-10-05 21:24:08 +02:00
Victor Stinner
e57b1c0da1
Mark PyUnicode_FromUCS[124] as private
2011-09-28 22:20:48 +02:00
Martin v. Löwis
d63a3b8beb
Implement PEP 393.
2011-09-28 07:41:54 +02:00
Mark Dickinson
c7d93b7614
Issue #1621 : Fix undefined behaviour from signed overflow in datetime module hashes, array and list iterations, and get_integer (stringlib/string_format.h)
2011-09-25 15:34:32 +01:00
Mark Dickinson
36f27c995a
Issue #1621 : Fix undefined behaviour from signed overflow in get_integer (stringlib/formatter.h)
2011-09-24 19:11:53 +01:00
Eric V. Smith
12ebefc9d3
Closes #12579 . Positional fields with str.format_map() now raise a ValueError instead of SystemError.
2011-07-18 14:03:41 -04:00
Jesus Cea
6159ee3cf5
MERGE: startswith and endswith don't accept None as slice index. Patch by Torsten Becker. ( closes #11828 )
2011-04-20 17:42:50 +02:00
Jesus Cea
ac4515063c
startswith and endswith don't accept None as slice index. Patch by Torsten Becker. ( closes #11828 )
2011-04-20 17:09:23 +02:00
Ezio Melotti
4969f709cc
#11515 : Merge with 3.1.
2011-03-15 05:59:46 +02:00
Ezio Melotti
42da663e6f
#11515 : fix several typos. Patch by Piotr Kasprzyk.
2011-03-15 05:18:48 +02:00
Eric Smith
a1eac7218b
Issue #11302 : missing type check on _string.formatter_field_name_split and _string.formatter_parser caused crash.
...
Originial patch by haypo, reviewed by me, okayed by Georg.
2011-01-29 11:15:35 +00:00
Eric Smith
984bb58000
Issue #7094 : Add alternate ('#') flag to __format__ methods for float, complex and Decimal. Allows greater control over when decimal points appear. Added to make transitioning from %-formatting easier. '#g' still has a problem with Decimal which I'll fix soon.
2010-11-25 16:08:06 +00:00
Antoine Pitrou
a277ec4ad9
Followup to r86170: fix reference leak in str.format
2010-11-05 12:23:55 +00:00
Eric Smith
27bbca6f79
Issue #6081 : Add str.format_map. str.format_map(mapping) is similar to str.format(**mapping), except mapping does not get converted to a dict.
2010-11-04 17:06:58 +00:00