Victor Stinner
684d5fd420
Fix PyUnicode_Substring() for start >= length and start > end
...
Remove the fast-path for 1-character string: unicode_fromascii() and
_PyUnicode_FromUCS*() now have their own fast-path for 1-character strings.
2012-05-03 02:32:34 +02:00
Victor Stinner
b6cd014d75
Unicode: optimize creating of 1-character strings
2012-05-03 02:17:04 +02:00
Victor Stinner
bff7c96834
Issue #14687 : Optimize str%tuple for the "%(name)s" syntax
...
Avoid an useless and expensive call to PyUnicode_READ().
2012-05-03 01:44:59 +02:00
Victor Stinner
e6abb488c9
unicodeobject.c: Add MAX_MAXCHAR() macro to (micro-)optimize the computation
...
of the second argument of PyUnicode_New().
* Create also align_maxchar() function
* Optimize fix_decimal_and_space_to_ascii(): don't compute the maximum
character when ch <= 127 (it is ASCII)
2012-05-02 01:15:40 +02:00
Victor Stinner
438106b66e
Issue #14687 : Cleanup PyUnicode_Format()
2012-05-02 00:41:57 +02:00
Victor Stinner
b5c3ea3af3
Issue #14687 : Optimize str%args
...
* formatfloat() uses unicode_fromascii() instead of PyUnicode_DecodeASCII()
to not have to check characters, we know that it is really ASCII
* Use PyUnicode_FromOrdinal() instead of _PyUnicode_FromUCS4() to format
a character: if avoids a call to ucs4lib_find_max_char() to compute
the maximum character (whereas we already know it, it is just the character
itself)
2012-05-02 00:29:36 +02:00
Victor Stinner
b80e46eca4
Issue #14687 : Avoid an useless duplicated string in PyUnicode_Format()
2012-04-30 05:21:52 +02:00
Victor Stinner
aff3cc659b
Issue #14687 : Cleanup PyUnicode_Format()
2012-04-30 05:19:21 +02:00
Victor Stinner
b11d91d969
Fix my previous commit: bool is a long, restore the specical case for bool
2012-04-28 00:25:34 +02:00
Victor Stinner
d0880d57b0
Simplify and optimize formatlong()
...
* Remove _PyBytes_FormatLong(): inline it into formatlong()
* the input type is always a long, so remove the code for bool
* don't duplicate the string if the length does not change
* Use PyUnicode_DATA() instead of _PyUnicode_AsString()
2012-04-27 23:40:13 +02:00
Victor Stinner
94d558b063
Optimize _PyUnicode_FindMaxChar() find pure ASCII strings
2012-04-27 22:26:58 +02:00
Victor Stinner
8f825060f1
Check newly created consistency using _PyUnicode_CheckConsistency(str, 1)
...
* In debug mode, fill the string data with invalid characters
* Simplify also reference counting in PyCodec_BackslashReplaceErrors()
and PyCodec_XMLCharRefReplaceError()
2012-04-27 13:55:39 +02:00
Victor Stinner
718fbf078c
_PyUnicode_CheckConsistency() ensures that the unicode string ends with a
...
null character
2012-04-26 00:39:37 +02:00
Benjamin Peterson
b9f4c9daad
make pointer arith c89
2012-04-23 21:45:40 -04:00
Benjamin Peterson
f3b7d86e25
use correct base ptr
2012-04-23 18:07:01 -04:00
Benjamin Peterson
2844a7a6d3
simplify and reformat
2012-04-23 18:00:25 -04:00
Victor Stinner
ece58deb9f
Close #14648 : Compute correctly maxchar in str.format() for substrin
2012-04-23 23:36:38 +02:00
Benjamin Peterson
64ed576de8
merge 3.2 ( #14509 )
2012-04-09 15:04:39 -04:00
Benjamin Peterson
ca819c3c9d
merge 3.1 ( #14509 )
2012-04-09 15:01:02 -04:00
Benjamin Peterson
f6622c8a3e
fix build without Py_DEBUG and DNDEBUG ( closes #14509 )
2012-04-09 14:53:07 -04:00
Victor Stinner
afb5205c48
Close #14249 : Use bit shifts instead of an union, it's more efficient.
...
Patch written by Serhiy Storchaka
2012-04-05 22:54:49 +02:00
Victor Stinner
e7eee01f36
Close #14249 : Use an union instead of a long to short pointer to avoid aliasing
...
issue. Speed up UTF-16 by 20%.
2012-04-05 13:44:34 +02:00
Antoine Pitrou
a701388de1
Rename _PyIter_GetBuiltin to _PyObject_GetBuiltin, and do not include it in the stable ABI.
2012-04-05 00:04:20 +02:00
Kristján Valur Jónsson
31668b8f7a
Issue #14288 : Serialization support for builtin iterators.
2012-04-03 10:49:41 +00:00
Benjamin Peterson
0df542985a
grammar
2012-03-26 14:50:32 -04:00
Benjamin Peterson
c067d6661f
merge 3.2
2012-03-25 22:41:16 -04:00
Benjamin Peterson
a8755c586e
kill this terribly outdated comment
2012-03-25 22:40:54 -04:00
Victor Stinner
0d03478b88
Remove an unused variable
2012-03-06 02:06:01 +01:00
Victor Stinner
c9590ad745
Close #14085 : remove assertions from PyUnicode_WRITE macro
...
Add checks in PyUnicode_WriteChar() and convert PyUnicode_New() assertion to a
test raising a Python exception.
2012-03-04 01:34:37 +01:00
Ezio Melotti
cda6b6d60d
#14081 : The sep and maxsplit parameter to str.split, bytes.split, and bytearray.split may now be passed as keyword arguments.
2012-02-26 09:39:55 +02:00
Victor Stinner
b0800dc53b
Oops, revert unwanted changes
2012-02-25 00:47:08 +01:00
Victor Stinner
abc649ddbe
Issue #14107 : fix bigmem tests on str.capitalize(), str.swapcase() and
...
str.title(). Compute correctly how much memory is required for the test
(memuse).
2012-02-25 00:43:27 +01:00
Antoine Pitrou
842c0f17eb
Fix compilation error under Windows (and warnings too).
2012-02-24 13:30:46 +01:00
Victor Stinner
90f50d4df9
Issue #13706 : Fix format(float, "n") for locale with non-ASCII decimal point (e.g. ps_aF)
2012-02-24 01:44:47 +01:00
Victor Stinner
41a863cb81
Issue #13706 : Fix format(int, "n") for locale with non-ASCII thousands separator
...
* Decode thousands separator and decimal point using PyUnicode_DecodeLocale()
(from the locale encoding), instead of decoding them implicitly from latin1
* Remove _PyUnicode_InsertThousandsGroupingLocale(), it was not used
* Change _PyUnicode_InsertThousandsGrouping() API to return the maximum
character if unicode is NULL
* Replace MIN/MAX macros by Py_MIN/Py_MAX
* stringlib/undef.h undefines STRINGLIB_IS_UNICODE
* stringlib/localeutil.h only supports Unicode
2012-02-24 00:37:51 +01:00
Victor Stinner
b429d3b09c
Fix doc of an internal function: unicode_write_cstr()
2012-02-22 21:22:20 +01:00
Antoine Pitrou
ba6bafcfbe
Fix compile failure under Windows
2012-02-22 16:41:50 +01:00
Victor Stinner
c516610f0b
Optimize str%arg for number formats: %i, %d, %u, %x, %p
...
Write a specialized function to write an ASCII/latin1 C char* string into a
Python Unicode string.
2012-02-22 13:55:02 +01:00
Victor Stinner
99d7ad0bb0
Micro-optimize computation of maxchar in PyUnicode_TransformDecimalToASCII()
2012-02-22 13:37:39 +01:00
Victor Stinner
da79e632c4
Micro-optimize unicode_expandtabs(): use FILL() macro to write N spaces
2012-02-22 13:37:04 +01:00
Victor Stinner
15e9ed299c
PyUnicode_New() and unicode_putchar() check for MAX_UNICODE maximum (U+10FFFF)
2012-02-22 13:36:20 +01:00
Benjamin Peterson
d9a3591ed1
merge 3.2
2012-02-21 11:12:14 -05:00
Benjamin Peterson
e249dcab7a
merge 3.2
2012-02-21 11:09:13 -05:00
Benjamin Peterson
69e9727657
ensure no one tries to hash things before the random seed is found
2012-02-21 11:08:50 -05:00
Georg Brandl
16fa2a1097
Forgot the "empty string -> hash == 0" special case for strings.
2012-02-21 00:50:13 +01:00
Georg Brandl
2fb477c0f0
Merge 3.2: Issue #13703 plus some related test suite fixes.
2012-02-21 00:33:36 +01:00
Georg Brandl
09a7c72cad
Merge from 3.1: Issue #13703 : add a way to randomize the hash values of basic types (str, bytes, datetime)
...
in order to make algorithmic complexity attacks on (e.g.) web apps much more complicated.
The environment variable PYTHONHASHSEED and the new command line flag -R control this
behavior.
2012-02-20 21:31:46 +01:00
Georg Brandl
2daf6ae249
Issue #13703 : add a way to randomize the hash values of basic types (str, bytes, datetime)
...
in order to make algorithmic complexity attacks on (e.g.) web apps much more complicated.
The environment variable PYTHONHASHSEED and the new command line flag -R control this
behavior.
2012-02-20 19:54:16 +01:00
Victor Stinner
c3a6b02d70
(Merge 3.2) Issue #13913 : normalize utf-8 codec name in UTF-8 decoder
2012-02-14 01:18:10 +01:00
Victor Stinner
cbe01342bc
Issue #13913 : normalize utf-8 codec name in UTF-8 decoder
2012-02-14 01:17:45 +01:00
Victor Stinner
d1cd99b533
Backout d2c1521ad0a1: _Py_IDENTIFIER() uses UTF-8 again
2012-02-07 23:05:55 +01:00
Victor Stinner
d446d8e09a
_Py_Identifier are always ASCII strings
2012-02-05 01:45:45 +01:00
Antoine Pitrou
7ab4af0427
Issue #13848 : open() and the FileIO constructor now check for NUL characters in the file name.
...
Patch by Hynek Schlawack.
2012-01-29 18:43:36 +01:00
Antoine Pitrou
1334884ff2
Issue #13848 : open() and the FileIO constructor now check for NUL characters in the file name.
...
Patch by Hynek Schlawack.
2012-01-29 18:36:34 +01:00
Benjamin Peterson
eea4846d23
don't ready in case_operation, since most callers do it themselves
2012-01-16 14:28:50 -05:00
Gregory P. Smith
f5b62a9b31
Consolidate the occurrances of the prime used as the multiplier when hashing.
2012-01-14 15:45:13 -08:00
Gregory P. Smith
63e6c3222f
Consolidate the occurrances of the prime used as the multiplier when hashing
...
to a single #define instead of having several copies in several files.
This excludes the Modules/ tree (datetime and expat both have a copy
for their own purposes with no need for it to be the same).
2012-01-14 15:31:34 -08:00
Benjamin Peterson
c8d8b8861e
fix possible refleaks if PyUnicode_READY fails
2012-01-14 13:37:31 -05:00
Benjamin Peterson
bac79498c8
always explicitly check for -1 from PyUnicode_READY
2012-01-14 13:34:47 -05:00
Benjamin Peterson
d5890c8db5
add str.casefold() ( closes #13752 )
2012-01-14 13:23:30 -05:00
Benjamin Peterson
53aa1d7c57
fix possible if unlikely leak
2011-12-20 13:29:45 -06:00
Benjamin Peterson
e51757f6de
move do_title to a better place
2012-01-12 21:10:29 -05:00
Benjamin Peterson
821e4cfd01
make fix_decimal_and_space_to_ascii check if it modifies the string
2012-01-12 15:40:18 -05:00
Benjamin Peterson
0c91392fe6
kill capwords implementation which has been disabled since the begining
2012-01-12 15:25:41 -05:00
Benjamin Peterson
b2bf01d824
use full unicode mappings for upper/lower/title case ( #12736 )
...
Also broaden the category of characters that count as lowercase/uppercase.
2012-01-11 18:17:06 -05:00
Victor Stinner
3fe553160c
Add a new PyUnicode_Fill() function
...
It is faster than the unicode_fill() function which was implemented in
formatter_unicode.c.
2012-01-04 00:33:50 +01:00
Benjamin Peterson
5e458f520c
also decref the right thing
2012-01-02 10:12:13 -06:00
Benjamin Peterson
4c13a4a352
ready the correct string
2012-01-02 09:07:38 -06:00
Benjamin Peterson
22a29708fd
fix some possible refleaks from PyUnicode_READY error conditions
2012-01-02 09:00:30 -06:00
Benjamin Peterson
9ca3ffac94
== -1 is convention
2012-01-01 16:04:29 -06:00
Benjamin Peterson
e157cf1012
make switch more robust
2012-01-01 15:56:20 -06:00
Benjamin Peterson
c0b95d18fa
4 space indentation
2011-12-20 17:24:05 -06:00
Benjamin Peterson
ead6b53659
fix spacing around switch statements
2011-12-20 17:23:42 -06:00
Benjamin Peterson
822c790527
merge 3.2
2011-12-20 13:32:50 -06:00
Victor Stinner
6099a03202
Issue #13624 : Write a specialized UTF-8 encoder to allow more optimization
...
The main bottleneck was the PyUnicode_READ() macro.
2011-12-18 14:22:26 +01:00
Victor Stinner
73f53b57d1
Optimize str * n for len(str)==1 and UCS-2 or UCS-4
2011-12-18 03:26:31 +01:00
Victor Stinner
f644110816
Issue #13621 : Optimize str.replace(char1, char2)
...
Use findchar() which is more optimized than a dummy loop using
PyUnicode_READ(). PyUnicode_READ() is a complex and slow macro.
2011-12-18 02:43:08 +01:00
Victor Stinner
ab870218e3
Issue #10951 : Fix compiler warnings in timemodule.c and unicodeobject.c
...
Thanks Jérémy Anger for the fix.
2011-12-17 22:39:43 +01:00
Victor Stinner
2f197078fb
The locale decoder raises a UnicodeDecodeError instead of an OSError
...
Search the invalid character using mbrtowc().
2011-12-17 07:08:30 +01:00
Victor Stinner
1b57967b96
Issue #13560 : Locale codec functions use the classic "errors" parameter,
...
instead of surrogateescape
So it would be possible to support more error handlers later.
2011-12-17 05:47:23 +01:00
Victor Stinner
ab59594326
What's New in Python 3.3: complete the deprecation list
...
Add also FIXMEs in unicodeobject.c
2011-12-17 04:59:06 +01:00
Victor Stinner
1f33f2b0c3
Issue #13560 : os.strerror() now uses the current locale encoding instead of UTF-8
2011-12-17 04:45:09 +01:00
Victor Stinner
f2ea71fcc8
Issue #13560 : Add PyUnicode_EncodeLocale()
...
* Use PyUnicode_EncodeLocale() in time.strftime() if wcsftime() is not
available
* Document my last changes in Misc/NEWS
2011-12-17 04:13:41 +01:00
Victor Stinner
af02e1c85a
Add PyUnicode_DecodeLocaleAndSize() and PyUnicode_DecodeLocale()
...
* PyUnicode_DecodeLocaleAndSize() and PyUnicode_DecodeLocale() decode a string
from the current locale encoding
* _Py_char2wchar() writes an "error code" in the size argument to indicate
if the function failed because of memory allocation failure or because of a
decoding error. The function doesn't write the error message directly to
stderr.
* Fix time.strftime() (if wcsftime() is missing): decode strftime() result
from the current locale encoding, not from the filesystem encoding.
2011-12-16 23:56:01 +01:00
Victor Stinner
16e6a80923
PyUnicode_Resize(): warn about canonical representation
...
Call also directly unicode_resize() in unicodeobject.c
2011-12-12 13:24:15 +01:00
Victor Stinner
b0a82a6a7f
Fix PyUnicode_Resize() for compact string: leave the string unchanged on error
...
Fix also PyUnicode_Resize() doc
2011-12-12 13:08:33 +01:00
Victor Stinner
bf6e560d0c
Make PyUnicode_Copy() private => _PyUnicode_Copy()
...
Undocument the function.
Make also decode_utf8_errors() as private (static).
2011-12-12 01:53:47 +01:00
Victor Stinner
7a9105a380
resize_copy() now supports legacy ready strings
2011-12-12 00:13:42 +01:00
Victor Stinner
488fa49acf
Rewrite PyUnicode_Append(); unicode_modifiable() is more strict
...
* Rename unicode_resizable() to unicode_modifiable()
* Rename _PyUnicode_Dirty() to unicode_check_modifiable() to make it clear
that the function is private
* Inline PyUnicode_Concat() and unicode_append_inplace() in PyUnicode_Append()
to simplify the code
* unicode_modifiable() return 0 if the hash has been computed or if the string
is not an exact unicode string
* Remove _PyUnicode_DIRTY(): no need to reset the hash anymore, because if the
hash has already been computed, you cannot modify a string inplace anymore
* PyUnicode_Concat() checks for integer overflow
2011-12-12 00:01:39 +01:00
Victor Stinner
c4b495497a
Create unicode_result_unchanged() subfunction
2011-12-11 22:44:26 +01:00
Victor Stinner
eaab604829
Fix fixup() for unchanged unicode subtype
...
If maxchar_new == 0 and self is a unicode subtype, return u instead of duplicating u.
2011-12-11 22:22:39 +01:00
Victor Stinner
e6b2d4407a
unicode_fromascii() doesn't check string content twice in debug mode
...
_PyUnicode_CheckConsistency() also checks string content.
2011-12-11 21:54:30 +01:00
Victor Stinner
a1d12bb119
Call directly PyUnicode_DecodeUTF8Stateful() instead of PyUnicode_DecodeUTF8()
...
* Remove micro-optimization from PyUnicode_FromStringAndSize():
PyUnicode_DecodeUTF8Stateful() has already these optimizations (for size=0
and one ascii char).
* Rename utf8_max_char_size_and_char_count() to utf8_scanner(), and remove an
useless variable
2011-12-11 21:53:09 +01:00
Victor Stinner
382955ff4e
Use directly unicode_empty instead of PyUnicode_New(0, 0)
2011-12-11 21:44:00 +01:00
Victor Stinner
785938eebd
Move the slowest UTF-8 decoder to its own subfunction
...
* Create decode_utf8_errors()
* Reuse unicode_fromascii()
* decode_utf8_errors() doesn't refit at the beginning
* Remove refit_partial_string(), use unicode_adjust_maxchar() instead
2011-12-11 20:09:03 +01:00
Victor Stinner
84def3774d
Fix error handling in resize_compact()
2011-12-11 20:04:56 +01:00
Victor Stinner
8faf8216e4
PyUnicode_FromWideChar() and PyUnicode_FromUnicode() raise a ValueError if a
...
character in not in range [U+0000; U+10ffff].
2011-12-08 22:14:11 +01:00
Victor Stinner
551ac95733
Py_UNICODE_HIGH_SURROGATE() and Py_UNICODE_LOW_SURROGATE() macros
...
And use surrogates macros everywhere in unicodeobject.c
2011-11-29 22:58:13 +01:00
Victor Stinner
6345be9a14
Close #13093 : PyUnicode_EncodeDecimal() doesn't support error handlers
...
different than "strict" anymore. The caller was unable to compute the
size of the output buffer: it depends on the error handler.
2011-11-25 20:09:01 +01:00
Benjamin Peterson
1518e8713d
and back to the "magic" formula (with a comment) it is
2011-11-23 10:44:52 -06:00