Victor Stinner
fc009eff9e
Close #16311 : Use the _PyUnicodeWriter API in text decoders
...
* Remove unicode_widen(): replaced with _PyUnicodeWriter_Prepare()
* Remove unicode_putchar(): replaced with
PyUnicodeWriter_Prepare() + PyUnicode_WRITER()
* When handling an decoding error, only overallocate the buffer by +25%
instead of +100%
2012-11-07 00:36:38 +01:00
Ezio Melotti
cfa9636404
#8271 : merge with 3.3.
2012-11-04 23:23:09 +02:00
Ezio Melotti
f7ed5d111b
#8271 : the utf-8 decoder now outputs the correct number of U+FFFD characters when used with the "replace" error handler on invalid utf-8 sequences. Patch by Serhiy Storchaka, tests by Ezio Melotti.
2012-11-04 23:21:38 +02:00
Benjamin Peterson
7ff2094bc7
merge 3.3 ( #16369 )
2012-10-30 23:31:12 -04:00
Benjamin Peterson
e8ea97fffb
merge 3.2 ( #16369 )
2012-10-30 23:27:52 -04:00
Benjamin Peterson
c43112823b
initialize more global type objects ( closes #16369 )
2012-10-30 23:21:10 -04:00
Victor Stinner
e64322e034
Close #14625 : Rewrite the UTF-32 decoder. It is now 3x to 4x faster
...
Patch written by Serhiy Storchaka.
2012-10-30 23:12:47 +01:00
Victor Stinner
76df43de30
Issue #16330 : Use surrogate-related macros
...
Patch written by Serhiy Storchaka.
2012-10-30 01:42:39 +01:00
Mark Dickinson
fb90c0934c
Issue #14700 : Fix buggy overflow checks for large precision and width in new-style and old-style formatting.
2012-10-28 10:18:03 +00:00
Victor Stinner
c6cf1ba29e
Replace usage of the deprecated Py_UNICODE_COPY() with Py_MEMCPY() in resize_copy()
2012-10-23 02:54:47 +02:00
Victor Stinner
fe75fb4b3e
Optimize _PyUnicode_HasNULChars(): use findchar() instead of PyUnicode_Contains()
2012-10-23 02:52:18 +02:00
Victor Stinner
6fa627578a
Inline raise_translate_exception(): it is only used once
2012-10-23 02:51:50 +02:00
Victor Stinner
e5567ad236
Optimize PyUnicode_RichCompare() for Py_EQ and Py_NE: always use memcmp()
2012-10-23 02:48:49 +02:00
Christian Heimes
743e0cd6b5
Issue #16166 : Add PY_LITTLE_ENDIAN and PY_BIG_ENDIAN macros and unified
...
endianess detection and handling.
2012-10-17 23:52:17 +02:00
Chris Jerdonek
4a7df9aba9
Issue #14783 : Merge changes from 3.3.
2012-10-07 15:02:16 -07:00
Chris Jerdonek
042fa653ab
Issue #14783 : Merge changes from 3.2.
2012-10-07 14:56:27 -07:00
Chris Jerdonek
83fe2e1c22
Issue #14783 : Improve int() docstring and also str(), range(), and slice().
...
This commit rewrites the docstring for int() to incorporate the documentation
changes made in issue #16036 . It also switches the docstrings for int(),
str(), range(), and slice() to use multi-line signatures.
2012-10-07 14:48:36 -07:00
Victor Stinner
4c63a972d1
Cleanup PyUnicode_FromFormatV() for zero padding
...
Skip the "0" instead of parsing it twice: detect zero padding and then parsed
as a digit of the width.
2012-10-06 23:55:33 +02:00
Victor Stinner
15a1136547
Issue #16147 : PyUnicode_FromFormatV() doesn't need anymore to allocate a buffer
...
on the heap to format numbers.
2012-10-06 23:48:20 +02:00
Victor Stinner
ff5a848db5
Issue #16147 : PyUnicode_FromFormatV() now raises an error if the argument of
...
'%c' is not in the range(0x110000).
2012-10-06 23:05:45 +02:00
Victor Stinner
3921e90c5a
Issue #16147 : PyUnicode_FromFormatV() now detects integer overflow when parsing
...
width and precision
2012-10-06 23:05:00 +02:00
Victor Stinner
e215d960be
Issue #16147 : Rewrite PyUnicode_FromFormatV() to use _PyUnicodeWriter API
...
* Simplify the code: replace 4 steps with one unique step using the
_PyUnicodeWriter API. PyUnicode_Format() has the same design. It avoids to
store intermediate results which require to allocate an array of pointers on
the heap.
* Use the _PyUnicodeWriter API for speed (and its convinient API):
overallocate the buffer to reduce the number of "realloc()"
* Implement "width" and "precision" in Python, don't rely on sprintf(). It
avoids to need of a temporary buffer allocated on the heap: only use a small
buffer allocated in the stack.
* Add _PyUnicodeWriter_WriteCstr() function
* Split PyUnicode_FromFormatV() into two functions: add
unicode_fromformat_arg().
* Inline parse_format_flags(): the format of an argument is now only parsed
once, it's no more needed to have a subfunction.
* Optimize PyUnicode_FromFormatV() for characters between two "%" arguments:
search the next "%" and copy the substring in one chunk, instead of copying
character per character.
2012-10-06 23:03:36 +02:00
Mark Dickinson
ff9c54aca2
Issue #16096 : Merge fixes from 3.3.
2012-10-06 18:05:14 +01:00
Mark Dickinson
c04ddff290
Issue #16096 : Fix several occurrences of potential signed integer overflow. Thanks Serhiy Storchaka.
2012-10-06 18:04:49 +01:00
Victor Stinner
8c6db45d3e
In debug mode, unicode_write_cstr() now checks that non-ASCII characters are
...
not written into an ASCII string
2012-10-06 00:40:45 +02:00
Ezio Melotti
080a2c087e
#16127 : merge with 3.3.
2012-10-05 03:34:02 +03:00
Ezio Melotti
e7f90375b1
#16127 : remove outdated references to narrow builds. Patch by Serhiy Storchaka.
2012-10-05 03:33:31 +03:00
Victor Stinner
1929407406
Fix PyUnicode_Format(): return NULL if PyUnicode_READY(uformat) failed
...
This error cannot occur in practice: PyUnicode_FromObject() always return
a "ready" string.
2012-10-05 00:09:33 +02:00
Victor Stinner
770e19e0cc
Optimize unicode_compare(): use memcmp() when comparing two UCS1 strings
2012-10-04 22:59:45 +02:00
Victor Stinner
90db9c47dc
Enable also ptr==ptr optimization in PyUnicode_Compare()
...
It was already implemented in PyUnicode_RichCompare()
2012-10-04 21:53:50 +02:00
Victor Stinner
aa7712711d
unicode_result_wchar(): move the assert() to the "#ifdef Py_DEBUG" block
2012-10-04 02:32:58 +02:00
Victor Stinner
a4708231e6
Split the huge PyUnicode_Format() function (+540 lines) into subfunctions
2012-10-04 02:19:54 +02:00
Victor Stinner
a049443fab
PyUnicode_Format(): disable overallocation when we are writing the last part
...
of the output string
2012-10-03 23:03:46 +02:00
Victor Stinner
afffce489b
Unicode: resize_compact() and resize_inplace() fills also the Unicode strings
...
with invalid bytes in debug mode, as done by PyUnicode_New()
2012-10-03 23:03:17 +02:00
Victor Stinner
c89d28fdfc
Issue #15609 : Fix refleak introduced by my last optimization
2012-10-02 12:54:07 +02:00
Victor Stinner
621ef3d84f
Issue #15609 : Optimize str%args for integer argument
...
- Use _PyLong_FormatWriter() instead of formatlong() when possible, to avoid
a temporary buffer
- Enable the fast path when width is smaller or equals to the length,
and when the precision is bigger or equals to the length
- Add unit tests!
- formatlong() uses PyUnicode_Resize() instead of _PyUnicode_FromASCII()
to resize the output string
2012-10-02 00:33:47 +02:00
Antoine Pitrou
a1f7655fa7
Issue #15379 : Fix passing of non-BMP characters as integers for the charmap decoder (already working as unicode strings).
...
Patch by Serhiy Storchaka.
2012-09-23 20:00:04 +02:00
Antoine Pitrou
6f80f5d444
Issue #15379 : Fix passing of non-BMP characters as integers for the charmap decoder (already working as unicode strings).
...
Patch by Serhiy Storchaka.
2012-09-23 19:55:21 +02:00
Antoine Pitrou
ca8aa4acf6
Issue #15144 : Fix possible integer overflow when handling pointers as integer values, by using Py_uintptr_t instead of size_t.
...
Patch by Serhiy Storchaka.
2012-09-20 20:56:47 +02:00
Christian Heimes
5f520f4fed
Issue #15900 : Fixed reference leak in PyUnicode_TranslateCharmap()
2012-09-11 14:03:25 +02:00
Christian Heimes
f4f9939a96
Fixed memory leak in error branch of formatfloat(). CID 719687
2012-09-10 11:48:41 +02:00
Antoine Pitrou
057119b0b7
Fix C++-style comment (xlc compilation failure)
2012-09-02 17:56:33 +02:00
Benjamin Peterson
59043f96ea
merge 3.2 ( #15801 )
2012-08-28 18:01:45 -04:00
Benjamin Peterson
28a6cfaefc
use the stricter PyMapping_Check ( closes #15801 )
2012-08-28 17:55:35 -04:00
Stefan Krah
8528c3145e
Issue #15728 : Fix leak in PyUnicode_AsWideCharString(). Found by Coverity.
2012-08-19 21:52:43 +02:00
Nick Coghlan
0e41628d35
Merge str docstring fix from 3.2
2012-08-16 14:14:30 +10:00
Nick Coghlan
573b1fd779
Fix str docstring
2012-08-16 14:13:07 +10:00
Antoine Pitrou
b4bbee25b1
Issue #14579 : Fix CVE-2012-2135: vulnerability in the utf-16 decoder after error handling.
...
Patch by Serhiy Storchaka.
2012-07-21 00:45:14 +02:00
Mark Dickinson
01ac8b6ab1
Use correct types for ASCII_CHAR_MASK integer constants.
2012-07-07 14:08:48 +02:00
Antoine Pitrou
aaefac76dd
Issue #14874 : Restore charmap decoding speed to pre-PEP 393 levels.
...
Patch by Serhiy Storchaka.
2012-06-16 22:48:21 +02:00
Victor Stinner
f185226244
_copy_characters(): move debug code at the top to avoid noisy #ifdef
...
And don't use assert() anymore if check_maxchar is set: return -1 on error
instead.
2012-06-16 16:38:26 +02:00
Victor Stinner
07621338fb
Fix PyUnicode_GetSize(): Don't replace _PyUnicode_Ready() exception
2012-06-16 04:53:46 +02:00
Victor Stinner
8a8b3eaabe
Fix a compiler warning in _copy_characters() and remove debug code
2012-06-16 04:53:25 +02:00
Victor Stinner
24e403bbee
Oops, fix my previous change on _copy_characters()
2012-06-16 04:53:00 +02:00
Victor Stinner
ca439eecea
Fix unicode_adjust_maxchar(): catch PyUnicode_New() failure
2012-06-16 03:17:34 +02:00
Victor Stinner
184252ad3f
Fix "%f" format of str%args if the result is not an ASCII or latin1 string
2012-06-16 02:57:41 +02:00
Victor Stinner
9a77770add
Remove debug code
2012-06-16 02:44:43 +02:00
Victor Stinner
c9d369f1bf
Optimize _PyUnicode_FastCopyCharacters() when maxchar(from) > maxchar(to)
2012-06-16 02:22:37 +02:00
Victor Stinner
f05e17ece9
unicodeobject.c: Remove debug code
2012-06-16 01:53:04 +02:00
Antoine Pitrou
27f6a3b0bf
Issue #15026 : utf-16 encoding is now significantly faster (up to 10x).
...
Patch by Serhiy Storchaka.
2012-06-15 22:15:23 +02:00
Kristján Valur Jónsson
55e5dc8371
Rearrange code to beat an optimizer bug affecting Release x64 on windows
...
with VS2010sp1
2012-06-06 21:58:08 +00:00
Victor Stinner
d7b7c7472b
Issue #14993 : Use standard "unsigned char" instead of a unsigned char bitfield
2012-06-04 22:52:12 +02:00
Kristjan Valur Jonsson
85634d7a2e
Issue #14909 : A number of places were using PyMem_Realloc() apis and
...
PyObject_GC_Resize() with incorrect error handling. In case of errors,
the original object would be leaked. This checkin fixes those cases.
2012-05-31 09:37:31 +00:00
Victor Stinner
3a7d096f2f
Issue #14744 : Fix compilation on Windows (part 2)
2012-05-29 18:53:56 +02:00
Victor Stinner
d3f0882dfb
Issue #14744 : Use the new _PyUnicodeWriter internal API to speed up str%args and str.format(args)
...
* Formatting string, int, float and complex use the _PyUnicodeWriter API. It
avoids a temporary buffer in most cases.
* Add _PyUnicodeWriter_WriteStr() to restore the PyAccu optimization: just
keep a reference to the string if the output is only composed of one string
* Disable overallocation when formatting the last argument of str%args and
str.format(args)
* Overallocation allocates at least 100 characters: add min_length attribute
to the _PyUnicodeWriter structure
* Add new private functions: _PyUnicode_FastCopyCharacters(),
_PyUnicode_FastFill() and _PyUnicode_FromASCII()
The speed up is around 20% in average.
2012-05-29 12:57:52 +02:00
Antoine Pitrou
63065d761e
Issue #14624 : UTF-16 decoding is now 3x to 4x faster on various inputs.
...
Patch by Serhiy Storchaka.
2012-05-15 23:48:04 +02:00
Martin v. Löwis
b05c0738d8
Silence VS 2010 signed/unsigned warnings.
2012-05-15 13:45:49 +02:00
Antoine Pitrou
758153badb
Fix refleaks introduced by 83da67651687.
2012-05-12 15:51:51 +02:00
Antoine Pitrou
e45c0c5cef
Fix logic error introduced by 83da67651687.
2012-05-12 15:49:07 +02:00
Benjamin Peterson
1ff2e35e84
simplify by shortcutting when the kind of the needle is larger than the haystack
2012-05-11 17:41:20 -05:00
Antoine Pitrou
ca5f91b888
Issue #14738 : Speed-up UTF-8 decoding on non-ASCII data. Patch by Serhiy Storchaka.
2012-05-10 16:36:02 +02:00
Victor Stinner
3b1a74a9c3
Rename unicode_write_t structure and its methods to "_PyUnicodeWriter"
2012-05-09 22:25:00 +02:00
Victor Stinner
ee4544c920
Issue #14744 : Inline unicode_writer_write_char() and unicode_write_str()
...
Optimize also PyUnicode_Format(): call unicode_writer_prepare() only once
per argument.
2012-05-09 22:24:08 +02:00
Victor Stinner
f59c28c930
unicode_writer_finish() checks string consistency
2012-05-09 03:24:14 +02:00
Victor Stinner
106802547c
Backout ab500b297900: the check for integer overflow is wrong
...
Issue #14716 : Change integer overflow check in unicode_writer_prepare()
to compute the limit at compile time instead of runtime. Patch writen by Serhiy
Storchaka.
2012-05-07 23:50:05 +02:00
Victor Stinner
0576f9b4cf
Issue #14716 : Change integer overflow check in unicode_writer_prepare()
...
to compute the limit at compile time instead of runtime. Patch writen by Serhiy
Storchaka.
2012-05-07 13:02:44 +02:00
Victor Stinner
202fdca133
Close #14716 : str.format() now uses the new "unicode writer" API instead of the
...
PyAccu API. For example, it makes str.format() from 25% to 30% faster on Linux.
2012-05-07 12:47:02 +02:00
Mark Dickinson
99e2e5552a
Issue #14700 : Fix two broken and undefined-behaviour-inducing overflow checks in old-style string formatting. Thanks Serhiy Storchaka for report and original patch.
2012-05-07 11:20:50 +01:00
Victor Stinner
d0dba6eee8
unicode_writer: don't force inline when it is not necessary
...
Keep inline for performance critical functions (functions used in loops)
2012-05-04 01:19:15 +02:00
Benjamin Peterson
b63f49f2b4
if the kind of the string to count is larger than the string to search, shortcut to 0
2012-05-03 18:31:07 -04:00
Victor Stinner
a7b654be30
unicode_writer: add finish() method and assertions to write_str() method
...
* The write_str() method does nothing if the length is zero.
* Replace "struct unicode_writer_t" with "unicode_writer_t"
2012-05-03 23:58:55 +02:00
Victor Stinner
bf4e266397
Issue #14687 : Remove redundant length attribute of unicode_write_t
...
The length can be read directly from the buffer
2012-05-03 19:27:14 +02:00
Victor Stinner
7989157e49
Issue #14687 : Cleanup unicode_writer_prepare()
...
"Inline" PyUnicode_Resize(): call directly resize_compact()
2012-05-03 13:43:07 +02:00
Victor Stinner
f2c76aa6cb
Issue #14687 : str%tuple now uses an optimistic "unicode writer" instead of an
...
accumulator. Directly write characters into the output (don't use a temporary
list): resize and widen the string on demand.
2012-05-03 13:10:40 +02:00
Victor Stinner
1b487b467b
Issue #14624 , #14687 : Optimize unicode_widen()
...
Don't convert uninitialized characters. Patch written by Serhiy Storchaka.
2012-05-03 12:29:04 +02:00
Victor Stinner
3a7f7977f1
Remove buggy assertion in PyUnicode_Substring()
...
Use also directly unicode_empty, instead of PyUnicode_New(0,0).
2012-05-03 03:36:40 +02:00
Victor Stinner
684d5fd420
Fix PyUnicode_Substring() for start >= length and start > end
...
Remove the fast-path for 1-character string: unicode_fromascii() and
_PyUnicode_FromUCS*() now have their own fast-path for 1-character strings.
2012-05-03 02:32:34 +02:00
Victor Stinner
b6cd014d75
Unicode: optimize creating of 1-character strings
2012-05-03 02:17:04 +02:00
Victor Stinner
bff7c96834
Issue #14687 : Optimize str%tuple for the "%(name)s" syntax
...
Avoid an useless and expensive call to PyUnicode_READ().
2012-05-03 01:44:59 +02:00
Victor Stinner
e6abb488c9
unicodeobject.c: Add MAX_MAXCHAR() macro to (micro-)optimize the computation
...
of the second argument of PyUnicode_New().
* Create also align_maxchar() function
* Optimize fix_decimal_and_space_to_ascii(): don't compute the maximum
character when ch <= 127 (it is ASCII)
2012-05-02 01:15:40 +02:00
Victor Stinner
438106b66e
Issue #14687 : Cleanup PyUnicode_Format()
2012-05-02 00:41:57 +02:00
Victor Stinner
b5c3ea3af3
Issue #14687 : Optimize str%args
...
* formatfloat() uses unicode_fromascii() instead of PyUnicode_DecodeASCII()
to not have to check characters, we know that it is really ASCII
* Use PyUnicode_FromOrdinal() instead of _PyUnicode_FromUCS4() to format
a character: if avoids a call to ucs4lib_find_max_char() to compute
the maximum character (whereas we already know it, it is just the character
itself)
2012-05-02 00:29:36 +02:00
Victor Stinner
b80e46eca4
Issue #14687 : Avoid an useless duplicated string in PyUnicode_Format()
2012-04-30 05:21:52 +02:00
Victor Stinner
aff3cc659b
Issue #14687 : Cleanup PyUnicode_Format()
2012-04-30 05:19:21 +02:00
Victor Stinner
b11d91d969
Fix my previous commit: bool is a long, restore the specical case for bool
2012-04-28 00:25:34 +02:00
Victor Stinner
d0880d57b0
Simplify and optimize formatlong()
...
* Remove _PyBytes_FormatLong(): inline it into formatlong()
* the input type is always a long, so remove the code for bool
* don't duplicate the string if the length does not change
* Use PyUnicode_DATA() instead of _PyUnicode_AsString()
2012-04-27 23:40:13 +02:00
Victor Stinner
94d558b063
Optimize _PyUnicode_FindMaxChar() find pure ASCII strings
2012-04-27 22:26:58 +02:00
Victor Stinner
8f825060f1
Check newly created consistency using _PyUnicode_CheckConsistency(str, 1)
...
* In debug mode, fill the string data with invalid characters
* Simplify also reference counting in PyCodec_BackslashReplaceErrors()
and PyCodec_XMLCharRefReplaceError()
2012-04-27 13:55:39 +02:00
Victor Stinner
718fbf078c
_PyUnicode_CheckConsistency() ensures that the unicode string ends with a
...
null character
2012-04-26 00:39:37 +02:00
Benjamin Peterson
b9f4c9daad
make pointer arith c89
2012-04-23 21:45:40 -04:00
Benjamin Peterson
f3b7d86e25
use correct base ptr
2012-04-23 18:07:01 -04:00
Benjamin Peterson
2844a7a6d3
simplify and reformat
2012-04-23 18:00:25 -04:00
Victor Stinner
ece58deb9f
Close #14648 : Compute correctly maxchar in str.format() for substrin
2012-04-23 23:36:38 +02:00
Benjamin Peterson
64ed576de8
merge 3.2 ( #14509 )
2012-04-09 15:04:39 -04:00
Benjamin Peterson
ca819c3c9d
merge 3.1 ( #14509 )
2012-04-09 15:01:02 -04:00
Benjamin Peterson
f6622c8a3e
fix build without Py_DEBUG and DNDEBUG ( closes #14509 )
2012-04-09 14:53:07 -04:00
Victor Stinner
afb5205c48
Close #14249 : Use bit shifts instead of an union, it's more efficient.
...
Patch written by Serhiy Storchaka
2012-04-05 22:54:49 +02:00
Victor Stinner
e7eee01f36
Close #14249 : Use an union instead of a long to short pointer to avoid aliasing
...
issue. Speed up UTF-16 by 20%.
2012-04-05 13:44:34 +02:00
Antoine Pitrou
a701388de1
Rename _PyIter_GetBuiltin to _PyObject_GetBuiltin, and do not include it in the stable ABI.
2012-04-05 00:04:20 +02:00
Kristján Valur Jónsson
31668b8f7a
Issue #14288 : Serialization support for builtin iterators.
2012-04-03 10:49:41 +00:00
Benjamin Peterson
0df542985a
grammar
2012-03-26 14:50:32 -04:00
Benjamin Peterson
c067d6661f
merge 3.2
2012-03-25 22:41:16 -04:00
Benjamin Peterson
a8755c586e
kill this terribly outdated comment
2012-03-25 22:40:54 -04:00
Victor Stinner
0d03478b88
Remove an unused variable
2012-03-06 02:06:01 +01:00
Victor Stinner
c9590ad745
Close #14085 : remove assertions from PyUnicode_WRITE macro
...
Add checks in PyUnicode_WriteChar() and convert PyUnicode_New() assertion to a
test raising a Python exception.
2012-03-04 01:34:37 +01:00
Ezio Melotti
cda6b6d60d
#14081 : The sep and maxsplit parameter to str.split, bytes.split, and bytearray.split may now be passed as keyword arguments.
2012-02-26 09:39:55 +02:00
Victor Stinner
b0800dc53b
Oops, revert unwanted changes
2012-02-25 00:47:08 +01:00
Victor Stinner
abc649ddbe
Issue #14107 : fix bigmem tests on str.capitalize(), str.swapcase() and
...
str.title(). Compute correctly how much memory is required for the test
(memuse).
2012-02-25 00:43:27 +01:00
Antoine Pitrou
842c0f17eb
Fix compilation error under Windows (and warnings too).
2012-02-24 13:30:46 +01:00
Victor Stinner
90f50d4df9
Issue #13706 : Fix format(float, "n") for locale with non-ASCII decimal point (e.g. ps_aF)
2012-02-24 01:44:47 +01:00
Victor Stinner
41a863cb81
Issue #13706 : Fix format(int, "n") for locale with non-ASCII thousands separator
...
* Decode thousands separator and decimal point using PyUnicode_DecodeLocale()
(from the locale encoding), instead of decoding them implicitly from latin1
* Remove _PyUnicode_InsertThousandsGroupingLocale(), it was not used
* Change _PyUnicode_InsertThousandsGrouping() API to return the maximum
character if unicode is NULL
* Replace MIN/MAX macros by Py_MIN/Py_MAX
* stringlib/undef.h undefines STRINGLIB_IS_UNICODE
* stringlib/localeutil.h only supports Unicode
2012-02-24 00:37:51 +01:00
Victor Stinner
b429d3b09c
Fix doc of an internal function: unicode_write_cstr()
2012-02-22 21:22:20 +01:00
Antoine Pitrou
ba6bafcfbe
Fix compile failure under Windows
2012-02-22 16:41:50 +01:00
Victor Stinner
c516610f0b
Optimize str%arg for number formats: %i, %d, %u, %x, %p
...
Write a specialized function to write an ASCII/latin1 C char* string into a
Python Unicode string.
2012-02-22 13:55:02 +01:00
Victor Stinner
99d7ad0bb0
Micro-optimize computation of maxchar in PyUnicode_TransformDecimalToASCII()
2012-02-22 13:37:39 +01:00
Victor Stinner
da79e632c4
Micro-optimize unicode_expandtabs(): use FILL() macro to write N spaces
2012-02-22 13:37:04 +01:00
Victor Stinner
15e9ed299c
PyUnicode_New() and unicode_putchar() check for MAX_UNICODE maximum (U+10FFFF)
2012-02-22 13:36:20 +01:00
Benjamin Peterson
d9a3591ed1
merge 3.2
2012-02-21 11:12:14 -05:00
Benjamin Peterson
e249dcab7a
merge 3.2
2012-02-21 11:09:13 -05:00
Benjamin Peterson
69e9727657
ensure no one tries to hash things before the random seed is found
2012-02-21 11:08:50 -05:00
Georg Brandl
16fa2a1097
Forgot the "empty string -> hash == 0" special case for strings.
2012-02-21 00:50:13 +01:00
Georg Brandl
2fb477c0f0
Merge 3.2: Issue #13703 plus some related test suite fixes.
2012-02-21 00:33:36 +01:00
Georg Brandl
09a7c72cad
Merge from 3.1: Issue #13703 : add a way to randomize the hash values of basic types (str, bytes, datetime)
...
in order to make algorithmic complexity attacks on (e.g.) web apps much more complicated.
The environment variable PYTHONHASHSEED and the new command line flag -R control this
behavior.
2012-02-20 21:31:46 +01:00
Georg Brandl
2daf6ae249
Issue #13703 : add a way to randomize the hash values of basic types (str, bytes, datetime)
...
in order to make algorithmic complexity attacks on (e.g.) web apps much more complicated.
The environment variable PYTHONHASHSEED and the new command line flag -R control this
behavior.
2012-02-20 19:54:16 +01:00
Victor Stinner
c3a6b02d70
(Merge 3.2) Issue #13913 : normalize utf-8 codec name in UTF-8 decoder
2012-02-14 01:18:10 +01:00
Victor Stinner
cbe01342bc
Issue #13913 : normalize utf-8 codec name in UTF-8 decoder
2012-02-14 01:17:45 +01:00
Victor Stinner
d1cd99b533
Backout d2c1521ad0a1: _Py_IDENTIFIER() uses UTF-8 again
2012-02-07 23:05:55 +01:00
Victor Stinner
d446d8e09a
_Py_Identifier are always ASCII strings
2012-02-05 01:45:45 +01:00
Antoine Pitrou
7ab4af0427
Issue #13848 : open() and the FileIO constructor now check for NUL characters in the file name.
...
Patch by Hynek Schlawack.
2012-01-29 18:43:36 +01:00
Antoine Pitrou
1334884ff2
Issue #13848 : open() and the FileIO constructor now check for NUL characters in the file name.
...
Patch by Hynek Schlawack.
2012-01-29 18:36:34 +01:00
Benjamin Peterson
eea4846d23
don't ready in case_operation, since most callers do it themselves
2012-01-16 14:28:50 -05:00
Gregory P. Smith
f5b62a9b31
Consolidate the occurrances of the prime used as the multiplier when hashing.
2012-01-14 15:45:13 -08:00
Gregory P. Smith
63e6c3222f
Consolidate the occurrances of the prime used as the multiplier when hashing
...
to a single #define instead of having several copies in several files.
This excludes the Modules/ tree (datetime and expat both have a copy
for their own purposes with no need for it to be the same).
2012-01-14 15:31:34 -08:00
Benjamin Peterson
c8d8b8861e
fix possible refleaks if PyUnicode_READY fails
2012-01-14 13:37:31 -05:00
Benjamin Peterson
bac79498c8
always explicitly check for -1 from PyUnicode_READY
2012-01-14 13:34:47 -05:00
Benjamin Peterson
d5890c8db5
add str.casefold() ( closes #13752 )
2012-01-14 13:23:30 -05:00
Benjamin Peterson
53aa1d7c57
fix possible if unlikely leak
2011-12-20 13:29:45 -06:00
Benjamin Peterson
e51757f6de
move do_title to a better place
2012-01-12 21:10:29 -05:00
Benjamin Peterson
821e4cfd01
make fix_decimal_and_space_to_ascii check if it modifies the string
2012-01-12 15:40:18 -05:00
Benjamin Peterson
0c91392fe6
kill capwords implementation which has been disabled since the begining
2012-01-12 15:25:41 -05:00
Benjamin Peterson
b2bf01d824
use full unicode mappings for upper/lower/title case ( #12736 )
...
Also broaden the category of characters that count as lowercase/uppercase.
2012-01-11 18:17:06 -05:00
Victor Stinner
3fe553160c
Add a new PyUnicode_Fill() function
...
It is faster than the unicode_fill() function which was implemented in
formatter_unicode.c.
2012-01-04 00:33:50 +01:00
Benjamin Peterson
5e458f520c
also decref the right thing
2012-01-02 10:12:13 -06:00
Benjamin Peterson
4c13a4a352
ready the correct string
2012-01-02 09:07:38 -06:00
Benjamin Peterson
22a29708fd
fix some possible refleaks from PyUnicode_READY error conditions
2012-01-02 09:00:30 -06:00
Benjamin Peterson
9ca3ffac94
== -1 is convention
2012-01-01 16:04:29 -06:00
Benjamin Peterson
e157cf1012
make switch more robust
2012-01-01 15:56:20 -06:00
Benjamin Peterson
c0b95d18fa
4 space indentation
2011-12-20 17:24:05 -06:00
Benjamin Peterson
ead6b53659
fix spacing around switch statements
2011-12-20 17:23:42 -06:00
Benjamin Peterson
822c790527
merge 3.2
2011-12-20 13:32:50 -06:00
Victor Stinner
6099a03202
Issue #13624 : Write a specialized UTF-8 encoder to allow more optimization
...
The main bottleneck was the PyUnicode_READ() macro.
2011-12-18 14:22:26 +01:00
Victor Stinner
73f53b57d1
Optimize str * n for len(str)==1 and UCS-2 or UCS-4
2011-12-18 03:26:31 +01:00
Victor Stinner
f644110816
Issue #13621 : Optimize str.replace(char1, char2)
...
Use findchar() which is more optimized than a dummy loop using
PyUnicode_READ(). PyUnicode_READ() is a complex and slow macro.
2011-12-18 02:43:08 +01:00
Victor Stinner
ab870218e3
Issue #10951 : Fix compiler warnings in timemodule.c and unicodeobject.c
...
Thanks Jérémy Anger for the fix.
2011-12-17 22:39:43 +01:00
Victor Stinner
2f197078fb
The locale decoder raises a UnicodeDecodeError instead of an OSError
...
Search the invalid character using mbrtowc().
2011-12-17 07:08:30 +01:00
Victor Stinner
1b57967b96
Issue #13560 : Locale codec functions use the classic "errors" parameter,
...
instead of surrogateescape
So it would be possible to support more error handlers later.
2011-12-17 05:47:23 +01:00
Victor Stinner
ab59594326
What's New in Python 3.3: complete the deprecation list
...
Add also FIXMEs in unicodeobject.c
2011-12-17 04:59:06 +01:00
Victor Stinner
1f33f2b0c3
Issue #13560 : os.strerror() now uses the current locale encoding instead of UTF-8
2011-12-17 04:45:09 +01:00
Victor Stinner
f2ea71fcc8
Issue #13560 : Add PyUnicode_EncodeLocale()
...
* Use PyUnicode_EncodeLocale() in time.strftime() if wcsftime() is not
available
* Document my last changes in Misc/NEWS
2011-12-17 04:13:41 +01:00
Victor Stinner
af02e1c85a
Add PyUnicode_DecodeLocaleAndSize() and PyUnicode_DecodeLocale()
...
* PyUnicode_DecodeLocaleAndSize() and PyUnicode_DecodeLocale() decode a string
from the current locale encoding
* _Py_char2wchar() writes an "error code" in the size argument to indicate
if the function failed because of memory allocation failure or because of a
decoding error. The function doesn't write the error message directly to
stderr.
* Fix time.strftime() (if wcsftime() is missing): decode strftime() result
from the current locale encoding, not from the filesystem encoding.
2011-12-16 23:56:01 +01:00
Victor Stinner
16e6a80923
PyUnicode_Resize(): warn about canonical representation
...
Call also directly unicode_resize() in unicodeobject.c
2011-12-12 13:24:15 +01:00
Victor Stinner
b0a82a6a7f
Fix PyUnicode_Resize() for compact string: leave the string unchanged on error
...
Fix also PyUnicode_Resize() doc
2011-12-12 13:08:33 +01:00
Victor Stinner
bf6e560d0c
Make PyUnicode_Copy() private => _PyUnicode_Copy()
...
Undocument the function.
Make also decode_utf8_errors() as private (static).
2011-12-12 01:53:47 +01:00
Victor Stinner
7a9105a380
resize_copy() now supports legacy ready strings
2011-12-12 00:13:42 +01:00
Victor Stinner
488fa49acf
Rewrite PyUnicode_Append(); unicode_modifiable() is more strict
...
* Rename unicode_resizable() to unicode_modifiable()
* Rename _PyUnicode_Dirty() to unicode_check_modifiable() to make it clear
that the function is private
* Inline PyUnicode_Concat() and unicode_append_inplace() in PyUnicode_Append()
to simplify the code
* unicode_modifiable() return 0 if the hash has been computed or if the string
is not an exact unicode string
* Remove _PyUnicode_DIRTY(): no need to reset the hash anymore, because if the
hash has already been computed, you cannot modify a string inplace anymore
* PyUnicode_Concat() checks for integer overflow
2011-12-12 00:01:39 +01:00
Victor Stinner
c4b495497a
Create unicode_result_unchanged() subfunction
2011-12-11 22:44:26 +01:00
Victor Stinner
eaab604829
Fix fixup() for unchanged unicode subtype
...
If maxchar_new == 0 and self is a unicode subtype, return u instead of duplicating u.
2011-12-11 22:22:39 +01:00
Victor Stinner
e6b2d4407a
unicode_fromascii() doesn't check string content twice in debug mode
...
_PyUnicode_CheckConsistency() also checks string content.
2011-12-11 21:54:30 +01:00
Victor Stinner
a1d12bb119
Call directly PyUnicode_DecodeUTF8Stateful() instead of PyUnicode_DecodeUTF8()
...
* Remove micro-optimization from PyUnicode_FromStringAndSize():
PyUnicode_DecodeUTF8Stateful() has already these optimizations (for size=0
and one ascii char).
* Rename utf8_max_char_size_and_char_count() to utf8_scanner(), and remove an
useless variable
2011-12-11 21:53:09 +01:00
Victor Stinner
382955ff4e
Use directly unicode_empty instead of PyUnicode_New(0, 0)
2011-12-11 21:44:00 +01:00
Victor Stinner
785938eebd
Move the slowest UTF-8 decoder to its own subfunction
...
* Create decode_utf8_errors()
* Reuse unicode_fromascii()
* decode_utf8_errors() doesn't refit at the beginning
* Remove refit_partial_string(), use unicode_adjust_maxchar() instead
2011-12-11 20:09:03 +01:00
Victor Stinner
84def3774d
Fix error handling in resize_compact()
2011-12-11 20:04:56 +01:00
Victor Stinner
8faf8216e4
PyUnicode_FromWideChar() and PyUnicode_FromUnicode() raise a ValueError if a
...
character in not in range [U+0000; U+10ffff].
2011-12-08 22:14:11 +01:00
Victor Stinner
551ac95733
Py_UNICODE_HIGH_SURROGATE() and Py_UNICODE_LOW_SURROGATE() macros
...
And use surrogates macros everywhere in unicodeobject.c
2011-11-29 22:58:13 +01:00
Victor Stinner
6345be9a14
Close #13093 : PyUnicode_EncodeDecimal() doesn't support error handlers
...
different than "strict" anymore. The caller was unable to compute the
size of the output buffer: it depends on the error handler.
2011-11-25 20:09:01 +01:00
Benjamin Peterson
1518e8713d
and back to the "magic" formula (with a comment) it is
2011-11-23 10:44:52 -06:00
Benjamin Peterson
5944c36931
cave to those who like readable code
2011-11-22 19:05:49 -06:00
Benjamin Peterson
0268675193
fix compiler warning by implementing this more cleverly
2011-11-22 15:29:32 -05:00
Victor Stinner
ca4f20782e
find_maxchar_surrogates() reuses surrogate macros
2011-11-22 03:38:40 +01:00
Victor Stinner
0d3721d986
Issue #13441 : Disable temporary the check on the maximum character until
...
the Solaris issue is solved.
But add assertion on the maximum character in various encoders: UTF-7, UTF-8,
wide character (wchar_t*, Py_UNICODE*), unicode-escape, raw-unicode-escape.
Fix also unicode_encode_ucs1() for backslashreplace error handler: Python is
now always "wide".
2011-11-22 03:27:53 +01:00
Victor Stinner
f8facacf30
Fix compiler warnings
2011-11-22 02:30:47 +01:00
Victor Stinner
b84d723509
(Merge 3.2) Issue #13093 : Fix error handling on PyUnicode_EncodeDecimal()
2011-11-22 01:50:07 +01:00
Victor Stinner
cfed46e00a
PyUnicode_FromKindAndData() fails with a ValueError if size < 0
2011-11-22 01:29:14 +01:00
Victor Stinner
42885206ec
UTF-8 decoder: set consumed value in the latin1 fast-path
2011-11-22 01:23:02 +01:00
Victor Stinner
d3df8ab377
Replace _PyUnicode_READY_REPLACE() and _PyUnicode_ReadyReplace() with unicode_ready()
...
* unicode_ready() has a simpler API
* try to reuse unicode_empty and latin1_char singleton everywhere
* Fix a reference leak in _PyUnicode_TranslateCharmap()
* PyUnicode_InternInPlace() doesn't try to get a singleton anymore, to avoid
having to handle a failure
2011-11-22 01:22:34 +01:00
Victor Stinner
f01245067a
Rewrite PyUnicode_TransformDecimalToASCII() to use the new Unicode API
2011-11-21 23:12:56 +01:00
Victor Stinner
2d718f39a5
Remove an unused variable from PyUnicode_Copy()
2011-11-21 23:11:52 +01:00
Victor Stinner
87af4f2f3a
Simplify PyUnicode_Copy()
...
USe PyUnicode_Copy() in fixup()
2011-11-21 23:03:47 +01:00
Victor Stinner
5bbe5e7c85
Fix a compiler warning in _PyUnicode_CheckConsistency()
2011-11-21 22:54:05 +01:00
Victor Stinner
42bf77537e
Rewrite PyUnicode_EncodeDecimal() to use the new Unicode API
...
Add tests for PyUnicode_EncodeDecimal() and
PyUnicode_TransformDecimalToASCII().
2011-11-21 22:52:58 +01:00
Antoine Pitrou
0a3229de6b
Issue #13417 : speed up utf-8 decoding by around 2x for the non-fully-ASCII case.
...
This almost catches up with pre-PEP 393 performance, when decoding needed
only one pass.
2011-11-21 20:39:13 +01:00
Victor Stinner
da29cc36aa
Issue #13441 : _PyUnicode_CheckConsistency() dumps the string if the maximum
...
character is bigger than U+10FFFF and locale.localeconv() dumps the string
before decoding it.
Temporary hack to debug the issue #13441 .
2011-11-21 14:31:41 +01:00
Victor Stinner
9e30aa52fd
Fix misuse of PyUnicode_GET_SIZE() => PyUnicode_GET_LENGTH()
...
And PyUnicode_GetSize() => PyUnicode_GetLength()
2011-11-21 02:49:52 +01:00
Victor Stinner
4ead7c7be8
PyObject_Str() ensures that the result string is ready
...
and check the string consistency.
_PyUnicode_CheckConsistency() doesn't check the hash anymore. It should be
possible to call this function even if hash(str) was already called.
2011-11-20 19:48:36 +01:00
Victor Stinner
b960b34577
PyUnicode_AsUTF32String() calls directly _PyUnicode_EncodeUTF32(),
...
instead of calling the deprecated PyUnicode_EncodeUTF32() function
2011-11-20 19:12:52 +01:00
Victor Stinner
77faf69ca1
_PyUnicode_CheckConsistency() also checks maxchar maximum value,
...
not only its minimum value
2011-11-20 18:56:05 +01:00
Victor Stinner
d5c4022d2a
Remove the two ugly and unused WRITE_ASCII_OR_WSTR and WRITE_WSTR macros
2011-11-20 18:41:31 +01:00
Victor Stinner
2e9cfadd7c
Reuse surrogate macros in UTF-16 decoder
2011-11-20 18:40:27 +01:00
Victor Stinner
ae4f7c8e59
charmap_encoding_error() uses the new Unicode API
2011-11-20 18:28:55 +01:00
Victor Stinner
ac931b1e5b
Use PyUnicode_EncodeCodePage() instead of PyUnicode_EncodeMBCS() with
...
PyUnicode_AsUnicodeAndSize()
2011-11-20 18:27:03 +01:00
Victor Stinner
22168998f5
charmap encoders uses Py_UCS4, not Py_UNICODE
2011-11-20 17:09:18 +01:00
Victor Stinner
1f7951711c
Catch PyUnicode_AS_UNICODE() errors
2011-11-17 00:45:54 +01:00
Ezio Melotti
11060a4a48
#13406 : silence deprecation warnings in test_codecs.
2011-11-16 09:39:10 +02:00
Antoine Pitrou
78edf7576e
Issue #13333 : The UTF-7 decoder now accepts lone surrogates
...
(the encoder already accepts them).
2011-11-15 01:44:16 +01:00
Antoine Pitrou
5418ee0b9a
Issue #13333 : The UTF-7 decoder now accepts lone surrogates
...
(the encoder already accepts them).
2011-11-15 01:42:21 +01:00
Antoine Pitrou
31b92a534f
Sanitize reference management in the utf-8 encoder
2011-11-12 18:35:19 +01:00
Antoine Pitrou
0290c7a811
Fix regression on 2-byte wchar_t systems (Windows)
2011-11-11 13:29:12 +01:00
Antoine Pitrou
44c6affc79
Avoid crashing because of an unaligned word access
2011-11-11 02:59:42 +01:00
Antoine Pitrou
de20b0b50e
Issue #13149 : Speed up append-only StringIO objects.
...
This is very similar to the "lazy strings" idea.
2011-11-10 21:47:38 +01:00
Victor Stinner
9f4b1e9c50
Fix and deprecated the unicode_internal codec
...
unicode_internal codec uses Py_UNICODE instead of the real internal
representation (PEP 393: Py_UCS1, Py_UCS2 or Py_UCS4) for backward
compatibility.
2011-11-10 20:56:30 +01:00
Victor Stinner
24729f36bf
Prefer Py_UCS4 or wchar_t over Py_UNICODE
2011-11-10 20:31:37 +01:00
Victor Stinner
ebf3ba808e
PyUnicode_DecodeCharmap() uses the new Unicode API
2011-11-10 20:30:22 +01:00
Victor Stinner
a98b28c1bf
Avoid PyUnicode_AS_UNICODE in the UTF-8 encoder
2011-11-10 20:21:49 +01:00
Victor Stinner
3326cb6a36
Fix "unicode_escape" encoder
2011-11-10 20:15:25 +01:00
Victor Stinner
0e36826a04
Fix UTF-7 encoder on Windows
2011-11-10 20:12:49 +01:00
Martin v. Löwis
1db7c13be1
Port encoders from Py_UNICODE API to unicode object API.
2011-11-10 18:24:32 +01:00
Victor Stinner
62aa4d086a
Strip trailing spaces
2011-11-09 00:03:45 +01:00
Victor Stinner
0a045efb49
Fix a compiler warning: use unsiged for maxchar in unicode_widen()
2011-11-09 00:02:42 +01:00
Victor Stinner
596a6c4ffc
Fix the code page decoder
...
* unicode_decode_call_errorhandler() now supports the PyUnicode_WCHAR_KIND
kind
* unicode_decode_call_errorhandler() calls copy_characters() instead of
PyUnicode_CopyCharacters()
2011-11-09 00:02:18 +01:00
Antoine Pitrou
a8f63c02ef
Fix missing goto
2011-11-08 18:37:16 +01:00
Martin v. Löwis
d10759f6ed
Make _PyUnicode_FromId return borrowed references.
...
http://mail.python.org/pipermail/python-dev/2011-November/114347.html
2011-11-07 13:00:05 +01:00
Martin v. Löwis
e9b11c1cd8
Change decoders to use Unicode API instead of Py_UNICODE.
2011-11-08 17:35:34 +01:00
Victor Stinner
e30c0a1014
Fix gdb/libpython.py for not ready Unicode strings
...
_PyUnicode_CheckConsistency() checks also hash and length value for not ready
Unicode strings.
2011-11-04 20:54:05 +01:00
Victor Stinner
2fc507fe45
Replace tabs by spaces
2011-11-04 20:06:39 +01:00
Martin v. Löwis
12be46ca84
Drop Py_UNICODE based encode exceptions.
2011-11-04 19:04:15 +01:00
Martin v. Löwis
3d325191bf
Port code page codec to Unicode API.
2011-11-04 18:23:06 +01:00
Victor Stinner
fcd9653667
Fix a compiler warning in unicode_encode_ucs1()
2011-11-04 00:28:50 +01:00
Victor Stinner
fc026c98d8
Fix PyUnicode_EncodeCharmap()
2011-11-04 00:24:51 +01:00
Victor Stinner
7931d9a951
Replace PyUnicodeObject type by PyObject
...
* _PyUnicode_CheckConsistency() now takes a PyObject* instead of void*
* Remove now useless casts to PyObject*
2011-11-04 00:22:48 +01:00
Victor Stinner
76a31a6bff
Cleanup decode_code_page_stateful() and encode_code_page()
...
* Fix decode_code_page_errors() result
* Inline decode_code_page() and encode_code_page_chunk()
* Replace the PyUnicodeObject type by PyObject
2011-11-04 00:05:13 +01:00
Victor Stinner
7581cef699
Adapt the code page encoder to the new unicode_encode_call_errorhandler()
...
The code is not correct, but at least it doesn't crash anymore.
2011-11-03 22:32:33 +01:00
Brian Curtin
2787ea41fd
Fix a compile error (apparently Windows only) introduced in 295fdfd4f422
2011-11-02 15:09:37 -05:00
Martin v. Löwis
23e275b3ad
Port UCS1 and charmap codecs to new API.
2011-11-02 18:02:51 +01:00
Martin v. Löwis
9e8166843c
Introduce PyObject* API for raising encode errors.
2011-11-02 12:45:42 +01:00
Martin v. Löwis
0d3072e98d
Drop Py_UCS4_ functions. Closes #13246 .
2011-10-31 08:40:56 +01:00
Victor Stinner
57ffa9d4ff
PyUnicode_AsUnicodeCopy() uses PyUnicode_AsUnicodeAndSize() to get directly the length
2011-10-23 20:10:08 +02:00
Victor Stinner
af9e4b8c29
Fix PyUnicode_InternImmortal(): PyUnicode_InternInPlace() may changes *p
2011-10-23 20:07:00 +02:00
Victor Stinner
9faa384bed
Cast directly to unsigned char, instead of using Py_CHARMASK
...
We don't need "& 0xff" on an unsigned char.
2011-10-23 20:06:00 +02:00
Victor Stinner
9db1a8b69f
Replace PyUnicodeObject* by PyObject* where it was irrevelant
...
A Unicode string can now be a PyASCIIObject, PyCompactUnicodeObject or
PyUnicodeObject. Aliasing a PyASCIIObject* or PyCompactUnicodeObject* to
PyUnicodeObject* is wrong
2011-10-23 20:04:37 +02:00
Victor Stinner
0d60e87ad6
Fix data variable in _PyUnicode_Dump() for compact ASCII
2011-10-23 19:47:19 +02:00