Victor Stinner
4a58707a34
Add _PyUnicodeWriter_WriteASCIIString() function
2013-11-19 12:54:53 +01:00
Serhiy Storchaka
58cf607d13
Issue #12892 : The utf-16* and utf-32* codecs now reject (lone) surrogates.
...
The utf-16* and utf-32* encoders no longer allow surrogate code points
(U+D800-U+DFFF) to be encoded.
The utf-32* decoders no longer decode byte sequences that correspond to
surrogate code points.
The surrogatepass error handler now works with the utf-16* and utf-32* codecs.
Based on patches by Victor Stinner and Kang-Hao (Kenny) Lu.
2013-11-19 11:32:41 +02:00
Victor Stinner
6989ba0174
Issue #19581 : Change the overallocation factor of _PyUnicodeWriter on Windows
...
On Windows, a factor of 50% gives best performances.
2013-11-18 21:08:39 +01:00
Larry Hastings
ed4a1c5703
Argument Clinic: rename "self" to "module" for module-level functions.
2013-11-18 09:32:13 -08:00
Ezio Melotti
745d54d2fa
#17806 : Added keyword-argument support for "tabsize" to str/bytes.expandtabs().
2013-11-16 19:10:57 +02:00
Nick Coghlan
8b097b4ed7
Close #17828 : better handling of codec errors
...
- output type errors now redirect users to the type-neutral
convenience functions in the codecs module
- stateless errors that occur during encoding and decoding
will now be automatically wrapped in exceptions that give
the name of the codec involved
2013-11-13 23:49:21 +10:00
Victor Stinner
66b3270975
_Py_normalize_encoding(): explain how the value 6 was computed
2013-11-07 23:12:23 +01:00
Victor Stinner
df23e30bea
Fix _Py_normalize_encoding(): ensure that buffer is big enough to store "utf-8"
...
if the input string is NULL
2013-11-07 13:33:36 +01:00
Victor Stinner
ad14ccd047
Issue #19512 : add _PyUnicode_CompareWithId() function
...
_PyUnicode_CompareWithId() is faster than PyUnicode_CompareWithASCIIString()
when both strings are equal and interned.
Add also _PyId_builtins identifier for "builtins" common string.
2013-11-07 00:46:04 +01:00
Victor Stinner
21ea21ef6d
Issue #19424 : PyUnicode_CompareWithASCIIString() normalizes memcmp() result
...
to -1, 0, 1
2013-11-04 11:28:26 +01:00
Victor Stinner
f0c7b2af05
Issue #16286 : remove duplicated identity check from unicode_compare()
...
Move the test to PyUnicode_Compare()
2013-11-04 11:27:14 +01:00
Victor Stinner
fd9e44db37
Issue #16286 : optimize PyUnicode_RichCompare() for identical strings (same
...
pointer) for any operator, not only Py_EQ and Py_NE.
Code of bytes_richcompare() and PyUnicode_RichCompare() is now closer.
2013-11-04 11:23:05 +01:00
Victor Stinner
c8bc5377ac
Issue #16286 : write a new subfunction bytes_compare_eq()
...
* cleanup bytes_richcompare()
* PyUnicode_RichCompare(): replace a test with a XOR
2013-11-04 11:08:10 +01:00
Victor Stinner
e1b1592fd4
Issue #19424 : Fix a compiler warning on comparing signed/unsigned size_t
...
Patch written by Zachary Ware.
2013-11-03 13:53:12 +01:00
Victor Stinner
a6b9b071a3
Issue #19424 : Fix a compiler warning
...
memcmp() just takes raw pointers
2013-10-30 18:27:13 +01:00
Victor Stinner
602f7cf0b9
Issue #19424 : Optimize PyUnicode_CompareWithASCIIString()
...
Use fast memcmp() instead of a loop using the slow PyUnicode_READ() macro.
strlen() is still necessary to check Unicode string containing null bytes.
2013-10-29 23:31:50 +01:00
Victor Stinner
68b674c9d4
Issue #19437 : Fix _PyUnicode_New() (constructor of legacy string), set all
...
attributes before checking for error. The destructor expects all attributes to
be set. It is now safe to call Py_DECREF(unicode) in the constructor.
2013-10-29 19:31:43 +01:00
Victor Stinner
fa3ba4c3bc
Issue #18609 : Add a fast-path for "iso8859-1" encoding
...
On AIX, the locale encoding may be "iso8859-1", which was not a known syntax of
the legacy ISO 8859-1 encoding.
Using a C codec instead of a Python codec is faster but also avoids tricky
issues during Python startup or complex code.
2013-10-29 11:34:05 +01:00
Victor Stinner
a5afb58986
Issue #18408 : Fix PyUnicode_AsUTF8AndSize(), raise MemoryError exception on
...
memory allocation failure
2013-10-29 01:28:23 +01:00
Serhiy Storchaka
c679227e31
Issue #1772673 : The type of `char*` arguments now changed to `const char*`.
2013-10-19 21:03:34 +03:00
Serhiy Storchaka
55e092f545
Issue #19279 : UTF-7 decoder no more produces illegal strings.
2013-10-19 20:39:28 +03:00
Serhiy Storchaka
35804e4c63
Issue #19279 : UTF-7 decoder no more produces illegal strings.
2013-10-19 20:38:19 +03:00
Larry Hastings
3182680210
Issue #16612 : Add "Argument Clinic", a compile-time preprocessor
...
for C files to generate argument parsing code. (See PEP 436.)
2013-10-19 00:09:25 -07:00
Ethan Furman
fb13721b1b
Close #18780 : %-formatting now prints value for int subclasses with %d, %i, and %u codes.
2013-08-31 10:18:55 -07:00
Antoine Pitrou
9ed5f27266
Issue #18722 : Remove uses of the "register" keyword in C code.
2013-08-13 20:18:52 +02:00
Raymond Hettinger
e56666d17f
Silence compiler warning about an uninitialized variable
2013-08-04 11:51:03 -07:00
Raymond Hettinger
5ed1b38a7d
merge
2013-08-04 11:51:35 -07:00
Christian Heimes
b578735dff
Check return value of PyType_Ready(&EncodingMapType)
...
CID 486654
2013-07-20 14:57:28 +02:00
Christian Heimes
26532f7519
Check return value of PyType_Ready(&EncodingMapType)
...
CID 486654
2013-07-20 14:57:16 +02:00
Victor Stinner
e699e5a218
Issue #18408 : Don't check unicode consistency in _PyUnicode_HAS_UTF8_MEMORY()
...
and _PyUnicode_HAS_WSTR_MEMORY() macros
These macros are called in unicode_dealloc(), whereas the unicode object can be
"inconsistent" if the creation of the object failed.
For example, when unicode_subtype_new() fails on a memory allocation,
_PyUnicode_CheckConsistency() fails with an assertion error because data is
NULL.
2013-07-15 18:22:47 +02:00
Victor Stinner
9e6b4d715c
Issue #18408 : _PyUnicodeWriter_Finish() now clears its buffer attribute in all
...
cases, so _PyUnicodeWriter_Dealloc() can be called after finish.
2013-07-09 00:37:24 +02:00
Victor Stinner
15a0bd3965
Issue #18408 : Fix _PyUnicodeWriter_Finish(): clear writer->buffer,
...
so _PyUnicodeWriter_Dealloc() can be called on the writer after finish.
2013-07-08 22:29:55 +02:00
Victor Stinner
6f8eeee7b9
Issue #18203 : Fix _Py_DecodeUTF8_surrogateescape(), use PyMem_RawMalloc() as _Py_char2wchar()
2013-07-07 22:57:45 +02:00
Victor Stinner
1a7425f67a
Issue #18203 : Replace malloc() with PyMem_RawMalloc() at Python initialization
...
* Replace malloc() with PyMem_RawMalloc()
* Replace PyMem_Malloc() with PyMem_RawMalloc() where the GIL is not held.
* _Py_char2wchar() now returns a buffer allocated by PyMem_RawMalloc(), instead
of PyMem_Malloc()
2013-07-07 16:25:15 +02:00
Christian Heimes
d47802eef7
Fix ref leak in error case of unicode find, count, formatlong
...
CID 983315: Resource leak (RESOURCE_LEAK)
CID 983316: Resource leak (RESOURCE_LEAK)
CID 983317: Resource leak (RESOURCE_LEAK)
2013-06-29 21:33:36 +02:00
Christian Heimes
d47a0456b1
Fix ref leak in error case of unicode index
...
CID 983319 (#1 of 2): Resource leak (RESOURCE_LEAK)
leaked_storage: Variable substring going out of scope leaks the storage it points to.
2013-06-29 21:21:37 +02:00
Christian Heimes
ea71a525c3
Fix ref leak in error case of unicode rindex and rfind
...
CID 983320: Resource leak (RESOURCE_LEAK)
CID 983321: Resource leak (RESOURCE_LEAK)
leaked_storage: Variable substring going out of scope leaks the storage it points to.
2013-06-29 21:17:34 +02:00
Christian Heimes
305e49e17e
Fix memory leak in endswith
...
CID 1040368 (#1 of 1): Resource leak (RESOURCE_LEAK)
leaked_storage: Variable substring going out of scope leaks the storage it points to.
2013-06-29 20:41:06 +02:00
Serhiy Storchaka
c89533f72f
Issue #18184 : PyUnicode_FromFormat() and PyUnicode_FromFormatV() now raise
...
OverflowError when an argument of %c format is out of range.
2013-06-23 20:21:16 +03:00
Serhiy Storchaka
8eeae2126c
Issue #18184 : PyUnicode_FromFormat() and PyUnicode_FromFormatV() now raise
...
OverflowError when an argument of %c format is out of range.
2013-06-23 20:12:14 +03:00
Benjamin Peterson
3164f5d565
merge 3.3 ( #18183 )
2013-06-10 09:24:01 -07:00
Benjamin Peterson
7e30373126
remove MAX_MAXCHAR because it's unsafe for computing maximum codepoitn value (see #18183 )
2013-06-10 09:19:46 -07:00
Victor Stinner
9f067f490f
Issue #9566 : Fix compiler warning on Windows 64-bit
2013-06-05 00:21:31 +02:00
Antoine Pitrou
7ce35a1816
Issue #17237 : Fix crash in the ASCII decoder on m68k.
2013-05-11 15:59:37 +02:00
Antoine Pitrou
8b0e98426d
Issue #17237 : Fix crash in the ASCII decoder on m68k.
2013-05-11 15:58:34 +02:00
Victor Stinner
f4f24248dc
Fix uninitialized value in charmap_decode_mapping()
2013-05-07 01:01:31 +02:00
Victor Stinner
8cecc8c262
Issue #7330 : Implement width and precision (ex: "%5.3s") for the format string
...
of PyUnicode_FromFormat() function, original patch written by Ysj Ray.
2013-05-06 23:11:54 +02:00
Victor Stinner
bb4503f61e
Partial revert of changeset 9744b2df134c
...
PyUnicode_Append() cannot call directly resize_compact(): I forgot that a
string can be ready *and* not compact (a legacy string can also be ready).
2013-04-18 09:41:34 +02:00
Victor Stinner
fb161b1b6d
Split PyUnicode_DecodeCharmap() into subfunction for readability
2013-04-18 01:44:27 +02:00
Victor Stinner
170ca6f84b
Fix bug in Unicode decoders related to _PyUnicodeWriter
...
Bug introduced by changesets 7ed9993d53b4 and edf029fc9591.
2013-04-18 00:25:28 +02:00
Victor Stinner
376cfa122d
Fix typo in unicode_decode_call_errorhandler_writer()
...
Bug introduced by changeset 7ed9993d53b4.
2013-04-17 23:58:16 +02:00
Victor Stinner
8f674ccd64
Close #17694 : Add minimum length to _PyUnicodeWriter
...
* Add also min_char attribute to _PyUnicodeWriter structure (currently unused)
* _PyUnicodeWriter_Init() has no more argument (except the writer itself):
min_length and overallocate must be set explicitly
* In error handlers, only enable overallocation if the replacement string
is longer than 1 character
* CJK decoders don't use overallocation anymore
* Set min_length, instead of preallocating memory using
_PyUnicodeWriter_Prepare(), in many decoders
* _PyUnicode_DecodeUnicodeInternal() checks for integer overflow
2013-04-17 23:02:17 +02:00
Victor Stinner
77282cb4f8
Cleanup PyUnicode_Contains()
...
* No need to double-check that strings are ready: test already done by
PyUnicode_FromObject()
* Remove useless kind variable (use kind1 instead)
2013-04-14 19:22:47 +02:00
Victor Stinner
d92e078c8d
Minor change: fix character in do_strip() for the ASCII case
2013-04-14 19:17:42 +02:00
Victor Stinner
f033510fee
Cleanup PyUnicode_Append()
...
* Check also that right is a Unicode object
* call directly resize_compact() instead of unicode_resize() for a more
explicit error handling, and to avoid testing some properties twice
(ex: unicode_modifiable())
2013-04-14 19:13:03 +02:00
Victor Stinner
4560f9c63f
PyUnicode_Join(): move use_memcpy test out of the loop to cleanup and optimize the code
2013-04-14 18:56:46 +02:00
Victor Stinner
55c08781e8
Optimize repr(str): use _PyUnicode_FastCopyCharacters() when no character is escaped
2013-04-14 18:45:39 +02:00
Victor Stinner
af03757d20
Optimize ascii(str): don't encode/decode repr if repr is already ASCII
2013-04-14 18:44:10 +02:00
Victor Stinner
8a1a6cffd6
Add _PyUnicodeWriter_WriteCharInline()
2013-04-14 02:35:33 +02:00
Serhiy Storchaka
e2cef885a2
Issue #16061 : Speed up str.replace() for replacing 1-character strings.
2013-04-13 22:45:04 +03:00
Victor Stinner
a0dd0213cc
Close #17693 : Rewrite CJK decoders to use the _PyUnicodeWriter API instead of
...
the legacy Py_UNICODE API.
Add also a new _PyUnicodeWriter_WriteChar() function.
2013-04-11 22:09:04 +02:00
Victor Stinner
247109e74d
Issue #17615 : On Windows (VS2010), Performances of wmemcmp() to compare Unicode
...
strings are not convincing. For UCS2 (16-bit wchar_t type), use a dummy loop
instead of wmemcmp(). The dummy loop is as fast, or a little bit faster.
wchar_t is only 16-bit long on Windows. wmemcmp() is still used for 32-bit
wchar_t.
2013-04-09 23:53:26 +02:00
Victor Stinner
0cff4b16d9
replace(): only call PyUnicode_DATA(u) once
2013-04-09 22:52:48 +02:00
Victor Stinner
cc7af72192
Write super-fast version of str.strip(), str.lstrip() and str.rstrip() for pure ASCII
2013-04-09 22:39:24 +02:00
Victor Stinner
f50a4e9bc9
Don't calls macros in PyUnicode_WRITE() parameters
...
PyUnicode_WRITE() expands some parameters twice or more.
2013-04-09 22:38:52 +02:00
Victor Stinner
9c79e41fc5
Fix do_strip(): don't call PyUnicode_READ() in Py_UNICODE_ISSPACE() to not call
...
it twice
2013-04-09 22:21:08 +02:00
Victor Stinner
b3a6014504
Fix _PyUnicode_XStrip()
...
Inline the BLOOM_MEMBER() to only call PyUnicode_READ() only once (per loop
iteration). Store also the length of the seperator in a variable to avoid calls
to PyUnicode_GET_LENGTH().
2013-04-09 22:19:21 +02:00
Victor Stinner
63d5c1a14a
Optimize PyUnicode_DecodeCharmap()
...
Avoid expensive PyUnicode_READ() and PyUnicode_WRITE(), manipulate pointers
instead.
2013-04-09 22:13:33 +02:00
Victor Stinner
a85af502a4
Optimize make_bloom_mask(), used by str.strip(), str.lstrip() and str.rstrip()
...
Write specialized functions per Unicode kind to avoid the expensive
PyUnicode_READ() macro.
2013-04-09 21:53:54 +02:00
Victor Stinner
69ed0f4c86
Use PyUnicode_READ() instead of PyUnicode_READ_CHAR()
...
"PyUnicode_READ_CHAR() is less efficient than PyUnicode_READ() because it calls
PyUnicode_KIND() and might call it twice." according to its documentation.
2013-04-09 21:48:24 +02:00
Victor Stinner
03c3e35d42
Add fast-path in PyUnicode_DecodeCharmap() for pure 8 bit encodings:
...
cp037, cp500 and iso8859_1 codecs
2013-04-09 21:53:09 +02:00
Victor Stinner
cd777eaf53
Issue #17615 : Comparing two Unicode strings now uses wmemcmp() when possible
...
wmemcmp() is twice faster than a dummy loop (342 usec vs 744 usec) on Fedora
18/x86_64, GCC 4.7.2.
2013-04-08 22:43:44 +02:00
Victor Stinner
c1302bba4c
Issue #17615 : Expand expensive PyUnicode_READ() macro in unicode_compare():
...
write specialized functions for each combination of Unicode kinds.
2013-04-08 21:50:54 +02:00
Victor Stinner
207dd38726
fix unused variable
2013-04-03 03:14:58 +02:00
Victor Stinner
eb4b5ac8af
Close #16757 : Avoid calling the expensive _PyUnicode_FindMaxChar() function
...
when possible
2013-04-03 02:02:33 +02:00
Victor Stinner
cfc4c13b04
Add _PyUnicodeWriter_WriteSubstring() function
...
Write a function to enable more optimizations:
* If the substring is the whole string and overallocation is disabled, just
keep a reference to the string, don't copy characters
* Avoid a call to the expensive _PyUnicode_FindMaxChar() function when
possible
2013-04-03 01:48:39 +02:00
Raymond Hettinger
51612fd803
merge
2013-03-23 08:21:52 -07:00
Raymond Hettinger
378170d5d9
Issue 17447: Clarify that str.isidentifier doesn't check for reserved keywords.
2013-03-23 08:21:12 -07:00
Victor Stinner
fb84b5d48d
(Merge 3.3) _PyUnicode_Writer() now also reuses Unicode singletons:
...
empty string and latin1 single character
2013-03-06 19:29:09 +01:00
Victor Stinner
2cb16aa3cb
_PyUnicode_Writer() now also reuses Unicode singletons:
...
empty string and latin1 single character
2013-03-06 19:28:37 +01:00
Victor Stinner
cf77da9fb5
Backed out changeset b9f7b1bf36aa
2013-03-06 01:09:24 +01:00
Victor Stinner
313cac88c5
Issue #17223 : Fix PyUnicode_FromUnicode() on Windows (16-bit wchar_t type)
...
to reject invalid UTF-16 surrogate.
2013-03-06 00:41:50 +01:00
Victor Stinner
36025478bf
(Merge 3.3) Issue #17223 : Fix PyUnicode_FromUnicode() for string of 1 character
...
outside the range U+0000-U+10ffff.
2013-02-26 00:16:57 +01:00
Victor Stinner
d21b58c05d
Issue #17223 : Fix PyUnicode_FromUnicode() for string of 1 character outside
...
the range U+0000-U+10ffff.
2013-02-26 00:15:54 +01:00
Victor Stinner
cfd2c1b4cc
(Merge 3.3) Issue #17137 : When an Unicode string is resized, the internal wide
...
character string (wstr) format is now cleared.
2013-02-07 23:17:34 +01:00
Victor Stinner
bbbac2ec34
Issue #17137 : When an Unicode string is resized, the internal wide character
...
string (wstr) format is now cleared.
2013-02-07 23:12:46 +01:00
Serhiy Storchaka
d0c79dcda5
Issue #17043 : The unicode-internal decoder no longer read past the end of
...
input buffer.
2013-02-07 16:26:55 +02:00
Serhiy Storchaka
03ee12ed72
Issue #17043 : The unicode-internal decoder no longer read past the end of
...
input buffer.
2013-02-07 16:25:25 +02:00
Serhiy Storchaka
3fd4ab356d
Issue #17043 : The unicode-internal decoder no longer read past the end of
...
input buffer.
2013-02-07 16:23:21 +02:00
Serhiy Storchaka
2aee6a6460
Issue #16971 : Fix a refleak in the charmap decoder.
2013-01-29 12:16:57 +02:00
Serhiy Storchaka
afb1cb5579
Issue #16971 : Fix a refleak in the charmap decoder.
2013-01-29 12:13:22 +02:00
Serhiy Storchaka
8fe5a9f9c3
Issue #16979 : Fix error handling bugs in the unicode-escape-decode decoder.
2013-01-29 10:37:39 +02:00
Serhiy Storchaka
24193debd4
Issue #16979 : Fix error handling bugs in the unicode-escape-decode decoder.
2013-01-29 10:28:07 +02:00
Serhiy Storchaka
d679377be7
Issue #16979 : Fix error handling bugs in the unicode-escape-decode decoder.
2013-01-29 10:20:44 +02:00
Serhiy Storchaka
ed3c4128c0
Issue #10156 : In the interpreter's initialization phase, unicode globals
...
are now initialized dynamically as needed.
2013-01-26 12:18:17 +02:00
Serhiy Storchaka
678db84b37
Issue #10156 : In the interpreter's initialization phase, unicode globals
...
are now initialized dynamically as needed.
2013-01-26 12:16:36 +02:00
Serhiy Storchaka
059972535f
Issue #10156 : In the interpreter's initialization phase, unicode globals
...
are now initialized dynamically as needed.
2013-01-26 12:14:02 +02:00
Serhiy Storchaka
570c5b2354
Issue #16980 : Fix processing of escaped non-ascii bytes in the
...
unicode-escape-decode decoder.
2013-01-25 23:53:29 +02:00
Serhiy Storchaka
73e38809e0
Issue #16980 : Fix processing of escaped non-ascii bytes in the
...
unicode-escape-decode decoder.
2013-01-25 23:52:21 +02:00
Serhiy Storchaka
6481bfb2b5
Issue #16335 : Fix integer overflow in unicode-escape decoder.
2013-01-21 11:44:40 +02:00