Serhiy Storchaka
35804e4c63
Issue #19279 : UTF-7 decoder no more produces illegal strings.
2013-10-19 20:38:19 +03:00
Larry Hastings
3182680210
Issue #16612 : Add "Argument Clinic", a compile-time preprocessor
...
for C files to generate argument parsing code. (See PEP 436.)
2013-10-19 00:09:25 -07:00
Ethan Furman
fb13721b1b
Close #18780 : %-formatting now prints value for int subclasses with %d, %i, and %u codes.
2013-08-31 10:18:55 -07:00
Antoine Pitrou
9ed5f27266
Issue #18722 : Remove uses of the "register" keyword in C code.
2013-08-13 20:18:52 +02:00
Raymond Hettinger
e56666d17f
Silence compiler warning about an uninitialized variable
2013-08-04 11:51:03 -07:00
Raymond Hettinger
5ed1b38a7d
merge
2013-08-04 11:51:35 -07:00
Christian Heimes
b578735dff
Check return value of PyType_Ready(&EncodingMapType)
...
CID 486654
2013-07-20 14:57:28 +02:00
Christian Heimes
26532f7519
Check return value of PyType_Ready(&EncodingMapType)
...
CID 486654
2013-07-20 14:57:16 +02:00
Victor Stinner
e699e5a218
Issue #18408 : Don't check unicode consistency in _PyUnicode_HAS_UTF8_MEMORY()
...
and _PyUnicode_HAS_WSTR_MEMORY() macros
These macros are called in unicode_dealloc(), whereas the unicode object can be
"inconsistent" if the creation of the object failed.
For example, when unicode_subtype_new() fails on a memory allocation,
_PyUnicode_CheckConsistency() fails with an assertion error because data is
NULL.
2013-07-15 18:22:47 +02:00
Victor Stinner
9e6b4d715c
Issue #18408 : _PyUnicodeWriter_Finish() now clears its buffer attribute in all
...
cases, so _PyUnicodeWriter_Dealloc() can be called after finish.
2013-07-09 00:37:24 +02:00
Victor Stinner
15a0bd3965
Issue #18408 : Fix _PyUnicodeWriter_Finish(): clear writer->buffer,
...
so _PyUnicodeWriter_Dealloc() can be called on the writer after finish.
2013-07-08 22:29:55 +02:00
Victor Stinner
6f8eeee7b9
Issue #18203 : Fix _Py_DecodeUTF8_surrogateescape(), use PyMem_RawMalloc() as _Py_char2wchar()
2013-07-07 22:57:45 +02:00
Victor Stinner
1a7425f67a
Issue #18203 : Replace malloc() with PyMem_RawMalloc() at Python initialization
...
* Replace malloc() with PyMem_RawMalloc()
* Replace PyMem_Malloc() with PyMem_RawMalloc() where the GIL is not held.
* _Py_char2wchar() now returns a buffer allocated by PyMem_RawMalloc(), instead
of PyMem_Malloc()
2013-07-07 16:25:15 +02:00
Christian Heimes
d47802eef7
Fix ref leak in error case of unicode find, count, formatlong
...
CID 983315: Resource leak (RESOURCE_LEAK)
CID 983316: Resource leak (RESOURCE_LEAK)
CID 983317: Resource leak (RESOURCE_LEAK)
2013-06-29 21:33:36 +02:00
Christian Heimes
d47a0456b1
Fix ref leak in error case of unicode index
...
CID 983319 (#1 of 2): Resource leak (RESOURCE_LEAK)
leaked_storage: Variable substring going out of scope leaks the storage it points to.
2013-06-29 21:21:37 +02:00
Christian Heimes
ea71a525c3
Fix ref leak in error case of unicode rindex and rfind
...
CID 983320: Resource leak (RESOURCE_LEAK)
CID 983321: Resource leak (RESOURCE_LEAK)
leaked_storage: Variable substring going out of scope leaks the storage it points to.
2013-06-29 21:17:34 +02:00
Christian Heimes
305e49e17e
Fix memory leak in endswith
...
CID 1040368 (#1 of 1): Resource leak (RESOURCE_LEAK)
leaked_storage: Variable substring going out of scope leaks the storage it points to.
2013-06-29 20:41:06 +02:00
Serhiy Storchaka
c89533f72f
Issue #18184 : PyUnicode_FromFormat() and PyUnicode_FromFormatV() now raise
...
OverflowError when an argument of %c format is out of range.
2013-06-23 20:21:16 +03:00
Serhiy Storchaka
8eeae2126c
Issue #18184 : PyUnicode_FromFormat() and PyUnicode_FromFormatV() now raise
...
OverflowError when an argument of %c format is out of range.
2013-06-23 20:12:14 +03:00
Benjamin Peterson
3164f5d565
merge 3.3 ( #18183 )
2013-06-10 09:24:01 -07:00
Benjamin Peterson
7e30373126
remove MAX_MAXCHAR because it's unsafe for computing maximum codepoitn value (see #18183 )
2013-06-10 09:19:46 -07:00
Victor Stinner
9f067f490f
Issue #9566 : Fix compiler warning on Windows 64-bit
2013-06-05 00:21:31 +02:00
Antoine Pitrou
7ce35a1816
Issue #17237 : Fix crash in the ASCII decoder on m68k.
2013-05-11 15:59:37 +02:00
Antoine Pitrou
8b0e98426d
Issue #17237 : Fix crash in the ASCII decoder on m68k.
2013-05-11 15:58:34 +02:00
Victor Stinner
f4f24248dc
Fix uninitialized value in charmap_decode_mapping()
2013-05-07 01:01:31 +02:00
Victor Stinner
8cecc8c262
Issue #7330 : Implement width and precision (ex: "%5.3s") for the format string
...
of PyUnicode_FromFormat() function, original patch written by Ysj Ray.
2013-05-06 23:11:54 +02:00
Victor Stinner
bb4503f61e
Partial revert of changeset 9744b2df134c
...
PyUnicode_Append() cannot call directly resize_compact(): I forgot that a
string can be ready *and* not compact (a legacy string can also be ready).
2013-04-18 09:41:34 +02:00
Victor Stinner
fb161b1b6d
Split PyUnicode_DecodeCharmap() into subfunction for readability
2013-04-18 01:44:27 +02:00
Victor Stinner
170ca6f84b
Fix bug in Unicode decoders related to _PyUnicodeWriter
...
Bug introduced by changesets 7ed9993d53b4 and edf029fc9591.
2013-04-18 00:25:28 +02:00
Victor Stinner
376cfa122d
Fix typo in unicode_decode_call_errorhandler_writer()
...
Bug introduced by changeset 7ed9993d53b4.
2013-04-17 23:58:16 +02:00
Victor Stinner
8f674ccd64
Close #17694 : Add minimum length to _PyUnicodeWriter
...
* Add also min_char attribute to _PyUnicodeWriter structure (currently unused)
* _PyUnicodeWriter_Init() has no more argument (except the writer itself):
min_length and overallocate must be set explicitly
* In error handlers, only enable overallocation if the replacement string
is longer than 1 character
* CJK decoders don't use overallocation anymore
* Set min_length, instead of preallocating memory using
_PyUnicodeWriter_Prepare(), in many decoders
* _PyUnicode_DecodeUnicodeInternal() checks for integer overflow
2013-04-17 23:02:17 +02:00
Victor Stinner
77282cb4f8
Cleanup PyUnicode_Contains()
...
* No need to double-check that strings are ready: test already done by
PyUnicode_FromObject()
* Remove useless kind variable (use kind1 instead)
2013-04-14 19:22:47 +02:00
Victor Stinner
d92e078c8d
Minor change: fix character in do_strip() for the ASCII case
2013-04-14 19:17:42 +02:00
Victor Stinner
f033510fee
Cleanup PyUnicode_Append()
...
* Check also that right is a Unicode object
* call directly resize_compact() instead of unicode_resize() for a more
explicit error handling, and to avoid testing some properties twice
(ex: unicode_modifiable())
2013-04-14 19:13:03 +02:00
Victor Stinner
4560f9c63f
PyUnicode_Join(): move use_memcpy test out of the loop to cleanup and optimize the code
2013-04-14 18:56:46 +02:00
Victor Stinner
55c08781e8
Optimize repr(str): use _PyUnicode_FastCopyCharacters() when no character is escaped
2013-04-14 18:45:39 +02:00
Victor Stinner
af03757d20
Optimize ascii(str): don't encode/decode repr if repr is already ASCII
2013-04-14 18:44:10 +02:00
Victor Stinner
8a1a6cffd6
Add _PyUnicodeWriter_WriteCharInline()
2013-04-14 02:35:33 +02:00
Serhiy Storchaka
e2cef885a2
Issue #16061 : Speed up str.replace() for replacing 1-character strings.
2013-04-13 22:45:04 +03:00
Victor Stinner
a0dd0213cc
Close #17693 : Rewrite CJK decoders to use the _PyUnicodeWriter API instead of
...
the legacy Py_UNICODE API.
Add also a new _PyUnicodeWriter_WriteChar() function.
2013-04-11 22:09:04 +02:00
Victor Stinner
247109e74d
Issue #17615 : On Windows (VS2010), Performances of wmemcmp() to compare Unicode
...
strings are not convincing. For UCS2 (16-bit wchar_t type), use a dummy loop
instead of wmemcmp(). The dummy loop is as fast, or a little bit faster.
wchar_t is only 16-bit long on Windows. wmemcmp() is still used for 32-bit
wchar_t.
2013-04-09 23:53:26 +02:00
Victor Stinner
0cff4b16d9
replace(): only call PyUnicode_DATA(u) once
2013-04-09 22:52:48 +02:00
Victor Stinner
cc7af72192
Write super-fast version of str.strip(), str.lstrip() and str.rstrip() for pure ASCII
2013-04-09 22:39:24 +02:00
Victor Stinner
f50a4e9bc9
Don't calls macros in PyUnicode_WRITE() parameters
...
PyUnicode_WRITE() expands some parameters twice or more.
2013-04-09 22:38:52 +02:00
Victor Stinner
9c79e41fc5
Fix do_strip(): don't call PyUnicode_READ() in Py_UNICODE_ISSPACE() to not call
...
it twice
2013-04-09 22:21:08 +02:00
Victor Stinner
b3a6014504
Fix _PyUnicode_XStrip()
...
Inline the BLOOM_MEMBER() to only call PyUnicode_READ() only once (per loop
iteration). Store also the length of the seperator in a variable to avoid calls
to PyUnicode_GET_LENGTH().
2013-04-09 22:19:21 +02:00
Victor Stinner
63d5c1a14a
Optimize PyUnicode_DecodeCharmap()
...
Avoid expensive PyUnicode_READ() and PyUnicode_WRITE(), manipulate pointers
instead.
2013-04-09 22:13:33 +02:00
Victor Stinner
a85af502a4
Optimize make_bloom_mask(), used by str.strip(), str.lstrip() and str.rstrip()
...
Write specialized functions per Unicode kind to avoid the expensive
PyUnicode_READ() macro.
2013-04-09 21:53:54 +02:00
Victor Stinner
69ed0f4c86
Use PyUnicode_READ() instead of PyUnicode_READ_CHAR()
...
"PyUnicode_READ_CHAR() is less efficient than PyUnicode_READ() because it calls
PyUnicode_KIND() and might call it twice." according to its documentation.
2013-04-09 21:48:24 +02:00
Victor Stinner
03c3e35d42
Add fast-path in PyUnicode_DecodeCharmap() for pure 8 bit encodings:
...
cp037, cp500 and iso8859_1 codecs
2013-04-09 21:53:09 +02:00
Victor Stinner
cd777eaf53
Issue #17615 : Comparing two Unicode strings now uses wmemcmp() when possible
...
wmemcmp() is twice faster than a dummy loop (342 usec vs 744 usec) on Fedora
18/x86_64, GCC 4.7.2.
2013-04-08 22:43:44 +02:00
Victor Stinner
c1302bba4c
Issue #17615 : Expand expensive PyUnicode_READ() macro in unicode_compare():
...
write specialized functions for each combination of Unicode kinds.
2013-04-08 21:50:54 +02:00
Victor Stinner
207dd38726
fix unused variable
2013-04-03 03:14:58 +02:00
Victor Stinner
eb4b5ac8af
Close #16757 : Avoid calling the expensive _PyUnicode_FindMaxChar() function
...
when possible
2013-04-03 02:02:33 +02:00
Victor Stinner
cfc4c13b04
Add _PyUnicodeWriter_WriteSubstring() function
...
Write a function to enable more optimizations:
* If the substring is the whole string and overallocation is disabled, just
keep a reference to the string, don't copy characters
* Avoid a call to the expensive _PyUnicode_FindMaxChar() function when
possible
2013-04-03 01:48:39 +02:00
Raymond Hettinger
51612fd803
merge
2013-03-23 08:21:52 -07:00
Raymond Hettinger
378170d5d9
Issue 17447: Clarify that str.isidentifier doesn't check for reserved keywords.
2013-03-23 08:21:12 -07:00
Victor Stinner
fb84b5d48d
(Merge 3.3) _PyUnicode_Writer() now also reuses Unicode singletons:
...
empty string and latin1 single character
2013-03-06 19:29:09 +01:00
Victor Stinner
2cb16aa3cb
_PyUnicode_Writer() now also reuses Unicode singletons:
...
empty string and latin1 single character
2013-03-06 19:28:37 +01:00
Victor Stinner
cf77da9fb5
Backed out changeset b9f7b1bf36aa
2013-03-06 01:09:24 +01:00
Victor Stinner
313cac88c5
Issue #17223 : Fix PyUnicode_FromUnicode() on Windows (16-bit wchar_t type)
...
to reject invalid UTF-16 surrogate.
2013-03-06 00:41:50 +01:00
Victor Stinner
36025478bf
(Merge 3.3) Issue #17223 : Fix PyUnicode_FromUnicode() for string of 1 character
...
outside the range U+0000-U+10ffff.
2013-02-26 00:16:57 +01:00
Victor Stinner
d21b58c05d
Issue #17223 : Fix PyUnicode_FromUnicode() for string of 1 character outside
...
the range U+0000-U+10ffff.
2013-02-26 00:15:54 +01:00
Victor Stinner
cfd2c1b4cc
(Merge 3.3) Issue #17137 : When an Unicode string is resized, the internal wide
...
character string (wstr) format is now cleared.
2013-02-07 23:17:34 +01:00
Victor Stinner
bbbac2ec34
Issue #17137 : When an Unicode string is resized, the internal wide character
...
string (wstr) format is now cleared.
2013-02-07 23:12:46 +01:00
Serhiy Storchaka
d0c79dcda5
Issue #17043 : The unicode-internal decoder no longer read past the end of
...
input buffer.
2013-02-07 16:26:55 +02:00
Serhiy Storchaka
03ee12ed72
Issue #17043 : The unicode-internal decoder no longer read past the end of
...
input buffer.
2013-02-07 16:25:25 +02:00
Serhiy Storchaka
3fd4ab356d
Issue #17043 : The unicode-internal decoder no longer read past the end of
...
input buffer.
2013-02-07 16:23:21 +02:00
Serhiy Storchaka
2aee6a6460
Issue #16971 : Fix a refleak in the charmap decoder.
2013-01-29 12:16:57 +02:00
Serhiy Storchaka
afb1cb5579
Issue #16971 : Fix a refleak in the charmap decoder.
2013-01-29 12:13:22 +02:00
Serhiy Storchaka
8fe5a9f9c3
Issue #16979 : Fix error handling bugs in the unicode-escape-decode decoder.
2013-01-29 10:37:39 +02:00
Serhiy Storchaka
24193debd4
Issue #16979 : Fix error handling bugs in the unicode-escape-decode decoder.
2013-01-29 10:28:07 +02:00
Serhiy Storchaka
d679377be7
Issue #16979 : Fix error handling bugs in the unicode-escape-decode decoder.
2013-01-29 10:20:44 +02:00
Serhiy Storchaka
ed3c4128c0
Issue #10156 : In the interpreter's initialization phase, unicode globals
...
are now initialized dynamically as needed.
2013-01-26 12:18:17 +02:00
Serhiy Storchaka
678db84b37
Issue #10156 : In the interpreter's initialization phase, unicode globals
...
are now initialized dynamically as needed.
2013-01-26 12:16:36 +02:00
Serhiy Storchaka
059972535f
Issue #10156 : In the interpreter's initialization phase, unicode globals
...
are now initialized dynamically as needed.
2013-01-26 12:14:02 +02:00
Serhiy Storchaka
570c5b2354
Issue #16980 : Fix processing of escaped non-ascii bytes in the
...
unicode-escape-decode decoder.
2013-01-25 23:53:29 +02:00
Serhiy Storchaka
73e38809e0
Issue #16980 : Fix processing of escaped non-ascii bytes in the
...
unicode-escape-decode decoder.
2013-01-25 23:52:21 +02:00
Serhiy Storchaka
6481bfb2b5
Issue #16335 : Fix integer overflow in unicode-escape decoder.
2013-01-21 11:44:40 +02:00
Serhiy Storchaka
c35f3a9f61
Issue #16335 : Fix integer overflow in unicode-escape decoder.
2013-01-21 11:42:57 +02:00
Serhiy Storchaka
4f5f0e54e0
Issue #16335 : Fix integer overflow in unicode-escape decoder.
2013-01-21 11:38:00 +02:00
Serhiy Storchaka
441d30fac7
Issue #15989 : Fix several occurrences of integer overflow
...
when result of PyLong_AsLong() narrowed to int without checks.
This is a backport of changesets 13e2e44db99d and 525407d89277.
2013-01-19 12:26:26 +02:00
Serhiy Storchaka
9101e23ff6
Issue #15989 : Fix several occurrences of integer overflow
...
when result of PyLong_AsLong() narrowed to int without checks.
This is a backport of changesets 13e2e44db99d and 525407d89277.
2013-01-19 12:41:45 +02:00
Serhiy Storchaka
55e2cb497b
Issue #14850 : Now a chamap decoder treates U+FFFE as "undefined mapping"
...
in any mapping, not only in an unicode string.
2013-01-15 15:30:04 +02:00
Serhiy Storchaka
45d16d9924
Issue #14850 : Now a chamap decoder treates U+FFFE as "undefined mapping"
...
in any mapping, not only in an unicode string.
2013-01-15 15:01:20 +02:00
Serhiy Storchaka
4fb8caee87
Issue #14850 : Now a chamap decoder treates U+FFFE as "undefined mapping"
...
in any mapping, not only in an unicode string.
2013-01-15 14:43:21 +02:00
Serhiy Storchaka
7898043868
Issue #15989 : Fix several occurrences of integer overflow
...
when result of PyLong_AsLong() narrowed to int without checks.
2013-01-15 01:12:17 +02:00
Benjamin Peterson
0b32a480bd
merge 3.3 ( #16906 )
2013-01-09 09:52:22 -06:00
Benjamin Peterson
0c270a8bb7
correct static string clearing loop ( closes #16906 )
2013-01-09 09:52:01 -06:00
Serhiy Storchaka
24a3ef6999
Issue #11461 : Fix the incremental UTF-16 decoder. Original patch by
...
Amaury Forgeot d'Arc. Added tests for partial decoding of non-BMP
characters.
2013-01-08 23:41:55 +02:00
Serhiy Storchaka
ae3b32ad6b
Issue #11461 : Fix the incremental UTF-16 decoder. Original patch by
...
Amaury Forgeot d'Arc. Added tests for partial decoding of non-BMP
characters.
2013-01-08 23:40:52 +02:00
Serhiy Storchaka
48e188e573
Issue #11461 : Fix the incremental UTF-16 decoder. Original patch by
...
Amaury Forgeot d'Arc. Added tests for partial decoding of non-BMP
characters.
2013-01-08 23:14:24 +02:00
Serhiy Storchaka
dec798eb46
Fix out of bound read in UTF-32 decoder on "narrow Unicode" builds.
2013-01-08 22:45:42 +02:00
Serhiy Storchaka
4e02538bf3
Issue #16856 : Fix a segmentation fault from calling repr() on a dict with
...
a key whose repr raise an exception.
2013-01-04 12:40:35 +02:00
Serhiy Storchaka
6c83e739d7
Issue #16856 : Fix a segmentation fault from calling repr() on a dict with
...
a key whose repr raise an exception.
2013-01-04 12:39:34 +02:00
Victor Stinner
18aa4477d3
Close #16281 : handle tailmatch() failure and remove useless comment
...
"honor direction and do a forward or backwards search": the runtime speed may
be different, but I consider that it doesn't really matter in practice. The
direction was never honored before: Python 2.7 uses memcmp() for the str type
for example.
2013-01-03 03:18:09 +01:00
Victor Stinner
7ae320d667
(Merge 3.2) Issue #16455 : On FreeBSD and Solaris, if the locale is C, the
...
ASCII/surrogateescape codec is now used, instead of the locale encoding, to
decode the command line arguments. This change fixes inconsistencies with
os.fsencode() and os.fsdecode() because these operating systems announces an
ASCII locale encoding, whereas the ISO-8859-1 encoding is used in practice.
2013-01-03 01:21:07 +01:00
Victor Stinner
20b654acb5
Issue #16455 : On FreeBSD and Solaris, if the locale is C, the
...
ASCII/surrogateescape codec is now used, instead of the locale encoding, to
decode the command line arguments. This change fixes inconsistencies with
os.fsencode() and os.fsdecode() because these operating systems announces an
ASCII locale encoding, whereas the ISO-8859-1 encoding is used in practice.
2013-01-03 01:08:58 +01:00
Andrew Svetlov
2606a6f197
Issue #16719 : Get rid of WindowsError. Use OSError instead
...
Patch by Serhiy Storchaka.
2012-12-19 14:33:35 +02:00
Gregory P. Smith
27dc02e8c5
Fix the internals of our hash functions to used unsigned values during hash
...
computation as the overflow behavior of signed integers is undefined.
NOTE: This change is smaller compared to 3.2 as much of this cleanup had
already been done. I added the comment that my change in 3.2 added so that the
code would match up. Otherwise this just adds or synchronizes appropriate UL
designations on some constants to be pedantic.
In practice we require compiling everything with -fwrapv which forces overflow
to be defined as twos compliment but this keeps the code cleaner for checkers
or in the case where someone has compiled it without -fwrapv or their
compiler's equivalent. We could work to get rid of the -fwrapv requirement
in 3.4 but that requires more planning.
Found by Clang trunk's Undefined Behavior Sanitizer (UBSan).
Cleanup only - no functionality or hash values change.
2012-12-10 19:51:29 -08:00