cpython

Commit Graph

Author	SHA1	Message	Date
Victor Stinner	0cff4b16d9	replace(): only call PyUnicode_DATA(u) once	2013-04-09 22:52:48 +02:00
Victor Stinner	cc7af72192	Write super-fast version of str.strip(), str.lstrip() and str.rstrip() for pure ASCII	2013-04-09 22:39:24 +02:00
Victor Stinner	f50a4e9bc9	Don't calls macros in PyUnicode_WRITE() parameters PyUnicode_WRITE() expands some parameters twice or more.	2013-04-09 22:38:52 +02:00
Victor Stinner	9c79e41fc5	Fix do_strip(): don't call PyUnicode_READ() in Py_UNICODE_ISSPACE() to not call it twice	2013-04-09 22:21:08 +02:00
Victor Stinner	b3a6014504	Fix _PyUnicode_XStrip() Inline the BLOOM_MEMBER() to only call PyUnicode_READ() only once (per loop iteration). Store also the length of the seperator in a variable to avoid calls to PyUnicode_GET_LENGTH().	2013-04-09 22:19:21 +02:00
Victor Stinner	63d5c1a14a	Optimize PyUnicode_DecodeCharmap() Avoid expensive PyUnicode_READ() and PyUnicode_WRITE(), manipulate pointers instead.	2013-04-09 22:13:33 +02:00
Victor Stinner	a85af502a4	Optimize make_bloom_mask(), used by str.strip(), str.lstrip() and str.rstrip() Write specialized functions per Unicode kind to avoid the expensive PyUnicode_READ() macro.	2013-04-09 21:53:54 +02:00
Victor Stinner	69ed0f4c86	Use PyUnicode_READ() instead of PyUnicode_READ_CHAR() "PyUnicode_READ_CHAR() is less efficient than PyUnicode_READ() because it calls PyUnicode_KIND() and might call it twice." according to its documentation.	2013-04-09 21:48:24 +02:00
Victor Stinner	03c3e35d42	Add fast-path in PyUnicode_DecodeCharmap() for pure 8 bit encodings: cp037, cp500 and iso8859_1 codecs	2013-04-09 21:53:09 +02:00
Victor Stinner	cd777eaf53	Issue #17615 : Comparing two Unicode strings now uses wmemcmp() when possible wmemcmp() is twice faster than a dummy loop (342 usec vs 744 usec) on Fedora 18/x86_64, GCC 4.7.2.	2013-04-08 22:43:44 +02:00
Victor Stinner	c1302bba4c	Issue #17615 : Expand expensive PyUnicode_READ() macro in unicode_compare(): write specialized functions for each combination of Unicode kinds.	2013-04-08 21:50:54 +02:00
Victor Stinner	207dd38726	fix unused variable	2013-04-03 03:14:58 +02:00
Victor Stinner	eb4b5ac8af	Close #16757 : Avoid calling the expensive _PyUnicode_FindMaxChar() function when possible	2013-04-03 02:02:33 +02:00
Victor Stinner	cfc4c13b04	Add _PyUnicodeWriter_WriteSubstring() function Write a function to enable more optimizations: * If the substring is the whole string and overallocation is disabled, just keep a reference to the string, don't copy characters * Avoid a call to the expensive _PyUnicode_FindMaxChar() function when possible	2013-04-03 01:48:39 +02:00
Raymond Hettinger	51612fd803	merge	2013-03-23 08:21:52 -07:00
Raymond Hettinger	378170d5d9	Issue 17447: Clarify that str.isidentifier doesn't check for reserved keywords.	2013-03-23 08:21:12 -07:00
Victor Stinner	fb84b5d48d	(Merge 3.3) _PyUnicode_Writer() now also reuses Unicode singletons: empty string and latin1 single character	2013-03-06 19:29:09 +01:00
Victor Stinner	2cb16aa3cb	_PyUnicode_Writer() now also reuses Unicode singletons: empty string and latin1 single character	2013-03-06 19:28:37 +01:00
Victor Stinner	cf77da9fb5	Backed out changeset b9f7b1bf36aa	2013-03-06 01:09:24 +01:00
Victor Stinner	313cac88c5	Issue #17223 : Fix PyUnicode_FromUnicode() on Windows (16-bit wchar_t type) to reject invalid UTF-16 surrogate.	2013-03-06 00:41:50 +01:00
Victor Stinner	36025478bf	(Merge 3.3) Issue #17223 : Fix PyUnicode_FromUnicode() for string of 1 character outside the range U+0000-U+10ffff.	2013-02-26 00:16:57 +01:00
Victor Stinner	d21b58c05d	Issue #17223 : Fix PyUnicode_FromUnicode() for string of 1 character outside the range U+0000-U+10ffff.	2013-02-26 00:15:54 +01:00
Victor Stinner	cfd2c1b4cc	(Merge 3.3) Issue #17137 : When an Unicode string is resized, the internal wide character string (wstr) format is now cleared.	2013-02-07 23:17:34 +01:00
Victor Stinner	bbbac2ec34	Issue #17137 : When an Unicode string is resized, the internal wide character string (wstr) format is now cleared.	2013-02-07 23:12:46 +01:00
Serhiy Storchaka	d0c79dcda5	Issue #17043 : The unicode-internal decoder no longer read past the end of input buffer.	2013-02-07 16:26:55 +02:00
Serhiy Storchaka	03ee12ed72	Issue #17043 : The unicode-internal decoder no longer read past the end of input buffer.	2013-02-07 16:25:25 +02:00
Serhiy Storchaka	3fd4ab356d	Issue #17043 : The unicode-internal decoder no longer read past the end of input buffer.	2013-02-07 16:23:21 +02:00
Serhiy Storchaka	2aee6a6460	Issue #16971 : Fix a refleak in the charmap decoder.	2013-01-29 12:16:57 +02:00
Serhiy Storchaka	afb1cb5579	Issue #16971 : Fix a refleak in the charmap decoder.	2013-01-29 12:13:22 +02:00
Serhiy Storchaka	8fe5a9f9c3	Issue #16979 : Fix error handling bugs in the unicode-escape-decode decoder.	2013-01-29 10:37:39 +02:00
Serhiy Storchaka	24193debd4	Issue #16979 : Fix error handling bugs in the unicode-escape-decode decoder.	2013-01-29 10:28:07 +02:00
Serhiy Storchaka	d679377be7	Issue #16979 : Fix error handling bugs in the unicode-escape-decode decoder.	2013-01-29 10:20:44 +02:00
Serhiy Storchaka	ed3c4128c0	Issue #10156 : In the interpreter's initialization phase, unicode globals are now initialized dynamically as needed.	2013-01-26 12:18:17 +02:00
Serhiy Storchaka	678db84b37	Issue #10156 : In the interpreter's initialization phase, unicode globals are now initialized dynamically as needed.	2013-01-26 12:16:36 +02:00
Serhiy Storchaka	059972535f	Issue #10156 : In the interpreter's initialization phase, unicode globals are now initialized dynamically as needed.	2013-01-26 12:14:02 +02:00
Serhiy Storchaka	570c5b2354	Issue #16980 : Fix processing of escaped non-ascii bytes in the unicode-escape-decode decoder.	2013-01-25 23:53:29 +02:00
Serhiy Storchaka	73e38809e0	Issue #16980 : Fix processing of escaped non-ascii bytes in the unicode-escape-decode decoder.	2013-01-25 23:52:21 +02:00
Serhiy Storchaka	6481bfb2b5	Issue #16335 : Fix integer overflow in unicode-escape decoder.	2013-01-21 11:44:40 +02:00
Serhiy Storchaka	c35f3a9f61	Issue #16335 : Fix integer overflow in unicode-escape decoder.	2013-01-21 11:42:57 +02:00
Serhiy Storchaka	4f5f0e54e0	Issue #16335 : Fix integer overflow in unicode-escape decoder.	2013-01-21 11:38:00 +02:00
Serhiy Storchaka	441d30fac7	Issue #15989 : Fix several occurrences of integer overflow when result of PyLong_AsLong() narrowed to int without checks. This is a backport of changesets 13e2e44db99d and 525407d89277.	2013-01-19 12:26:26 +02:00
Serhiy Storchaka	9101e23ff6	Issue #15989 : Fix several occurrences of integer overflow when result of PyLong_AsLong() narrowed to int without checks. This is a backport of changesets 13e2e44db99d and 525407d89277.	2013-01-19 12:41:45 +02:00
Serhiy Storchaka	55e2cb497b	Issue #14850 : Now a chamap decoder treates U+FFFE as "undefined mapping" in any mapping, not only in an unicode string.	2013-01-15 15:30:04 +02:00
Serhiy Storchaka	45d16d9924	Issue #14850 : Now a chamap decoder treates U+FFFE as "undefined mapping" in any mapping, not only in an unicode string.	2013-01-15 15:01:20 +02:00
Serhiy Storchaka	4fb8caee87	Issue #14850 : Now a chamap decoder treates U+FFFE as "undefined mapping" in any mapping, not only in an unicode string.	2013-01-15 14:43:21 +02:00
Serhiy Storchaka	7898043868	Issue #15989 : Fix several occurrences of integer overflow when result of PyLong_AsLong() narrowed to int without checks.	2013-01-15 01:12:17 +02:00
Benjamin Peterson	0b32a480bd	merge 3.3 (#16906 )	2013-01-09 09:52:22 -06:00
Benjamin Peterson	0c270a8bb7	correct static string clearing loop (closes #16906 )	2013-01-09 09:52:01 -06:00
Serhiy Storchaka	24a3ef6999	Issue #11461 : Fix the incremental UTF-16 decoder. Original patch by Amaury Forgeot d'Arc. Added tests for partial decoding of non-BMP characters.	2013-01-08 23:41:55 +02:00
Serhiy Storchaka	ae3b32ad6b	Issue #11461 : Fix the incremental UTF-16 decoder. Original patch by Amaury Forgeot d'Arc. Added tests for partial decoding of non-BMP characters.	2013-01-08 23:40:52 +02:00
Serhiy Storchaka	48e188e573	Issue #11461 : Fix the incremental UTF-16 decoder. Original patch by Amaury Forgeot d'Arc. Added tests for partial decoding of non-BMP characters.	2013-01-08 23:14:24 +02:00
Serhiy Storchaka	dec798eb46	Fix out of bound read in UTF-32 decoder on "narrow Unicode" builds.	2013-01-08 22:45:42 +02:00
Serhiy Storchaka	4e02538bf3	Issue #16856 : Fix a segmentation fault from calling repr() on a dict with a key whose repr raise an exception.	2013-01-04 12:40:35 +02:00
Serhiy Storchaka	6c83e739d7	Issue #16856 : Fix a segmentation fault from calling repr() on a dict with a key whose repr raise an exception.	2013-01-04 12:39:34 +02:00
Victor Stinner	18aa4477d3	Close #16281 : handle tailmatch() failure and remove useless comment "honor direction and do a forward or backwards search": the runtime speed may be different, but I consider that it doesn't really matter in practice. The direction was never honored before: Python 2.7 uses memcmp() for the str type for example.	2013-01-03 03:18:09 +01:00
Victor Stinner	7ae320d667	(Merge 3.2) Issue #16455 : On FreeBSD and Solaris, if the locale is C, the ASCII/surrogateescape codec is now used, instead of the locale encoding, to decode the command line arguments. This change fixes inconsistencies with os.fsencode() and os.fsdecode() because these operating systems announces an ASCII locale encoding, whereas the ISO-8859-1 encoding is used in practice.	2013-01-03 01:21:07 +01:00
Victor Stinner	20b654acb5	Issue #16455 : On FreeBSD and Solaris, if the locale is C, the ASCII/surrogateescape codec is now used, instead of the locale encoding, to decode the command line arguments. This change fixes inconsistencies with os.fsencode() and os.fsdecode() because these operating systems announces an ASCII locale encoding, whereas the ISO-8859-1 encoding is used in practice.	2013-01-03 01:08:58 +01:00
Andrew Svetlov	2606a6f197	Issue #16719 : Get rid of WindowsError. Use OSError instead Patch by Serhiy Storchaka.	2012-12-19 14:33:35 +02:00
Gregory P. Smith	27dc02e8c5	Fix the internals of our hash functions to used unsigned values during hash computation as the overflow behavior of signed integers is undefined. NOTE: This change is smaller compared to 3.2 as much of this cleanup had already been done. I added the comment that my change in 3.2 added so that the code would match up. Otherwise this just adds or synchronizes appropriate UL designations on some constants to be pedantic. In practice we require compiling everything with -fwrapv which forces overflow to be defined as twos compliment but this keeps the code cleaner for checkers or in the case where someone has compiled it without -fwrapv or their compiler's equivalent. We could work to get rid of the -fwrapv requirement in 3.4 but that requires more planning. Found by Clang trunk's Undefined Behavior Sanitizer (UBSan). Cleanup only - no functionality or hash values change.	2012-12-10 19:51:29 -08:00
Gregory P. Smith	c2176e46d7	Fix the internals of our hash functions to used unsigned values during hash computation as the overflow behavior of signed integers is undefined. NOTE: This change is smaller compared to 3.2 as much of this cleanup had already been done. I added the comment that my change in 3.2 added so that the code would match up. Otherwise this just adds or synchronizes appropriate UL designations on some constants to be pedantic. In practice we require compiling everything with -fwrapv which forces overflow to be defined as twos compliment but this keeps the code cleaner for checkers or in the case where someone has compiled it without -fwrapv or their compiler's equivalent. Found by Clang trunk's Undefined Behavior Sanitizer (UBSan). Cleanup only - no functionality or hash values change.	2012-12-10 18:32:53 -08:00
Gregory P. Smith	27cbcd6241	Fix the internals of our hash functions to used unsigned values during hash computation as the overflow behavior of signed integers is undefined. In practice we require compiling everything with -fwrapv which forces overflow to be defined as twos compliment but this keeps the code cleaner for checkers or in the case where someone has compiled it without -fwrapv or their compiler's equivalent. Found by Clang trunk's Undefined Behavior Sanitizer (UBSan). Cleanup only - no functionality or hash values change.	2012-12-10 18:15:46 -08:00
Victor Stinner	8dbd421b4d	Cleanup unicodeobject.c * Remove micro-optization: (errors == "surrogateescape" \|\| strcmp(errors, "surrogateescape") == 0). Only use strcmp() * Initialize 'arg' members in unicode_format_arg() to help the compiler to diagnose real bugs and also make the code simpler to read	2012-12-04 09:30:24 +01:00
Victor Stinner	d45c7f8d74	Issue #16455 : On FreeBSD and Solaris, if the locale is C, the ASCII/surrogateescape codec is now used, instead of the locale encoding, to decode the command line arguments. This change fixes inconsistencies with os.fsencode() and os.fsdecode() because these operating systems announces an ASCII locale encoding, whereas the ISO-8859-1 encoding is used in practice.	2012-12-04 01:34:47 +01:00
Victor Stinner	2660e427d1	(Merge 3.2) Issue #16416 : On Mac OS X, operating system data are now always encoded/decoded to/from UTF-8/surrogateescape, instead of the locale encoding (which may be ASCII if no locale environment variable is set), to avoid inconsistencies with os.fsencode() and os.fsdecode() functions which are already using UTF-8/surrogateescape.	2012-12-03 12:48:53 +01:00
Victor Stinner	27b1ca29cc	Issue #16416 : On Mac OS X, operating system data are now always encoded/decoded to/from UTF-8/surrogateescape, instead of the locale encoding (which may be ASCII if no locale environment variable is set), to avoid inconsistencies with os.fsencode() and os.fsdecode() functions which are already using UTF-8/surrogateescape.	2012-12-03 12:47:59 +01:00
Antoine Pitrou	5439458a2a	Issue #16215 : Fix potential double memory free in str.replace(). Patch by Serhiy Storchaka.	2012-11-17 23:29:28 +01:00
Antoine Pitrou	6d5ad227a5	Issue #16215 : Fix potential double memory free in str.replace(). Patch by Serhiy Storchaka.	2012-11-17 23:28:17 +01:00
Victor Stinner	0d92c4f667	Issue #16416 : Fix error handling in _Py_wchar2char() _Py_char2wchar() functions	2012-11-12 23:32:21 +01:00
Victor Stinner	fc009eff9e	Close #16311 : Use the _PyUnicodeWriter API in text decoders * Remove unicode_widen(): replaced with _PyUnicodeWriter_Prepare() * Remove unicode_putchar(): replaced with PyUnicodeWriter_Prepare() + PyUnicode_WRITER() * When handling an decoding error, only overallocate the buffer by +25% instead of +100%	2012-11-07 00:36:38 +01:00
Ezio Melotti	cfa9636404	#8271 : merge with 3.3.	2012-11-04 23:23:09 +02:00
Ezio Melotti	f7ed5d111b	#8271 : the utf-8 decoder now outputs the correct number of U+FFFD characters when used with the "replace" error handler on invalid utf-8 sequences. Patch by Serhiy Storchaka, tests by Ezio Melotti.	2012-11-04 23:21:38 +02:00
Benjamin Peterson	7ff2094bc7	merge 3.3 (#16369 )	2012-10-30 23:31:12 -04:00
Benjamin Peterson	e8ea97fffb	merge 3.2 (#16369 )	2012-10-30 23:27:52 -04:00
Benjamin Peterson	c43112823b	initialize more global type objects (closes #16369 )	2012-10-30 23:21:10 -04:00
Victor Stinner	e64322e034	Close #14625 : Rewrite the UTF-32 decoder. It is now 3x to 4x faster Patch written by Serhiy Storchaka.	2012-10-30 23:12:47 +01:00
Victor Stinner	76df43de30	Issue #16330 : Use surrogate-related macros Patch written by Serhiy Storchaka.	2012-10-30 01:42:39 +01:00
Mark Dickinson	fb90c0934c	Issue #14700 : Fix buggy overflow checks for large precision and width in new-style and old-style formatting.	2012-10-28 10:18:03 +00:00
Victor Stinner	c6cf1ba29e	Replace usage of the deprecated Py_UNICODE_COPY() with Py_MEMCPY() in resize_copy()	2012-10-23 02:54:47 +02:00
Victor Stinner	fe75fb4b3e	Optimize _PyUnicode_HasNULChars(): use findchar() instead of PyUnicode_Contains()	2012-10-23 02:52:18 +02:00
Victor Stinner	6fa627578a	Inline raise_translate_exception(): it is only used once	2012-10-23 02:51:50 +02:00
Victor Stinner	e5567ad236	Optimize PyUnicode_RichCompare() for Py_EQ and Py_NE: always use memcmp()	2012-10-23 02:48:49 +02:00
Christian Heimes	743e0cd6b5	Issue #16166 : Add PY_LITTLE_ENDIAN and PY_BIG_ENDIAN macros and unified endianess detection and handling.	2012-10-17 23:52:17 +02:00
Chris Jerdonek	4a7df9aba9	Issue #14783 : Merge changes from 3.3.	2012-10-07 15:02:16 -07:00
Chris Jerdonek	042fa653ab	Issue #14783 : Merge changes from 3.2.	2012-10-07 14:56:27 -07:00
Chris Jerdonek	83fe2e1c22	Issue #14783 : Improve int() docstring and also str(), range(), and slice(). This commit rewrites the docstring for int() to incorporate the documentation changes made in issue #16036. It also switches the docstrings for int(), str(), range(), and slice() to use multi-line signatures.	2012-10-07 14:48:36 -07:00
Victor Stinner	4c63a972d1	Cleanup PyUnicode_FromFormatV() for zero padding Skip the "0" instead of parsing it twice: detect zero padding and then parsed as a digit of the width.	2012-10-06 23:55:33 +02:00
Victor Stinner	15a1136547	Issue #16147 : PyUnicode_FromFormatV() doesn't need anymore to allocate a buffer on the heap to format numbers.	2012-10-06 23:48:20 +02:00
Victor Stinner	ff5a848db5	Issue #16147 : PyUnicode_FromFormatV() now raises an error if the argument of '%c' is not in the range(0x110000).	2012-10-06 23:05:45 +02:00
Victor Stinner	3921e90c5a	Issue #16147 : PyUnicode_FromFormatV() now detects integer overflow when parsing width and precision	2012-10-06 23:05:00 +02:00
Victor Stinner	e215d960be	Issue #16147 : Rewrite PyUnicode_FromFormatV() to use _PyUnicodeWriter API * Simplify the code: replace 4 steps with one unique step using the _PyUnicodeWriter API. PyUnicode_Format() has the same design. It avoids to store intermediate results which require to allocate an array of pointers on the heap. * Use the _PyUnicodeWriter API for speed (and its convinient API): overallocate the buffer to reduce the number of "realloc()" * Implement "width" and "precision" in Python, don't rely on sprintf(). It avoids to need of a temporary buffer allocated on the heap: only use a small buffer allocated in the stack. * Add _PyUnicodeWriter_WriteCstr() function * Split PyUnicode_FromFormatV() into two functions: add unicode_fromformat_arg(). * Inline parse_format_flags(): the format of an argument is now only parsed once, it's no more needed to have a subfunction. * Optimize PyUnicode_FromFormatV() for characters between two "%" arguments: search the next "%" and copy the substring in one chunk, instead of copying character per character.	2012-10-06 23:03:36 +02:00
Mark Dickinson	ff9c54aca2	Issue #16096 : Merge fixes from 3.3.	2012-10-06 18:05:14 +01:00
Mark Dickinson	c04ddff290	Issue #16096 : Fix several occurrences of potential signed integer overflow. Thanks Serhiy Storchaka.	2012-10-06 18:04:49 +01:00
Victor Stinner	8c6db45d3e	In debug mode, unicode_write_cstr() now checks that non-ASCII characters are not written into an ASCII string	2012-10-06 00:40:45 +02:00
Ezio Melotti	080a2c087e	#16127 : merge with 3.3.	2012-10-05 03:34:02 +03:00
Ezio Melotti	e7f90375b1	#16127 : remove outdated references to narrow builds. Patch by Serhiy Storchaka.	2012-10-05 03:33:31 +03:00
Victor Stinner	1929407406	Fix PyUnicode_Format(): return NULL if PyUnicode_READY(uformat) failed This error cannot occur in practice: PyUnicode_FromObject() always return a "ready" string.	2012-10-05 00:09:33 +02:00
Victor Stinner	770e19e0cc	Optimize unicode_compare(): use memcmp() when comparing two UCS1 strings	2012-10-04 22:59:45 +02:00
Victor Stinner	90db9c47dc	Enable also ptr==ptr optimization in PyUnicode_Compare() It was already implemented in PyUnicode_RichCompare()	2012-10-04 21:53:50 +02:00
Victor Stinner	aa7712711d	unicode_result_wchar(): move the assert() to the "#ifdef Py_DEBUG" block	2012-10-04 02:32:58 +02:00
Victor Stinner	a4708231e6	Split the huge PyUnicode_Format() function (+540 lines) into subfunctions	2012-10-04 02:19:54 +02:00
Victor Stinner	a049443fab	PyUnicode_Format(): disable overallocation when we are writing the last part of the output string	2012-10-03 23:03:46 +02:00
Victor Stinner	afffce489b	Unicode: resize_compact() and resize_inplace() fills also the Unicode strings with invalid bytes in debug mode, as done by PyUnicode_New()	2012-10-03 23:03:17 +02:00
Victor Stinner	c89d28fdfc	Issue #15609 : Fix refleak introduced by my last optimization	2012-10-02 12:54:07 +02:00
Victor Stinner	621ef3d84f	Issue #15609 : Optimize str%args for integer argument - Use _PyLong_FormatWriter() instead of formatlong() when possible, to avoid a temporary buffer - Enable the fast path when width is smaller or equals to the length, and when the precision is bigger or equals to the length - Add unit tests! - formatlong() uses PyUnicode_Resize() instead of _PyUnicode_FromASCII() to resize the output string	2012-10-02 00:33:47 +02:00
Antoine Pitrou	a1f7655fa7	Issue #15379 : Fix passing of non-BMP characters as integers for the charmap decoder (already working as unicode strings). Patch by Serhiy Storchaka.	2012-09-23 20:00:04 +02:00
Antoine Pitrou	6f80f5d444	Issue #15379 : Fix passing of non-BMP characters as integers for the charmap decoder (already working as unicode strings). Patch by Serhiy Storchaka.	2012-09-23 19:55:21 +02:00
Antoine Pitrou	ca8aa4acf6	Issue #15144 : Fix possible integer overflow when handling pointers as integer values, by using Py_uintptr_t instead of size_t. Patch by Serhiy Storchaka.	2012-09-20 20:56:47 +02:00
Christian Heimes	5f520f4fed	Issue #15900 : Fixed reference leak in PyUnicode_TranslateCharmap()	2012-09-11 14:03:25 +02:00
Christian Heimes	f4f9939a96	Fixed memory leak in error branch of formatfloat(). CID 719687	2012-09-10 11:48:41 +02:00
Antoine Pitrou	057119b0b7	Fix C++-style comment (xlc compilation failure)	2012-09-02 17:56:33 +02:00
Benjamin Peterson	59043f96ea	merge 3.2 (#15801 )	2012-08-28 18:01:45 -04:00
Benjamin Peterson	28a6cfaefc	use the stricter PyMapping_Check (closes #15801 )	2012-08-28 17:55:35 -04:00
Stefan Krah	8528c3145e	Issue #15728 : Fix leak in PyUnicode_AsWideCharString(). Found by Coverity.	2012-08-19 21:52:43 +02:00
Nick Coghlan	0e41628d35	Merge str docstring fix from 3.2	2012-08-16 14:14:30 +10:00
Nick Coghlan	573b1fd779	Fix str docstring	2012-08-16 14:13:07 +10:00
Antoine Pitrou	b4bbee25b1	Issue #14579 : Fix CVE-2012-2135: vulnerability in the utf-16 decoder after error handling. Patch by Serhiy Storchaka.	2012-07-21 00:45:14 +02:00
Mark Dickinson	01ac8b6ab1	Use correct types for ASCII_CHAR_MASK integer constants.	2012-07-07 14:08:48 +02:00
Antoine Pitrou	aaefac76dd	Issue #14874 : Restore charmap decoding speed to pre-PEP 393 levels. Patch by Serhiy Storchaka.	2012-06-16 22:48:21 +02:00
Victor Stinner	f185226244	_copy_characters(): move debug code at the top to avoid noisy #ifdef And don't use assert() anymore if check_maxchar is set: return -1 on error instead.	2012-06-16 16:38:26 +02:00
Victor Stinner	07621338fb	Fix PyUnicode_GetSize(): Don't replace _PyUnicode_Ready() exception	2012-06-16 04:53:46 +02:00
Victor Stinner	8a8b3eaabe	Fix a compiler warning in _copy_characters() and remove debug code	2012-06-16 04:53:25 +02:00
Victor Stinner	24e403bbee	Oops, fix my previous change on _copy_characters()	2012-06-16 04:53:00 +02:00
Victor Stinner	ca439eecea	Fix unicode_adjust_maxchar(): catch PyUnicode_New() failure	2012-06-16 03:17:34 +02:00
Victor Stinner	184252ad3f	Fix "%f" format of str%args if the result is not an ASCII or latin1 string	2012-06-16 02:57:41 +02:00
Victor Stinner	9a77770add	Remove debug code	2012-06-16 02:44:43 +02:00
Victor Stinner	c9d369f1bf	Optimize _PyUnicode_FastCopyCharacters() when maxchar(from) > maxchar(to)	2012-06-16 02:22:37 +02:00
Victor Stinner	f05e17ece9	unicodeobject.c: Remove debug code	2012-06-16 01:53:04 +02:00
Antoine Pitrou	27f6a3b0bf	Issue #15026 : utf-16 encoding is now significantly faster (up to 10x). Patch by Serhiy Storchaka.	2012-06-15 22:15:23 +02:00
Kristján Valur Jónsson	55e5dc8371	Rearrange code to beat an optimizer bug affecting Release x64 on windows with VS2010sp1	2012-06-06 21:58:08 +00:00
Victor Stinner	d7b7c7472b	Issue #14993 : Use standard "unsigned char" instead of a unsigned char bitfield	2012-06-04 22:52:12 +02:00
Kristjan Valur Jonsson	85634d7a2e	Issue #14909 : A number of places were using PyMem_Realloc() apis and PyObject_GC_Resize() with incorrect error handling. In case of errors, the original object would be leaked. This checkin fixes those cases.	2012-05-31 09:37:31 +00:00
Victor Stinner	3a7d096f2f	Issue #14744 : Fix compilation on Windows (part 2)	2012-05-29 18:53:56 +02:00
Victor Stinner	d3f0882dfb	Issue #14744 : Use the new _PyUnicodeWriter internal API to speed up str%args and str.format(args) * Formatting string, int, float and complex use the _PyUnicodeWriter API. It avoids a temporary buffer in most cases. * Add _PyUnicodeWriter_WriteStr() to restore the PyAccu optimization: just keep a reference to the string if the output is only composed of one string * Disable overallocation when formatting the last argument of str%args and str.format(args) * Overallocation allocates at least 100 characters: add min_length attribute to the _PyUnicodeWriter structure * Add new private functions: _PyUnicode_FastCopyCharacters(), _PyUnicode_FastFill() and _PyUnicode_FromASCII() The speed up is around 20% in average.	2012-05-29 12:57:52 +02:00
Antoine Pitrou	63065d761e	Issue #14624 : UTF-16 decoding is now 3x to 4x faster on various inputs. Patch by Serhiy Storchaka.	2012-05-15 23:48:04 +02:00
Martin v. Löwis	b05c0738d8	Silence VS 2010 signed/unsigned warnings.	2012-05-15 13:45:49 +02:00
Antoine Pitrou	758153badb	Fix refleaks introduced by 83da67651687.	2012-05-12 15:51:51 +02:00
Antoine Pitrou	e45c0c5cef	Fix logic error introduced by 83da67651687.	2012-05-12 15:49:07 +02:00
Benjamin Peterson	1ff2e35e84	simplify by shortcutting when the kind of the needle is larger than the haystack	2012-05-11 17:41:20 -05:00
Antoine Pitrou	ca5f91b888	Issue #14738 : Speed-up UTF-8 decoding on non-ASCII data. Patch by Serhiy Storchaka.	2012-05-10 16:36:02 +02:00
Victor Stinner	3b1a74a9c3	Rename unicode_write_t structure and its methods to "_PyUnicodeWriter"	2012-05-09 22:25:00 +02:00
Victor Stinner	ee4544c920	Issue #14744 : Inline unicode_writer_write_char() and unicode_write_str() Optimize also PyUnicode_Format(): call unicode_writer_prepare() only once per argument.	2012-05-09 22:24:08 +02:00
Victor Stinner	f59c28c930	unicode_writer_finish() checks string consistency	2012-05-09 03:24:14 +02:00
Victor Stinner	106802547c	Backout ab500b297900: the check for integer overflow is wrong Issue #14716: Change integer overflow check in unicode_writer_prepare() to compute the limit at compile time instead of runtime. Patch writen by Serhiy Storchaka.	2012-05-07 23:50:05 +02:00
Victor Stinner	0576f9b4cf	Issue #14716 : Change integer overflow check in unicode_writer_prepare() to compute the limit at compile time instead of runtime. Patch writen by Serhiy Storchaka.	2012-05-07 13:02:44 +02:00
Victor Stinner	202fdca133	Close #14716 : str.format() now uses the new "unicode writer" API instead of the PyAccu API. For example, it makes str.format() from 25% to 30% faster on Linux.	2012-05-07 12:47:02 +02:00
Mark Dickinson	99e2e5552a	Issue #14700 : Fix two broken and undefined-behaviour-inducing overflow checks in old-style string formatting. Thanks Serhiy Storchaka for report and original patch.	2012-05-07 11:20:50 +01:00
Victor Stinner	d0dba6eee8	unicode_writer: don't force inline when it is not necessary Keep inline for performance critical functions (functions used in loops)	2012-05-04 01:19:15 +02:00
Benjamin Peterson	b63f49f2b4	if the kind of the string to count is larger than the string to search, shortcut to 0	2012-05-03 18:31:07 -04:00
Victor Stinner	a7b654be30	unicode_writer: add finish() method and assertions to write_str() method * The write_str() method does nothing if the length is zero. * Replace "struct unicode_writer_t" with "unicode_writer_t"	2012-05-03 23:58:55 +02:00
Victor Stinner	bf4e266397	Issue #14687 : Remove redundant length attribute of unicode_write_t The length can be read directly from the buffer	2012-05-03 19:27:14 +02:00
Victor Stinner	7989157e49	Issue #14687 : Cleanup unicode_writer_prepare() "Inline" PyUnicode_Resize(): call directly resize_compact()	2012-05-03 13:43:07 +02:00
Victor Stinner	f2c76aa6cb	Issue #14687 : str%tuple now uses an optimistic "unicode writer" instead of an accumulator. Directly write characters into the output (don't use a temporary list): resize and widen the string on demand.	2012-05-03 13:10:40 +02:00
Victor Stinner	1b487b467b	Issue #14624 , #14687 : Optimize unicode_widen() Don't convert uninitialized characters. Patch written by Serhiy Storchaka.	2012-05-03 12:29:04 +02:00
Victor Stinner	3a7f7977f1	Remove buggy assertion in PyUnicode_Substring() Use also directly unicode_empty, instead of PyUnicode_New(0,0).	2012-05-03 03:36:40 +02:00
Victor Stinner	684d5fd420	Fix PyUnicode_Substring() for start >= length and start > end Remove the fast-path for 1-character string: unicode_fromascii() and _PyUnicode_FromUCS*() now have their own fast-path for 1-character strings.	2012-05-03 02:32:34 +02:00
Victor Stinner	b6cd014d75	Unicode: optimize creating of 1-character strings	2012-05-03 02:17:04 +02:00
Victor Stinner	bff7c96834	Issue #14687 : Optimize str%tuple for the "%(name)s" syntax Avoid an useless and expensive call to PyUnicode_READ().	2012-05-03 01:44:59 +02:00
Victor Stinner	e6abb488c9	unicodeobject.c: Add MAX_MAXCHAR() macro to (micro-)optimize the computation of the second argument of PyUnicode_New(). * Create also align_maxchar() function * Optimize fix_decimal_and_space_to_ascii(): don't compute the maximum character when ch <= 127 (it is ASCII)	2012-05-02 01:15:40 +02:00
Victor Stinner	438106b66e	Issue #14687 : Cleanup PyUnicode_Format()	2012-05-02 00:41:57 +02:00
Victor Stinner	b5c3ea3af3	Issue #14687 : Optimize str%args * formatfloat() uses unicode_fromascii() instead of PyUnicode_DecodeASCII() to not have to check characters, we know that it is really ASCII * Use PyUnicode_FromOrdinal() instead of _PyUnicode_FromUCS4() to format a character: if avoids a call to ucs4lib_find_max_char() to compute the maximum character (whereas we already know it, it is just the character itself)	2012-05-02 00:29:36 +02:00
Victor Stinner	b80e46eca4	Issue #14687 : Avoid an useless duplicated string in PyUnicode_Format()	2012-04-30 05:21:52 +02:00
Victor Stinner	aff3cc659b	Issue #14687 : Cleanup PyUnicode_Format()	2012-04-30 05:19:21 +02:00
Victor Stinner	b11d91d969	Fix my previous commit: bool is a long, restore the specical case for bool	2012-04-28 00:25:34 +02:00
Victor Stinner	d0880d57b0	Simplify and optimize formatlong() * Remove _PyBytes_FormatLong(): inline it into formatlong() * the input type is always a long, so remove the code for bool * don't duplicate the string if the length does not change * Use PyUnicode_DATA() instead of _PyUnicode_AsString()	2012-04-27 23:40:13 +02:00
Victor Stinner	94d558b063	Optimize _PyUnicode_FindMaxChar() find pure ASCII strings	2012-04-27 22:26:58 +02:00
Victor Stinner	8f825060f1	Check newly created consistency using _PyUnicode_CheckConsistency(str, 1) * In debug mode, fill the string data with invalid characters * Simplify also reference counting in PyCodec_BackslashReplaceErrors() and PyCodec_XMLCharRefReplaceError()	2012-04-27 13:55:39 +02:00
Victor Stinner	718fbf078c	_PyUnicode_CheckConsistency() ensures that the unicode string ends with a null character	2012-04-26 00:39:37 +02:00
Benjamin Peterson	b9f4c9daad	make pointer arith c89	2012-04-23 21:45:40 -04:00
Benjamin Peterson	f3b7d86e25	use correct base ptr	2012-04-23 18:07:01 -04:00
Benjamin Peterson	2844a7a6d3	simplify and reformat	2012-04-23 18:00:25 -04:00
Victor Stinner	ece58deb9f	Close #14648 : Compute correctly maxchar in str.format() for substrin	2012-04-23 23:36:38 +02:00
Benjamin Peterson	64ed576de8	merge 3.2 (#14509 )	2012-04-09 15:04:39 -04:00
Benjamin Peterson	ca819c3c9d	merge 3.1 (#14509 )	2012-04-09 15:01:02 -04:00
Benjamin Peterson	f6622c8a3e	fix build without Py_DEBUG and DNDEBUG (closes #14509 )	2012-04-09 14:53:07 -04:00
Victor Stinner	afb5205c48	Close #14249 : Use bit shifts instead of an union, it's more efficient. Patch written by Serhiy Storchaka	2012-04-05 22:54:49 +02:00
Victor Stinner	e7eee01f36	Close #14249 : Use an union instead of a long to short pointer to avoid aliasing issue. Speed up UTF-16 by 20%.	2012-04-05 13:44:34 +02:00
Antoine Pitrou	a701388de1	Rename _PyIter_GetBuiltin to _PyObject_GetBuiltin, and do not include it in the stable ABI.	2012-04-05 00:04:20 +02:00
Kristján Valur Jónsson	31668b8f7a	Issue #14288 : Serialization support for builtin iterators.	2012-04-03 10:49:41 +00:00
Benjamin Peterson	0df542985a	grammar	2012-03-26 14:50:32 -04:00
Benjamin Peterson	c067d6661f	merge 3.2	2012-03-25 22:41:16 -04:00
Benjamin Peterson	a8755c586e	kill this terribly outdated comment	2012-03-25 22:40:54 -04:00
Victor Stinner	0d03478b88	Remove an unused variable	2012-03-06 02:06:01 +01:00
Victor Stinner	c9590ad745	Close #14085 : remove assertions from PyUnicode_WRITE macro Add checks in PyUnicode_WriteChar() and convert PyUnicode_New() assertion to a test raising a Python exception.	2012-03-04 01:34:37 +01:00
Ezio Melotti	cda6b6d60d	#14081 : The sep and maxsplit parameter to str.split, bytes.split, and bytearray.split may now be passed as keyword arguments.	2012-02-26 09:39:55 +02:00
Victor Stinner	b0800dc53b	Oops, revert unwanted changes	2012-02-25 00:47:08 +01:00
Victor Stinner	abc649ddbe	Issue #14107 : fix bigmem tests on str.capitalize(), str.swapcase() and str.title(). Compute correctly how much memory is required for the test (memuse).	2012-02-25 00:43:27 +01:00
Antoine Pitrou	842c0f17eb	Fix compilation error under Windows (and warnings too).	2012-02-24 13:30:46 +01:00
Victor Stinner	90f50d4df9	Issue #13706 : Fix format(float, "n") for locale with non-ASCII decimal point (e.g. ps_aF)	2012-02-24 01:44:47 +01:00
Victor Stinner	41a863cb81	Issue #13706 : Fix format(int, "n") for locale with non-ASCII thousands separator * Decode thousands separator and decimal point using PyUnicode_DecodeLocale() (from the locale encoding), instead of decoding them implicitly from latin1 * Remove _PyUnicode_InsertThousandsGroupingLocale(), it was not used * Change _PyUnicode_InsertThousandsGrouping() API to return the maximum character if unicode is NULL * Replace MIN/MAX macros by Py_MIN/Py_MAX * stringlib/undef.h undefines STRINGLIB_IS_UNICODE * stringlib/localeutil.h only supports Unicode	2012-02-24 00:37:51 +01:00
Victor Stinner	b429d3b09c	Fix doc of an internal function: unicode_write_cstr()	2012-02-22 21:22:20 +01:00
Antoine Pitrou	ba6bafcfbe	Fix compile failure under Windows	2012-02-22 16:41:50 +01:00
Victor Stinner	c516610f0b	Optimize str%arg for number formats: %i, %d, %u, %x, %p Write a specialized function to write an ASCII/latin1 C char* string into a Python Unicode string.	2012-02-22 13:55:02 +01:00
Victor Stinner	99d7ad0bb0	Micro-optimize computation of maxchar in PyUnicode_TransformDecimalToASCII()	2012-02-22 13:37:39 +01:00
Victor Stinner	da79e632c4	Micro-optimize unicode_expandtabs(): use FILL() macro to write N spaces	2012-02-22 13:37:04 +01:00
Victor Stinner	15e9ed299c	PyUnicode_New() and unicode_putchar() check for MAX_UNICODE maximum (U+10FFFF)	2012-02-22 13:36:20 +01:00
Benjamin Peterson	d9a3591ed1	merge 3.2	2012-02-21 11:12:14 -05:00
Benjamin Peterson	e249dcab7a	merge 3.2	2012-02-21 11:09:13 -05:00
Benjamin Peterson	69e9727657	ensure no one tries to hash things before the random seed is found	2012-02-21 11:08:50 -05:00
Georg Brandl	16fa2a1097	Forgot the "empty string -> hash == 0" special case for strings.	2012-02-21 00:50:13 +01:00
Georg Brandl	2fb477c0f0	Merge 3.2: Issue #13703 plus some related test suite fixes.	2012-02-21 00:33:36 +01:00
Georg Brandl	09a7c72cad	Merge from 3.1: Issue #13703 : add a way to randomize the hash values of basic types (str, bytes, datetime) in order to make algorithmic complexity attacks on (e.g.) web apps much more complicated. The environment variable PYTHONHASHSEED and the new command line flag -R control this behavior.	2012-02-20 21:31:46 +01:00
Georg Brandl	2daf6ae249	Issue #13703 : add a way to randomize the hash values of basic types (str, bytes, datetime) in order to make algorithmic complexity attacks on (e.g.) web apps much more complicated. The environment variable PYTHONHASHSEED and the new command line flag -R control this behavior.	2012-02-20 19:54:16 +01:00
Victor Stinner	c3a6b02d70	(Merge 3.2) Issue #13913 : normalize utf-8 codec name in UTF-8 decoder	2012-02-14 01:18:10 +01:00
Victor Stinner	cbe01342bc	Issue #13913 : normalize utf-8 codec name in UTF-8 decoder	2012-02-14 01:17:45 +01:00
Victor Stinner	d1cd99b533	Backout d2c1521ad0a1: _Py_IDENTIFIER() uses UTF-8 again	2012-02-07 23:05:55 +01:00
Victor Stinner	d446d8e09a	_Py_Identifier are always ASCII strings	2012-02-05 01:45:45 +01:00
Antoine Pitrou	7ab4af0427	Issue #13848 : open() and the FileIO constructor now check for NUL characters in the file name. Patch by Hynek Schlawack.	2012-01-29 18:43:36 +01:00
Antoine Pitrou	1334884ff2	Issue #13848 : open() and the FileIO constructor now check for NUL characters in the file name. Patch by Hynek Schlawack.	2012-01-29 18:36:34 +01:00
Benjamin Peterson	eea4846d23	don't ready in case_operation, since most callers do it themselves	2012-01-16 14:28:50 -05:00
Gregory P. Smith	f5b62a9b31	Consolidate the occurrances of the prime used as the multiplier when hashing.	2012-01-14 15:45:13 -08:00
Gregory P. Smith	63e6c3222f	Consolidate the occurrances of the prime used as the multiplier when hashing to a single #define instead of having several copies in several files. This excludes the Modules/ tree (datetime and expat both have a copy for their own purposes with no need for it to be the same).	2012-01-14 15:31:34 -08:00
Benjamin Peterson	c8d8b8861e	fix possible refleaks if PyUnicode_READY fails	2012-01-14 13:37:31 -05:00
Benjamin Peterson	bac79498c8	always explicitly check for -1 from PyUnicode_READY	2012-01-14 13:34:47 -05:00
Benjamin Peterson	d5890c8db5	add str.casefold() (closes #13752 )	2012-01-14 13:23:30 -05:00
Benjamin Peterson	53aa1d7c57	fix possible if unlikely leak	2011-12-20 13:29:45 -06:00
Benjamin Peterson	e51757f6de	move do_title to a better place	2012-01-12 21:10:29 -05:00
Benjamin Peterson	821e4cfd01	make fix_decimal_and_space_to_ascii check if it modifies the string	2012-01-12 15:40:18 -05:00
Benjamin Peterson	0c91392fe6	kill capwords implementation which has been disabled since the begining	2012-01-12 15:25:41 -05:00
Benjamin Peterson	b2bf01d824	use full unicode mappings for upper/lower/title case (#12736 ) Also broaden the category of characters that count as lowercase/uppercase.	2012-01-11 18:17:06 -05:00
Victor Stinner	3fe553160c	Add a new PyUnicode_Fill() function It is faster than the unicode_fill() function which was implemented in formatter_unicode.c.	2012-01-04 00:33:50 +01:00
Benjamin Peterson	5e458f520c	also decref the right thing	2012-01-02 10:12:13 -06:00
Benjamin Peterson	4c13a4a352	ready the correct string	2012-01-02 09:07:38 -06:00
Benjamin Peterson	22a29708fd	fix some possible refleaks from PyUnicode_READY error conditions	2012-01-02 09:00:30 -06:00
Benjamin Peterson	9ca3ffac94	== -1 is convention	2012-01-01 16:04:29 -06:00
Benjamin Peterson	e157cf1012	make switch more robust	2012-01-01 15:56:20 -06:00
Benjamin Peterson	c0b95d18fa	4 space indentation	2011-12-20 17:24:05 -06:00
Benjamin Peterson	ead6b53659	fix spacing around switch statements	2011-12-20 17:23:42 -06:00
Benjamin Peterson	822c790527	merge 3.2	2011-12-20 13:32:50 -06:00
Victor Stinner	6099a03202	Issue #13624 : Write a specialized UTF-8 encoder to allow more optimization The main bottleneck was the PyUnicode_READ() macro.	2011-12-18 14:22:26 +01:00
Victor Stinner	73f53b57d1	Optimize str * n for len(str)==1 and UCS-2 or UCS-4	2011-12-18 03:26:31 +01:00
Victor Stinner	f644110816	Issue #13621 : Optimize str.replace(char1, char2) Use findchar() which is more optimized than a dummy loop using PyUnicode_READ(). PyUnicode_READ() is a complex and slow macro.	2011-12-18 02:43:08 +01:00
Victor Stinner	ab870218e3	Issue #10951 : Fix compiler warnings in timemodule.c and unicodeobject.c Thanks Jérémy Anger for the fix.	2011-12-17 22:39:43 +01:00
Victor Stinner	2f197078fb	The locale decoder raises a UnicodeDecodeError instead of an OSError Search the invalid character using mbrtowc().	2011-12-17 07:08:30 +01:00
Victor Stinner	1b57967b96	Issue #13560 : Locale codec functions use the classic "errors" parameter, instead of surrogateescape So it would be possible to support more error handlers later.	2011-12-17 05:47:23 +01:00
Victor Stinner	ab59594326	What's New in Python 3.3: complete the deprecation list Add also FIXMEs in unicodeobject.c	2011-12-17 04:59:06 +01:00
Victor Stinner	1f33f2b0c3	Issue #13560 : os.strerror() now uses the current locale encoding instead of UTF-8	2011-12-17 04:45:09 +01:00
Victor Stinner	f2ea71fcc8	Issue #13560 : Add PyUnicode_EncodeLocale() * Use PyUnicode_EncodeLocale() in time.strftime() if wcsftime() is not available * Document my last changes in Misc/NEWS	2011-12-17 04:13:41 +01:00
Victor Stinner	af02e1c85a	Add PyUnicode_DecodeLocaleAndSize() and PyUnicode_DecodeLocale() * PyUnicode_DecodeLocaleAndSize() and PyUnicode_DecodeLocale() decode a string from the current locale encoding * _Py_char2wchar() writes an "error code" in the size argument to indicate if the function failed because of memory allocation failure or because of a decoding error. The function doesn't write the error message directly to stderr. * Fix time.strftime() (if wcsftime() is missing): decode strftime() result from the current locale encoding, not from the filesystem encoding.	2011-12-16 23:56:01 +01:00
Victor Stinner	16e6a80923	PyUnicode_Resize(): warn about canonical representation Call also directly unicode_resize() in unicodeobject.c	2011-12-12 13:24:15 +01:00
Victor Stinner	b0a82a6a7f	Fix PyUnicode_Resize() for compact string: leave the string unchanged on error Fix also PyUnicode_Resize() doc	2011-12-12 13:08:33 +01:00
Victor Stinner	bf6e560d0c	Make PyUnicode_Copy() private => _PyUnicode_Copy() Undocument the function. Make also decode_utf8_errors() as private (static).	2011-12-12 01:53:47 +01:00
Victor Stinner	7a9105a380	resize_copy() now supports legacy ready strings	2011-12-12 00:13:42 +01:00
Victor Stinner	488fa49acf	Rewrite PyUnicode_Append(); unicode_modifiable() is more strict * Rename unicode_resizable() to unicode_modifiable() * Rename _PyUnicode_Dirty() to unicode_check_modifiable() to make it clear that the function is private * Inline PyUnicode_Concat() and unicode_append_inplace() in PyUnicode_Append() to simplify the code * unicode_modifiable() return 0 if the hash has been computed or if the string is not an exact unicode string * Remove _PyUnicode_DIRTY(): no need to reset the hash anymore, because if the hash has already been computed, you cannot modify a string inplace anymore * PyUnicode_Concat() checks for integer overflow	2011-12-12 00:01:39 +01:00
Victor Stinner	c4b495497a	Create unicode_result_unchanged() subfunction	2011-12-11 22:44:26 +01:00
Victor Stinner	eaab604829	Fix fixup() for unchanged unicode subtype If maxchar_new == 0 and self is a unicode subtype, return u instead of duplicating u.	2011-12-11 22:22:39 +01:00
Victor Stinner	e6b2d4407a	unicode_fromascii() doesn't check string content twice in debug mode _PyUnicode_CheckConsistency() also checks string content.	2011-12-11 21:54:30 +01:00
Victor Stinner	a1d12bb119	Call directly PyUnicode_DecodeUTF8Stateful() instead of PyUnicode_DecodeUTF8() * Remove micro-optimization from PyUnicode_FromStringAndSize(): PyUnicode_DecodeUTF8Stateful() has already these optimizations (for size=0 and one ascii char). * Rename utf8_max_char_size_and_char_count() to utf8_scanner(), and remove an useless variable	2011-12-11 21:53:09 +01:00
Victor Stinner	382955ff4e	Use directly unicode_empty instead of PyUnicode_New(0, 0)	2011-12-11 21:44:00 +01:00
Victor Stinner	785938eebd	Move the slowest UTF-8 decoder to its own subfunction * Create decode_utf8_errors() * Reuse unicode_fromascii() * decode_utf8_errors() doesn't refit at the beginning * Remove refit_partial_string(), use unicode_adjust_maxchar() instead	2011-12-11 20:09:03 +01:00
Victor Stinner	84def3774d	Fix error handling in resize_compact()	2011-12-11 20:04:56 +01:00
Victor Stinner	8faf8216e4	PyUnicode_FromWideChar() and PyUnicode_FromUnicode() raise a ValueError if a character in not in range [U+0000; U+10ffff].	2011-12-08 22:14:11 +01:00
Victor Stinner	551ac95733	Py_UNICODE_HIGH_SURROGATE() and Py_UNICODE_LOW_SURROGATE() macros And use surrogates macros everywhere in unicodeobject.c	2011-11-29 22:58:13 +01:00
Victor Stinner	6345be9a14	Close #13093 : PyUnicode_EncodeDecimal() doesn't support error handlers different than "strict" anymore. The caller was unable to compute the size of the output buffer: it depends on the error handler.	2011-11-25 20:09:01 +01:00
Benjamin Peterson	1518e8713d	and back to the "magic" formula (with a comment) it is	2011-11-23 10:44:52 -06:00
Benjamin Peterson	5944c36931	cave to those who like readable code	2011-11-22 19:05:49 -06:00
Benjamin Peterson	0268675193	fix compiler warning by implementing this more cleverly	2011-11-22 15:29:32 -05:00
Victor Stinner	ca4f20782e	find_maxchar_surrogates() reuses surrogate macros	2011-11-22 03:38:40 +01:00
Victor Stinner	0d3721d986	Issue #13441 : Disable temporary the check on the maximum character until the Solaris issue is solved. But add assertion on the maximum character in various encoders: UTF-7, UTF-8, wide character (wchar_t, Py_UNICODE), unicode-escape, raw-unicode-escape. Fix also unicode_encode_ucs1() for backslashreplace error handler: Python is now always "wide".	2011-11-22 03:27:53 +01:00
Victor Stinner	f8facacf30	Fix compiler warnings	2011-11-22 02:30:47 +01:00
Victor Stinner	b84d723509	(Merge 3.2) Issue #13093 : Fix error handling on PyUnicode_EncodeDecimal()	2011-11-22 01:50:07 +01:00
Victor Stinner	cfed46e00a	PyUnicode_FromKindAndData() fails with a ValueError if size < 0	2011-11-22 01:29:14 +01:00
Victor Stinner	42885206ec	UTF-8 decoder: set consumed value in the latin1 fast-path	2011-11-22 01:23:02 +01:00
Victor Stinner	d3df8ab377	Replace _PyUnicode_READY_REPLACE() and _PyUnicode_ReadyReplace() with unicode_ready() * unicode_ready() has a simpler API * try to reuse unicode_empty and latin1_char singleton everywhere * Fix a reference leak in _PyUnicode_TranslateCharmap() * PyUnicode_InternInPlace() doesn't try to get a singleton anymore, to avoid having to handle a failure	2011-11-22 01:22:34 +01:00
Victor Stinner	f01245067a	Rewrite PyUnicode_TransformDecimalToASCII() to use the new Unicode API	2011-11-21 23:12:56 +01:00
Victor Stinner	2d718f39a5	Remove an unused variable from PyUnicode_Copy()	2011-11-21 23:11:52 +01:00
Victor Stinner	87af4f2f3a	Simplify PyUnicode_Copy() USe PyUnicode_Copy() in fixup()	2011-11-21 23:03:47 +01:00
Victor Stinner	5bbe5e7c85	Fix a compiler warning in _PyUnicode_CheckConsistency()	2011-11-21 22:54:05 +01:00
Victor Stinner	42bf77537e	Rewrite PyUnicode_EncodeDecimal() to use the new Unicode API Add tests for PyUnicode_EncodeDecimal() and PyUnicode_TransformDecimalToASCII().	2011-11-21 22:52:58 +01:00
Antoine Pitrou	0a3229de6b	Issue #13417 : speed up utf-8 decoding by around 2x for the non-fully-ASCII case. This almost catches up with pre-PEP 393 performance, when decoding needed only one pass.	2011-11-21 20:39:13 +01:00
Victor Stinner	da29cc36aa	Issue #13441 : _PyUnicode_CheckConsistency() dumps the string if the maximum character is bigger than U+10FFFF and locale.localeconv() dumps the string before decoding it. Temporary hack to debug the issue #13441.	2011-11-21 14:31:41 +01:00
Victor Stinner	9e30aa52fd	Fix misuse of PyUnicode_GET_SIZE() => PyUnicode_GET_LENGTH() And PyUnicode_GetSize() => PyUnicode_GetLength()	2011-11-21 02:49:52 +01:00
Victor Stinner	4ead7c7be8	PyObject_Str() ensures that the result string is ready and check the string consistency. _PyUnicode_CheckConsistency() doesn't check the hash anymore. It should be possible to call this function even if hash(str) was already called.	2011-11-20 19:48:36 +01:00
Victor Stinner	b960b34577	PyUnicode_AsUTF32String() calls directly _PyUnicode_EncodeUTF32(), instead of calling the deprecated PyUnicode_EncodeUTF32() function	2011-11-20 19:12:52 +01:00
Victor Stinner	77faf69ca1	_PyUnicode_CheckConsistency() also checks maxchar maximum value, not only its minimum value	2011-11-20 18:56:05 +01:00
Victor Stinner	d5c4022d2a	Remove the two ugly and unused WRITE_ASCII_OR_WSTR and WRITE_WSTR macros	2011-11-20 18:41:31 +01:00
Victor Stinner	2e9cfadd7c	Reuse surrogate macros in UTF-16 decoder	2011-11-20 18:40:27 +01:00
Victor Stinner	ae4f7c8e59	charmap_encoding_error() uses the new Unicode API	2011-11-20 18:28:55 +01:00
Victor Stinner	ac931b1e5b	Use PyUnicode_EncodeCodePage() instead of PyUnicode_EncodeMBCS() with PyUnicode_AsUnicodeAndSize()	2011-11-20 18:27:03 +01:00
Victor Stinner	22168998f5	charmap encoders uses Py_UCS4, not Py_UNICODE	2011-11-20 17:09:18 +01:00
Victor Stinner	1f7951711c	Catch PyUnicode_AS_UNICODE() errors	2011-11-17 00:45:54 +01:00
Ezio Melotti	11060a4a48	#13406 : silence deprecation warnings in test_codecs.	2011-11-16 09:39:10 +02:00
Antoine Pitrou	78edf7576e	Issue #13333 : The UTF-7 decoder now accepts lone surrogates (the encoder already accepts them).	2011-11-15 01:44:16 +01:00
Antoine Pitrou	5418ee0b9a	Issue #13333 : The UTF-7 decoder now accepts lone surrogates (the encoder already accepts them).	2011-11-15 01:42:21 +01:00
Antoine Pitrou	31b92a534f	Sanitize reference management in the utf-8 encoder	2011-11-12 18:35:19 +01:00
Antoine Pitrou	0290c7a811	Fix regression on 2-byte wchar_t systems (Windows)	2011-11-11 13:29:12 +01:00
Antoine Pitrou	44c6affc79	Avoid crashing because of an unaligned word access	2011-11-11 02:59:42 +01:00
Antoine Pitrou	de20b0b50e	Issue #13149 : Speed up append-only StringIO objects. This is very similar to the "lazy strings" idea.	2011-11-10 21:47:38 +01:00
Victor Stinner	9f4b1e9c50	Fix and deprecated the unicode_internal codec unicode_internal codec uses Py_UNICODE instead of the real internal representation (PEP 393: Py_UCS1, Py_UCS2 or Py_UCS4) for backward compatibility.	2011-11-10 20:56:30 +01:00
Victor Stinner	24729f36bf	Prefer Py_UCS4 or wchar_t over Py_UNICODE	2011-11-10 20:31:37 +01:00
Victor Stinner	ebf3ba808e	PyUnicode_DecodeCharmap() uses the new Unicode API	2011-11-10 20:30:22 +01:00
Victor Stinner	a98b28c1bf	Avoid PyUnicode_AS_UNICODE in the UTF-8 encoder	2011-11-10 20:21:49 +01:00
Victor Stinner	3326cb6a36	Fix "unicode_escape" encoder	2011-11-10 20:15:25 +01:00
Victor Stinner	0e36826a04	Fix UTF-7 encoder on Windows	2011-11-10 20:12:49 +01:00
Martin v. Löwis	1db7c13be1	Port encoders from Py_UNICODE API to unicode object API.	2011-11-10 18:24:32 +01:00
Victor Stinner	62aa4d086a	Strip trailing spaces	2011-11-09 00:03:45 +01:00
Victor Stinner	0a045efb49	Fix a compiler warning: use unsiged for maxchar in unicode_widen()	2011-11-09 00:02:42 +01:00
Victor Stinner	596a6c4ffc	Fix the code page decoder * unicode_decode_call_errorhandler() now supports the PyUnicode_WCHAR_KIND kind * unicode_decode_call_errorhandler() calls copy_characters() instead of PyUnicode_CopyCharacters()	2011-11-09 00:02:18 +01:00
Antoine Pitrou	a8f63c02ef	Fix missing goto	2011-11-08 18:37:16 +01:00
Martin v. Löwis	d10759f6ed	Make _PyUnicode_FromId return borrowed references. http://mail.python.org/pipermail/python-dev/2011-November/114347.html	2011-11-07 13:00:05 +01:00
Martin v. Löwis	e9b11c1cd8	Change decoders to use Unicode API instead of Py_UNICODE.	2011-11-08 17:35:34 +01:00

... 4 5 6 7 8 ...

1306 Commits