cpython

Commit Graph

Author	SHA1	Message	Date
Benjamin Peterson	9ca3ffac94	== -1 is convention	2012-01-01 16:04:29 -06:00
Benjamin Peterson	e157cf1012	make switch more robust	2012-01-01 15:56:20 -06:00
Benjamin Peterson	c0b95d18fa	4 space indentation	2011-12-20 17:24:05 -06:00
Benjamin Peterson	ead6b53659	fix spacing around switch statements	2011-12-20 17:23:42 -06:00
Benjamin Peterson	822c790527	merge 3.2	2011-12-20 13:32:50 -06:00
Victor Stinner	6099a03202	Issue #13624 : Write a specialized UTF-8 encoder to allow more optimization The main bottleneck was the PyUnicode_READ() macro.	2011-12-18 14:22:26 +01:00
Victor Stinner	73f53b57d1	Optimize str * n for len(str)==1 and UCS-2 or UCS-4	2011-12-18 03:26:31 +01:00
Victor Stinner	f644110816	Issue #13621 : Optimize str.replace(char1, char2) Use findchar() which is more optimized than a dummy loop using PyUnicode_READ(). PyUnicode_READ() is a complex and slow macro.	2011-12-18 02:43:08 +01:00
Victor Stinner	ab870218e3	Issue #10951 : Fix compiler warnings in timemodule.c and unicodeobject.c Thanks Jérémy Anger for the fix.	2011-12-17 22:39:43 +01:00
Victor Stinner	2f197078fb	The locale decoder raises a UnicodeDecodeError instead of an OSError Search the invalid character using mbrtowc().	2011-12-17 07:08:30 +01:00
Victor Stinner	1b57967b96	Issue #13560 : Locale codec functions use the classic "errors" parameter, instead of surrogateescape So it would be possible to support more error handlers later.	2011-12-17 05:47:23 +01:00
Victor Stinner	ab59594326	What's New in Python 3.3: complete the deprecation list Add also FIXMEs in unicodeobject.c	2011-12-17 04:59:06 +01:00
Victor Stinner	1f33f2b0c3	Issue #13560 : os.strerror() now uses the current locale encoding instead of UTF-8	2011-12-17 04:45:09 +01:00
Victor Stinner	f2ea71fcc8	Issue #13560 : Add PyUnicode_EncodeLocale() * Use PyUnicode_EncodeLocale() in time.strftime() if wcsftime() is not available * Document my last changes in Misc/NEWS	2011-12-17 04:13:41 +01:00
Victor Stinner	af02e1c85a	Add PyUnicode_DecodeLocaleAndSize() and PyUnicode_DecodeLocale() * PyUnicode_DecodeLocaleAndSize() and PyUnicode_DecodeLocale() decode a string from the current locale encoding * _Py_char2wchar() writes an "error code" in the size argument to indicate if the function failed because of memory allocation failure or because of a decoding error. The function doesn't write the error message directly to stderr. * Fix time.strftime() (if wcsftime() is missing): decode strftime() result from the current locale encoding, not from the filesystem encoding.	2011-12-16 23:56:01 +01:00
Victor Stinner	16e6a80923	PyUnicode_Resize(): warn about canonical representation Call also directly unicode_resize() in unicodeobject.c	2011-12-12 13:24:15 +01:00
Victor Stinner	b0a82a6a7f	Fix PyUnicode_Resize() for compact string: leave the string unchanged on error Fix also PyUnicode_Resize() doc	2011-12-12 13:08:33 +01:00
Victor Stinner	bf6e560d0c	Make PyUnicode_Copy() private => _PyUnicode_Copy() Undocument the function. Make also decode_utf8_errors() as private (static).	2011-12-12 01:53:47 +01:00
Victor Stinner	7a9105a380	resize_copy() now supports legacy ready strings	2011-12-12 00:13:42 +01:00
Victor Stinner	488fa49acf	Rewrite PyUnicode_Append(); unicode_modifiable() is more strict * Rename unicode_resizable() to unicode_modifiable() * Rename _PyUnicode_Dirty() to unicode_check_modifiable() to make it clear that the function is private * Inline PyUnicode_Concat() and unicode_append_inplace() in PyUnicode_Append() to simplify the code * unicode_modifiable() return 0 if the hash has been computed or if the string is not an exact unicode string * Remove _PyUnicode_DIRTY(): no need to reset the hash anymore, because if the hash has already been computed, you cannot modify a string inplace anymore * PyUnicode_Concat() checks for integer overflow	2011-12-12 00:01:39 +01:00
Victor Stinner	c4b495497a	Create unicode_result_unchanged() subfunction	2011-12-11 22:44:26 +01:00
Victor Stinner	eaab604829	Fix fixup() for unchanged unicode subtype If maxchar_new == 0 and self is a unicode subtype, return u instead of duplicating u.	2011-12-11 22:22:39 +01:00
Victor Stinner	e6b2d4407a	unicode_fromascii() doesn't check string content twice in debug mode _PyUnicode_CheckConsistency() also checks string content.	2011-12-11 21:54:30 +01:00
Victor Stinner	a1d12bb119	Call directly PyUnicode_DecodeUTF8Stateful() instead of PyUnicode_DecodeUTF8() * Remove micro-optimization from PyUnicode_FromStringAndSize(): PyUnicode_DecodeUTF8Stateful() has already these optimizations (for size=0 and one ascii char). * Rename utf8_max_char_size_and_char_count() to utf8_scanner(), and remove an useless variable	2011-12-11 21:53:09 +01:00
Victor Stinner	382955ff4e	Use directly unicode_empty instead of PyUnicode_New(0, 0)	2011-12-11 21:44:00 +01:00
Victor Stinner	785938eebd	Move the slowest UTF-8 decoder to its own subfunction * Create decode_utf8_errors() * Reuse unicode_fromascii() * decode_utf8_errors() doesn't refit at the beginning * Remove refit_partial_string(), use unicode_adjust_maxchar() instead	2011-12-11 20:09:03 +01:00
Victor Stinner	84def3774d	Fix error handling in resize_compact()	2011-12-11 20:04:56 +01:00
Victor Stinner	8faf8216e4	PyUnicode_FromWideChar() and PyUnicode_FromUnicode() raise a ValueError if a character in not in range [U+0000; U+10ffff].	2011-12-08 22:14:11 +01:00
Victor Stinner	551ac95733	Py_UNICODE_HIGH_SURROGATE() and Py_UNICODE_LOW_SURROGATE() macros And use surrogates macros everywhere in unicodeobject.c	2011-11-29 22:58:13 +01:00
Victor Stinner	6345be9a14	Close #13093 : PyUnicode_EncodeDecimal() doesn't support error handlers different than "strict" anymore. The caller was unable to compute the size of the output buffer: it depends on the error handler.	2011-11-25 20:09:01 +01:00
Benjamin Peterson	1518e8713d	and back to the "magic" formula (with a comment) it is	2011-11-23 10:44:52 -06:00
Benjamin Peterson	5944c36931	cave to those who like readable code	2011-11-22 19:05:49 -06:00
Benjamin Peterson	0268675193	fix compiler warning by implementing this more cleverly	2011-11-22 15:29:32 -05:00
Victor Stinner	ca4f20782e	find_maxchar_surrogates() reuses surrogate macros	2011-11-22 03:38:40 +01:00
Victor Stinner	0d3721d986	Issue #13441 : Disable temporary the check on the maximum character until the Solaris issue is solved. But add assertion on the maximum character in various encoders: UTF-7, UTF-8, wide character (wchar_t, Py_UNICODE), unicode-escape, raw-unicode-escape. Fix also unicode_encode_ucs1() for backslashreplace error handler: Python is now always "wide".	2011-11-22 03:27:53 +01:00
Victor Stinner	f8facacf30	Fix compiler warnings	2011-11-22 02:30:47 +01:00
Victor Stinner	b84d723509	(Merge 3.2) Issue #13093 : Fix error handling on PyUnicode_EncodeDecimal()	2011-11-22 01:50:07 +01:00
Victor Stinner	cfed46e00a	PyUnicode_FromKindAndData() fails with a ValueError if size < 0	2011-11-22 01:29:14 +01:00
Victor Stinner	42885206ec	UTF-8 decoder: set consumed value in the latin1 fast-path	2011-11-22 01:23:02 +01:00
Victor Stinner	d3df8ab377	Replace _PyUnicode_READY_REPLACE() and _PyUnicode_ReadyReplace() with unicode_ready() * unicode_ready() has a simpler API * try to reuse unicode_empty and latin1_char singleton everywhere * Fix a reference leak in _PyUnicode_TranslateCharmap() * PyUnicode_InternInPlace() doesn't try to get a singleton anymore, to avoid having to handle a failure	2011-11-22 01:22:34 +01:00
Victor Stinner	f01245067a	Rewrite PyUnicode_TransformDecimalToASCII() to use the new Unicode API	2011-11-21 23:12:56 +01:00
Victor Stinner	2d718f39a5	Remove an unused variable from PyUnicode_Copy()	2011-11-21 23:11:52 +01:00
Victor Stinner	87af4f2f3a	Simplify PyUnicode_Copy() USe PyUnicode_Copy() in fixup()	2011-11-21 23:03:47 +01:00
Victor Stinner	5bbe5e7c85	Fix a compiler warning in _PyUnicode_CheckConsistency()	2011-11-21 22:54:05 +01:00
Victor Stinner	42bf77537e	Rewrite PyUnicode_EncodeDecimal() to use the new Unicode API Add tests for PyUnicode_EncodeDecimal() and PyUnicode_TransformDecimalToASCII().	2011-11-21 22:52:58 +01:00
Antoine Pitrou	0a3229de6b	Issue #13417 : speed up utf-8 decoding by around 2x for the non-fully-ASCII case. This almost catches up with pre-PEP 393 performance, when decoding needed only one pass.	2011-11-21 20:39:13 +01:00
Victor Stinner	da29cc36aa	Issue #13441 : _PyUnicode_CheckConsistency() dumps the string if the maximum character is bigger than U+10FFFF and locale.localeconv() dumps the string before decoding it. Temporary hack to debug the issue #13441.	2011-11-21 14:31:41 +01:00
Victor Stinner	9e30aa52fd	Fix misuse of PyUnicode_GET_SIZE() => PyUnicode_GET_LENGTH() And PyUnicode_GetSize() => PyUnicode_GetLength()	2011-11-21 02:49:52 +01:00
Victor Stinner	4ead7c7be8	PyObject_Str() ensures that the result string is ready and check the string consistency. _PyUnicode_CheckConsistency() doesn't check the hash anymore. It should be possible to call this function even if hash(str) was already called.	2011-11-20 19:48:36 +01:00
Victor Stinner	b960b34577	PyUnicode_AsUTF32String() calls directly _PyUnicode_EncodeUTF32(), instead of calling the deprecated PyUnicode_EncodeUTF32() function	2011-11-20 19:12:52 +01:00
Victor Stinner	77faf69ca1	_PyUnicode_CheckConsistency() also checks maxchar maximum value, not only its minimum value	2011-11-20 18:56:05 +01:00
Victor Stinner	d5c4022d2a	Remove the two ugly and unused WRITE_ASCII_OR_WSTR and WRITE_WSTR macros	2011-11-20 18:41:31 +01:00
Victor Stinner	2e9cfadd7c	Reuse surrogate macros in UTF-16 decoder	2011-11-20 18:40:27 +01:00
Victor Stinner	ae4f7c8e59	charmap_encoding_error() uses the new Unicode API	2011-11-20 18:28:55 +01:00
Victor Stinner	ac931b1e5b	Use PyUnicode_EncodeCodePage() instead of PyUnicode_EncodeMBCS() with PyUnicode_AsUnicodeAndSize()	2011-11-20 18:27:03 +01:00
Victor Stinner	22168998f5	charmap encoders uses Py_UCS4, not Py_UNICODE	2011-11-20 17:09:18 +01:00
Victor Stinner	1f7951711c	Catch PyUnicode_AS_UNICODE() errors	2011-11-17 00:45:54 +01:00
Ezio Melotti	11060a4a48	#13406 : silence deprecation warnings in test_codecs.	2011-11-16 09:39:10 +02:00
Antoine Pitrou	78edf7576e	Issue #13333 : The UTF-7 decoder now accepts lone surrogates (the encoder already accepts them).	2011-11-15 01:44:16 +01:00
Antoine Pitrou	5418ee0b9a	Issue #13333 : The UTF-7 decoder now accepts lone surrogates (the encoder already accepts them).	2011-11-15 01:42:21 +01:00
Antoine Pitrou	31b92a534f	Sanitize reference management in the utf-8 encoder	2011-11-12 18:35:19 +01:00
Antoine Pitrou	0290c7a811	Fix regression on 2-byte wchar_t systems (Windows)	2011-11-11 13:29:12 +01:00
Antoine Pitrou	44c6affc79	Avoid crashing because of an unaligned word access	2011-11-11 02:59:42 +01:00
Antoine Pitrou	de20b0b50e	Issue #13149 : Speed up append-only StringIO objects. This is very similar to the "lazy strings" idea.	2011-11-10 21:47:38 +01:00
Victor Stinner	9f4b1e9c50	Fix and deprecated the unicode_internal codec unicode_internal codec uses Py_UNICODE instead of the real internal representation (PEP 393: Py_UCS1, Py_UCS2 or Py_UCS4) for backward compatibility.	2011-11-10 20:56:30 +01:00
Victor Stinner	24729f36bf	Prefer Py_UCS4 or wchar_t over Py_UNICODE	2011-11-10 20:31:37 +01:00
Victor Stinner	ebf3ba808e	PyUnicode_DecodeCharmap() uses the new Unicode API	2011-11-10 20:30:22 +01:00
Victor Stinner	a98b28c1bf	Avoid PyUnicode_AS_UNICODE in the UTF-8 encoder	2011-11-10 20:21:49 +01:00
Victor Stinner	3326cb6a36	Fix "unicode_escape" encoder	2011-11-10 20:15:25 +01:00
Victor Stinner	0e36826a04	Fix UTF-7 encoder on Windows	2011-11-10 20:12:49 +01:00
Martin v. Löwis	1db7c13be1	Port encoders from Py_UNICODE API to unicode object API.	2011-11-10 18:24:32 +01:00
Victor Stinner	62aa4d086a	Strip trailing spaces	2011-11-09 00:03:45 +01:00
Victor Stinner	0a045efb49	Fix a compiler warning: use unsiged for maxchar in unicode_widen()	2011-11-09 00:02:42 +01:00
Victor Stinner	596a6c4ffc	Fix the code page decoder * unicode_decode_call_errorhandler() now supports the PyUnicode_WCHAR_KIND kind * unicode_decode_call_errorhandler() calls copy_characters() instead of PyUnicode_CopyCharacters()	2011-11-09 00:02:18 +01:00
Antoine Pitrou	a8f63c02ef	Fix missing goto	2011-11-08 18:37:16 +01:00
Martin v. Löwis	d10759f6ed	Make _PyUnicode_FromId return borrowed references. http://mail.python.org/pipermail/python-dev/2011-November/114347.html	2011-11-07 13:00:05 +01:00
Martin v. Löwis	e9b11c1cd8	Change decoders to use Unicode API instead of Py_UNICODE.	2011-11-08 17:35:34 +01:00
Victor Stinner	e30c0a1014	Fix gdb/libpython.py for not ready Unicode strings _PyUnicode_CheckConsistency() checks also hash and length value for not ready Unicode strings.	2011-11-04 20:54:05 +01:00
Victor Stinner	2fc507fe45	Replace tabs by spaces	2011-11-04 20:06:39 +01:00
Martin v. Löwis	12be46ca84	Drop Py_UNICODE based encode exceptions.	2011-11-04 19:04:15 +01:00
Martin v. Löwis	3d325191bf	Port code page codec to Unicode API.	2011-11-04 18:23:06 +01:00
Victor Stinner	fcd9653667	Fix a compiler warning in unicode_encode_ucs1()	2011-11-04 00:28:50 +01:00
Victor Stinner	fc026c98d8	Fix PyUnicode_EncodeCharmap()	2011-11-04 00:24:51 +01:00
Victor Stinner	7931d9a951	Replace PyUnicodeObject type by PyObject * _PyUnicode_CheckConsistency() now takes a PyObject* instead of void* * Remove now useless casts to PyObject*	2011-11-04 00:22:48 +01:00
Victor Stinner	76a31a6bff	Cleanup decode_code_page_stateful() and encode_code_page() * Fix decode_code_page_errors() result * Inline decode_code_page() and encode_code_page_chunk() * Replace the PyUnicodeObject type by PyObject	2011-11-04 00:05:13 +01:00
Victor Stinner	7581cef699	Adapt the code page encoder to the new unicode_encode_call_errorhandler() The code is not correct, but at least it doesn't crash anymore.	2011-11-03 22:32:33 +01:00
Brian Curtin	2787ea41fd	Fix a compile error (apparently Windows only) introduced in 295fdfd4f422	2011-11-02 15:09:37 -05:00
Martin v. Löwis	23e275b3ad	Port UCS1 and charmap codecs to new API.	2011-11-02 18:02:51 +01:00
Martin v. Löwis	9e8166843c	Introduce PyObject* API for raising encode errors.	2011-11-02 12:45:42 +01:00
Martin v. Löwis	0d3072e98d	Drop Py_UCS4_ functions. Closes #13246 .	2011-10-31 08:40:56 +01:00
Victor Stinner	57ffa9d4ff	PyUnicode_AsUnicodeCopy() uses PyUnicode_AsUnicodeAndSize() to get directly the length	2011-10-23 20:10:08 +02:00
Victor Stinner	af9e4b8c29	Fix PyUnicode_InternImmortal(): PyUnicode_InternInPlace() may changes *p	2011-10-23 20:07:00 +02:00
Victor Stinner	9faa384bed	Cast directly to unsigned char, instead of using Py_CHARMASK We don't need "& 0xff" on an unsigned char.	2011-10-23 20:06:00 +02:00
Victor Stinner	9db1a8b69f	Replace PyUnicodeObject* by PyObject* where it was irrevelant A Unicode string can now be a PyASCIIObject, PyCompactUnicodeObject or PyUnicodeObject. Aliasing a PyASCIIObject* or PyCompactUnicodeObject* to PyUnicodeObject* is wrong	2011-10-23 20:04:37 +02:00
Victor Stinner	0d60e87ad6	Fix data variable in _PyUnicode_Dump() for compact ASCII	2011-10-23 19:47:19 +02:00
Victor Stinner	d8e61c348e	Remove last references to the removed Unicode free list	2011-10-23 19:43:33 +02:00
Victor Stinner	065836ec9c	PyUnicode_FSDecoder() ensures that the decoded string is ready	2011-10-27 01:56:33 +02:00
Victor Stinner	dd18d3ad9e	Fix unicode_subtype_new() on debug build Patch written by Stefan Behnel.	2011-10-22 11:08:10 +02:00
Ezio Melotti	f881751ded	Remove unused variable.	2011-10-22 01:01:32 +03:00
Ezio Melotti	931b8aac80	#12753 : Add support for Unicode name aliases and named sequences.	2011-10-21 21:57:36 +03:00
Victor Stinner	6707293e75	Add consistency check to _PyUnicode_New()	2011-10-18 22:10:14 +02:00
Victor Stinner	3a50e7056e	Issue #12281 : Rewrite the MBCS codec to handle correctly replace and ignore error handlers on all Windows versions. The MBCS codec is now supporting all error handlers, instead of only replace to encode and ignore to decode.	2011-10-18 21:21:00 +02:00
Benjamin Peterson	7a6debe79c	remove some duplication	2011-10-15 09:25:28 -04:00
Victor Stinner	f5cff56a1b	Issue #13088 : Add shared Py_hexdigits constant to format a number into base 16	2011-10-14 02:13:11 +02:00
Antoine Pitrou	f0b934b01a	Reuse the stringlib in findchar(), and make its signature more convenient	2011-10-13 18:55:09 +02:00
Victor Stinner	55c991197b	Optimize unicode_subscript() for step != 1 and ascii strings	2011-10-13 01:17:06 +02:00
Victor Stinner	127226ba69	Don't use PyUnicode_MAX_CHAR_VALUE() macro in Py_MAX()	2011-10-13 01:12:34 +02:00
Victor Stinner	9e7a1bcfd6	Optimize findchar() for PyUnicode_1BYTE_KIND: use memchr and memrchr	2011-10-13 00:18:12 +02:00
Antoine Pitrou	dd4e2f0153	Issue #13155 : Optimize finding the optimal character width of an unicode string	2011-10-13 00:02:27 +02:00
Victor Stinner	49a0a21f37	Unicode replace() avoids calling unicode_adjust_maxchar() when it's useless Add also a special case if the result is an empty string.	2011-10-12 23:46:10 +02:00
Victor Stinner	983b1434bd	Backed out changeset 952d91a7d376 If maxchar == PyUnicode_MAX_CHAR_VALUE(unicode), we do an useless copy.	2011-10-12 00:54:35 +02:00
Antoine Pitrou	e55ad2dff0	Relax condition	2011-10-12 00:36:51 +02:00
Victor Stinner	4e10100dee	Fix compiler warning in _PyUnicode_FromUCS2()	2011-10-11 23:27:52 +02:00
Antoine Pitrou	950468e553	Use _PyUnicode_CONVERT_BYTES() where applicable.	2011-10-11 22:45:48 +02:00
Victor Stinner	577db2c9f0	PyUnicode_AsUnicodeCopy() now checks if PyUnicode_AsUnicode() failed	2011-10-11 22:12:48 +02:00
Victor Stinner	c4f281eba3	Fix misuse of PyUnicode_GET_SIZE, use PyUnicode_GET_LENGTH instead	2011-10-11 22:11:42 +02:00
Antoine Pitrou	e459a0877e	Issue #13136 : speed up conversion between different character widths.	2011-10-11 20:58:41 +02:00
Antoine Pitrou	2871698546	/* Remove unused code. It has been committed out since 2000 (!). */	2011-10-11 03:17:47 +02:00
Antoine Pitrou	53bb548f22	Avoid exporting private helpers (thanks "make smelly")	2011-10-10 23:49:24 +02:00
Victor Stinner	794d567b17	any_find_slice() doesn't use callbacks anymore * Call directly the right find/rfind method: allow inlining functions * Remove Py_LOCAL_CALLBACK (added for any_find_slice)	2011-10-10 03:21:36 +02:00
Martin v. Löwis	afe55bba33	Add API for static strings, primarily good for identifiers. Thanks to Konrad Schöbel and Jasper Schulz for helping with the mass-editing.	2011-10-09 10:38:36 +02:00
Antoine Pitrou	eaf139b3fc	Fix typo in the PyUnicode_Find() implementation	2011-10-09 00:33:09 +02:00
Martin v. Löwis	c47adb04b3	Change PyUnicode_KIND to 1,2,4. Drop _KIND_SIZE and _CHARACTER_SIZE.	2011-10-07 20:55:35 +02:00
Victor Stinner	dd07732af5	PyUnicode_Join() calls directly memcpy() if all strings are of the same kind	2011-10-07 17:02:31 +02:00
Antoine Pitrou	978b9d2a27	Fix formatting memory consumption with very large padding specifications	2011-10-07 12:35:48 +02:00
Victor Stinner	59de0ee9e0	str.replace(a, a) is now returning str unchanged if a is a	2011-10-07 10:01:28 +02:00
Antoine Pitrou	5c0ba36d5f	Fix massive slowdown in string formatting with the % operator	2011-10-07 01:54:09 +02:00
Antoine Pitrou	7c46da7993	Ensure that 1-char singletons get used	2011-10-06 22:07:51 +02:00
Victor Stinner	c6f0df7b20	Fix PyUnicode_Join() for len==1 and non-exact string	2011-10-06 15:58:54 +02:00
Antoine Pitrou	15a66cf134	Fix compilation under Windows	2011-10-06 15:25:32 +02:00
Victor Stinner	200f21340d	Fix assertion in unicode_adjust_maxchar()	2011-10-06 13:27:56 +02:00
Victor Stinner	acf47b807f	Fix my last change on PyUnicode_Join(): don't process separator if len==1	2011-10-06 12:32:37 +02:00
Victor Stinner	25a4b29c95	str.replace() avoids memory when it's possible	2011-10-06 12:31:55 +02:00
Victor Stinner	56c161ab00	_copy_characters() fails more quickly in debug mode on inconsistent state	2011-10-06 02:47:11 +02:00
Victor Stinner	c729b8e92f	Fix a compiler warning: don't define unicode_is_singleton() in release mode	2011-10-06 02:36:59 +02:00
Victor Stinner	fb9ea8c57e	Don't check for the maximum character when copying from unicodeobject.c * Create copy_characters() function which doesn't check for the maximum character in release mode * _PyUnicode_CheckConsistency() is no more static to be able to use it in _PyUnicode_FormatAdvanced() (in formatter_unicode.c) * _PyUnicode_CheckConsistency() checks the string hash	2011-10-06 01:45:57 +02:00
Victor Stinner	05d1189566	Fix post-condition in unicode_repr(): check the result, not the input	2011-10-06 01:13:58 +02:00
Victor Stinner	f48323e3b3	replace() uses unicode_fromascii() if the input and replace string is ASCII	2011-10-05 23:27:08 +02:00
Victor Stinner	0617b6e18b	unicode_fromascii() checks that the input is ASCII in debug mode	2011-10-05 23:26:01 +02:00
Victor Stinner	c3cec7868b	Add asciilib: similar to ucs1, ucs2 and ucs4 library, but specialized to ASCII ucs1, ucs2 and ucs4 libraries have to scan created substring to find the maximum character, whereas it is not need to ASCII strings. Because ASCII strings are common, it is useful to optimize ASCII.	2011-10-05 21:24:08 +02:00
Victor Stinner	14f8f02826	Fix PyUnicode_Partition(): str_in->str_obj	2011-10-05 20:58:25 +02:00
Victor Stinner	bb10a1f759	Ensure that newly created strings use the most efficient store in debug mode	2011-10-05 01:34:17 +02:00
Victor Stinner	9310abbf40	Replace PyUnicodeObject* with PyObject* where it was inappropriate	2011-10-05 00:59:23 +02:00
Victor Stinner	ce5faf673e	unicodeobject.c doesn't make output strings ready in debug mode Try to only create non ready strings in debug mode to ensure that all functions (not only in unicodeobject.c, everywhere) make input strings ready.	2011-10-05 00:42:43 +02:00
Georg Brandl	7597addbd4	More typoes.	2011-10-05 16:36:47 +02:00
Victor Stinner	c80d6d20d5	Speedup str[a🅱️step] for step != 1 Try to stop the scanner of the maximum character before the end using a limit depending on the kind (e.g. 256 for PyUnicode_2BYTE_KIND).	2011-10-05 14:13:28 +02:00
Victor Stinner	ae86485517	Speedup find_maxchar_surrogates() for 32-bit wchar_t If we have at least one character in U+10000-U+10FFFF, we know that we must use PyUnicode_4BYTE_KIND kind.	2011-10-05 14:02:44 +02:00
Victor Stinner	b9275c104e	Speedup str[a:b] and PyUnicode_FromKindAndData * str[a:b] doesn't scan the string for the maximum character if the string is ascii only * PyUnicode_FromKindAndData() stops if we are sure that we cannot use a shorter character type. For example, _PyUnicode_FromUCS1() stops if we have at least one character in range U+0080-U+00FF	2011-10-05 14:01:42 +02:00
Victor Stinner	702c734395	Speedup the ASCII decoder It is faster for long string and a little bit faster for short strings, benchmark on Linux 32 bits, Intel Core i5 @ 3.33GHz: ./python -m timeit 'x=b"a"' 'x.decode("ascii")' ./python -m timeit 'x=b"x"80' 'x.decode("ascii")' ./python -m timeit 'x=b"abc"4096' 'x.decode("ascii")' length \| before \| after -------+------------+----------- 1 \| 0.234 usec \| 0.229 usec 80 \| 0.381 usec \| 0.357 usec 12,288 \| 11.2 usec \| 3.01 usec	2011-10-05 13:50:52 +02:00
Victor Stinner	e1335c711c	Fix usage og PyUnicode_READY()	2011-10-04 20:53:03 +02:00
Victor Stinner	e06e145943	_PyUnicode_READY_REPLACE() cannot be used in unicode_subtype_new()	2011-10-04 20:52:31 +02:00
Victor Stinner	17efeed284	Add DONT_MAKE_RESULT_READY to unicodeobject.c to help detecting bugs Use also _PyUnicode_READY_REPLACE() when it's applicable.	2011-10-04 20:05:46 +02:00
Victor Stinner	6b56a7fd3d	Add assertion to _Py_ReleaseInternedUnicodeStrings() if READY fails	2011-10-04 20:04:52 +02:00
Antoine Pitrou	875f29bb95	Fix naïve heuristic in unicode slicing (followup to 1b4f886dc9e2)	2011-10-04 20:00:49 +02:00
Antoine Pitrou	2242522fde	Add a necessary call to PyUnicode_READY() (followup to ab5086539ab9)	2011-10-04 19:10:51 +02:00
Antoine Pitrou	7aec401966	Optimize string slicing to use the new API	2011-10-04 19:08:01 +02:00
Antoine Pitrou	e19aa388e8	When expandtabs() would be a no-op, don't create a duplicate string	2011-10-04 16:04:01 +02:00
Antoine Pitrou	e71d574a39	Migrate str.expandtabs to the new API	2011-10-04 15:55:09 +02:00
Benjamin Peterson	7f3140ef80	fix parens	2011-10-03 19:37:29 -04:00
Benjamin Peterson	4bfce8f81f	fix formatting	2011-10-03 19:35:07 -04:00
Benjamin Peterson	ccc51c1fc6	fix compiler warnings	2011-10-03 19:34:12 -04:00
Victor Stinner	b092365cc6	Move in-place Unicode append to its own subfunction	2011-10-04 01:17:31 +02:00
Victor Stinner	a5f9163501	Reindent internal Unicode macros	2011-10-04 01:07:11 +02:00
Victor Stinner	a41463c203	Document utf8_length and wstr_length states Ensure these states with assertions in _PyUnicode_CheckConsistency().	2011-10-04 01:05:08 +02:00
Victor Stinner	9566311014	resize_inplace() sets utf8_length to zero if the utf8 is not shared8 Cleanup also the code.	2011-10-04 01:03:50 +02:00
Victor Stinner	9e9d689d85	PyUnicode_New() sets utf8_length to zero for latin1	2011-10-04 01:02:02 +02:00
Victor Stinner	016980454e	Unicode: raise SystemError instead of ValueError or RuntimeError on invalid state	2011-10-04 00:04:26 +02:00
Victor Stinner	7f11ad4594	Unicode: document when the wstr pointer is shared with data Add also related assertions to _PyUnicode_CheckConsistency().	2011-10-04 00:00:20 +02:00
Victor Stinner	03490918b7	Add _PyUnicode_HAS_WSTR_MEMORY() macro	2011-10-03 23:45:12 +02:00
Victor Stinner	9ce5a835bb	PyUnicode_Join() checks output length in debug mode PyUnicode_CopyCharacters() may copies less character than requested size, if the input string is smaller than the argument. (This is very unlikely, but who knows!?) Avoid also calling PyUnicode_CopyCharacters() if the string is empty.	2011-10-03 23:36:02 +02:00
Victor Stinner	b803895355	Fix a compiler warning in PyUnicode_Append() Don't check PyUnicode_CopyCharacters() in release mode. Rename also some variables.	2011-10-03 23:27:56 +02:00
Victor Stinner	8cfcbed4e3	Improve string forms and PyUnicode_Resize() documentation Remove also the FIXME for resize_copy(): as discussed with Martin, copy the string on resize if the string is not resizable is just fine.	2011-10-03 23:19:21 +02:00
Victor Stinner	77bb47b312	Simplify unicode_resizable(): singletons reference count is at least 2	2011-10-03 20:06:05 +02:00
Victor Stinner	85041a54bd	_PyUnicode_CheckConsistency() checks utf8 field consistency	2011-10-03 14:42:39 +02:00
Victor Stinner	3cf4637e4e	unicode_subtype_new() copies also the ascii flag	2011-10-03 14:42:15 +02:00
Victor Stinner	42dfd71333	unicode_kind_name() doesn't check consistency anymore It is is called from _PyUnicode_Dump() and so must not fail.	2011-10-03 14:41:45 +02:00
Victor Stinner	a3b334da6d	PyUnicode_Ready() now sets ascii=1 if maxchar < 128 ascii=1 is no more reserved to PyASCIIObject. Use PyUnicode_IS_COMPACT_ASCII(obj) to check if obj is a PyASCIIObject (as before).	2011-10-03 13:53:37 +02:00
Victor Stinner	1b4f9ceca7	Create _PyUnicode_READY_REPLACE() to reuse singleton Only use _PyUnicode_READY_REPLACE() on just created strings.	2011-10-03 13:28:14 +02:00
Victor Stinner	c379ead9af	Fix resize_compact() and resize_inplace(); reenable full resize optimizations * resize_compact() updates also wstr_len for non-ascii strings sharing wstr * resize_inplace() updates also utf8_len/wstr_len for strings sharing utf8/wstr	2011-10-03 12:52:27 +02:00
Victor Stinner	34411e17b0	resize_inplace() has been fixed: reenable this optimization	2011-10-03 12:21:33 +02:00
Victor Stinner	a849a4b6b4	_PyUnicode_Dump() indicates if wstr and/or utf8 are shared	2011-10-03 12:12:11 +02:00
Victor Stinner	1c8d0c76a1	Fix resize_inplace(): update shared utf8 pointer	2011-10-03 12:11:00 +02:00
Victor Stinner	ca4f7a4298	Disable unicode_resize() optimization on Windows (16-bit wchar_t)	2011-10-03 04:18:04 +02:00
Victor Stinner	126c559d05	_PyUnicode_Ready() for 16-bit wchar_t	2011-10-03 04:17:10 +02:00
Victor Stinner	2fd82278cb	Fix compilation error on Windows Fix also a compiler warning.	2011-10-03 04:06:05 +02:00
Victor Stinner	a3be613a56	Use PyUnicode_WCHAR_KIND to check if a string is a wstr string Simplify the test in wstr pointer in unicode_sizeof().	2011-10-03 02:16:37 +02:00
Victor Stinner	910337b42e	Add _PyUnicode_CheckConsistency() macro to help debugging * Document Unicode string states * Use _PyUnicode_CheckConsistency() to ensure that objects are always consistent.	2011-10-03 03:20:16 +02:00
Victor Stinner	4fae54cb0e	In release mode, PyUnicode_InternInPlace() does nothing if the input is NULL or not a unicode, instead of failing with a fatal error. Use assertions in debug mode (provide better error messages).	2011-10-03 02:01:52 +02:00
Victor Stinner	23e5668214	PyUnicode_Append() now works in-place when it's possible	2011-10-03 03:54:37 +02:00
Victor Stinner	fe226c0d37	Rewrite PyUnicode_Resize() * Rename _PyUnicode_Resize() to unicode_resize() * unicode_resize() creates a copy if the string cannot be resized instead of failing * Optimize resize_copy() for wstr strings * Disable temporary resize_inplace()	2011-10-03 03:52:20 +02:00
Victor Stinner	829c0adca9	Add _PyUnicode_HAS_UTF8_MEMORY() macro	2011-10-03 01:08:02 +02:00
Victor Stinner	fe0c155c4f	Write _PyUnicode_Dump() to help debugging	2011-10-03 02:59:31 +02:00
Victor Stinner	f42dc448e0	PyUnicode_CopyCharacters() fails when copying latin1 into ascii	2011-10-02 23:33:16 +02:00
Victor Stinner	c53be96c54	unicode_convert_wchar_to_ucs4() cannot fail	2011-10-02 21:33:54 +02:00
Victor Stinner	c3c7415639	Add _PyUnicode_DATA_ANY(op) private macro	2011-10-02 20:39:55 +02:00
Victor Stinner	a464fc141d	unicode_empty and unicode_latin1 are PyObject* objects, not PyUnicodeObject*	2011-10-02 20:39:30 +02:00
Victor Stinner	267aa24365	PyUnicode_FindChar() raises a IndexError on invalid index	2011-10-02 01:08:37 +02:00
Victor Stinner	bc603d12b7	Optimize _PyUnicode_AsKind() for UCS1->UCS4 and UCS2->UCS4 * Ensure that the input string is ready * Raise a ValueError instead of of a fatal error	2011-10-02 01:00:40 +02:00
Victor Stinner	5a706cf8c0	Fix usage of PyUnicode_READY() in PyUnicode_GetLength()	2011-10-02 00:36:53 +02:00
Victor Stinner	cd9950fd09	PyUnicode_WriteChar() raises IndexError on invalid index PyUnicode_WriteChar() raises also a ValueError if the string has more than 1 reference.	2011-10-02 00:34:53 +02:00
Victor Stinner	2fe5ced752	PyUnicode_ReadChar() raises a IndexError if the index in invalid unicode_getitem() reuses PyUnicode_ReadChar()	2011-10-02 00:25:40 +02:00
Victor Stinner	202b62bd90	PyUnicode_FromKindAndData() raises a ValueError if the kind is unknown	2011-10-01 23:48:37 +02:00
Victor Stinner	07ac3ebd7b	Optimize unicode_subtype_new(): don't encode to wchar_t and decode from wchar_t Rewrite unicode_subtype_new(): allocate directly the right type.	2011-10-01 16:16:43 +02:00
Victor Stinner	e90fe6a8f4	Add _PyUnicode_UTF8() and _PyUnicode_UTF8_LENGTH() macros * Rename existing _PyUnicode_UTF8() macro to PyUnicode_UTF8() * Rename existing _PyUnicode_UTF8_LENGTH() macro to PyUnicode_UTF8_LENGTH() * PyUnicode_UTF8() and PyUnicode_UTF8_LENGTH() are more strict	2011-10-01 16:48:13 +02:00
Martin v. Löwis	0b1d348990	Issue 13085: Fix some memory leaks. Patch by Stefan Krah.	2011-10-01 16:35:40 +02:00
Benjamin Peterson	5c0fb00ad8	merge heads	2011-10-01 00:12:20 -04:00
Benjamin Peterson	31616ea2ff	remove reference to non-existent file	2011-10-01 00:11:09 -04:00
Victor Stinner	de636f3c34	PyUnicode_Substring() now accepts end bigger than string length Fix also a bug: call PyUnicode_READY() before reading string length.	2011-10-01 03:55:54 +02:00
Victor Stinner	c759f3e7ec	Ooops, avoid a division by zero in unicode_repeat()	2011-10-01 03:09:58 +02:00
Victor Stinner	d3a83d5eb3	PyUnicode_FromObject() ensures that its output is a ready string	2011-10-01 03:09:33 +02:00
Victor Stinner	67ca64ce54	I want a super fast 'a' * n! * Optimize unicode_repeat() for a special case with memset() * Simplify integer overflow checking; remove the second check because PyUnicode_New() already does it and uses a smaller limit (Py_ssize_t vs size_t)	2011-10-01 02:47:29 +02:00
Victor Stinner	e9a2935c1f	Fix usage of PyUnicode_READY in unicodeobject.c	2011-10-01 02:14:59 +02:00
Victor Stinner	12bab6dace	Remove private substring() function, reuse public PyUnicode_Substring() * PyUnicode_Substring() now fails if start or end is invalid * PyUnicode_Substring() reuses PyUnicode_Copy() for non-exact strings	2011-10-01 01:53:49 +02:00
Victor Stinner	c841e7db1f	Optimize PyUnicode_Copy(): don't recompute maximum character	2011-10-01 01:34:32 +02:00
Victor Stinner	2219e0a37e	PyUnicode_FromObject() reuses PyUnicode_Copy() * PyUnicode_Copy() is faster than substring() * Fix also a compiler warning	2011-10-01 01:16:59 +02:00
Victor Stinner	034f6cf10c	Add PyUnicode_Copy() function, include it to the public API	2011-09-30 02:26:44 +02:00
Victor Stinner	b153615008	PyUnicode_CopyCharacters() uses exceptions instead of assertions Call PyErr_BadInternalCall() if inputs are not unicode strings.	2011-09-30 02:26:10 +02:00
Victor Stinner	d8f6510acc	_PyUnicode_Ready() cannot be used on ready strings anymore * Change its prototype: PyObject* instead of PyUnicodeoObject. Remove an old assertion, the result of PyUnicode_READY (_PyUnicode_Ready) must be checked instead	2011-09-29 19:43:17 +02:00
Victor Stinner	bc8b81bc4e	Move _PyUnicode_UTF8() and _PyUnicode_UTF8_LENGTH() outside unicodeobject.h Move these macros to unicodeobject.c	2011-09-29 19:31:34 +02:00
Victor Stinner	a0702ab1fe	Add a note in PyUnicode_CopyCharacters() doc: it doesn't write null character Cleanup also the code (avoid the goto).	2011-09-29 14:14:38 +02:00
Victor Stinner	639418812f	Use the new Py_ARRAY_LENGTH macro	2011-09-29 00:42:28 +02:00
Victor Stinner	b9dcffb51e	Fix 'c' format of PyUnicode_Format() formatbuf is now an array of Py_UCS4, not of Py_UNICODE	2011-09-29 00:39:24 +02:00
Victor Stinner	c17f540b7a	Oops, fix my previous commit: unicode => to	2011-09-29 00:16:58 +02:00
Victor Stinner	b15d4d899c	PyUnicode_CopyCharacters() marks the string as dirty (reset the hash)	2011-09-28 23:59:20 +02:00
Victor Stinner	f5ca1a21a5	PyUnicode_CopyCharacters() fails if 'to' has more than 1 reference	2011-09-28 23:54:59 +02:00
Ezio Melotti	2aa2b3b4d5	Clean up a few tabs that went in with PEP393.	2011-09-29 00:58:57 +03:00
Ezio Melotti	48a2f8fd97	#13054 : sys.maxunicode is now always 0x10FFFF.	2011-09-29 00:18:19 +03:00
Victor Stinner	506f592769	Check size of wchar_t using the preprocessor	2011-09-28 22:34:18 +02:00
Victor Stinner	73f01c65c8	PyUnicode_CopyCharacters() initializes overflow	2011-09-28 22:28:04 +02:00
Victor Stinner	e57b1c0da1	Mark PyUnicode_FromUCS[124] as private	2011-09-28 22:20:48 +02:00
Victor Stinner	ff9e50fd04	Oops, fix Py_MIN/Py_MAX case	2011-09-28 22:17:19 +02:00
Victor Stinner	17222160e7	Mark _PyUnicode_FindMaxCharAndNumSurrogatePairs() as private	2011-09-28 22:15:37 +02:00
Victor Stinner	157f83fcfc	Strip trailing spaces in unicodeobject.[ch]	2011-09-28 21:41:31 +02:00
Victor Stinner	6c7a52a46f	Check for PyUnicode_CopyCharacters() failure	2011-09-28 21:39:17 +02:00
Victor Stinner	be78eaf2de	PyUnicode_CopyCharacters() checks for buffer and character overflow It now returns the number of written characters on success.	2011-09-28 21:37:03 +02:00
Victor Stinner	fb5f5f2420	Mark PyUnicode_CONVERT_BYTES as private	2011-09-28 21:39:49 +02:00
Georg Brandl	4cb0de246c	Rename new macros to conform to naming rules (function macros have "Py" prefix, not "PY").	2011-09-28 21:49:49 +02:00
Benjamin Peterson	9c6e6a0c7f	don't check that the first character is XID_Continue Current, XID_Continue is a superset of XID_Start, but that may sometime change.	2011-09-28 08:09:05 -04:00
Martin v. Löwis	d63a3b8beb	Implement PEP 393.	2011-09-28 07:41:54 +02:00
Mark Dickinson	57e683e53e	Issue #1621 : Fix undefined behaviour in bytes.__hash__, str.__hash__, tuple.__hash__, frozenset.__hash__ and set indexing operations.	2011-09-24 18:18:40 +01:00
Mark Dickinson	0d5f6adbb3	Issue #13012 : Allow 'keepends' to be passed as a keyword argument in str.splitlines, bytes.splitlines and bytearray.splitlines.	2011-09-24 09:14:39 +01:00
Victor Stinner	f955eb210f	Merge 3.2: Fix PyUnicode_AsWideCharString() doc - Fix PyUnicode_AsWideCharString() doc: size doesn't contain the null character - Fix spelling of the null character	2011-09-06 02:01:29 +02:00
Victor Stinner	d88d9836c5	Fix PyUnicode_AsWideCharString() doc: size doesn't contain the null character Fix also spelling of the null character.	2011-09-06 02:00:05 +02:00
Ezio Melotti	6f2a683a0c	#9200 : merge with 3.2.	2011-08-22 20:31:11 +03:00
Ezio Melotti	93e7afc5d9	#9200 : The str.is* methods now work with strings that contain non-BMP characters even in narrow Unicode builds.	2011-08-22 14:08:38 +03:00
Benjamin Peterson	e518d4c18a	merge 3.2	2011-08-18 13:52:19 -05:00
Benjamin Peterson	7a6b44ab62	the named of the character is actually NUL	2011-08-18 13:51:47 -05:00
Benjamin Peterson	020340f284	merge 3.2	2011-08-18 10:49:16 -05:00
Benjamin Peterson	5ad517a7d9	NUL -> NULL	2011-08-18 10:48:50 -05:00
Ezio Melotti	269e3ee3db	#12266 : merge with 3.2.	2011-08-15 09:26:28 +03:00
Ezio Melotti	ee8d998ecf	#12266 : Fix str.capitalize() to correctly uppercase/lowercase titlecased and cased non-letter characters.	2011-08-15 09:09:57 +03:00
Benjamin Peterson	f8e7543df9	merge 3.2 (#12732 )	2011-08-12 22:18:19 -05:00
Benjamin Peterson	f413b80806	in narrow builds, make sure to test codepoints as identifier characters (closes #12732 ) This fixes the use of Unicode identifiers outside the BMP in narrow builds.	2011-08-12 22:17:18 -05:00
Brian Curtin	dfc80e3d97	Replace Py_NotImplemented returns with the macro form Py_RETURN_NOTIMPLEMENTED. The macro was introduced in #12724.	2011-08-10 20:28:54 -05:00
Victor Stinner	ab1d16b456	Issue #13093 : Fix error handling on PyUnicode_EncodeDecimal() * Add tests for PyUnicode_EncodeDecimal() and PyUnicode_TransformDecimalToASCII() * Remove the unused "e" variable in replace()	2011-11-22 01:45:37 +01:00
Senthil Kumaran	fcdaaa9011	merge from 3.2 - Fix closes Issue12621 - Fix docstrings of find and rfind methods of bytes/bytearry/unicodeobject.	2011-07-27 23:34:29 +08:00
Senthil Kumaran	53516a82df	Fix closes Issue12621 - Fix docstrings of find and rfind methods of bytes/bytearry/unicodeobject.	2011-07-27 23:33:54 +08:00
Victor Stinner	99b9538636	Issue #9642 : Uniformize the tests on the availability of the mbcs codec Add a new HAVE_MBCS define.	2011-07-04 14:23:54 +02:00
Senthil Kumaran	bc9d8f838b	merge from 3.2	2011-07-03 21:05:25 -07:00
Senthil Kumaran	9ebe08d2f6	Fix closes issue12471 - wrong TypeError message when '%i' format spec was used.	2011-07-03 21:03:16 -07:00
Victor Stinner	3cbf14bfb1	Issue #10914 : Initialize correctly the filesystem codec when creating a new subinterpreter to fix a bootstrap issue with codecs implemented in Python, as the ISO-8859-15 codec. Add fscodec_initialized attribute to the PyInterpreterState structure.	2011-04-27 00:24:21 +02:00
Victor Stinner	793b531756	Issue #10914 : Initialize correctly the filesystem codec when creating a new subinterpreter to fix a bootstrap issue with codecs implemented in Python, as the ISO-8859-15 codec. Add fscodec_initialized attribute to the PyInterpreterState structure.	2011-04-27 00:24:21 +02:00
Ezio Melotti	bf1253b25a	#6780 : merge with 3.2.	2011-04-26 06:45:24 +03:00
Ezio Melotti	f2b3f780a1	#6780 : merge with 3.1.	2011-04-26 06:40:59 +03:00
Ezio Melotti	ba42fd5801	#6780 : fix starts/endswith error message to mention that tuples are accepted too.	2011-04-26 06:09:45 +03:00
Jesus Cea	c1ceb64e41	MERGE: startswith and endswith don't accept None as slice index. Patch by Torsten Becker. (closes #11828 )	2011-04-20 17:59:29 +02:00
Jesus Cea	6159ee3cf5	MERGE: startswith and endswith don't accept None as slice index. Patch by Torsten Becker. (closes #11828 )	2011-04-20 17:42:50 +02:00
Jesus Cea	ac4515063c	startswith and endswith don't accept None as slice index. Patch by Torsten Becker. (closes #11828 )	2011-04-20 17:09:23 +02:00
Benjamin Peterson	5fd4bd3796	avoid casting with this nice macro	2011-03-06 09:06:34 -06:00
Victor Stinner	2f283c2c19	Fix my previous commit (r88709) for str.encode(errors=...)	2011-03-02 01:21:46 +00:00
Victor Stinner	a5c68c3cb7	Issue #8923 : cache str.encode() result When a string is encoded to UTF-8 in strict mode, the result is cached into the object. Examples: str.encode(), str.encode('utf-8'), PyUnicode_AsUTF8String() and PyUnicode_AsEncodedString(unicode, "utf-8", NULL).	2011-03-02 01:03:14 +00:00
Victor Stinner	f3fd733f92	Remove useless argument of _PyUnicode_AsDefaultEncodedString()	2011-03-02 01:03:11 +00:00
Victor Stinner	6d970f4713	Issue #10831 : PyUnicode_FromFormat() supports %li, %lli and %zi formats	2011-03-02 00:04:25 +00:00
Victor Stinner	e7faec1aa9	Fix my previous commit (r88702): initialize size_tflag in parse_format_flags()	2011-03-02 00:01:53 +00:00
Victor Stinner	968654515f	Issue #10829 : Refactor PyUnicode_FromFormat() * Use the same function to parse the format string in the 3 steps * Fix crashs on invalid format strings	2011-03-01 23:44:09 +00:00
Victor Stinner	2b574a2332	Merged revisions 88697 via svnmerge from svn+ssh://pythondev@svn.python.org/python/branches/py3k ........ r88697 \| victor.stinner \| 2011-03-01 23:46:52 +0100 (mar., 01 mars 2011) \| 4 lines Issue #11246: Fix PyUnicode_FromFormat("%V") Decode the byte string from UTF-8 (with replace error handler) instead of ISO-8859-1 (in strict mode). Patch written by Ray Allen. ........	2011-03-01 22:48:49 +00:00
Victor Stinner	2512a8b62e	Issue #11246 : Fix PyUnicode_FromFormat("%V") Decode the byte string from UTF-8 (with replace error handler) instead of ISO-8859-1 (in strict mode). Patch written by Ray Allen.	2011-03-01 22:46:52 +00:00
Alexander Belopolsky	4001847a98	PEP 7 conformance changes (whitespace only).	2011-02-26 01:02:56 +00:00
Alexander Belopolsky	1d52146a25	Issue #11303 : Added shortcuts for utf8 and latin1 encodings. Documented the list of optimized encodings as CPython implementation detail.	2011-02-25 19:19:57 +00:00
Victor Stinner	659eb84457	Merged revisions 88481 via svnmerge from svn+ssh://pythondev@svn.python.org/python/branches/py3k ........ r88481 \| victor.stinner \| 2011-02-21 22:13:44 +0100 (lun., 21 févr. 2011) \| 4 lines Fix PyUnicode_FromFormatV("%c") for non-BMP char Issue #10830: Fix PyUnicode_FromFormatV("%c") for non-BMP characters on narrow build. ........	2011-02-23 12:14:22 +00:00
Brett Cannon	b94767ff44	Issue #8914 : fix various warnings from the Clang static analyzer v254.	2011-02-22 20:15:44 +00:00
Victor Stinner	5ed8b2c737	Fix PyUnicode_FromFormatV("%c") for non-BMP char Issue #10830: Fix PyUnicode_FromFormatV("%c") for non-BMP characters on narrow build.	2011-02-21 21:13:44 +00:00
Victor Stinner	fd34b3788f	Remove bootstrap code of PyUnicode_AsEncodedString() Issue #11187: Remove bootstrap code (use ASCII) of PyUnicode_AsEncodedString(), it was replaced by a better fallback (use the locale encoding) in PyUnicode_EncodeFSDefault(). Prepare also empty sections in NEWS.	2011-02-21 20:51:28 +00:00
Alexander Belopolsky	b9cc00caab	Removed unneeded #include	2010-12-22 02:35:20 +00:00
Benjamin Peterson	28a4dce6a8	remove (un)transform methods	2010-12-12 01:33:04 +00:00
Alexander Belopolsky	942af5a9a4	Issue #10557 : Fixed error messages from float() and other numeric types. Added a new API function, PyUnicode_TransformDecimalToASCII(), which transforms non-ASCII decimal digits in a Unicode string to their ASCII equivalents.	2010-12-04 03:38:46 +00:00
Martin v. Löwis	4d0d471a80	Merge branches/pep-0384.	2010-12-03 20:14:31 +00:00
Georg Brandl	3b9406b08a	Remove redundant check for PyBytes in unicode_encode.	2010-12-03 07:54:09 +00:00
Georg Brandl	02524629f3	#7475 : add (un)transform method to bytes/bytearray and str, add back codecs that can be used with them from Python 2.	2010-12-02 18:06:51 +00:00
Georg Brandl	e5b99f0fb3	Remove redundant includes of headers that are already included by Python.h.	2010-11-30 09:41:01 +00:00
Victor Stinner	d5af0a5df0	PyUnicode_DecodeFSDefaultAndSize() raises MemoryError if _Py_char2wchar() fails	2010-11-08 23:34:29 +00:00
Victor Stinner	2f02a51135	PyUnicode_EncodeFS() raises an exception if _Py_wchar2char() fails * Add error_pos optional argument to _Py_wchar2char() * PyUnicode_EncodeFS() raises a UnicodeEncodeError or MemoryError if _Py_wchar2char() fails	2010-11-08 22:43:46 +00:00
Victor Stinner	c911bbfd5d	str, bytes, bytearray docstring: remove unnecessary [...]	2010-11-07 19:04:46 +00:00
Victor Stinner	e14e212221	Fix encode/decode method doc of str, bytes, bytearray types * Specify the default encoding: write 'utf-8' instead of sys.getdefaultencoding(), because the default encoding is now constant * Specify the default errors value	2010-11-07 18:41:46 +00:00
Eric Smith	16562f41b0	Merged revisions 86277 via svnmerge from svn+ssh://pythondev@svn.python.org/python/branches/py3k ........ r86277 \| eric.smith \| 2010-11-06 15:27:37 -0400 (Sat, 06 Nov 2010) \| 1 line Added more to docstrings for str.format, format_map, and __format__. ........	2010-11-06 19:29:45 +00:00
Eric Smith	51d2fd983b	Added more to docstrings for str.format, format_map, and __format__.	2010-11-06 19:27:37 +00:00
David Malcolm	9696088b6d	Issue #10288 : The deprecated family of "char"-handling macros (ISLOWER()/ISUPPER()/etc) have now been removed: use Py_ISLOWER() etc instead.	2010-11-05 17:23:41 +00:00
Eric Smith	27bbca6f79	Issue #6081 : Add str.format_map. str.format_map(mapping) is similar to str.format(**mapping), except mapping does not get converted to a dict.	2010-11-04 17:06:58 +00:00
Victor Stinner	ad15872854	Simplify PyUnicode_Encode/DecodeFSDefault on Windows/Mac OS X * Windows always uses mbcs * Mac OS X always uses utf-8	2010-10-27 00:25:46 +00:00
Victor Stinner	f933e1ab6f	Issue #4388 : On Mac OS X, decode command line arguments from UTF-8, instead of the locale encoding. If the LANG (and LC_ALL and LC_CTYPE) environment variable is not set, the locale encoding is ISO-8859-1, whereas most programs (including Python) expect UTF-8. Python already uses UTF-8 for the filesystem encoding and to encode command line arguments on this OS.	2010-10-20 22:58:25 +00:00
Victor Stinner	9a90900da5	PyUnicode_FromFormatV(): Fix %A format It was not completly implemented. Add a test.	2010-10-18 20:59:24 +00:00
Benjamin Peterson	8f67d0893f	make hashes always the size of pointers; introduce Py_hash_t #9778	2010-10-17 20:54:53 +00:00
Georg Brandl	ded5acf34a	Merged revisions 81936 via svnmerge from svn+ssh://svn.python.org/python/branches/py3k ........ r81936 \| mark.dickinson \| 2010-06-12 11:10:14 +0200 (Sa, 12 Jun 2010) \| 2 lines Silence 'unused variable' gcc warning. Patch by Éric Araujo. ........	2010-10-17 11:48:07 +00:00
Victor Stinner	168e117e0a	Add an optional size argument to _Py_char2wchar() _Py_char2wchar() callers usually need the result size in characters. Since it's trivial to compute it in _Py_char2wchar() (O(1) whereas wcslen() is O(n)), add an option to get it.	2010-10-16 23:16:16 +00:00
Victor Stinner	f3170ccef8	Use locale encoding if Py_FileSystemDefaultEncoding is not set * PyUnicode_EncodeFSDefault(), PyUnicode_DecodeFSDefaultAndSize() and PyUnicode_DecodeFSDefault() use the locale encoding instead of UTF-8 if Py_FileSystemDefaultEncoding is NULL * redecode_filenames() functions and _Py_code_object_list (issue #9630) are no more needed: remove them	2010-10-15 12:04:23 +00:00
Georg Brandl	66c221e993	#9418 : first step of moving private string methods to _string module.	2010-10-14 07:04:07 +00:00
Victor Stinner	beb4135b8c	PyUnicode_AsWideCharString() takes a PyObject, not a PyUnicodeObject All unicode functions uses PyObject* except PyUnicode_AsWideChar(). Fix the prototype for the new function PyUnicode_AsWideCharString().	2010-10-07 01:02:42 +00:00
Victor Stinner	5593d8aeb4	Issue #8670 : PyUnicode_AsWideChar() and PyUnicode_AsWideCharString() replace UTF-16 surrogate pairs by single non-BMP characters for 16 bits Py_UNICODE and 32 bits wchar_t (eg. Linux in narrow build).	2010-10-02 11:11:27 +00:00
Victor Stinner	1c24bd0252	Issue #8870 : PyUnicode_AsWideCharString() doesn't count the trailing nul character And write unit tests for PyUnicode_AsWideChar() and PyUnicode_AsWideCharString().	2010-10-02 11:03:13 +00:00
Victor Stinner	71e91a358b	Fix PyUnicode_AsWideCharString(): set *size if size is not NULL	2010-09-29 17:55:12 +00:00
Victor Stinner	c39211f51e	Issue #9630 : Redecode filenames when setting the filesystem encoding Redecode the filenames of: - all modules: __file__ and __path__ attributes - all code objects: co_filename attribute - sys.path - sys.meta_path - sys.executable - sys.path_importer_cache (keys) Keep weak references to all code objects until initfsencoding() is called, to be able to redecode co_filename attribute of all code objects.	2010-09-29 16:35:47 +00:00
Victor Stinner	137c34c027	Issue #9979 : Create function PyUnicode_AsWideCharString().	2010-09-29 10:25:54 +00:00
Benjamin Peterson	d4ac96a336	use return NULL; it's just as correct	2010-09-12 16:40:53 +00:00
Victor Stinner	4c7db315df	Issue #9738 , #9836 : Fix refleak introduced by r84704	2010-09-12 07:51:18 +00:00
Benjamin Peterson	9be0b2e312	detect non-ascii characters much earlier (plugs ref leak)	2010-09-12 03:40:54 +00:00
Victor Stinner	1205f2774e	Issue #9738 : PyUnicode_FromFormat() and PyErr_Format() raise an error on a non-ASCII byte in the format string. Document also the encoding.	2010-09-11 00:54:47 +00:00
Victor Stinner	46408606d8	Rename PyUnicode_strdup() to PyUnicode_AsUnicodeCopy()	2010-09-03 16:18:00 +00:00
Victor Stinner	71133ff368	Create PyUnicode_strdup() function	2010-09-01 23:43:53 +00:00
Victor Stinner	c4eb765fc1	Create Py_UNICODE_strcat() function	2010-09-01 23:43:50 +00:00
Victor Stinner	42cb462682	Remove unicode_default_encoding constant Inline its value in PyUnicode_GetDefaultEncoding(). The comment is now outdated (we will not change its value anymore).	2010-09-01 19:39:01 +00:00
Antoine Pitrou	fce7fd6426	Issue #9549 : sys.setdefaultencoding() and PyUnicode_SetDefaultEncoding() are now removed, since their effect was inexistent in 3.x (the default encoding is hardcoded to utf-8 and cannot be changed).	2010-09-01 18:54:56 +00:00
Antoine Pitrou	a2983c6734	Merged revisions 84394 via svnmerge from svn+ssh://pythondev@svn.python.org/python/branches/py3k ........ r84394 \| antoine.pitrou \| 2010-09-01 17:10:12 +0200 (mer., 01 sept. 2010) \| 4 lines Issue #7415: PyUnicode_FromEncodedObject() now uses the new buffer API properly. Patch by Stefan Behnel. ........	2010-09-01 15:16:41 +00:00
Antoine Pitrou	b0fa831d1e	Issue #7415 : PyUnicode_FromEncodedObject() now uses the new buffer API properly. Patch by Stefan Behnel.	2010-09-01 15:10:12 +00:00
Daniel Stutzbach	8515eaefda	Issue 8781: On systems a signed 4-byte wchar_t and a 4-byte Py_UNICODE, use memcpy to convert between the two (as already done when wchar_t is unsigned)	2010-08-24 21:57:33 +00:00
Victor Stinner	3119ed73aa	Fix PyUnicode_EncodeFSDefault() indentation	2010-08-18 22:26:50 +00:00
Victor Stinner	ef8d95c498	Issue #9425 : Create Py_UNICODE_strncmp() function The code is based on strncmp() of the libiberty library, function in the public domain.	2010-08-16 22:03:11 +00:00
Victor Stinner	47fcb5b4c3	Issue #9542 : Create PyUnicode_FSDecoder() function It's a ParseTuple converter: decode bytes objects to unicode using PyUnicode_DecodeFSDefaultAndSize(); str objects are output as-is. * Don't specify surrogateescape error handler in the comments nor the documentation, but PyUnicode_DecodeFSDefaultAndSize() and PyUnicode_EncodeFSDefault() because these functions use strict error handler for the mbcs encoding (on Windows). * Remove PyUnicode_FSConverter() comment in unicodeobject.c to avoid inconsistency with unicodeobject.h.	2010-08-13 23:59:58 +00:00
Victor Stinner	4a2b7a1b14	Issue #9425 : Create PyErr_WarnFormat() function Similar to PyErr_WarnEx() but use PyUnicode_FromFormatV() to format the warning message. Strip also some trailing spaces.	2010-08-13 14:03:48 +00:00
Alexander Belopolsky	f0f45142d5	Issue #2443 : Added a new macro, Py_VA_COPY, which is equivalent to C99 va_copy, but available on all python platforms. Untabified a few unrelated files.	2010-08-11 17:31:17 +00:00
Victor Stinner	331ea92ade	Issue #9425 : create Py_UNICODE_strrchr() function	2010-08-10 16:37:20 +00:00
Georg Brandl	1fa11af7aa	Merged revisions 83226-83227,83229-83232 via svnmerge from svn+ssh://svn.python.org/python/branches/py3k ........ r83226 \| georg.brandl \| 2010-07-29 16:17:12 +0200 (Do, 29 Jul 2010) \| 1 line #1090076: explain the behavior of vars in get() better. ........ r83227 \| georg.brandl \| 2010-07-29 16:23:06 +0200 (Do, 29 Jul 2010) \| 1 line Use Py_CLEAR(). ........ r83229 \| georg.brandl \| 2010-07-29 16:32:22 +0200 (Do, 29 Jul 2010) \| 1 line #9407: document configparser.Error. ........ r83230 \| georg.brandl \| 2010-07-29 16:36:11 +0200 (Do, 29 Jul 2010) \| 1 line Use correct directive and name. ........ r83231 \| georg.brandl \| 2010-07-29 16:46:07 +0200 (Do, 29 Jul 2010) \| 1 line #9397: remove mention of dbm.bsd which does not exist anymore. ........ r83232 \| georg.brandl \| 2010-07-29 16:49:08 +0200 (Do, 29 Jul 2010) \| 1 line #9388: remove ERA_YEAR which is never defined in the source code. ........	2010-08-01 21:03:01 +00:00
Georg Brandl	0f1470960c	Recorded merge of revisions 83444 via svnmerge from svn+ssh://svn.python.org/python/branches/py3k ........ r83444 \| georg.brandl \| 2010-08-01 22:51:02 +0200 (So, 01 Aug 2010) \| 1 line Revert r83395, it introduces test failures and is not necessary anyway since we now have to nul-terminate the string anyway. ........	2010-08-01 20:54:22 +00:00
Georg Brandl	78eef3de88	Revert r83395, it introduces test failures and is not necessary anyway since we now have to nul-terminate the string anyway.	2010-08-01 20:51:02 +00:00
Georg Brandl	a70070c9e5	Merged revisions 83395,83417 via svnmerge from svn+ssh://svn.python.org/python/branches/py3k ........ r83395 \| georg.brandl \| 2010-08-01 10:49:18 +0200 (So, 01 Aug 2010) \| 1 line #8821: do not rely on Unicode strings being terminated with a \u0000, rather explicitly check range before looking for a second surrogate character. ........ r83417 \| georg.brandl \| 2010-08-01 20:38:26 +0200 (So, 01 Aug 2010) \| 1 line #5776: fix mistakes in python specfile. (Nobody probably uses it anyway.) ........	2010-08-01 18:59:44 +00:00
Georg Brandl	bd534f0349	#8821 : do not rely on Unicode strings being terminated with a \u0000, rather explicitly check range before looking for a second surrogate character.	2010-08-01 08:49:18 +00:00
Georg Brandl	8ee604b989	Use Py_CLEAR().	2010-07-29 14:23:06 +00:00
Stefan Krah	aebd6f4c29	Merged revisions 82978 via svnmerge from svn+ssh://pythondev@svn.python.org/python/branches/py3k ........ r82978 \| stefan.krah \| 2010-07-19 19:58:26 +0200 (Mon, 19 Jul 2010) \| 3 lines Sub-issue of #9036: Fix incorrect use of Py_CHARMASK. ........	2010-07-19 18:01:13 +00:00
Stefan Krah	99212f61db	Sub-issue of #9036 : Fix incorrect use of Py_CHARMASK.	2010-07-19 17:58:26 +00:00
Senthil Kumaran	74ceac2306	Merged revisions 82573 via svnmerge from svn+ssh://pythondev@svn.python.org/python/branches/py3k ........ r82573 \| senthil.kumaran \| 2010-07-05 17:30:56 +0530 (Mon, 05 Jul 2010) \| 3 lines Fix the docstrings of the capitalize method. ........	2010-07-05 12:04:23 +00:00
Senthil Kumaran	e51ee8a5bc	Fix the docstrings of the capitalize method.	2010-07-05 12:00:56 +00:00
Ezio Melotti	25bc019d46	Merged revisions 82413,82468 via svnmerge from svn+ssh://pythondev@svn.python.org/python/branches/py3k ........ r82413 \| ezio.melotti \| 2010-07-01 10:32:02 +0300 (Thu, 01 Jul 2010) \| 13 lines Update PyUnicode_DecodeUTF8 from RFC 2279 to RFC 3629. 1) #8271: when a byte sequence is invalid, only the start byte and all the valid continuation bytes are now replaced by U+FFFD, instead of replacing the number of bytes specified by the start byte. See http://www.unicode.org/versions/Unicode5.2.0/ch03.pdf (pages 94-95); 2) 5- and 6-bytes-long UTF-8 sequences are now considered invalid (no changes in behavior); 3) Change the error messages "unexpected code byte" to "invalid start byte" and "invalid data" to "invalid continuation byte"; 4) Add an extensive set of tests in test_unicode; 5) Fix test_codeccallbacks because it was failing after this change. ........ r82468 \| ezio.melotti \| 2010-07-03 07:52:19 +0300 (Sat, 03 Jul 2010) \| 1 line Update comment about surrogates. ........	2010-07-03 05:18:50 +00:00
Ezio Melotti	9bf2b3ae6a	Update comment about surrogates.	2010-07-03 04:52:19 +00:00
Ezio Melotti	57221d02ba	Update PyUnicode_DecodeUTF8 from RFC 2279 to RFC 3629. 1) #8271: when a byte sequence is invalid, only the start byte and all the valid continuation bytes are now replaced by U+FFFD, instead of replacing the number of bytes specified by the start byte. See http://www.unicode.org/versions/Unicode5.2.0/ch03.pdf (pages 94-95); 2) 5- and 6-bytes-long UTF-8 sequences are now considered invalid (no changes in behavior); 3) Change the error messages "unexpected code byte" to "invalid start byte" and "invalid data" to "invalid continuation byte"; 4) Add an extensive set of tests in test_unicode; 5) Fix test_codeccallbacks because it was failing after this change.	2010-07-01 07:32:02 +00:00
Georg Brandl	952867aa30	#9078 : fix some Unicode C API descriptions, in comments and docs.	2010-06-27 10:17:12 +00:00
Ezio Melotti	415f340a0c	Merged revisions 82252 via svnmerge from svn+ssh://pythondev@svn.python.org/python/branches/py3k ................ r82252 \| ezio.melotti \| 2010-06-26 21:50:39 +0300 (Sat, 26 Jun 2010) \| 9 lines Merged revisions 82248 via svnmerge from svn+ssh://pythondev@svn.python.org/python/trunk ........ r82248 \| ezio.melotti \| 2010-06-26 21:44:42 +0300 (Sat, 26 Jun 2010) \| 1 line Fix extra space. ........ ................	2010-06-26 18:52:26 +00:00
Ezio Melotti	c1897e716d	Merged revisions 82248 via svnmerge from svn+ssh://pythondev@svn.python.org/python/trunk ........ r82248 \| ezio.melotti \| 2010-06-26 21:44:42 +0300 (Sat, 26 Jun 2010) \| 1 line Fix extra space. ........	2010-06-26 18:50:39 +00:00
Victor Stinner	554f3f0081	Issue #850997 : mbcs encoding (Windows only) handles errors argument: strict mode raises unicode errors. The encoder only supports "strict" and "replace" error handlers, the decoder only supports "strict" and "ignore" error handlers.	2010-06-16 23:33:54 +00:00
Mark Dickinson	7db923cc99	Silence 'unused variable' gcc warning. Patch by Éric Araujo.	2010-06-12 09:10:14 +00:00
Victor Stinner	313a120ab6	Issue #8969 : On Windows, use mbcs codec in strict mode to encode and decode filenames and enable os.fsencode().	2010-06-11 23:56:51 +00:00
Antoine Pitrou	6107a688ee	Merged revisions 81908 via svnmerge from svn+ssh://pythondev@svn.python.org/python/branches/py3k ................ r81908 \| antoine.pitrou \| 2010-06-11 23:46:32 +0200 (ven., 11 juin 2010) \| 11 lines Merged revisions 81907 via svnmerge from svn+ssh://pythondev@svn.python.org/python/trunk ........ r81907 \| antoine.pitrou \| 2010-06-11 23:42:26 +0200 (ven., 11 juin 2010) \| 5 lines Issue #8941: decoding big endian UTF-32 data in UCS-2 builds could crash the interpreter with characters outside the Basic Multilingual Plane (higher than 0x10000). ........ ................	2010-06-11 21:48:34 +00:00

... 5 6 7 8 9 ...

1133 Commits