Commit Graph

850 Commits

Author SHA1 Message Date
Victor Stinner ca4f20782e find_maxchar_surrogates() reuses surrogate macros 2011-11-22 03:38:40 +01:00
Victor Stinner 0d3721d986 Issue #13441: Disable temporary the check on the maximum character until
the Solaris issue is solved.

But add assertion on the maximum character in various encoders: UTF-7, UTF-8,
wide character (wchar_t*, Py_UNICODE*), unicode-escape, raw-unicode-escape.

Fix also unicode_encode_ucs1() for backslashreplace error handler: Python is
now always "wide".
2011-11-22 03:27:53 +01:00
Victor Stinner f8facacf30 Fix compiler warnings 2011-11-22 02:30:47 +01:00
Victor Stinner b84d723509 (Merge 3.2) Issue #13093: Fix error handling on PyUnicode_EncodeDecimal() 2011-11-22 01:50:07 +01:00
Victor Stinner cfed46e00a PyUnicode_FromKindAndData() fails with a ValueError if size < 0 2011-11-22 01:29:14 +01:00
Victor Stinner 42885206ec UTF-8 decoder: set consumed value in the latin1 fast-path 2011-11-22 01:23:02 +01:00
Victor Stinner d3df8ab377 Replace _PyUnicode_READY_REPLACE() and _PyUnicode_ReadyReplace() with unicode_ready()
* unicode_ready() has a simpler API
 * try to reuse unicode_empty and latin1_char singleton everywhere
 * Fix a reference leak in _PyUnicode_TranslateCharmap()
 * PyUnicode_InternInPlace() doesn't try to get a singleton anymore, to avoid
   having to handle a failure
2011-11-22 01:22:34 +01:00
Victor Stinner f01245067a Rewrite PyUnicode_TransformDecimalToASCII() to use the new Unicode API 2011-11-21 23:12:56 +01:00
Victor Stinner 2d718f39a5 Remove an unused variable from PyUnicode_Copy() 2011-11-21 23:11:52 +01:00
Victor Stinner 87af4f2f3a Simplify PyUnicode_Copy()
USe PyUnicode_Copy() in fixup()
2011-11-21 23:03:47 +01:00
Victor Stinner 5bbe5e7c85 Fix a compiler warning in _PyUnicode_CheckConsistency() 2011-11-21 22:54:05 +01:00
Victor Stinner 42bf77537e Rewrite PyUnicode_EncodeDecimal() to use the new Unicode API
Add tests for PyUnicode_EncodeDecimal() and
PyUnicode_TransformDecimalToASCII().
2011-11-21 22:52:58 +01:00
Antoine Pitrou 0a3229de6b Issue #13417: speed up utf-8 decoding by around 2x for the non-fully-ASCII case.
This almost catches up with pre-PEP 393 performance, when decoding needed
only one pass.
2011-11-21 20:39:13 +01:00
Victor Stinner da29cc36aa Issue #13441: _PyUnicode_CheckConsistency() dumps the string if the maximum
character is bigger than U+10FFFF and locale.localeconv() dumps the string
before decoding it.

Temporary hack to debug the issue #13441.
2011-11-21 14:31:41 +01:00
Victor Stinner 9e30aa52fd Fix misuse of PyUnicode_GET_SIZE() => PyUnicode_GET_LENGTH()
And PyUnicode_GetSize() => PyUnicode_GetLength()
2011-11-21 02:49:52 +01:00
Victor Stinner 4ead7c7be8 PyObject_Str() ensures that the result string is ready
and check the string consistency.

_PyUnicode_CheckConsistency() doesn't check the hash anymore. It should be
possible to call this function even if hash(str) was already called.
2011-11-20 19:48:36 +01:00
Victor Stinner b960b34577 PyUnicode_AsUTF32String() calls directly _PyUnicode_EncodeUTF32(),
instead of calling the deprecated PyUnicode_EncodeUTF32() function
2011-11-20 19:12:52 +01:00
Victor Stinner 77faf69ca1 _PyUnicode_CheckConsistency() also checks maxchar maximum value,
not only its minimum value
2011-11-20 18:56:05 +01:00
Victor Stinner d5c4022d2a Remove the two ugly and unused WRITE_ASCII_OR_WSTR and WRITE_WSTR macros 2011-11-20 18:41:31 +01:00
Victor Stinner 2e9cfadd7c Reuse surrogate macros in UTF-16 decoder 2011-11-20 18:40:27 +01:00
Victor Stinner ae4f7c8e59 charmap_encoding_error() uses the new Unicode API 2011-11-20 18:28:55 +01:00
Victor Stinner ac931b1e5b Use PyUnicode_EncodeCodePage() instead of PyUnicode_EncodeMBCS() with
PyUnicode_AsUnicodeAndSize()
2011-11-20 18:27:03 +01:00
Victor Stinner 22168998f5 charmap encoders uses Py_UCS4, not Py_UNICODE 2011-11-20 17:09:18 +01:00
Victor Stinner 1f7951711c Catch PyUnicode_AS_UNICODE() errors 2011-11-17 00:45:54 +01:00
Ezio Melotti 11060a4a48 #13406: silence deprecation warnings in test_codecs. 2011-11-16 09:39:10 +02:00
Antoine Pitrou 78edf7576e Issue #13333: The UTF-7 decoder now accepts lone surrogates
(the encoder already accepts them).
2011-11-15 01:44:16 +01:00
Antoine Pitrou 5418ee0b9a Issue #13333: The UTF-7 decoder now accepts lone surrogates
(the encoder already accepts them).
2011-11-15 01:42:21 +01:00
Antoine Pitrou 31b92a534f Sanitize reference management in the utf-8 encoder 2011-11-12 18:35:19 +01:00
Antoine Pitrou 0290c7a811 Fix regression on 2-byte wchar_t systems (Windows) 2011-11-11 13:29:12 +01:00
Antoine Pitrou 44c6affc79 Avoid crashing because of an unaligned word access 2011-11-11 02:59:42 +01:00
Antoine Pitrou de20b0b50e Issue #13149: Speed up append-only StringIO objects.
This is very similar to the "lazy strings" idea.
2011-11-10 21:47:38 +01:00
Victor Stinner 9f4b1e9c50 Fix and deprecated the unicode_internal codec
unicode_internal codec uses Py_UNICODE instead of the real internal
representation (PEP 393: Py_UCS1, Py_UCS2 or Py_UCS4) for backward
compatibility.
2011-11-10 20:56:30 +01:00
Victor Stinner 24729f36bf Prefer Py_UCS4 or wchar_t over Py_UNICODE 2011-11-10 20:31:37 +01:00
Victor Stinner ebf3ba808e PyUnicode_DecodeCharmap() uses the new Unicode API 2011-11-10 20:30:22 +01:00
Victor Stinner a98b28c1bf Avoid PyUnicode_AS_UNICODE in the UTF-8 encoder 2011-11-10 20:21:49 +01:00
Victor Stinner 3326cb6a36 Fix "unicode_escape" encoder 2011-11-10 20:15:25 +01:00
Victor Stinner 0e36826a04 Fix UTF-7 encoder on Windows 2011-11-10 20:12:49 +01:00
Martin v. Löwis 1db7c13be1 Port encoders from Py_UNICODE API to unicode object API. 2011-11-10 18:24:32 +01:00
Victor Stinner 62aa4d086a Strip trailing spaces 2011-11-09 00:03:45 +01:00
Victor Stinner 0a045efb49 Fix a compiler warning: use unsiged for maxchar in unicode_widen() 2011-11-09 00:02:42 +01:00
Victor Stinner 596a6c4ffc Fix the code page decoder
* unicode_decode_call_errorhandler() now supports the PyUnicode_WCHAR_KIND
   kind
 * unicode_decode_call_errorhandler() calls copy_characters() instead of
   PyUnicode_CopyCharacters()
2011-11-09 00:02:18 +01:00
Antoine Pitrou a8f63c02ef Fix missing goto 2011-11-08 18:37:16 +01:00
Martin v. Löwis d10759f6ed Make _PyUnicode_FromId return borrowed references.
http://mail.python.org/pipermail/python-dev/2011-November/114347.html
2011-11-07 13:00:05 +01:00
Martin v. Löwis e9b11c1cd8 Change decoders to use Unicode API instead of Py_UNICODE. 2011-11-08 17:35:34 +01:00
Victor Stinner e30c0a1014 Fix gdb/libpython.py for not ready Unicode strings
_PyUnicode_CheckConsistency() checks also hash and length value for not ready
Unicode strings.
2011-11-04 20:54:05 +01:00
Victor Stinner 2fc507fe45 Replace tabs by spaces 2011-11-04 20:06:39 +01:00
Martin v. Löwis 12be46ca84 Drop Py_UNICODE based encode exceptions. 2011-11-04 19:04:15 +01:00
Martin v. Löwis 3d325191bf Port code page codec to Unicode API. 2011-11-04 18:23:06 +01:00
Victor Stinner fcd9653667 Fix a compiler warning in unicode_encode_ucs1() 2011-11-04 00:28:50 +01:00
Victor Stinner fc026c98d8 Fix PyUnicode_EncodeCharmap() 2011-11-04 00:24:51 +01:00
Victor Stinner 7931d9a951 Replace PyUnicodeObject type by PyObject
* _PyUnicode_CheckConsistency() now takes a PyObject* instead of void*
 * Remove now useless casts to PyObject*
2011-11-04 00:22:48 +01:00
Victor Stinner 76a31a6bff Cleanup decode_code_page_stateful() and encode_code_page()
* Fix decode_code_page_errors() result
 * Inline decode_code_page() and encode_code_page_chunk()
 * Replace the PyUnicodeObject type by PyObject
2011-11-04 00:05:13 +01:00
Victor Stinner 7581cef699 Adapt the code page encoder to the new unicode_encode_call_errorhandler()
The code is not correct, but at least it doesn't crash anymore.
2011-11-03 22:32:33 +01:00
Brian Curtin 2787ea41fd Fix a compile error (apparently Windows only) introduced in 295fdfd4f422 2011-11-02 15:09:37 -05:00
Martin v. Löwis 23e275b3ad Port UCS1 and charmap codecs to new API. 2011-11-02 18:02:51 +01:00
Martin v. Löwis 9e8166843c Introduce PyObject* API for raising encode errors. 2011-11-02 12:45:42 +01:00
Martin v. Löwis 0d3072e98d Drop Py_UCS4_ functions. Closes #13246. 2011-10-31 08:40:56 +01:00
Victor Stinner 57ffa9d4ff PyUnicode_AsUnicodeCopy() uses PyUnicode_AsUnicodeAndSize() to get directly the length 2011-10-23 20:10:08 +02:00
Victor Stinner af9e4b8c29 Fix PyUnicode_InternImmortal(): PyUnicode_InternInPlace() may changes *p 2011-10-23 20:07:00 +02:00
Victor Stinner 9faa384bed Cast directly to unsigned char, instead of using Py_CHARMASK
We don't need "& 0xff" on an unsigned char.
2011-10-23 20:06:00 +02:00
Victor Stinner 9db1a8b69f Replace PyUnicodeObject* by PyObject* where it was irrevelant
A Unicode string can now be a PyASCIIObject, PyCompactUnicodeObject or
PyUnicodeObject. Aliasing a PyASCIIObject* or PyCompactUnicodeObject* to
PyUnicodeObject* is wrong
2011-10-23 20:04:37 +02:00
Victor Stinner 0d60e87ad6 Fix data variable in _PyUnicode_Dump() for compact ASCII 2011-10-23 19:47:19 +02:00
Victor Stinner d8e61c348e Remove last references to the removed Unicode free list 2011-10-23 19:43:33 +02:00
Victor Stinner 065836ec9c PyUnicode_FSDecoder() ensures that the decoded string is ready 2011-10-27 01:56:33 +02:00
Victor Stinner dd18d3ad9e Fix unicode_subtype_new() on debug build
Patch written by Stefan Behnel.
2011-10-22 11:08:10 +02:00
Ezio Melotti f881751ded Remove unused variable. 2011-10-22 01:01:32 +03:00
Ezio Melotti 931b8aac80 #12753: Add support for Unicode name aliases and named sequences. 2011-10-21 21:57:36 +03:00
Victor Stinner 6707293e75 Add consistency check to _PyUnicode_New() 2011-10-18 22:10:14 +02:00
Victor Stinner 3a50e7056e Issue #12281: Rewrite the MBCS codec to handle correctly replace and ignore
error handlers on all Windows versions. The MBCS codec is now supporting all
error handlers, instead of only replace to encode and ignore to decode.
2011-10-18 21:21:00 +02:00
Benjamin Peterson 7a6debe79c remove some duplication 2011-10-15 09:25:28 -04:00
Victor Stinner f5cff56a1b Issue #13088: Add shared Py_hexdigits constant to format a number into base 16 2011-10-14 02:13:11 +02:00
Antoine Pitrou f0b934b01a Reuse the stringlib in findchar(), and make its signature more convenient 2011-10-13 18:55:09 +02:00
Victor Stinner 55c991197b Optimize unicode_subscript() for step != 1 and ascii strings 2011-10-13 01:17:06 +02:00
Victor Stinner 127226ba69 Don't use PyUnicode_MAX_CHAR_VALUE() macro in Py_MAX() 2011-10-13 01:12:34 +02:00
Victor Stinner 9e7a1bcfd6 Optimize findchar() for PyUnicode_1BYTE_KIND: use memchr and memrchr 2011-10-13 00:18:12 +02:00
Antoine Pitrou dd4e2f0153 Issue #13155: Optimize finding the optimal character width of an unicode string 2011-10-13 00:02:27 +02:00
Victor Stinner 49a0a21f37 Unicode replace() avoids calling unicode_adjust_maxchar() when it's useless
Add also a special case if the result is an empty string.
2011-10-12 23:46:10 +02:00
Victor Stinner 983b1434bd Backed out changeset 952d91a7d376
If maxchar == PyUnicode_MAX_CHAR_VALUE(unicode), we do an useless copy.
2011-10-12 00:54:35 +02:00
Antoine Pitrou e55ad2dff0 Relax condition 2011-10-12 00:36:51 +02:00
Victor Stinner 4e10100dee Fix compiler warning in _PyUnicode_FromUCS2() 2011-10-11 23:27:52 +02:00
Antoine Pitrou 950468e553 Use _PyUnicode_CONVERT_BYTES() where applicable. 2011-10-11 22:45:48 +02:00
Victor Stinner 577db2c9f0 PyUnicode_AsUnicodeCopy() now checks if PyUnicode_AsUnicode() failed 2011-10-11 22:12:48 +02:00
Victor Stinner c4f281eba3 Fix misuse of PyUnicode_GET_SIZE, use PyUnicode_GET_LENGTH instead 2011-10-11 22:11:42 +02:00
Antoine Pitrou e459a0877e Issue #13136: speed up conversion between different character widths. 2011-10-11 20:58:41 +02:00
Antoine Pitrou 2871698546 /* Remove unused code. It has been committed out since 2000 (!). */ 2011-10-11 03:17:47 +02:00
Antoine Pitrou 53bb548f22 Avoid exporting private helpers
(thanks "make smelly")
2011-10-10 23:49:24 +02:00
Victor Stinner 794d567b17 any_find_slice() doesn't use callbacks anymore
* Call directly the right find/rfind method: allow inlining functions
 * Remove Py_LOCAL_CALLBACK (added for any_find_slice)
2011-10-10 03:21:36 +02:00
Martin v. Löwis afe55bba33 Add API for static strings, primarily good for identifiers.
Thanks to Konrad Schöbel and Jasper Schulz for helping with the mass-editing.
2011-10-09 10:38:36 +02:00
Antoine Pitrou eaf139b3fc Fix typo in the PyUnicode_Find() implementation 2011-10-09 00:33:09 +02:00
Martin v. Löwis c47adb04b3 Change PyUnicode_KIND to 1,2,4. Drop _KIND_SIZE and _CHARACTER_SIZE. 2011-10-07 20:55:35 +02:00
Victor Stinner dd07732af5 PyUnicode_Join() calls directly memcpy() if all strings are of the same kind 2011-10-07 17:02:31 +02:00
Antoine Pitrou 978b9d2a27 Fix formatting memory consumption with very large padding specifications 2011-10-07 12:35:48 +02:00
Victor Stinner 59de0ee9e0 str.replace(a, a) is now returning str unchanged if a is a 2011-10-07 10:01:28 +02:00
Antoine Pitrou 5c0ba36d5f Fix massive slowdown in string formatting with the % operator 2011-10-07 01:54:09 +02:00
Antoine Pitrou 7c46da7993 Ensure that 1-char singletons get used 2011-10-06 22:07:51 +02:00
Victor Stinner c6f0df7b20 Fix PyUnicode_Join() for len==1 and non-exact string 2011-10-06 15:58:54 +02:00
Antoine Pitrou 15a66cf134 Fix compilation under Windows 2011-10-06 15:25:32 +02:00
Victor Stinner 200f21340d Fix assertion in unicode_adjust_maxchar() 2011-10-06 13:27:56 +02:00
Victor Stinner acf47b807f Fix my last change on PyUnicode_Join(): don't process separator if len==1 2011-10-06 12:32:37 +02:00
Victor Stinner 25a4b29c95 str.replace() avoids memory when it's possible 2011-10-06 12:31:55 +02:00