Commit Graph

4020 Commits

Author SHA1 Message Date
Antoine Pitrou d73a9acb63 Fix the fix for issue #12149: it was incorrect, although it had the side
effect of appearing to resolve the issue.  Thanks to Mark Shannon for
noticing.
2011-12-15 14:17:36 +01:00
Antoine Pitrou 2e872082f6 Fix the fix for issue #12149: it was incorrect, although it had the side
effect of appearing to resolve the issue.  Thanks to Mark Shannon for
noticing.
2011-12-15 14:15:31 +01:00
Florent Xicluna aa6c1d240f Issue #13575: there is only one class type. 2011-12-12 18:54:29 +01:00
Antoine Pitrou 9d57481f04 Issue #13577: various kinds of descriptors now have a __qualname__ attribute.
Patch by sbt.
2011-12-12 13:47:25 +01:00
Victor Stinner 16e6a80923 PyUnicode_Resize(): warn about canonical representation
Call also directly unicode_resize() in unicodeobject.c
2011-12-12 13:24:15 +01:00
Victor Stinner b0a82a6a7f Fix PyUnicode_Resize() for compact string: leave the string unchanged on error
Fix also PyUnicode_Resize() doc
2011-12-12 13:08:33 +01:00
Victor Stinner bf6e560d0c Make PyUnicode_Copy() private => _PyUnicode_Copy()
Undocument the function.

Make also decode_utf8_errors() as private (static).
2011-12-12 01:53:47 +01:00
Victor Stinner 7a9105a380 resize_copy() now supports legacy ready strings 2011-12-12 00:13:42 +01:00
Victor Stinner 488fa49acf Rewrite PyUnicode_Append(); unicode_modifiable() is more strict
* Rename unicode_resizable() to unicode_modifiable()
 * Rename _PyUnicode_Dirty() to unicode_check_modifiable() to make it clear
   that the function is private
 * Inline PyUnicode_Concat() and unicode_append_inplace() in PyUnicode_Append()
   to simplify the code
 * unicode_modifiable() return 0 if the hash has been computed or if the string
   is not an exact unicode string
 * Remove _PyUnicode_DIRTY(): no need to reset the hash anymore, because if the
   hash has already been computed, you cannot modify a string inplace anymore
 * PyUnicode_Concat() checks for integer overflow
2011-12-12 00:01:39 +01:00
Victor Stinner c4b495497a Create unicode_result_unchanged() subfunction 2011-12-11 22:44:26 +01:00
Victor Stinner eaab604829 Fix fixup() for unchanged unicode subtype
If maxchar_new == 0 and self is a unicode subtype, return u instead of duplicating u.
2011-12-11 22:22:39 +01:00
Victor Stinner e6b2d4407a unicode_fromascii() doesn't check string content twice in debug mode
_PyUnicode_CheckConsistency() also checks string content.
2011-12-11 21:54:30 +01:00
Victor Stinner a1d12bb119 Call directly PyUnicode_DecodeUTF8Stateful() instead of PyUnicode_DecodeUTF8()
* Remove micro-optimization from PyUnicode_FromStringAndSize():
   PyUnicode_DecodeUTF8Stateful() has already these optimizations (for size=0
   and one ascii char).
 * Rename utf8_max_char_size_and_char_count() to utf8_scanner(), and remove an
   useless variable
2011-12-11 21:53:09 +01:00
Victor Stinner 382955ff4e Use directly unicode_empty instead of PyUnicode_New(0, 0) 2011-12-11 21:44:00 +01:00
Victor Stinner 785938eebd Move the slowest UTF-8 decoder to its own subfunction
* Create decode_utf8_errors()
 * Reuse unicode_fromascii()
 * decode_utf8_errors() doesn't refit at the beginning
 * Remove refit_partial_string(), use unicode_adjust_maxchar() instead
2011-12-11 20:09:03 +01:00
Victor Stinner 84def3774d Fix error handling in resize_compact() 2011-12-11 20:04:56 +01:00
Victor Stinner 8faf8216e4 PyUnicode_FromWideChar() and PyUnicode_FromUnicode() raise a ValueError if a
character in not in range [U+0000; U+10ffff].
2011-12-08 22:14:11 +01:00
Antoine Pitrou b0e1f8b38b Issue #13503: Use a more efficient reduction format for bytearrays with
pickle protocol >= 3.  The old reduction format is kept with older
protocols in order to allow unpickling under Python 2.

Patch by Irmen de Jong.
2011-12-05 20:40:08 +01:00
Victor Stinner 0a54cf12a0 Fix PyObject_Repr(): don't call PyUnicode_READY() if res is NULL 2011-12-01 03:22:44 +01:00
Victor Stinner b37b17423b Replace PyUnicode_FromUnicode(NULL, 0) by PyUnicode_New(0, 0)
Create an empty string with the new Unicode API.
2011-12-01 03:18:59 +01:00
Victor Stinner db88ae5d66 PyObject_Repr() ensures that the result is a ready Unicode string
And PyObject_Str() and PyObject_Repr() don't make strings ready in debug
mode to ensure that the caller makes the string ready before using it.
2011-12-01 02:15:00 +01:00
Victor Stinner 551ac95733 Py_UNICODE_HIGH_SURROGATE() and Py_UNICODE_LOW_SURROGATE() macros
And use surrogates macros everywhere in unicodeobject.c
2011-11-29 22:58:13 +01:00
Antoine Pitrou c366117820 Merge heads 2011-11-26 01:13:12 +01:00
Antoine Pitrou f0effe6379 Better resolution for issue #11849: Ensure that free()d memory arenas are really released
on POSIX systems supporting anonymous memory mappings.  Patch by Charles-François Natali.
2011-11-26 01:11:02 +01:00
Victor Stinner 6345be9a14 Close #13093: PyUnicode_EncodeDecimal() doesn't support error handlers
different than "strict" anymore. The caller was unable to compute the
size of the output buffer: it depends on the error handler.
2011-11-25 20:09:01 +01:00
Antoine Pitrou 86a36b500a PEP 3155 / issue #13448: Qualified name for classes and functions. 2011-11-25 18:56:07 +01:00
Benjamin Peterson 1518e8713d and back to the "magic" formula (with a comment) it is 2011-11-23 10:44:52 -06:00
Benjamin Peterson 5944c36931 cave to those who like readable code 2011-11-22 19:05:49 -06:00
Benjamin Peterson 0268675193 fix compiler warning by implementing this more cleverly 2011-11-22 15:29:32 -05:00
Victor Stinner ca4f20782e find_maxchar_surrogates() reuses surrogate macros 2011-11-22 03:38:40 +01:00
Victor Stinner 0d3721d986 Issue #13441: Disable temporary the check on the maximum character until
the Solaris issue is solved.

But add assertion on the maximum character in various encoders: UTF-7, UTF-8,
wide character (wchar_t*, Py_UNICODE*), unicode-escape, raw-unicode-escape.

Fix also unicode_encode_ucs1() for backslashreplace error handler: Python is
now always "wide".
2011-11-22 03:27:53 +01:00
Victor Stinner f8facacf30 Fix compiler warnings 2011-11-22 02:30:47 +01:00
Victor Stinner 9d3b93ba30 Use the new Unicode API
* Replace PyUnicode_FromUnicode(NULL, 0) by PyUnicode_New(0, 0)
 * Replce PyUnicode_FromUnicode(str, len) by PyUnicode_FromWideChar(str, len)
 * Replace Py_UNICODE by wchar_t
 * posix_putenv() uses PyUnicode_FromFormat() to create the string, instead
   of PyUnicode_FromUnicode() + _snwprintf()
2011-11-22 02:27:30 +01:00
Victor Stinner b84d723509 (Merge 3.2) Issue #13093: Fix error handling on PyUnicode_EncodeDecimal() 2011-11-22 01:50:07 +01:00
Victor Stinner cfed46e00a PyUnicode_FromKindAndData() fails with a ValueError if size < 0 2011-11-22 01:29:14 +01:00
Victor Stinner 42885206ec UTF-8 decoder: set consumed value in the latin1 fast-path 2011-11-22 01:23:02 +01:00
Victor Stinner d3df8ab377 Replace _PyUnicode_READY_REPLACE() and _PyUnicode_ReadyReplace() with unicode_ready()
* unicode_ready() has a simpler API
 * try to reuse unicode_empty and latin1_char singleton everywhere
 * Fix a reference leak in _PyUnicode_TranslateCharmap()
 * PyUnicode_InternInPlace() doesn't try to get a singleton anymore, to avoid
   having to handle a failure
2011-11-22 01:22:34 +01:00
Victor Stinner f01245067a Rewrite PyUnicode_TransformDecimalToASCII() to use the new Unicode API 2011-11-21 23:12:56 +01:00
Victor Stinner 2d718f39a5 Remove an unused variable from PyUnicode_Copy() 2011-11-21 23:11:52 +01:00
Victor Stinner 87af4f2f3a Simplify PyUnicode_Copy()
USe PyUnicode_Copy() in fixup()
2011-11-21 23:03:47 +01:00
Victor Stinner 5bbe5e7c85 Fix a compiler warning in _PyUnicode_CheckConsistency() 2011-11-21 22:54:05 +01:00
Victor Stinner 42bf77537e Rewrite PyUnicode_EncodeDecimal() to use the new Unicode API
Add tests for PyUnicode_EncodeDecimal() and
PyUnicode_TransformDecimalToASCII().
2011-11-21 22:52:58 +01:00
Antoine Pitrou ce4a9da705 Issue #13411: memoryview objects are now hashable when the underlying object is hashable. 2011-11-21 20:46:33 +01:00
Antoine Pitrou 0a3229de6b Issue #13417: speed up utf-8 decoding by around 2x for the non-fully-ASCII case.
This almost catches up with pre-PEP 393 performance, when decoding needed
only one pass.
2011-11-21 20:39:13 +01:00
Victor Stinner da29cc36aa Issue #13441: _PyUnicode_CheckConsistency() dumps the string if the maximum
character is bigger than U+10FFFF and locale.localeconv() dumps the string
before decoding it.

Temporary hack to debug the issue #13441.
2011-11-21 14:31:41 +01:00
Victor Stinner 9e30aa52fd Fix misuse of PyUnicode_GET_SIZE() => PyUnicode_GET_LENGTH()
And PyUnicode_GetSize() => PyUnicode_GetLength()
2011-11-21 02:49:52 +01:00
Victor Stinner 53b33e767d UnicodeTranslateError uses the new Unicode API
The index is a character index, not a index in a Py_UNICODE* string.
2011-11-21 01:17:27 +01:00
Victor Stinner da1ddf37c6 UnicodeEncodeError uses the new Unicode API
The index is a character index, not a index in a Py_UNICODE* string.
2011-11-20 22:50:23 +01:00
Victor Stinner 4ead7c7be8 PyObject_Str() ensures that the result string is ready
and check the string consistency.

_PyUnicode_CheckConsistency() doesn't check the hash anymore. It should be
possible to call this function even if hash(str) was already called.
2011-11-20 19:48:36 +01:00
Victor Stinner 0fc35196bb stringlib: remove unused STRINGLIB_FILL 2011-11-20 19:30:15 +01:00