Victor Stinner
3fe553160c
Add a new PyUnicode_Fill() function
...
It is faster than the unicode_fill() function which was implemented in
formatter_unicode.c.
2012-01-04 00:33:50 +01:00
Benjamin Peterson
5e458f520c
also decref the right thing
2012-01-02 10:12:13 -06:00
Benjamin Peterson
4c13a4a352
ready the correct string
2012-01-02 09:07:38 -06:00
Benjamin Peterson
22a29708fd
fix some possible refleaks from PyUnicode_READY error conditions
2012-01-02 09:00:30 -06:00
Benjamin Peterson
9ca3ffac94
== -1 is convention
2012-01-01 16:04:29 -06:00
Benjamin Peterson
e157cf1012
make switch more robust
2012-01-01 15:56:20 -06:00
Benjamin Peterson
c0b95d18fa
4 space indentation
2011-12-20 17:24:05 -06:00
Benjamin Peterson
ead6b53659
fix spacing around switch statements
2011-12-20 17:23:42 -06:00
Benjamin Peterson
822c790527
merge 3.2
2011-12-20 13:32:50 -06:00
Benjamin Peterson
53aa1d7c57
fix possible if unlikely leak
2011-12-20 13:29:45 -06:00
Victor Stinner
6099a03202
Issue #13624 : Write a specialized UTF-8 encoder to allow more optimization
...
The main bottleneck was the PyUnicode_READ() macro.
2011-12-18 14:22:26 +01:00
Victor Stinner
73f53b57d1
Optimize str * n for len(str)==1 and UCS-2 or UCS-4
2011-12-18 03:26:31 +01:00
Victor Stinner
f644110816
Issue #13621 : Optimize str.replace(char1, char2)
...
Use findchar() which is more optimized than a dummy loop using
PyUnicode_READ(). PyUnicode_READ() is a complex and slow macro.
2011-12-18 02:43:08 +01:00
Victor Stinner
ab870218e3
Issue #10951 : Fix compiler warnings in timemodule.c and unicodeobject.c
...
Thanks Jérémy Anger for the fix.
2011-12-17 22:39:43 +01:00
Victor Stinner
2f197078fb
The locale decoder raises a UnicodeDecodeError instead of an OSError
...
Search the invalid character using mbrtowc().
2011-12-17 07:08:30 +01:00
Victor Stinner
1b57967b96
Issue #13560 : Locale codec functions use the classic "errors" parameter,
...
instead of surrogateescape
So it would be possible to support more error handlers later.
2011-12-17 05:47:23 +01:00
Victor Stinner
ab59594326
What's New in Python 3.3: complete the deprecation list
...
Add also FIXMEs in unicodeobject.c
2011-12-17 04:59:06 +01:00
Victor Stinner
1f33f2b0c3
Issue #13560 : os.strerror() now uses the current locale encoding instead of UTF-8
2011-12-17 04:45:09 +01:00
Victor Stinner
f2ea71fcc8
Issue #13560 : Add PyUnicode_EncodeLocale()
...
* Use PyUnicode_EncodeLocale() in time.strftime() if wcsftime() is not
available
* Document my last changes in Misc/NEWS
2011-12-17 04:13:41 +01:00
Victor Stinner
af02e1c85a
Add PyUnicode_DecodeLocaleAndSize() and PyUnicode_DecodeLocale()
...
* PyUnicode_DecodeLocaleAndSize() and PyUnicode_DecodeLocale() decode a string
from the current locale encoding
* _Py_char2wchar() writes an "error code" in the size argument to indicate
if the function failed because of memory allocation failure or because of a
decoding error. The function doesn't write the error message directly to
stderr.
* Fix time.strftime() (if wcsftime() is missing): decode strftime() result
from the current locale encoding, not from the filesystem encoding.
2011-12-16 23:56:01 +01:00
Victor Stinner
16e6a80923
PyUnicode_Resize(): warn about canonical representation
...
Call also directly unicode_resize() in unicodeobject.c
2011-12-12 13:24:15 +01:00
Victor Stinner
b0a82a6a7f
Fix PyUnicode_Resize() for compact string: leave the string unchanged on error
...
Fix also PyUnicode_Resize() doc
2011-12-12 13:08:33 +01:00
Victor Stinner
bf6e560d0c
Make PyUnicode_Copy() private => _PyUnicode_Copy()
...
Undocument the function.
Make also decode_utf8_errors() as private (static).
2011-12-12 01:53:47 +01:00
Victor Stinner
7a9105a380
resize_copy() now supports legacy ready strings
2011-12-12 00:13:42 +01:00
Victor Stinner
488fa49acf
Rewrite PyUnicode_Append(); unicode_modifiable() is more strict
...
* Rename unicode_resizable() to unicode_modifiable()
* Rename _PyUnicode_Dirty() to unicode_check_modifiable() to make it clear
that the function is private
* Inline PyUnicode_Concat() and unicode_append_inplace() in PyUnicode_Append()
to simplify the code
* unicode_modifiable() return 0 if the hash has been computed or if the string
is not an exact unicode string
* Remove _PyUnicode_DIRTY(): no need to reset the hash anymore, because if the
hash has already been computed, you cannot modify a string inplace anymore
* PyUnicode_Concat() checks for integer overflow
2011-12-12 00:01:39 +01:00
Victor Stinner
c4b495497a
Create unicode_result_unchanged() subfunction
2011-12-11 22:44:26 +01:00
Victor Stinner
eaab604829
Fix fixup() for unchanged unicode subtype
...
If maxchar_new == 0 and self is a unicode subtype, return u instead of duplicating u.
2011-12-11 22:22:39 +01:00
Victor Stinner
e6b2d4407a
unicode_fromascii() doesn't check string content twice in debug mode
...
_PyUnicode_CheckConsistency() also checks string content.
2011-12-11 21:54:30 +01:00
Victor Stinner
a1d12bb119
Call directly PyUnicode_DecodeUTF8Stateful() instead of PyUnicode_DecodeUTF8()
...
* Remove micro-optimization from PyUnicode_FromStringAndSize():
PyUnicode_DecodeUTF8Stateful() has already these optimizations (for size=0
and one ascii char).
* Rename utf8_max_char_size_and_char_count() to utf8_scanner(), and remove an
useless variable
2011-12-11 21:53:09 +01:00
Victor Stinner
382955ff4e
Use directly unicode_empty instead of PyUnicode_New(0, 0)
2011-12-11 21:44:00 +01:00
Victor Stinner
785938eebd
Move the slowest UTF-8 decoder to its own subfunction
...
* Create decode_utf8_errors()
* Reuse unicode_fromascii()
* decode_utf8_errors() doesn't refit at the beginning
* Remove refit_partial_string(), use unicode_adjust_maxchar() instead
2011-12-11 20:09:03 +01:00
Victor Stinner
84def3774d
Fix error handling in resize_compact()
2011-12-11 20:04:56 +01:00
Victor Stinner
8faf8216e4
PyUnicode_FromWideChar() and PyUnicode_FromUnicode() raise a ValueError if a
...
character in not in range [U+0000; U+10ffff].
2011-12-08 22:14:11 +01:00
Victor Stinner
551ac95733
Py_UNICODE_HIGH_SURROGATE() and Py_UNICODE_LOW_SURROGATE() macros
...
And use surrogates macros everywhere in unicodeobject.c
2011-11-29 22:58:13 +01:00
Victor Stinner
6345be9a14
Close #13093 : PyUnicode_EncodeDecimal() doesn't support error handlers
...
different than "strict" anymore. The caller was unable to compute the
size of the output buffer: it depends on the error handler.
2011-11-25 20:09:01 +01:00
Benjamin Peterson
1518e8713d
and back to the "magic" formula (with a comment) it is
2011-11-23 10:44:52 -06:00
Benjamin Peterson
5944c36931
cave to those who like readable code
2011-11-22 19:05:49 -06:00
Benjamin Peterson
0268675193
fix compiler warning by implementing this more cleverly
2011-11-22 15:29:32 -05:00
Victor Stinner
ca4f20782e
find_maxchar_surrogates() reuses surrogate macros
2011-11-22 03:38:40 +01:00
Victor Stinner
0d3721d986
Issue #13441 : Disable temporary the check on the maximum character until
...
the Solaris issue is solved.
But add assertion on the maximum character in various encoders: UTF-7, UTF-8,
wide character (wchar_t*, Py_UNICODE*), unicode-escape, raw-unicode-escape.
Fix also unicode_encode_ucs1() for backslashreplace error handler: Python is
now always "wide".
2011-11-22 03:27:53 +01:00
Victor Stinner
f8facacf30
Fix compiler warnings
2011-11-22 02:30:47 +01:00
Victor Stinner
b84d723509
(Merge 3.2) Issue #13093 : Fix error handling on PyUnicode_EncodeDecimal()
2011-11-22 01:50:07 +01:00
Victor Stinner
cfed46e00a
PyUnicode_FromKindAndData() fails with a ValueError if size < 0
2011-11-22 01:29:14 +01:00
Victor Stinner
42885206ec
UTF-8 decoder: set consumed value in the latin1 fast-path
2011-11-22 01:23:02 +01:00
Victor Stinner
d3df8ab377
Replace _PyUnicode_READY_REPLACE() and _PyUnicode_ReadyReplace() with unicode_ready()
...
* unicode_ready() has a simpler API
* try to reuse unicode_empty and latin1_char singleton everywhere
* Fix a reference leak in _PyUnicode_TranslateCharmap()
* PyUnicode_InternInPlace() doesn't try to get a singleton anymore, to avoid
having to handle a failure
2011-11-22 01:22:34 +01:00
Victor Stinner
f01245067a
Rewrite PyUnicode_TransformDecimalToASCII() to use the new Unicode API
2011-11-21 23:12:56 +01:00
Victor Stinner
2d718f39a5
Remove an unused variable from PyUnicode_Copy()
2011-11-21 23:11:52 +01:00
Victor Stinner
87af4f2f3a
Simplify PyUnicode_Copy()
...
USe PyUnicode_Copy() in fixup()
2011-11-21 23:03:47 +01:00
Victor Stinner
5bbe5e7c85
Fix a compiler warning in _PyUnicode_CheckConsistency()
2011-11-21 22:54:05 +01:00
Victor Stinner
42bf77537e
Rewrite PyUnicode_EncodeDecimal() to use the new Unicode API
...
Add tests for PyUnicode_EncodeDecimal() and
PyUnicode_TransformDecimalToASCII().
2011-11-21 22:52:58 +01:00