* Rename unicode_resizable() to unicode_modifiable()
* Rename _PyUnicode_Dirty() to unicode_check_modifiable() to make it clear
that the function is private
* Inline PyUnicode_Concat() and unicode_append_inplace() in PyUnicode_Append()
to simplify the code
* unicode_modifiable() return 0 if the hash has been computed or if the string
is not an exact unicode string
* Remove _PyUnicode_DIRTY(): no need to reset the hash anymore, because if the
hash has already been computed, you cannot modify a string inplace anymore
* PyUnicode_Concat() checks for integer overflow
* Remove micro-optimization from PyUnicode_FromStringAndSize():
PyUnicode_DecodeUTF8Stateful() has already these optimizations (for size=0
and one ascii char).
* Rename utf8_max_char_size_and_char_count() to utf8_scanner(), and remove an
useless variable
Test the following functions:
* codecs.raw_unicode_escape_decode()
* PyUnicode_FromWideChar()
* PyUnicode_FromUnicode()
* "unicode_internal" and "unicode_escape" decoders
Skip locales triggering the mbstowcs() bug. I collected the locale list thanks
my previous commit:
* hu_HU (ISO8859-2): character U+30000020
* de_AT (ISO8859-1): character U+30000076
* cs_CZ (ISO8859-2): character U+30000020
* sk_SK (ISO8859-2): character U+30000020
* pl_PL (ISO8859-2): character U+30000020
* fr_CA (ISO8859-1): character U+30000020
a mbstowcs() bug. For example, on Solaris, the hu_HU locale uses the locale
encoding ISO-8859-2, the thousauds separator is b'\xA0' and it is decoded as
U+30000020 (an invalid character) by mbstowcs().
The workaround is not enabled yet (commented): I would like first to get
more information about the failing locales.
bug. On Solaris, if the locale is hu_HU (and if the locale encoding is not
UTF-8), the thousauds separator is b'\xA0' which is decoded as U+30000020
instead of U+0020 by mbstowcs().
Australian Eastern Standard Time (UTC+10) is called "EST" (as Eastern Standard
Time, UTC-5) instead of "AEST" on some operating systems (e.g. FreeBSD), which
is wrong. See for example this bug:
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=93810
Australian Eastern Standard Time (UTC+10) is called "EST" (as Eastern Standard
Time, UTC-5) instead of "AEST" on some operating systems (e.g. FreeBSD), which
is wrong. See for example this bug:
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=93810