A Unicode string can now be a PyASCIIObject, PyCompactUnicodeObject or
PyUnicodeObject. Aliasing a PyASCIIObject* or PyCompactUnicodeObject* to
PyUnicodeObject* is wrong
Use PyUnicode_IS_ASCII instead of PyUnicode_IS_COMPACT_ASCII, so the following
test can be removed:
PyUnicode_DATA(op) == (((PyCompactUnicodeObject *)(op))->utf8)
* Create copy_characters() function which doesn't check for the maximum
character in release mode
* _PyUnicode_CheckConsistency() is no more static to be able to use it
in _PyUnicode_FormatAdvanced() (in formatter_unicode.c)
* _PyUnicode_CheckConsistency() checks the string hash
ucs1, ucs2 and ucs4 libraries have to scan created substring to find the
maximum character, whereas it is not need to ASCII strings. Because ASCII
strings are common, it is useful to optimize ASCII.
* str[a:b] doesn't scan the string for the maximum character if the string
is ascii only
* PyUnicode_FromKindAndData() stops if we are sure that we cannot use a
shorter character type. For example, _PyUnicode_FromUCS1() stops if we
have at least one character in range U+0080-U+00FF
* Change its prototype: PyObject* instead of PyUnicodeoObject*.
* Remove an old assertion, the result of PyUnicode_READY (_PyUnicode_Ready)
must be checked instead
types. Added a new API function, PyUnicode_TransformDecimalToASCII(),
which transforms non-ASCII decimal digits in a Unicode string to their
ASCII equivalents.
* PyUnicode_EncodeFSDefault(), PyUnicode_DecodeFSDefaultAndSize() and
PyUnicode_DecodeFSDefault() use the locale encoding instead of UTF-8 if
Py_FileSystemDefaultEncoding is NULL
* redecode_filenames() functions and _Py_code_object_list (issue #9630)
are no more needed: remove them
Database (Py_UNICODE_TOLOWER, Py_UNICODE_ISDECIMAL, and others) now accept
and return characters from the full Unicode range (Py_UCS4).
The differences from Python code are few:
- unicodedata.numeric(), unicodedata.decimal() and unicodedata.digit()
now return the correct value for large code points
- repr() may consider more characters as printable.
It's a ParseTuple converter: decode bytes objects to unicode using
PyUnicode_DecodeFSDefaultAndSize(); str objects are output as-is.
* Don't specify surrogateescape error handler in the comments nor the
documentation, but PyUnicode_DecodeFSDefaultAndSize() and
PyUnicode_EncodeFSDefault() because these functions use strict error handler
for the mbcs encoding (on Windows).
* Remove PyUnicode_FSConverter() comment in unicodeobject.c to avoid
inconsistency with unicodeobject.h.
object to Py_FileSystemDefaultEncoding with the "surrogateescape" error
handler, return a bytes object. If Py_FileSystemDefaultEncoding is not set,
fall back to UTF-8.
* Add paragraph titles to c-api/unicode.rst.
* Fix PyUnicode_DecodeFSDefault*() comment: it now uses the "surrogateescape"
error handler (and not "replace")
* Remove "The function is intended to be used for paths and file names only
during bootstrapping process where the codecs are not set up." from
PyUnicode_FSConverter() comment: it is used after the bootstrapping and for
other purposes than file names
svn+ssh://pythondev@svn.python.org/python/trunk
........
r72283 | antoine.pitrou | 2009-05-04 20:32:32 +0200 (lun., 04 mai 2009) | 4 lines
Issue #4426: The UTF-7 decoder was too strict and didn't accept some legal sequences.
Patch by Nick Barnes and Victor Stinner.
........
r72284 | antoine.pitrou | 2009-05-04 20:32:50 +0200 (lun., 04 mai 2009) | 3 lines
Add Nick Barnes to ACKS.
........
Addresses the float -> string conversion, using David Gay's code which
was added in Mark Dickinson's checkin r71663.
Also addresses these, which are intertwined with the short repr
changes:
- Issue #5772: format(1e100, '<') produces '1e+100', not '1.0e+100'
- Issue #5515: 'n' formatting with commas no longer works poorly
with leading zeros.
- PEP 378 Format Specifier for Thousands Separator: implemented
for floats.
This is incomplete, but I want to get some version into the next alpha. I am still working on:
Documentation.
More tests.
Implement for floats.
In addition, there's an existing bug with 'n' formatting that carries forward to thousands grouping (issue 5515).
svn+ssh://pythondev@svn.python.org/python/trunk
........
r68167 | vinay.sajip | 2009-01-02 12:53:04 -0600 (Fri, 02 Jan 2009) | 1 line
Minor documentation changes relating to NullHandler, the module used for handlers and references to ConfigParser.
........
r68276 | tarek.ziade | 2009-01-03 18:04:49 -0600 (Sat, 03 Jan 2009) | 1 line
fixed#1702551: distutils sdist was not pruning VCS directories under win32
........
r68292 | skip.montanaro | 2009-01-04 04:36:58 -0600 (Sun, 04 Jan 2009) | 3 lines
If user configures --without-gcc give preference to $CC instead of blindly
assuming the compiler will be "cc".
........
r68293 | tarek.ziade | 2009-01-04 04:37:52 -0600 (Sun, 04 Jan 2009) | 1 line
using clearer syntax
........
r68344 | marc-andre.lemburg | 2009-01-05 13:43:35 -0600 (Mon, 05 Jan 2009) | 7 lines
Fix#4846 (Py_UNICODE_ISSPACE causes linker error) by moving the declaration
into the extern "C" section.
Add a few more comments and apply some minor edits to make the file contents
fit the original structure again.
........
PyUnicode_AsStringAndSize -> _PyUnicode_AsStringAndSize to mark
them for interpreter internal use only.
We'll have to rework these APIs or create new ones for the
purpose of accessing the UTF-8 representation of Unicode objects
for 3.1.
The repr() of a string now contains printable Unicode characters unescaped.
The new ascii() builtin can be used to get a repr() with only ASCII characters in it.
PEP and patch were written by Atsuo Ishimoto.
Use faster PyUnicode_FromEncodedObject() for bytes/bytearray.decode().
Add new PyCodec_KnownEncoding() API.
Add new PyUnicode_AsDecodedUnicode() and PyUnicode_AsEncodedUnicode() APIs.
Add missing PyUnicode_AsDecodedObject() to unicodeobject.h
Fix punicode codec to also work on memoryviews.
svn+ssh://pythondev@svn.python.org/python/trunk
When forward porting this, I added _PyUnicode_InsertThousandsGrouping.
........
r63078 | eric.smith | 2008-05-11 15:52:48 -0400 (Sun, 11 May 2008) | 14 lines
Addresses issue 2802: 'n' formatting for integers.
Adds 'n' as a format specifier for integers, to mirror the same
specifier which is already available for floats. 'n' is the same as
'd', but inserts the current locale-specific thousands grouping.
I added this as a stringlib function, but it's only used by str type,
not unicode. This is because of an implementation detail in
unicode.format(), which does its own str->unicode conversion. But the
unicode version will be needed in 3.0, and it may be needed by other
code eventually in 2.6 (maybe decimal?), so I left it as a stringlib
implementation. As long as the unicode version isn't instantiated,
there's no overhead for this.
........