works just like the Unicode one. The C APIs match the ones in the Unicode
implementation, but were extended to be able to reuse the existing
Unicode codecs for string purposes too.
Conversions from string to Unicode and back are done using the
default encoding.
Fix the string methods that implement slice-like semantics with
optional args (count, find, endswith, etc.) to properly handle
indeces outside [INT_MIN, INT_MAX]. Previously the "i" formatter
for PyArg_ParseTuple was used to get the indices. These could overflow.
This patch changes the string methods to use the "O&" formatter with
the slice_index() function from ceval.c which is used to do the same
job for Python code slices (e.g. 'abcabcabc'[0:1000000000L]). slice_index()
is renamed _PyEval_SliceIndex() and is now exported. As well, the return
values for success/fail were changed to make slice_index directly
usable as required by the "O&" formatter.
[GvR: shouldn't a similar patch be applied to unicodeobject.c?]
gave bogus results for chars in the range 128-255, because their
implementation was using signed characters. Fixed this by using
unsigned character pointers (as opposed to using Py_CHARMASK()).
For more comments, read the patches@python.org archives.
For documentation read the comments in mymalloc.h and objimpl.h.
(This is not exactly what Vladimir posted to the patches list; I've
made a few changes, and Vladimir sent me a fix in private email for a
problem that only occurs in debug mode. I'm also holding back on his
change to main.c, which seems unnecessary to me.)
The maxsplit functionality in .splitlines() was replaced by the keepends
functionality which allows keeping the line end markers together
with the string.
Added support for '%r' % obj: this inserts repr(obj) rather
than str(obj).
* string_contains now calls PyUnicode_Contains() only when the other
operand is a Unicode string (not whenever it's not a string).
* New format style '%r' inserts repr(arg) instead of str(arg).
* '...%s...' % u"abc" now coerces to Unicode just like
string methods. Care is taken not to reevaluate already formatted
arguments -- only the first Unicode object appearing in the
argument mapping is looked up twice. Added test cases for
this to test_unicode.py.
Attached you find an update of the Unicode implementation.
The patch is against the current CVS version. I would appreciate
if someone with CVS checkin permissions could check the changes
in.
The patch contains all bugs and patches sent this week and also
fixes a leak in the codecs code and a bug in the free list code
for Unicode objects (which only shows up when compiling Python
with Py_DEBUG; thanks to MarkH for spotting this one).
specifier came from an int expression instead of a constant in the
format, a negative width was truncated to zero instead of taken to
mean the same as that negative constant plugged into the format. E.g.
"(%*s)" % (-5, "foo") yielded "(foo)" while "(%-5s)" yields "(foo )".
Now both yield the latter -- like sprintf() in C.
arbitrary nested parens in a %(...)X style format.
#Also folded two lines and added more detail to the error message for
#unsupported format character.
from the interned table. There are references in hard-to-find static
variables all over the interpreter, and it's not worth trying to get
rid of all those; but "uninterning" isn't fair either and may cause
subtle failures later -- so we have to keep them in the interned
table.
Also get rid of no-longer-needed insert of None in interned dict.