Also fixes two long-standing bugs (present in 2.0):
1. .join() didn't check that the result size fit in an int.
2. string.join(s) when len(s)==1 returned s[0] regardless of s[0]'s
type; e.g., "".join([3]) returned 3 (overly optimistic optimization).
I resisted a keen temptation to make .join() apply str() automagically.
in case the parameters are out of bounds and fixes error handling
for .count(), .startswith() and .endswith() for the case of
mixed string/Unicode objects.
This patch adds Python style index semantics to PyUnicode_Count()
indices (including the special handling of negative indices).
The patch is an extended version of patch #103249 submitted
by Michael Hudson (mwh) on SF. It also includes new test cases.
Add definitions of INT_MAX and LONG_MAX to pyport.h.
Remove includes of limits.h and conditional definitions of INT_MAX
and LONG_MAX elsewhere.
This closes SourceForge patch #101659 and bug #115323.
Note a curious extension to the std C rules: x, X and o formatting can never produce
a sign character in C, so the '+' and ' ' flags are meaningless for them. But
unbounded ints *can* produce a sign character under these conversions (no fixed-
width bitstring is wide enough to hold all negative values in 2's-comp form). So
these flags become meaningful in Python when formatting a Python long which is too
big to fit in a C long. This required shuffling around existing code, which hacked
x and X conversions to death when both the '#' and '0' flags were specified: the
hacks weren't strong enough to deal with the simultaneous possibility of the ' ' or
'+' flags too, since signs were always meaningless before for x and X conversions.
Isomorphic shuffling was required in unicodeobject.c.
Also added dozens of non-trivial new unbounded-int test cases to test_format.py.
which implements the automatic conversion from Unicode to a string
object using the default encoding.
The new API is then put to use to have eval() and exec accept
Unicode objects as code parameter. This closes bugs #110924
and #113890.
As side-effect, the traditional C APIs PyString_Size() and
PyString_AsString() will also accept Unicode objects as
parameters.
all, either to see whether the # of chars fit in an int, or that the
amount of memory needed fit in a size_t. Checking these is expensive, but
the alternative is silently wrong answers (as in the bug report) or
core dumps (which were easy to provoke using Unicode strings).
shutdown time, but CVS log entry for revision 2.45 explains why this
is so. Simply include a comment so we don't have to re-figure it out
again 5 years from now.
comments, docstrings or error messages. I fixed two minor things in
test_winreg.py ("didn't" -> "Didn't" and "Didnt" -> "Didn't").
There is a minor style issue involved: Guido seems to have preferred English
grammar (behaviour, honour) in a couple places. This patch changes that to
American, which is the more prominent style in the source. I prefer English
myself, so if English is preferred, I'd be happy to supply a patch myself ;)
use PyString_AS_STRING macro on local string object
when resizing string, make sure resized string will always be big enough
split string containing error message across two lines
add test to string_tests that causes resizing
seqlen==1 clause, before returning item, we need to DECREF seq. In
the res=PyString... failure clause, we need to goto finally to also
decref seq (and the DECREF of res in finally is changed to a
XDECREF). Also, we need to DECREF seq just before the
PyUnicode_Join() return.
implementation -- use PySequence_Fast interface to iterate over elements
interface -- if instance object reports wrong length, ignore it;
previous version raised an IndexError if reported length was too high
was cascades of warnings about mismatching const decls. Overall,
I think const creates lots of headaches and solves almost
nothing. Added enough consts to shut up the warnings, but
this did require casting away const in one spot too (another
usual outcome of starting down this path): the function
mymemreplace can't return const char*, but sometimes wants to
return its first argument as-is, which latter must be declared
const char* in order to avoid const warnings at mymemreplace's
call sites. So, in the case the function wants to return the
first arg, that arg's declared constness must be subverted.
works just like the Unicode one. The C APIs match the ones in the Unicode
implementation, but were extended to be able to reuse the existing
Unicode codecs for string purposes too.
Conversions from string to Unicode and back are done using the
default encoding.
Fix the string methods that implement slice-like semantics with
optional args (count, find, endswith, etc.) to properly handle
indeces outside [INT_MIN, INT_MAX]. Previously the "i" formatter
for PyArg_ParseTuple was used to get the indices. These could overflow.
This patch changes the string methods to use the "O&" formatter with
the slice_index() function from ceval.c which is used to do the same
job for Python code slices (e.g. 'abcabcabc'[0:1000000000L]). slice_index()
is renamed _PyEval_SliceIndex() and is now exported. As well, the return
values for success/fail were changed to make slice_index directly
usable as required by the "O&" formatter.
[GvR: shouldn't a similar patch be applied to unicodeobject.c?]
gave bogus results for chars in the range 128-255, because their
implementation was using signed characters. Fixed this by using
unsigned character pointers (as opposed to using Py_CHARMASK()).
For more comments, read the patches@python.org archives.
For documentation read the comments in mymalloc.h and objimpl.h.
(This is not exactly what Vladimir posted to the patches list; I've
made a few changes, and Vladimir sent me a fix in private email for a
problem that only occurs in debug mode. I'm also holding back on his
change to main.c, which seems unnecessary to me.)
The maxsplit functionality in .splitlines() was replaced by the keepends
functionality which allows keeping the line end markers together
with the string.
Added support for '%r' % obj: this inserts repr(obj) rather
than str(obj).
* string_contains now calls PyUnicode_Contains() only when the other
operand is a Unicode string (not whenever it's not a string).
* New format style '%r' inserts repr(arg) instead of str(arg).
* '...%s...' % u"abc" now coerces to Unicode just like
string methods. Care is taken not to reevaluate already formatted
arguments -- only the first Unicode object appearing in the
argument mapping is looked up twice. Added test cases for
this to test_unicode.py.
Attached you find an update of the Unicode implementation.
The patch is against the current CVS version. I would appreciate
if someone with CVS checkin permissions could check the changes
in.
The patch contains all bugs and patches sent this week and also
fixes a leak in the codecs code and a bug in the free list code
for Unicode objects (which only shows up when compiling Python
with Py_DEBUG; thanks to MarkH for spotting this one).
specifier came from an int expression instead of a constant in the
format, a negative width was truncated to zero instead of taken to
mean the same as that negative constant plugged into the format. E.g.
"(%*s)" % (-5, "foo") yielded "(foo)" while "(%-5s)" yields "(foo )".
Now both yield the latter -- like sprintf() in C.
arbitrary nested parens in a %(...)X style format.
#Also folded two lines and added more detail to the error message for
#unsupported format character.
from the interned table. There are references in hard-to-find static
variables all over the interpreter, and it's not worth trying to get
rid of all those; but "uninterning" isn't fair either and may cause
subtle failures later -- so we have to keep them in the interned
table.
Also get rid of no-longer-needed insert of None in interned dict.
entirely redone operator overloading. The rules for class
instances are now much more relaxed than for other built-in types
(whose coerce must still return two objects of the same type)
* Objects/floatobject.c: add overflow check when converting float
to int and implement truncation towards zero using ceil/float
* Objects/longobject.c: change ValueError to OverflowError when
converting to int
* Objects/rangeobject.c: modernized
* Objects/stringobject.c: use HAVE_LIMITS instead of __STDC__
* Objects/xxobject.c: changed to use new style (not finished?)
image objects, and lots of new methods.
* Added counting of allocations and deallocations of builtin types if
COUNT_ALLOCS is defined. Had to move calls to NEWREF down in some
files.
* Bug fix in sorting lists.
Added $(SYSDEF) to its build rule in Makefile.
* cgensupport.[ch], modsupport.[ch]: removed some old stuff. Also
changed files that still used it... And made several things static
that weren't but should have been... And other minor cleanups...
* listobject.[ch]: add external interfaces {set,get}listslice
* socketmodule.c: fix bugs in new send() argument parsing.
* sunaudiodevmodule.c: added flush() and close().
before it.
* ceval.c, object.c: moved testbool() to object.c (now extern visible)
* stringobject.c: fix bugs in and rationalize string resize in formatstring()
* tokenizer.[ch]: fix non-working code for lines longer than BUFSIZ
* Stubs for faster implementation of local variables (not yet finished)
* Added function name to code object. Print it for code and function
objects. THIS MAKES THE .PYC FILE FORMAT INCOMPATIBLE (the version
number has changed accordingly)
* Print address of self for built-in methods
* New internal functions getattro and setattro (getattr/setattr with
string object arg)
* Replaced "dictobject" with more powerful "mappingobject"
* New per-type functio tp_hash to implement arbitrary object hashing,
and hashobject() to interface to it
* Added built-in functions hash(v) and hasattr(v, 'name')
* classobject: made some functions static that accidentally weren't;
added __hash__ special instance method to implement hash()
* Added proper comparison for built-in methods and functions
* Fixcprt.py: added [-y file] option, do only files younger than file.
* modsupport.[ch]: added vmkvalue().
* intobject.c: use mkvalue().
* stringobject.c: added "formatstring"; renamed string* to string_*;
ceval.c: call formatstring for string % value.
* longobject.c: close memory leak in divmod.
* parsetok.c: set result node to NULL when returning an error.