his copy of test_contains.py seems to be broken -- the lines he
deleted were already absent). Checkin messages:
New Unicode support for int(), float(), complex() and long().
- new APIs PyInt_FromUnicode() and PyLong_FromUnicode()
- added support for Unicode to PyFloat_FromString()
- new encoding API PyUnicode_EncodeDecimal() which converts
Unicode to a decimal char* string (used in the above new
APIs)
- shortcuts for calls like int(<int object>) and float(<float obj>)
- tests for all of the above
Unicode compares and contains checks:
- comparing Unicode and non-string types now works; TypeErrors
are masked, all other errors such as ValueError during
Unicode coercion are passed through (note that PyUnicode_Compare
does not implement the masking -- PyObject_Compare does this)
- contains now works for non-string types too; TypeErrors are
masked and 0 returned; all other errors are passed through
Better testing support for the standard codecs.
Misc minor enhancements, such as an alias dbcs for the mbcs codec.
Changes:
- PyLong_FromString() now applies the same error checks as
does PyInt_FromString(): trailing garbage is reported
as error and not longer silently ignored. The only characters
which may be trailing the digits are 'L' and 'l' -- these
are still silently ignored.
- string.ato?() now directly interface to int(), long() and
float(). The error strings are now a little different, but
the type still remains the same. These functions are now
ready to get declared obsolete ;-)
- PyNumber_Int() now also does a check for embedded NULL chars
in the input string; PyNumber_Long() already did this (and
still does)
Followed by:
Looks like I've gone a step too far there... (and test_contains.py
seem to have a bug too).
I've changed back to reporting all errors in PyUnicode_Contains()
and added a few more test cases to test_contains.py (plus corrected
the join() NameError).
Attached you find an update of the Unicode implementation.
The patch is against the current CVS version. I would appreciate
if someone with CVS checkin permissions could check the changes
in.
The patch contains all bugs and patches sent this week and also
fixes a leak in the codecs code and a bug in the free list code
for Unicode objects (which only shows up when compiling Python
with Py_DEBUG; thanks to MarkH for spotting this one).
This (1) avoids thread unsafety whereby another thread could zap the
list while we were using it, and (2) now supports writing arbitrary
sequences of strings.
Added wrapping macros to dictobject.c, listobject.c, tupleobject.c,
frameobject.c, traceback.c that safely prevends core dumps
on stack overflow. Macros and functions in object.c, object.h.
The method is an "elevator destructor" that turns cascading
deletes into tail recursive behavior when some limit is hit.
diagnostics.
*** INCOMPATIBLE CHANGE: This changes append(), remove(), index(), and
*** count() to require exactly one argument -- previously, multiple
*** arguments were silently assumed to be a tuple.
messages from "OverflowError: integer pow()" to "OverflowError:
integer exponentiation". (Not that this takes care of the complaint
in general that the error messages could be greatly improved. :-)
trailing 'L' is appended to the representation,
otherwise not.
All existing call sites are modified to pass true for
addL.
Remove incorrect statement about external use of this
function from elsewhere; it's static!
long_str(): Handler for the tp_str slot in the type object.
Identical to long_repr(), but passes false as the addL
parameter of long_format().
specifier came from an int expression instead of a constant in the
format, a negative width was truncated to zero instead of taken to
mean the same as that negative constant plugged into the format. E.g.
"(%*s)" % (-5, "foo") yielded "(foo)" while "(%-5s)" yields "(foo )".
Now both yield the latter -- like sprintf() in C.
1. Fixes float divmod so that the quotient it returns is always an integral
value.
2. Fixes float % and float divmod so that the remainder always gets the
right sign (the current code uses a "are the signs different?" test that
doesn't work half the time <wink> when the product of the divisor and the
remainder underflows to 0).
a block cannot be freed, add its free items back to the free list, and
add its valid ints back to the small_ints array if they are in range.
This is necessary to avoid leaking when Python is reinitialized later.
represented by an explicit structure. (There are still too many casts
in the code, but that may be unavoidable.)
Also added code so that with -vv it is very chatty about what it does.
buffer increment, and sometimes the new buffer size. Make it do what
its name says, and fix the one place where this matters to the caller.
Also add a comment explaining why we call lseek() and then ftell().
The MS compiler doesn't call it 'long long', it uses __int64,
so a new #define, LONG_LONG, has been added and all occurrences
of 'long long' are replaced with it.
Previously, this said "unsubscriptable object"; in 1.5.1, the reverse
problem existed, where None[''] would complain about a non-integer
index. This fix does the right thing in all cases (for get, set and
del item).
before calling it. This check was there when the objects were of the
same type *before* coercion, but not if they initially differed but
became the same *after* coercion.
Sparc Solaris 2.6 (fully patched!) that I don't want to dig into, but
which I suspect is a bug in the multithreaded malloc library that only
shows up when run on a multiprocessor. (The program wasn't using
threads, it was just using the multithreaded C library.)
faster (using PyList_GetSlice()). Also added a test for a NULL
argument, as with PySequence_Tuple(). (Hmm... Better names for these
two would be PyList_FromSequence() and PyTuple_FromSequence(). Oh well.)
"indefinite length" sequences. These should still have a length, but
the length is only used as a hint -- the actual length of the sequence
is determined by the item that raises IndexError, which may be either
smaller or larger than what len() returns. (This is a novelty; map(),
filter() and reduce() only allow the actual length to be larger than
what len() returns, not shorter. I'll fix that shortly.)
conversions. Formerly, for example, int('-') would return 0 instead
of raising ValueError, and int(' 0') would raise ValueError
(complaining about a null byte!) instead of 0...
+ Took the "list" argument out of the other functions that no longer need
it. This speeds things up a little more.
+ Small comment changes in accord with that.
+ Exploited the now-safe ability to cache values in the partitioning loop.
Makes no timing difference on my flavor of Pentium, but this machine ran out
of registers 12 iterations ago. It should yield a small speedup on a RISC
machine, and not hurt in any case.
instead of testing whether the list changed size after each
comparison, temporarily set the type of the list to an immutable list
type. This should allow continued use of the list for legitimate
purposes but disallows all operations that can change it in any way.
(Changes to the internals of list items are not caught, of cause;
that's not possible to detect, and it's not necessary to protect the
sort code, either.)
not in restricted mode.
__dict__ can be set to any dictionary; the cl_getattr, cl_setattr and
cl_delattr slots are refreshed.
__name__ can be set to any string.
__bases__ can be set to to a tuple of classes, provided they are not
subclasses of the class whose attribute is being assigned.
__getattr__, __setattr__ and __delattr__ can be set to anything, or
deleted; the appropriate slot (cl_getattr, cl_setattr, cl_delattr) is
refreshed.
(Note: __name__ really doesn't need to be a special attribute, but
that would be more work.)
From: "Tim Peters" <tim_one@email.msn.com>
To: "Guido van Rossum" <guido@CNRI.Reston.VA.US>
Date: Sat, 23 May 1998 21:45:53 -0400
Guido, the overflow checking in PyLong_AsLong is off a little:
1) If the C in use sign-extends right shifts on signed longs, there's a
spurious overflow error when converting the most-negative int:
Python 1.5.1 (#0, Apr 13 1998, 20:22:04) [MSC 32 bit (Intel)] on win32
Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam
>>> x = -1L << 31
>>> x
-2147483648L
>>> int(x)
Traceback (innermost last):
File "<stdin>", line 1, in ?
OverflowError: long int too long to convert
>>>
2) If C does not sign-extend, some genuine overflows won't be caught.
The attached should repair both, and, because I installed a new disk and a C
compiler today, it's even been compiled this time <wink>.
Python 1.5.1 (#0, May 23 1998, 20:24:58) [MSC 32 bit (Intel)] on win32
Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam
>>> x = -1L << 31
>>> x
-2147483648L
>>> int(x)
-2147483648
>>> int(-x)
Traceback (innermost last):
File "<stdin>", line 1, in ?
OverflowError: long int too long to convert
>>> int(-x-1)
2147483647
>>> int(x-1)
Traceback (innermost last):
File "<stdin>", line 1, in ?
OverflowError: long int too long to convert
>>>
end-casing-ly y'rs - tim
Make sure that no tp_as_numbers->nb_<whatever> function is called
without checking for a NULL pointer. Marc-Andre Lemburg will love it!
(Except that he's just rewritten all this code for a different
approach to coercions ;-( )
programming style.
Recoded many routines to incorporate better error checking, and/or
better versions of the same function found elsewhere
(e.g. bltinmodule.c or ceval.c). In particular,
Py_Number_{Int,Long,Float}() now convert from strings, just like the
built-in functions int(), long() and float().
Sequences and mappings are now safe to have NULL function pointers
anywhere in their tp_as_sequence or tp_as_mapping fields. (A few
places in other files need to be checked in too.)
Renamed PySequence_In() to PySequence_Contains().
clear_carefully() used to do in import.c. Differences: leave only
__builtins__ alone in the 2nd pass; and don't clear the dictionary (on
the theory that as long as there are references left to the
dictionary, those might be destructors that might expect __builtins__
to be alive when they run; and __builtins__ can't normally be part of
a cycle).
PyNumber_Coerce() except that when the coercion can't be done and no
other exceptions happen, it returns 1 instead of raising an
exception.
Use this function in PyObject_Compare() to avoid raising an exception
simply because two objects with numeric behavior can't be coerced to a
common type; instead, proceed with the non-numeric default comparison.
Note that this is a somewhat questionable practice -- comparisons for
numeric objects shouldn't default to random behavior like this, but it
is required for backward compatibility. (Case in point, it broke
comparison of kjDict objects to integers in Aaron Watters' kjbuckets
extension.) A correct fix (for python 2.0) should involve a different
definiton of comparison altogether.
sys.stdin.readline(), you get a fatal error (no current thread). This
is because there was a call to PyErr_CheckSignals() while there was no
current thread. I wonder how many more of these we find... I bnetter
go hunting for PyErr_CheckSignals() now...
in libmath.a so they are available to mathmodule.so (in case it is
shared). While this still gets triggered on Solaris 2.x, this appears
to be harmless there.
__getitem__(). This method never raises an exception; if the key is
not in the dictionary, the second (optional) argument is returned. If
the second argument is not provided and the key is missing, None is
returned.
mapp_methods: added "get" method.
arbitrary nested parens in a %(...)X style format.
#Also folded two lines and added more detail to the error message for
#unsupported format character.
former lets you give an instance a set of new instance vars. The
latter lets you give it a new class. Both are typechecked and
disallowed in restricted mode.
For classes, the check for read-only special attributes is tightened
so that only assignments to __dict__, __bases__, __name__,
__getattr__, __setattr__, and __delattr__ (these could be made to work
as well, but I don't know if that's useful -- let's see first whether
mucking with instances will help).
from the interned table. There are references in hard-to-find static
variables all over the interpreter, and it's not worth trying to get
rid of all those; but "uninterning" isn't fair either and may cause
subtle failures later -- so we have to keep them in the interned
table.
Also get rid of no-longer-needed insert of None in interned dict.
no valid directory is passed in. This prevents __del__ to fail when
invoked after __builtins__ has already been discarded.
Also add PyFrame_Fini() to discard the cache of frames.
In _Py_PrintReferences(), no longer suppress once-referenced string.
Add Py_Malloc and friends and PyMem_Malloc and friends (malloc
wrappers for third parties).
complexity saved much any more. A simple benchmark (grail) showed
that there were 3 times as many misses as hits, and the same number of
times again the code was bypassed altogether due to the existence of
setattro/getattro.
this many bytes have been read, readlines stops. Because of
buffering, the amount of bytes read is usually at least 8K more than
the hint.
Also changed read() and readline() to use PyArg_ParseTuple().
(Note that the *previous* checkin also fixed error handling and
narrowed the range of thread unblocking for all methods using
fread().)
see if we can guess the #bytes until the end of the file. If we
can't, increment the buffer size increments up to 0.5Meg to avoid
realloc'ing too much.
The table size is now constrained to be a power of two, and we use a
variable increment based on GF(2^n)-{0} (not that I have the faintest
idea what that is :-) which helps avoid the expensive '%' operation.
Some of the entries in the table of polynomials have been modified
according to a post by Tim Peters.
Rather than allocating a list object for the fast locals and another
(extensible one) for the value stack and allocating the block stack
dynamically, allocate the block stack with a fixed size (CO_MAXBLOCKS
from compile.h), and stick the locals and value stack at the end of
the object (this is now possible since the stack size is known
beforehand). Get rid of the owner field and the nvalues argument --
it is available in the code object, like nlocals.
This requires small changes in ceval.c only.