* unified the way intobject, longobject and mystrtoul handle
values around -sys.maxint-1.
* in general, trying to entierely avoid overflows in any computation
involving signed ints or longs is extremely involved. Fixed a few
simple cases where a compiler might be too clever (but that's all
guesswork).
* more overflow checks against bad data in marshal.c.
* 2.5 specific: fixed a number of places that were still confusing int
and Py_ssize_t. Some of them could potentially have caused
"real-world" breakage.
* list.pop(x): fixing overflow issues on x was messy. I just reverted
to PyArg_ParseTuple("n"), which does the right thing. (An obscure
test was trying to give a Decimal to list.pop()... doesn't make
sense any more IMHO)
* trying to write a few tests...
I modified this patch some by fixing style, some error checking, and adding
XXX comments. This patch requires review and some changes are to be expected.
I'm checking in now to get the greatest possible review and establish a
baseline for moving forward. I don't want this to hold up release if possible.
a new comment) suggests there are almost certainly large input
integers in all non-binary input bases for which one Python digit
too few is initally allocated to hold the final result. Instead
of assert-failing when that happens, allocate more space. Alas,
I estimate it would take a few days to find a specific such case,
so this isn't backed up by a new test (not to mention that such
a case may take hours to run, since conversion time is quadratic
in the number of digits, and preliminary attempts suggested that
the smallest such inputs contain at least a million digits).
The SIGCHECK macro defined here has always been bizarre, but
it apparently causes compiler warnings on "Sun Studio 11".
I believe the warnings are bogus, but it doesn't hurt to make
the macro definition saner.
Bugfix candidate (but I'm not going to bother).
both mystrtoul.c and longobject.c. Share the table instead. Also
cut its size by 64 entries (they had been used for an inscrutable
trick originally, but the code no longer tries to use that trick).
longobject.c: also fix an ssize_t problem
<a> could have been NULL, so hoist the size calc to not use <a>.
_ssl.c: under fail: self is DECREF'd, but it would have been NULL.
_elementtree.c: delete self if there was an error.
_csv.c: I'm not sure if lineterminator could have been anything other than
a string. However, other string method calls are checked, so check this
one too.
to avoid confusing situations like:
>>> int("")
ValueError: invalid literal for int():
>>> int("2\n\n2")
ValueError: invalid literal for int(): 2
2
Also report the base used, to avoid:
ValueError: invalid literal for int(): 2
They now report:
>>> int("")
ValueError: invalid literal for int() with base 10: ''
>>> int("2\n\n2")
ValueError: invalid literal for int() with base 10: '2\n\n2'
>>> int("2", 2)
ValueError: invalid literal for int() with base 2: '2'
(Reporting the base could be avoided when base is 10, which is the default,
but hrm.) Another effect of these changes is that the errormessage can be
longer; before, it was cut off at about 250 characters. Now, it can be up to
four times as long, as the unrepr'ed string is cut off at 200 characters,
instead.
No tests were added or changed, since testing for exact errormsgs is (pardon
the pun) somewhat errorprone, and I consider not testing the exact text
preferable. The actually changed code is tested frequent enough in the
test_builtin test as it is (120 runs for each of ints and longs.)
PyTypeObject structures, I had to make prototypes for the functions, and
move the structure definition ahead of the functions. I'd dearly like a better
way to do this - to change this would make for a massive set of changes to
the codebase.
There's still some warnings - this is purely to get rid of errors first.
to protect against actual uninitialized usage.
Objects/longobject.c: In function ‘PyLong_AsDouble’:
Objects/longobject.c:655: warning: ‘e’ may be used uninitialized in this function
Objects/longobject.c: In function ‘long_true_divide’:
Objects/longobject.c:2263: warning: ‘aexp’ may be used uninitialized in this function
Objects/longobject.c:2263: warning: ‘bexp’ may be used uninitialized in this function
In C++, it's an error to pass a string literal to a char* function
without a const_cast(). Rather than require every C++ extension
module to put a cast around string literals, fix the API to state the
const-ness.
I focused on parts of the API where people usually pass literals:
PyArg_ParseTuple() and friends, Py_BuildValue(), PyMethodDef, the type
slots, etc. Predictably, there were a large set of functions that
needed to be fixed as a result of these changes. The most pervasive
change was to make the keyword args list passed to
PyArg_ParseTupleAndKewords() to be a const char *kwlist[].
One cast was required as a result of the changes: A type object
mallocs the memory for its tp_doc slot and later frees it.
PyTypeObject says that tp_doc is const char *; but if the type was
created by type_new(), we know it is safe to cast to char *.
In addition, long_pow() skipped a necessary (albeit extremely unlikely
to trigger) error check when converting an int modulus to long.
Alas, I was unable to write a test case that crashed due to either
cause.
Bugfix candidate.
conversion using the proper magic slot (e.g., __int__()). Also move conversion
code out of PyNumber_*() functions in the C API into the nb_* function.
Applied patch #1109424. Thanks Walter Doewald.
This checkin is adapted from part 2 (of 3) of Trevor Perrin's patch set.
BACKWARD INCOMPATIBILITY: SHIFT must now be divisible by 5. AFAIK,
nobody will care. long_pow() could be complicated to worm around that,
if necessary.
long_pow():
- BUGFIX: This leaked the base and power when the power was negative
(and so the computation delegated to float pow).
- Instead of doing right-to-left exponentiation, do left-to-right. This
is more efficient for small bases, which is the common case.
- In addition, if the exponent is large (more than FIVEARY_CUTOFF
digits), precompute [a**i % c for i in range(32)], and go left to
right 5 bits at a time.
l_divmod():
- The signature changed so that callers who don't want the quotient,
or don't want the remainder, can pass NULL in the slot they don't
want. This saves them from having to declare a vrbl for unwanted
stuff, and remembering to decref it.
long_mod(), long_div(), long_classic_div():
- Adjust to new l_divmod() signature, and simplified as a result.
This checkin is adapted from part 1 (of 3) of Trevor Perrin's patch set.
x_mul()
- sped a little by optimizing the C
- sped a lot (~2X) if it's doing a square; note that long_pow() squares
often
k_mul()
- more cache-friendly now if it's doing a square
KARATSUBA_CUTOFF
- boosted; gradeschool mult is quicker now, and it may have been too low
for many platforms anyway
KARATSUBA_SQUARE_CUTOFF
- new
- since x_mul is a lot faster at squaring now, the point at which
Karatsuba pays for squaring is much higher than for general mult
Some version of gcc in the "RTEMS port running on the Coldfire (m5200)
processor" generates bad code for a loop in long_from_binary_base(),
comparing the wrong half of an int to a short. The patch changes the
decl of the short temp to be an int temp instead. This "simplifies"
the code enough that gcc no longer blows it.
New functions:
unsigned long PyInt_AsUnsignedLongMask(PyObject *);
unsigned PY_LONG_LONG) PyInt_AsUnsignedLongLongMask(PyObject *);
unsigned long PyLong_AsUnsignedLongMask(PyObject *);
unsigned PY_LONG_LONG) PyLong_AsUnsignedLongLongMask(PyObject *);
New and changed format codes:
b unsigned char 0..UCHAR_MAX
B unsigned char none **
h unsigned short 0..USHRT_MAX
H unsigned short none **
i int INT_MIN..INT_MAX
I * unsigned int 0..UINT_MAX
l long LONG_MIN..LONG_MAX
k * unsigned long none
L long long LLONG_MIN..LLONG_MAX
K * unsigned long long none
Notes:
* New format codes.
** Changed from previous "range-and-a-half" to "none"; the
range-and-a-half checking wasn't particularly useful.
New test test_getargs2.py, to verify all this.
wasn't used outside the assert (and hence caused a compiler warning
about an unused variable in NDEBUG mode). The assert wasn't very
useful any more.
_PyLong_NumBits(): moved the calculation of ndigits after asserting
that v != NULL.
Assorted code cleanups; e.g., sizeof(char) is 1 by definition, so there's
no need to do things like multiply by sizeof(char) in hairy malloc
arguments. Fixed an undetected-overflow bug in readline_file().
longobject.c: Fixed a really stupid bug in the new _PyLong_NumBits.
pickle.py: Fixed stupid bug in save_long(): When proto is 2, it
wrote LONG1 or LONG4, but forgot to return then -- it went on to
append the proto 1 LONG opcode too.
Fixed equally stupid cancelling bugs in load_long1() and
load_long4(): they *returned* the unpickled long instead of pushing
it on the stack. The return values were ignored. Tests passed
before only because save_long() pickled the long twice.
Fixed bugs in encode_long().
Noted that decode_long() is quadratic-time despite our hopes,
because long(string, 16) is still quadratic-time in len(string).
It's hex() that's linear-time. I don't know a way to make decode_long()
linear-time in Python, short of maybe transforming the 256's-complement
bytes into marshal's funky internal format, and letting marshal decode
that. It would be more valuable to make long(string, 16) linear time.
pickletester.py: Added a global "protocols" vector so tests can try
all the protocols in a sane way. Changed test_ints() and test_unicode()
to do so. Added a new test_long(), but the tail end of it is disabled
because it "takes forever" under pickle.py (but runs very quickly under
cPickle: cPickle proto 2 for longs is linear-time).
needs of pickling longs. Backed off to a definition that's much easier
to understand. The pickler will have to work a little harder, but other
uses are more likely to be correct <0.5 wink>.
_PyLong_Sign(): New teensy function to characterize a long, as to <0, ==0,
or >0.
types. The special handling for these can now be removed from save_newobj().
Add some testing for this.
Also add support for setting the 'fast' flag on the Python Pickler class,
which suppresses use of the memo.
start for the C implemention of new pickle LONG1 and LONG4 opcodes (the
linear-time way to pickle a long is to call _PyLong_AsByteArray, but
the caller has no idea how big an array to allocate, and correct
calculation is a bit subtle).
globals, _Py_Ticker and _Py_CheckInterval. This also implements Jeremy's
shortcut in Py_AddPendingCall that zeroes out _Py_Ticker. This allows the
test in the main loop to only test a single value.
The gory details are at
http://python.org/sf/602191