classes was called with three arguments. This makes no sense, there's
no way to pass in the "modulo" 3rd argument as for __pow__, and
classic classes don't do this. [SF bug 620179]
I don't want to backport this to 2.2.2, because it could break
existing code that has developed a work-around. Code in 2.2.2 that
wants to use __ipow__ and wants to be forward compatible with 2.3
should be written like this:
def __ipow__(self, exponent, modulo=None):
...
macros. The 'op' argument is then the result from PyObject_MALLOC,
and that can of course be NULL. In that case, PyObject_Init[Var]
would raise a SystemError with "NULL object passed to
PyObject_Init[Var]". But there's nothing the caller of the macro can
do about this. So PyObject_Init[Var] should call just PyErr_NoMemory.
Will backport.
'%2147483647d' % -123 segfaults. This was because an integer overflow
in a comparison caused the string resize to be skipped. After fixing
the overflow, this could call _PyString_Resize() with a negative size,
so I (1) test for that and raise MemoryError instead; (2) also added a
test for negative newsize to _PyString_Resize(), raising SystemError
as for all bad arguments.
An identical bug existed in unicodeobject.c, of course.
Will backport to 2.2.2.
Also fixed an error message -- %s argument has non-string str()
doesn't make sense for %r, so the error message now differentiates
between %s and %r.
because PyObject_Repr() and PyObject_Str() ensure that this can never
happen. Added a helpful comment instead.
sees a Unicode argument. Unfortunately this test was also executed
for %r, because %s and %r share almost all of their code. This meant
that, if u is a unicode object while repr(u) is an 8-bit string
containing ASCII characters, '%r' % u is a *unicode* string containing
only ASCII characters!
Fixed by executing the test only for %s.
Also fixed an error message -- %s argument has non-string str()
doesn't make sense for %r, so the error message now differentiates
between %s and %r.
but returns r->len which is a long. This doesn't even cause a warning
on 32-bit platforms, but can return bogus values on 64-bit platforms
(and should cause a compiler warning). Fix this by inserting a range
check when LONG_MAX != INT_MAX, and adding an explicit cast to (int)
when the test passes. When r->len is out of range, PySequence_Size()
and hence len() will report an error (but an iterator will still
work).
Unicode strings (with arbitrary length) are allowed
as entries in the unicode.translate mapping.
Add a test case for multicharacter replacements.
(Multicharacter replacements were enabled by the
PEP 293 patch)
globals, _Py_Ticker and _Py_CheckInterval. This also implements Jeremy's
shortcut in Py_AddPendingCall that zeroes out _Py_Ticker. This allows the
test in the main loop to only test a single value.
The gory details are at
http://python.org/sf/602191
of PyString_DecodeEscape(). This prevents a call to
_PyString_Resize() for the empty string, which would
result in a PyErr_BadInternalCall(), because the
empty string has more than one reference.
This closes SF bug http://www.python.org/sf/603937
possible. This always called PyUnicode_Check() and PyString_Check(),
at least one of which would call PyType_IsSubtype(). Also, this would
call PyString_Size() on known string objects.
wrong thing for a unicode subclass when there were zero string
replacements. The example given in the SF bug report was only one way
to trigger this; replacing a string of length >= 2 that's not found is
another. The code would actually write outside allocated memory if
replacement string was longer than the search string.
(I wonder how many more of these are lurking? The unicode code base
is full of wonders.)
Bugfix candidate; this same bug is present in 2.2.1.
SHIFT and MASK, and widen digit. One problem is that code of the form
digit << small_integer
implicitly assumes that the result fits in an int or unsigned int
(platform-dependent, but "int sized" in any case), since digit is
promoted "just" to int or unsigned via the usual integer promotions.
But if digit is typedef'ed as unsigned int, this loses information.
The cure for this is just to cast digit to twodigits first.
interning. I modified Oren's patch significantly, but the basic idea
and most of the implementation is unchanged. Interned strings created
with PyString_InternInPlace() are now mortal, and you must keep a
reference to the resulting string around; use the new function
PyString_InternImmortal() to create immortal interned strings.
comments everywhere that bugged me: /* Foo is inlined */ instead of
/* Inline Foo */. Somehow the "is inlined" phrase always confused me
for half a second (thinking, "No it isn't" until I added the missing
"here"). The new phrase is hopefully unambiguous.
expensive and overly general PyObject_IsInstance(), call
PyObject_TypeCheck() which is a macro that often avoids a call, and if
it does make a call, calls the much more efficient PyType_IsSubtype().
This saved 6% on a benchmark for slot lookups.
-- replace then with slightly faster PyObject_Call(o,a,NULL). (The
difference is that the latter requires a to be a tuple; the former
allows other values and wraps them in a tuple if necessary; it
involves two more levels of C function calls to accomplish all that.)
rigorous instead of hoping for testing not to turn up counterexamples.
Call me heretical, but despite that I'm wholly confident in the proof,
and have done it two different ways now, I still put more faith in
testing ...
[ 587993 ] SET_LINENO killer
Remove SET_LINENO. Tracing is now supported by inspecting co_lnotab.
Many sundry changes to document and adapt to this change.
ah*bh and al*bl. This is much easier than explaining why that's true
for (ah+al)*(bh+bl), and follows directly from the simple part of the
(ah+al)*(bh+bl) explanation.
space is no longer needed, so removed the code. It was only possible when
a degenerate (ah->ob_size == 0) split happened, but after that fix went
in I added k_lopsided_mul(), which saves the body of k_mul() from seeing
a degenerate split. So this removes code, and adds a honking long comment
block explaining why spilling out of bounds isn't possible anymore. Note:
ff we end up spilling out of bounds anyway <wink>, an assert in v_iadd()
is certain to trigger.
(rev. 2.86). The other type is only disqualified from sq_repeat when
it has the CHECKTYPES flag. This means that for extension types that
only support "old-style" numeric ops, such as Zope 2's ExtensionClass,
sq_repeat still trumps nb_multiply.
k_mul() when inputs have vastly different sizes, and a little more
efficient when they're close to a factor of 2 out of whack.
I consider this done now, although I'll set up some more correctness
tests to run overnight.
cases, overflow the allocated result object by 1 bit. In such cases,
it would have been brought back into range if we subtracted al*bl and
ah*bh from it first, but I don't want to do that because it hurts cache
behavior. Instead we just ignore the excess bit when it appears -- in
effect, this is forcing unsigned mod BASE**(asize + bsize) arithmetic
in a case where that doesn't happen all by itself.
1. You can now have __dict__ and/or __weakref__ in your __slots__
(before only __weakref__ was supported). This is treated
differently than before: it merely sets a flag that the object
should support the corresponding magic.
2. Dynamic types now always have descriptors __dict__ and __weakref__
thrust upon them. If the type in fact does not support one or the
other, that descriptor's __get__ method will raise AttributeError.
3. (This is the reason for all this; it fixes SF bug 575229, reported
by Cesar Douady.) Given this code:
class A(object): __slots__ = []
class B(object): pass
class C(A, B): __slots__ = []
the class object for C was broken; its size was less than that of
B, and some descriptors on B could cause a segfault. C now
correctly inherits __weakrefs__ and __dict__ from B, even though A
is the "primary" base (C.__base__ is A).
4. Some code cleanup, and a few comments added.
algorithm. MSVC 6 wasn't impressed <wink>.
Something odd: the x_mul algorithm appears to get substantially worse
than quadratic time as the inputs grow larger:
bits in each input x_mul time k_mul time
------------------ ---------- ----------
15360 0.01 0.00
30720 0.04 0.01
61440 0.16 0.04
122880 0.64 0.14
245760 2.56 0.40
491520 10.76 1.23
983040 71.28 3.69
1966080 459.31 11.07
That is, x_mul is perfectly quadratic-time until a little burp at
2.56->10.76, and after that goes to hell in a hurry. Under Karatsuba,
doubling the input size "should take" 3 times longer instead of 4, and
that remains the case throughout this range. I conclude that my "be nice
to the cache" reworkings of k_mul() are paying.
correct now, so added some final comments, did some cleanup, and enabled
it for all long-int multiplies. The KARAT envar no longer matters,
although I left some #if 0'ed code in there for my own use (temporary).
k_mul() is still much slower than x_mul() if the inputs have very
differenent sizes, and that still needs to be addressed.
(it's possible, but should be harmless -- this requires more thought,
and allocating enough space in advance to prevent it requires exactly
as much thought, to know exactly how much that is -- the end result
certainly fits in the allocated space -- hmm, but that's really all
the thought it needs! borrows/carries out of the high digits really
are harmless).
k_mul(): This didn't allocate enough result space when one input had
more than twice as many bits as the other. This was partly hidden by
that x_mul() didn't normalize its result.
The Karatsuba recurrence is pretty much hosed if the inputs aren't
roughly the same size. If one has at least twice as many bits as the
other, we get a degenerate case where the "high half" of the smaller
input is 0. Added a special case for that, for speed, but despite that
it helped, this can still be much slower than the "grade school" method.
It seems to take a really wild imbalance to trigger that; e.g., a
2**22-bit input times a 1000-bit input on my box runs about twice as slow
under k_mul than under x_mul. This still needs to be addressed.
I'm also not sure that allocating a->ob_size + b->ob_size digits is
enough, given that this is computing k = (ah+al)*(bh+bl) instead of
k = (ah-al)*(bl-bh); i.e., it's certainly enough for the final result,
but it's vaguely possible that adding in the "artificially" large k may
overflow that temporarily. If so, an assert will trigger in the debug
build, but we'll probably compute the right result anyway(!).
addition and subtraction. Reworked the tail end of k_mul() to use them.
This saves oodles of one-shot longobject allocations (this is a triply-
recursive routine, so saving one allocation in the body saves 3**n
allocations at depth n; we actually save 2 allocations in the body).
SF 560379: Karatsuba multiplication.
Lots of things were changed from that. This needs a lot more testing,
for correctness and speed, the latter especially when bit lengths are
unbalanced. For now, the Karatsuba code gets invoked if and only if
envar KARAT exists.
currently return inconsistent results for ints and longs; in
particular: hex/oct/%u/%o/%x/%X of negative short ints, and x<<n that
either loses bits or changes sign. (No warnings for repr() of a long,
though that will also change to lose the trailing 'L' eventually.)
This introduces some warnings in the test suite; I'll take care of
those later.
This is friendlier for caches.
2. Cut MIN_GALLOP to 7, but added a per-sort min_gallop vrbl that adapts
the "get into galloping mode" threshold higher when galloping isn't
paying, and lower when it is. There's no known case where this hurts.
It's (of course) neutral for /sort, \sort and =sort. It also happens
to be neutral for !sort. It cuts a tiny # of compares in 3sort and +sort.
For *sort, it reduces the # of compares to better than what this used to
do when MIN_GALLOP was hardcoded to 10 (it did about 0.1% more *sort
compares before, but given how close we are to the limit, this is "a
lot"!). %sort used to do about 1.5% more compares, and ~sort about
3.6% more. Here are exact counts:
i *sort 3sort +sort %sort ~sort !sort
15 449235 33019 33016 51328 188720 65534 before
448885 33016 33007 50426 182083 65534 after
0.08% 0.01% 0.03% 1.79% 3.65% 0.00% %ch from after
16 963714 65824 65809 103409 377634 131070
962991 65821 65808 101667 364341 131070
0.08% 0.00% 0.00% 1.71% 3.65% 0.00%
17 2059092 131413 131362 209130 755476 262142
2057533 131410 131361 206193 728871 262142
0.08% 0.00% 0.00% 1.42% 3.65% 0.00%
18 4380687 262440 262460 421998 1511174 524286
4377402 262437 262459 416347 1457945 524286
0.08% 0.00% 0.00% 1.36% 3.65% 0.00%
19 9285709 524581 524634 848590 3022584 1048574
9278734 524580 524633 837947 2916107 1048574
0.08% 0.00% 0.00% 1.27% 3.65% 0.00%
20 19621118 1048960 1048942 1715806 6045418 2097150
19606028 1048958 1048941 1694896 5832445 2097150
0.08% 0.00% 0.00% 1.23% 3.65% 0.00%
3. Added some key asserts I overlooked before.
4. Updated the doc file.
before %sort was introduced. Redid them (the numbers change, but the
conclusions don't). Also did the samplesort counts with the released
2.2.1, as they're slightly different under the last CVS 2.3 samplesort
(some higher, some lower -- CVS had been changed to stop doing the
special-case business on recursive samplesort calls).
example of where this changes behavior is when a new-style instance
defines '__mul__' and '__rmul__' and is multiplied by an int. Before the
change the '__rmul__' method is never called, even if the int is the
left operand.
trampolining going on with the tp_new descriptor, where the inherited
PyType_GenericNew was overwritten with the much slower slot_tp_new
which would end up calling tp_new_wrapper which would eventually call
PyType_GenericNew. Add a special case for this to update_one_slot().
XXX Hope there isn't a loophole in this. I'll buy the first person to
point out a bug in the reasoning a beer.
Backport candidate (but I won't do it).