folded; this will change in Python 2.4. On a 32-bit machine, this
happens for 0x80000000 through 0xffffffff, and for octal constants in
the same value range. No warning is issued if an explicit base is
given, *or* if the string contains a sign (since in those cases no
sign folding ever happens).
descr_check(); it wasn't useful. Change the type argument of the
various _get() methods to PyObject * because the call signature of
tp_descr_get doesn't guarantee its type.
when Python code calls a descriptor's __get__ method. It should
translate None to NULL in both argument positions, and insist that at
least one of the argument positions is not NULL after this
transformation.
For the case where the current globals match the previous frame's
globals, eliminates three tests in two if statements. For the case
where we just get __builtins__ from a module, eliminate a couple of
tests.
wasn't used outside the assert (and hence caused a compiler warning
about an unused variable in NDEBUG mode). The assert wasn't very
useful any more.
_PyLong_NumBits(): moved the calculation of ndigits after asserting
that v != NULL.
Assorted code cleanups; e.g., sizeof(char) is 1 by definition, so there's
no need to do things like multiply by sizeof(char) in hairy malloc
arguments. Fixed an undetected-overflow bug in readline_file().
longobject.c: Fixed a really stupid bug in the new _PyLong_NumBits.
pickle.py: Fixed stupid bug in save_long(): When proto is 2, it
wrote LONG1 or LONG4, but forgot to return then -- it went on to
append the proto 1 LONG opcode too.
Fixed equally stupid cancelling bugs in load_long1() and
load_long4(): they *returned* the unpickled long instead of pushing
it on the stack. The return values were ignored. Tests passed
before only because save_long() pickled the long twice.
Fixed bugs in encode_long().
Noted that decode_long() is quadratic-time despite our hopes,
because long(string, 16) is still quadratic-time in len(string).
It's hex() that's linear-time. I don't know a way to make decode_long()
linear-time in Python, short of maybe transforming the 256's-complement
bytes into marshal's funky internal format, and letting marshal decode
that. It would be more valuable to make long(string, 16) linear time.
pickletester.py: Added a global "protocols" vector so tests can try
all the protocols in a sane way. Changed test_ints() and test_unicode()
to do so. Added a new test_long(), but the tail end of it is disabled
because it "takes forever" under pickle.py (but runs very quickly under
cPickle: cPickle proto 2 for longs is linear-time).
__module__ is the string name of the module the function was defined
in, just like __module__ of classes. In some cases, particularly for
C functions, the __module__ may be None.
Change PyCFunction_New() from a function to a macro, but keep an
unused copy of the function around so that we don't change the binary
API.
Change pickle's save_global() to use whichmodule() if __module__ is
None, but add the __module__ logic to whichmodule() since it might be
used outside of pickle.
error handers in the Unicode codecs: Negative
positions are treated as being relative to the end of
the input and out of bounds positions result in an
IndexError.
Also update the PEP and include an explanation of
this in the documentation for codecs.register_error.
Fixes a small bug in iconv_codecs: if the position
from the callback is negative *add* it to the size
instead of substracting it.
From SF patch #677429.
needs of pickling longs. Backed off to a definition that's much easier
to understand. The pickler will have to work a little harder, but other
uses are more likely to be correct <0.5 wink>.
_PyLong_Sign(): New teensy function to characterize a long, as to <0, ==0,
or >0.
types. The special handling for these can now be removed from save_newobj().
Add some testing for this.
Also add support for setting the 'fast' flag on the Python Pickler class,
which suppresses use of the memo.
start for the C implemention of new pickle LONG1 and LONG4 opcodes (the
linear-time way to pickle a long is to call _PyLong_AsByteArray, but
the caller has no idea how big an array to allocate, and correct
calculation is a bit subtle).
was broken because new-in-2.3 code added a tp_as_mapping slot to tuples.
Repaired that.
Added basic docs to check_recursion().
The code that intended to exempt tuples and strings was also broken here,
and in 2.2: these should use PyXYZ_CheckExact(), not PyXYZ_Check() -- we
can't know whether subclass instances are immutable. This part (and this
part alone) is a bugfix candidate.
Christian Tismer pointed out the high cost of the loop overhead and
function call overhead for 'c' * n where n is large. Accordingly,
the new code only makes lg2(n) loops.
Interestingly, 'c' * 1000 * 1000 ran a bit faster with old code. At some
point, the loop and function call overhead became cheaper than invalidating
the cache with lengthy memcpys. But for more typical sizes of n, the new
code runs much faster and for larger values of n it runs only a bit slower.
Refactor code in PyCFunction_Call giving a modest (tiny) speed boost,
a slight improvement in semantics (now detects invalid flag combinations),
and (arguably) improved clarity (making it blindingly clear which flag
combinations are allowed). All this comes at a cost of a few lines of
code duplication.
* Folded test for METH_KEYWORDS into the switch/case.
* Deferred testing for an empty dictionary until when and where needed.
* Make a similar deferral for filling the "size" variable.
* Inverted the dictionary test so that the common case falls though
instead of making a jump.
645404). I'm not 100% sure this is the right fix, so I'll keep the
bug report open for Samuele, but this fixes the index error and passes
the test suite (and I can't see why it *shouldn't* be the right fix
:-).
Initialize the small integers and __builtins__ in startup.
This removes some if conditions.
Change XDECREF to DECREF for values which shouldn't be NULL.
andsq_inplace_repeat. This fixes a number of corner case bugs (see #624807).
Consolidate the int and long sequence repeat code. Before the change, integers
checked for integer overflow but longs did not.
Obtain cleaner coding and a system wide
performance boost by using the fast, pre-parsed
PyArg_Unpack function instead of PyArg_ParseTuple
function which is driven by a format string.
[ 643835 ] Set Next Statement for Python debuggers
with a few tweaks by me: adding an unsigned or two, mentioning that
not all jumps are allowed in the doc for pdb, adding a NEWS item and
a note to whatsnew, and AuCTeX doing something cosmetic to libpdb.tex.
[#521782] unreliable file.read() error handling
* Objects/fileobject.c
(file_read): Clear errors before leaving the loop in all situations,
and also check if some data was read before exiting the loop with an
EWOULDBLOCK exception.
* Doc/lib/libstdtypes.tex
* Objects/fileobject.c
Document that sometimes a read() operation can return less data than
what the user asked, if running in non-blocking mode.
* Misc/NEWS
Document the fix.
containing class objects) are allowed as the second argument.
This makes issubclass() more similar to isinstance() where recursive
tuples are allowed too.
supported as the second argument. This has the same meaning as
for isinstance(), i.e. issubclass(X, (A, B)) is equivalent
to issubclass(X, A) or issubclass(X, B). Compared to isinstance(),
this patch does not search the tuple recursively for classes, i.e.
any entry in the tuple that is not a class, will result in a
TypeError.
This closes SF patch #649608.
Most of these patches are from Thomas Heller, with long lines folded
by Tim. The change to test_descr.py is from Guido. See the bug report.
Not a bugfix candidate -- METH_CLASS is new in 2.3.
Just van Rossum showed a weird, but clever way for pure python code to
trigger the BadInternalCall. The C code had assumed that calling a class
constructor would return an instance of that class; however, classes that
abuse __new__ can invalidate that assumption.
see problems with my code that I didn't see before the checkin, but:
When a subtype .mro() fails, we need to reset the type whose __bases__
are being changed, too. Fix + test.
[ 635933 ] make some type attrs writable
Plus a couple of extra tests beyond what's up there.
It hasn't been as carefully reviewed as it perhaps should, so all readers
are encouraged, nay exhorted, to give this a close reading.
There are still a couple of oddities related to assigning to __name__,
but I intend to solicit python-dev's opinions on these.
messages about MRO conflicts. (The tweaks include correcting spelling
errors, some refactoring to get the name of classic classes, and a
style nit or two.)
long but the double is too big to fit in a long. Prevent that. This
closes some recent bug or patch on SF, but SF is down now so I can't
say which.
Bugfix candidate.
Py_Init crash". refchain cannot be cleared because objects can live across
Py_Finalize() and Py_Initialize() if they are kept alive by circular
references.
619475; also closing SF bug 618704). I tweaked his code a bit for
style.
This raises TypeError for MRO order disagreements, which is an
improvement (previously these went undetected) but also a degradation:
what if the order disagreement doesn't affect any method lookups?
I don't think I care.
When mwh added extended slicing, strings and unicode became mappings.
Thus, dict was set which prevented an error when doing:
newstr = 'format without a percent' % string_value
This fix raises an exception again when there are no formats
and % with a string value.
Armin Rigo's Draconian but effective fix for
SF bug 453523: list.sort crasher
slightly fiddled to catch more cases of list mutation. The dreaded
internal "immutable list type" is gone! OTOH, if you look at a list
*while* it's being sorted now, it will appear to be empty. Better
than a core dump.
/* this is harder to get right than you might think */
angered some God somewhere. After noticing
>>> range(5000000)[slice(96360, None, 439)]
[]
I found that my cute test for the slice being empty failed due to
overflow. Fixed, and added simple test (not the above!).
classes was called with three arguments. This makes no sense, there's
no way to pass in the "modulo" 3rd argument as for __pow__, and
classic classes don't do this. [SF bug 620179]
I don't want to backport this to 2.2.2, because it could break
existing code that has developed a work-around. Code in 2.2.2 that
wants to use __ipow__ and wants to be forward compatible with 2.3
should be written like this:
def __ipow__(self, exponent, modulo=None):
...
macros. The 'op' argument is then the result from PyObject_MALLOC,
and that can of course be NULL. In that case, PyObject_Init[Var]
would raise a SystemError with "NULL object passed to
PyObject_Init[Var]". But there's nothing the caller of the macro can
do about this. So PyObject_Init[Var] should call just PyErr_NoMemory.
Will backport.
'%2147483647d' % -123 segfaults. This was because an integer overflow
in a comparison caused the string resize to be skipped. After fixing
the overflow, this could call _PyString_Resize() with a negative size,
so I (1) test for that and raise MemoryError instead; (2) also added a
test for negative newsize to _PyString_Resize(), raising SystemError
as for all bad arguments.
An identical bug existed in unicodeobject.c, of course.
Will backport to 2.2.2.
Also fixed an error message -- %s argument has non-string str()
doesn't make sense for %r, so the error message now differentiates
between %s and %r.
because PyObject_Repr() and PyObject_Str() ensure that this can never
happen. Added a helpful comment instead.
sees a Unicode argument. Unfortunately this test was also executed
for %r, because %s and %r share almost all of their code. This meant
that, if u is a unicode object while repr(u) is an 8-bit string
containing ASCII characters, '%r' % u is a *unicode* string containing
only ASCII characters!
Fixed by executing the test only for %s.
Also fixed an error message -- %s argument has non-string str()
doesn't make sense for %r, so the error message now differentiates
between %s and %r.
but returns r->len which is a long. This doesn't even cause a warning
on 32-bit platforms, but can return bogus values on 64-bit platforms
(and should cause a compiler warning). Fix this by inserting a range
check when LONG_MAX != INT_MAX, and adding an explicit cast to (int)
when the test passes. When r->len is out of range, PySequence_Size()
and hence len() will report an error (but an iterator will still
work).
Unicode strings (with arbitrary length) are allowed
as entries in the unicode.translate mapping.
Add a test case for multicharacter replacements.
(Multicharacter replacements were enabled by the
PEP 293 patch)
globals, _Py_Ticker and _Py_CheckInterval. This also implements Jeremy's
shortcut in Py_AddPendingCall that zeroes out _Py_Ticker. This allows the
test in the main loop to only test a single value.
The gory details are at
http://python.org/sf/602191
of PyString_DecodeEscape(). This prevents a call to
_PyString_Resize() for the empty string, which would
result in a PyErr_BadInternalCall(), because the
empty string has more than one reference.
This closes SF bug http://www.python.org/sf/603937
possible. This always called PyUnicode_Check() and PyString_Check(),
at least one of which would call PyType_IsSubtype(). Also, this would
call PyString_Size() on known string objects.
wrong thing for a unicode subclass when there were zero string
replacements. The example given in the SF bug report was only one way
to trigger this; replacing a string of length >= 2 that's not found is
another. The code would actually write outside allocated memory if
replacement string was longer than the search string.
(I wonder how many more of these are lurking? The unicode code base
is full of wonders.)
Bugfix candidate; this same bug is present in 2.2.1.
SHIFT and MASK, and widen digit. One problem is that code of the form
digit << small_integer
implicitly assumes that the result fits in an int or unsigned int
(platform-dependent, but "int sized" in any case), since digit is
promoted "just" to int or unsigned via the usual integer promotions.
But if digit is typedef'ed as unsigned int, this loses information.
The cure for this is just to cast digit to twodigits first.
interning. I modified Oren's patch significantly, but the basic idea
and most of the implementation is unchanged. Interned strings created
with PyString_InternInPlace() are now mortal, and you must keep a
reference to the resulting string around; use the new function
PyString_InternImmortal() to create immortal interned strings.
comments everywhere that bugged me: /* Foo is inlined */ instead of
/* Inline Foo */. Somehow the "is inlined" phrase always confused me
for half a second (thinking, "No it isn't" until I added the missing
"here"). The new phrase is hopefully unambiguous.
expensive and overly general PyObject_IsInstance(), call
PyObject_TypeCheck() which is a macro that often avoids a call, and if
it does make a call, calls the much more efficient PyType_IsSubtype().
This saved 6% on a benchmark for slot lookups.
-- replace then with slightly faster PyObject_Call(o,a,NULL). (The
difference is that the latter requires a to be a tuple; the former
allows other values and wraps them in a tuple if necessary; it
involves two more levels of C function calls to accomplish all that.)
rigorous instead of hoping for testing not to turn up counterexamples.
Call me heretical, but despite that I'm wholly confident in the proof,
and have done it two different ways now, I still put more faith in
testing ...
[ 587993 ] SET_LINENO killer
Remove SET_LINENO. Tracing is now supported by inspecting co_lnotab.
Many sundry changes to document and adapt to this change.
ah*bh and al*bl. This is much easier than explaining why that's true
for (ah+al)*(bh+bl), and follows directly from the simple part of the
(ah+al)*(bh+bl) explanation.
space is no longer needed, so removed the code. It was only possible when
a degenerate (ah->ob_size == 0) split happened, but after that fix went
in I added k_lopsided_mul(), which saves the body of k_mul() from seeing
a degenerate split. So this removes code, and adds a honking long comment
block explaining why spilling out of bounds isn't possible anymore. Note:
ff we end up spilling out of bounds anyway <wink>, an assert in v_iadd()
is certain to trigger.
(rev. 2.86). The other type is only disqualified from sq_repeat when
it has the CHECKTYPES flag. This means that for extension types that
only support "old-style" numeric ops, such as Zope 2's ExtensionClass,
sq_repeat still trumps nb_multiply.
k_mul() when inputs have vastly different sizes, and a little more
efficient when they're close to a factor of 2 out of whack.
I consider this done now, although I'll set up some more correctness
tests to run overnight.
cases, overflow the allocated result object by 1 bit. In such cases,
it would have been brought back into range if we subtracted al*bl and
ah*bh from it first, but I don't want to do that because it hurts cache
behavior. Instead we just ignore the excess bit when it appears -- in
effect, this is forcing unsigned mod BASE**(asize + bsize) arithmetic
in a case where that doesn't happen all by itself.
1. You can now have __dict__ and/or __weakref__ in your __slots__
(before only __weakref__ was supported). This is treated
differently than before: it merely sets a flag that the object
should support the corresponding magic.
2. Dynamic types now always have descriptors __dict__ and __weakref__
thrust upon them. If the type in fact does not support one or the
other, that descriptor's __get__ method will raise AttributeError.
3. (This is the reason for all this; it fixes SF bug 575229, reported
by Cesar Douady.) Given this code:
class A(object): __slots__ = []
class B(object): pass
class C(A, B): __slots__ = []
the class object for C was broken; its size was less than that of
B, and some descriptors on B could cause a segfault. C now
correctly inherits __weakrefs__ and __dict__ from B, even though A
is the "primary" base (C.__base__ is A).
4. Some code cleanup, and a few comments added.
algorithm. MSVC 6 wasn't impressed <wink>.
Something odd: the x_mul algorithm appears to get substantially worse
than quadratic time as the inputs grow larger:
bits in each input x_mul time k_mul time
------------------ ---------- ----------
15360 0.01 0.00
30720 0.04 0.01
61440 0.16 0.04
122880 0.64 0.14
245760 2.56 0.40
491520 10.76 1.23
983040 71.28 3.69
1966080 459.31 11.07
That is, x_mul is perfectly quadratic-time until a little burp at
2.56->10.76, and after that goes to hell in a hurry. Under Karatsuba,
doubling the input size "should take" 3 times longer instead of 4, and
that remains the case throughout this range. I conclude that my "be nice
to the cache" reworkings of k_mul() are paying.
correct now, so added some final comments, did some cleanup, and enabled
it for all long-int multiplies. The KARAT envar no longer matters,
although I left some #if 0'ed code in there for my own use (temporary).
k_mul() is still much slower than x_mul() if the inputs have very
differenent sizes, and that still needs to be addressed.
(it's possible, but should be harmless -- this requires more thought,
and allocating enough space in advance to prevent it requires exactly
as much thought, to know exactly how much that is -- the end result
certainly fits in the allocated space -- hmm, but that's really all
the thought it needs! borrows/carries out of the high digits really
are harmless).
k_mul(): This didn't allocate enough result space when one input had
more than twice as many bits as the other. This was partly hidden by
that x_mul() didn't normalize its result.
The Karatsuba recurrence is pretty much hosed if the inputs aren't
roughly the same size. If one has at least twice as many bits as the
other, we get a degenerate case where the "high half" of the smaller
input is 0. Added a special case for that, for speed, but despite that
it helped, this can still be much slower than the "grade school" method.
It seems to take a really wild imbalance to trigger that; e.g., a
2**22-bit input times a 1000-bit input on my box runs about twice as slow
under k_mul than under x_mul. This still needs to be addressed.
I'm also not sure that allocating a->ob_size + b->ob_size digits is
enough, given that this is computing k = (ah+al)*(bh+bl) instead of
k = (ah-al)*(bl-bh); i.e., it's certainly enough for the final result,
but it's vaguely possible that adding in the "artificially" large k may
overflow that temporarily. If so, an assert will trigger in the debug
build, but we'll probably compute the right result anyway(!).
addition and subtraction. Reworked the tail end of k_mul() to use them.
This saves oodles of one-shot longobject allocations (this is a triply-
recursive routine, so saving one allocation in the body saves 3**n
allocations at depth n; we actually save 2 allocations in the body).
SF 560379: Karatsuba multiplication.
Lots of things were changed from that. This needs a lot more testing,
for correctness and speed, the latter especially when bit lengths are
unbalanced. For now, the Karatsuba code gets invoked if and only if
envar KARAT exists.
currently return inconsistent results for ints and longs; in
particular: hex/oct/%u/%o/%x/%X of negative short ints, and x<<n that
either loses bits or changes sign. (No warnings for repr() of a long,
though that will also change to lose the trailing 'L' eventually.)
This introduces some warnings in the test suite; I'll take care of
those later.
This is friendlier for caches.
2. Cut MIN_GALLOP to 7, but added a per-sort min_gallop vrbl that adapts
the "get into galloping mode" threshold higher when galloping isn't
paying, and lower when it is. There's no known case where this hurts.
It's (of course) neutral for /sort, \sort and =sort. It also happens
to be neutral for !sort. It cuts a tiny # of compares in 3sort and +sort.
For *sort, it reduces the # of compares to better than what this used to
do when MIN_GALLOP was hardcoded to 10 (it did about 0.1% more *sort
compares before, but given how close we are to the limit, this is "a
lot"!). %sort used to do about 1.5% more compares, and ~sort about
3.6% more. Here are exact counts:
i *sort 3sort +sort %sort ~sort !sort
15 449235 33019 33016 51328 188720 65534 before
448885 33016 33007 50426 182083 65534 after
0.08% 0.01% 0.03% 1.79% 3.65% 0.00% %ch from after
16 963714 65824 65809 103409 377634 131070
962991 65821 65808 101667 364341 131070
0.08% 0.00% 0.00% 1.71% 3.65% 0.00%
17 2059092 131413 131362 209130 755476 262142
2057533 131410 131361 206193 728871 262142
0.08% 0.00% 0.00% 1.42% 3.65% 0.00%
18 4380687 262440 262460 421998 1511174 524286
4377402 262437 262459 416347 1457945 524286
0.08% 0.00% 0.00% 1.36% 3.65% 0.00%
19 9285709 524581 524634 848590 3022584 1048574
9278734 524580 524633 837947 2916107 1048574
0.08% 0.00% 0.00% 1.27% 3.65% 0.00%
20 19621118 1048960 1048942 1715806 6045418 2097150
19606028 1048958 1048941 1694896 5832445 2097150
0.08% 0.00% 0.00% 1.23% 3.65% 0.00%
3. Added some key asserts I overlooked before.
4. Updated the doc file.
before %sort was introduced. Redid them (the numbers change, but the
conclusions don't). Also did the samplesort counts with the released
2.2.1, as they're slightly different under the last CVS 2.3 samplesort
(some higher, some lower -- CVS had been changed to stop doing the
special-case business on recursive samplesort calls).
example of where this changes behavior is when a new-style instance
defines '__mul__' and '__rmul__' and is multiplied by an int. Before the
change the '__rmul__' method is never called, even if the int is the
left operand.
trampolining going on with the tp_new descriptor, where the inherited
PyType_GenericNew was overwritten with the much slower slot_tp_new
which would end up calling tp_new_wrapper which would eventually call
PyType_GenericNew. Add a special case for this to update_one_slot().
XXX Hope there isn't a loophole in this. I'll buy the first person to
point out a bug in the reasoning a beer.
Backport candidate (but I won't do it).
intern the string "__new__" so we can call PyObject_GetAttr() rather
than PyObject_GetAttrString(). (Though it's a mystery why slot_tp_new
is being called when a class doesn't define __new__. I'll look into
that tomorrow.)
2.2 backport candidate (but I won't do it).
a lot of work: it had to save and restore the current exception around
a call to lookup_maybe(), because that could fail in rare cases, and
most objects don't have a __del__ method, so the whole exercise was
usually a waste of time. Changed this to cache the __del__ method in
the type object just like all other special methods, in a new slot
tp_del. So now subtype_dealloc() can test whether tp_del is NULL and
skip the whole exercise if it is. The new slot doesn't need a new
flag bit: subtype_dealloc() is only called if the type was dynamically
allocated by type_new(), so it's guaranteed to have all current slots.
Types defined in C cannot fill in tp_del with a function of their own,
so there's no corresponding "wrapper". (That functionality is already
available through tp_dealloc.)
subtype_dealloc().
When call_finalizer() failed, it would return without going through
the trashcan end macro, thereby unbalancing the trashcan nesting level
counter, and thereby defeating the test case (slottrash() in
test_descr.py). This in turn meant that the assert in the GC_UNTRACK
macro wasn't triggered by the slottrash() test despite a bug in the
code: _PyTrash_destroy_chain() calls the dealloc routine with an
object that's untracked, and the assert in the GC_UNTRACK macro would
fail on this; but because of an earlier test that resurrects an
object, causing call_finalizer() to fail and the trashcan nesting
level to be unbalanced, so _PyTrash_destroy_chain() was never called.
Calling the slottrash() test in isolation *did* trigger the assert,
however.
So the fix is twofold: (1) call the GC_UnTrack() function instead of
the GC_UNTRACK macro, because the function is safe when the object is
already untracked; (2) when call_finalizer() fails, jump to a label
that exits through the trashcan end macro, keeping the trashcan
nesting balanced.
This is inspired by SF patch 581742 (by Jonathan Hogg, who also
submitted the bug report, and two other suggested patches), but
separates the non-GC case from the GC case to avoid testing for GC
several times.
Had to fix an assert() from call_finalizer() that asserted that the
object wasn't untracked, because it's possible that the object isn't
GC'ed!
For a file f, iter(f) now returns f (unless f is closed), and f.next()
is similar to f.readline() when EOF is not reached; however, f.next()
uses a readahead buffer that messes up the file position, so mixing
f.next() and f.readline() (or other methods) doesn't work right.
Calling f.seek() drops the readahead buffer, but other operations
don't.
The real purpose of this change is to reduce the confusion between
objects and their iterators. By making a file its own iterator, it's
made clearer that using the iterator modifies the file object's state
(in particular the current position).
A nice side effect is that this speeds up "for line in f:" by not
having to use the xreadlines module. The f.xreadlines() method is
still supported for backwards compatibility, though it is the same as
iter(f) now.
(I made some cosmetic changes to Oren's code, and added a test for
"file closed" to file_iternext() and file_iter().)
directly when no comparison function is specified. This saves a layer
of function call on every compare then. Measured speedups:
i 2**i *sort \sort /sort 3sort +sort %sort ~sort =sort !sort
15 32768 12.5% 0.0% 0.0% 100.0% 0.0% 50.0% 100.0% 100.0% -50.0%
16 65536 8.7% 0.0% 0.0% 0.0% 0.0% 0.0% 12.5% 0.0% 0.0%
17 131072 8.0% 25.0% 0.0% 25.0% 0.0% 14.3% 5.9% 0.0% 0.0%
18 262144 6.3% -10.0% 12.5% 11.1% 0.0% 6.3% 5.6% 12.5% 0.0%
19 524288 5.3% 5.9% 0.0% 5.6% 0.0% 5.9% 5.4% 0.0% 2.9%
20 1048576 5.3% 2.9% 2.9% 5.1% 2.8% 1.3% 5.9% 2.9% 4.2%
The best indicators are those that take significant time (larger i), and
where sort doesn't do very few compares (so *sort and ~sort benefit most
reliably). The large numbers are due to roundoff noise combined with
platform variability; e.g., the 14.3% speedup for %sort at i=17 reflects
a printed elapsed time of 0.18 seconds falling to 0.17, but a change in
the last digit isn't really meaningful (indeed, if it really took 0.175
seconds, one electron having a lazy nanosecond could shift it to either
value <wink>). Similarly the 25% at 3sort i=17 was a meaningless change
from 0.05 to 0.04. However, almost all the "meaningless changes" were
in the same direction, which is good. The before-and-after times for
*sort are clearest:
before after
0.18 0.16
0.25 0.23
0.54 0.50
1.18 1.11
2.57 2.44
5.58 5.30
longer to run than normal. A profiler run showed that this was due to
PyFrame_New() taking up an unreasonable amount of time. A little
thinking showed that this was due to the while loop clearing the space
available for the stack. The solution is to only clear the local
variables (and cells and free variables), not the space available for
the stack, since anything beyond the stack top is considered to be
garbage anyway. Also, use memset() instead of a while loop counting
backwards. This should be a time savings for normal code too! (By a
probably unmeasurable amount. :-)
version of PySlice_GetIndicesEx"):
> OK. Michael, if you want to check in indices(), go ahead.
Then I did what was needed, but didn't check it in. Here it is.
listsort. If the former calls itself recursively, they're a waste of
time, since it's called on a random permutation of a random subset of
elements. OTOH, for exactly the same reason, they're an immeasurably
small waste of time (the odds of finding exploitable order in a random
permutation are ~= 0, so the special-case loops looking for order give
up quickly). The point is more for conceptual clarity.
Also changed some "assert comments" into real asserts; when this code
was first written, Python.h didn't supply assert.h.
introduced, list.sort() was rewritten to use only the "< or not <?"
distinction. After rich comparisons were introduced, docompare() was
fiddled to translate a Py_LT Boolean result into the old "-1 for <,
0 for ==, 1 for >" flavor of outcome, and the sorting code was left
alone. This left things more obscure than they should be, and turns
out it also cost measurable cycles.
So: The old CMPERROR novelty is gone. docompare() is renamed to islt(),
and now has the same return conditinos as PyObject_RichCompareBool. The
SETK macro is renamed to ISLT, and is even weirder than before (don't
complain unless you want to maintain the sort code <wink>).
Overall, this yields a 1-2% speedup in the usual (no explicit function
passed to list.sort()) case when sorting arrays of floats (as sortperf.py
does). The boost is higher for arrays of ints.
The staticforward define was needed to support certain broken C
compilers (notably SCO ODT 3.0, perhaps early AIX as well) botched the
static keyword when it was used with a forward declaration of a static
initialized structure. Standard C allows the forward declaration with
static, and we've decided to stop catering to broken C compilers. (In
fact, we expect that the compilers are all fixed eight years later.)
I'm leaving staticforward and statichere defined in object.h as
static. This is only for backwards compatibility with C extensions
that might still use it.
XXX I haven't updated the documentation.
PyType_Ready() because the tp_iternext slot is set (fortunately,
because using the tp_iternext implementation for the the next()
implementation is buggy). Also changed the allocation order in
enum_next() so that the underlying iterator is only moved ahead when
we have successfully allocated the result tuple and index.
di_dict field when the end of the list is reached. Also make the
error ("dictionary changed size during iteration") a sticky state.
Also remove the next() method -- one is supplied automatically by
PyType_Ready() because the tp_iternext slot is set. That's a good
thing, because the implementation given here was buggy (it never
raised StopIteration).
object references (it_seq for seqiterobject, it_callable and
it_sentinel for calliterobject) when the end of the list is reached.
Also remove the next() methods -- one is supplied automatically by
PyType_Ready() because the tp_iternext slot is set. That's a good
thing, because the implementation given here was buggy (it never
raised StopIteration).
it_seq field when the end of the list is reached.
Also remove the next() method -- one is supplied automatically by
PyType_Ready() because the tp_iternext slot is set. That's a good
thing, because the implementation given here was buggy (it never
raised StopIteration).
If the object is an ExtensionClass, for example, the slot is not even
defined. So we must check that the type has the slot (implied by
HAVE_CLASS) before calling tp_init().
explicit comparison function case: use PyObject_Call instead of
PyEval_CallObject. Same thing in context, but gives a 2.4% overall
speedup when sorting a list of ints via list.sort(__builtin__.cmp).
MSDN sample programs use it, apparently in error. The correct name
is WIN32_LEAN_AND_MEAN. After switching to the correct name, in two
cases more was needed because the code actually relied on things that
disappear when WIN32_LEAN_AND_MEAN is defined.
arg tuple. This was suggested on c.l.py but afraid I can't find the msg
again for proper attribution. For
list.sort(cmp)
where list is a list of random ints, and cmp is __builtin__.cmp, this
yields an overall 50-60% speedup on my Win2K box. Of course this is a
best case, because the overhead of calling cmp relative to the cost of
actually comparing two ints is at an extreme. Nevertheless it's huge
bang for the buck. An additionak 20-30% can be bought by making the arg
tuple an immortal static (avoiding all but "the first" PyTuple_New), but
that's tricky to make correct since docompare needs to be reentrant. So
this picks the cherry and leaves the pits for Fred <wink>.
Note that this makes no difference to the
list.sort()
case; an arg tuple gets built only if the user specifies an explicit
sort function.
helper macros to something saner, and used them appropriately in other
files too, to reduce #ifdef blocks.
classobject.c, instance_dealloc(): One of my worst Python Memories is
trying to fix this routine a few years ago when COUNT_ALLOCS was defined
but Py_TRACE_REFS wasn't. The special-build code here is way too
complicated. Now it's much simpler. Difference: in a Py_TRACE_REFS
build, the instance is no longer in the doubly-linked list of live
objects while its __del__ method is executing, and that may be visible
via sys.getobjects() called from a __del__ method. Tough -- the object
is presumed dead while its __del__ is executing anyway, and not calling
_Py_NewReference() at the start allows enormous code simplification.
typeobject.c, call_finalizer(): The special-build instance_dealloc()
pain apparently spread to here too via cut-'n-paste, and this is much
simpler now too. In addition, I didn't understand why this routine
was calling _PyObject_GC_TRACK() after a resurrection, since there's no
plausible way _PyObject_GC_UNTRACK() could have been called on the
object by this point. I suspect it was left over from pasting the
instance_delloc() code. Instead asserted that the object is still
tracked. Caution: I suspect we don't have a test that actually
exercises the subtype_dealloc() __del__-resurrected-me code.
more trivial lexical helper macros so that uses of these guys expand
to nothing at all when they're not enabled. This should help sub-
standard compilers that can't do a good job of optimizing away the
previous "(void)0" expressions.
Py_DECREF: There's only one definition of this now. Yay! That
was that last one in the family defined multiple times in an #ifdef
maze.
Py_FatalError(): Changed the char* signature to const char*.
_Py_NegativeRefcount(): New helper function for the Py_REF_DEBUG
expansion of Py_DECREF. Calling an external function cuts down on
the volume of generated code. The previous inline expansion of abort()
didn't work as intended on Windows (the program often kept going, and
the error msg scrolled off the screen unseen). _Py_NegativeRefcount
calls Py_FatalError instead, which captures our best knowledge of
how to abort effectively across platforms.
Repair segfaults and infinite loops in COUNT_ALLOCS builds in the
presence of new-style (heap-allocated) classes/types.
Bugfix candidate. I'll backport this to 2.2. It's irrelevant in 2.1.
that have taken me "too long" to reverse-engineer over the years.
Vastly reduced the nesting level and redundancy of #ifdef-ery.
Took a light stab at repairing comments that are no longer true.
sys_gettotalrefcount(): Changed to enable under Py_REF_DEBUG.
It was enabled under Py_TRACE_REFS, which was much heavier than
necessary. sys.gettotalrefcount() is now available in a
Py_REF_DEBUG-only build.
mechanism is no longer evil: it no longer plays dangerous games with
the type pointer or refcounts, and objects in extension modules can play
along too without needing to edit the core first.
Rewrote all the comments to explain this, and (I hope) give clear
guidance to extension authors who do want to play along. Documented
all the functions. Added more asserts (it may no longer be evil, but
it's still dangerous <0.9 wink>). Rearranged the generated code to
make it clearer, and to tolerate either the presence or absence of a
semicolon after the macros. Rewrote _PyTrash_destroy_chain() to call
tp_dealloc directly; it was doing a Py_DECREF again, and that has all
sorts of obscure distorting effects in non-release builds (Py_DECREF
was already called on the object!). Removed Christian's little "embedded
change log" comments -- that's what checkin messages are for, and since
it was impossible to correlate the comments with the code that changed,
I found them merely distracting.
In a fresh interpreter, type.mro(tuple) would segfault, because
PyType_Ready() isn't called for tuple yet. To fix, call
PyType_Ready(type) if type->tp_dict is NULL.
These built-in functions are replaced by their (now callable) type:
slice()
buffer()
and these types can also be called (but have no built-in named
function named after them)
classobj (type name used to be "class")
code
function
instance
instancemethod (type name used to be "instance method")
The module "new" has been replaced with a small backward compatibility
placeholder in Python.
A large portion of the patch simply removes the new module from
various platform-specific build recipes. The following binary Mac
project files still have references to it:
Mac/Build/PythonCore.mcp
Mac/Build/PythonStandSmall.mcp
Mac/Build/PythonStandalone.mcp
[I've tweaked the code layout and the doc strings here and there, and
added a comment to types.py about StringTypes vs. basestring. --Guido]
gotten from a weak reference to NULL instead of to None. This caused
the following assert() to fail (but only in 2.2 in the debug build --
I have to find a better test case). Will backport.
optional attribute, only clear the exception when the internal getattr
operation raised AttributeError. Many places in this file already had
that policy; but just as many didn't, and there didn't seem to be any
rhyme or reason to it. Be consistently cautious.
Question: should I backport this? On the one hand it's a bugfix. On
the other hand it's a change in behavior. Certain forms of buggy or
just weird code would work in the past but raise an exception under
the new rules; e.g. if you define a __getattr__ method that raises a
non-AttributeError exception.
473985. Through a subtle rearrangement of some members in the etype
struct (!), mapping methods are now preferred over sequence methods,
which is necessary to support str.__getitem__("hello", slice(4)) etc.
[ 400998 ] experimental support for extended slicing on lists
somewhat spruced up and better tested than it was when I wrote it.
Includes docs & tests. The whatsnew section needs expanding, and arrays
should support extended slices -- later.
discovered that subtype_traverse must traverse the type if it is a
heap type, because otherwise some cycles involving a type and its
instance would not be collected. Simplest example:
while 1:
class C(object): pass
C.ref = C()
This program grows without bounds before this fix. (It grows ever
slower since it spends ever more time in the collector.)
Simply adding the right visit() call to subtype_traverse() revealed
other problems. With MvL's help we re-learned that type_clear()
doesn't have to clear *all* references, only the ones that may not be
cleared by other means. Careful analysis (see comments in the code)
revealed that only tp_mro needs to be cleared. (The previous checkin
to this file adds a test for tp_mro==NULL to _PyType_Lookup() that's
essential to prevent crashes due to tp_mro being NULL when
subtype_dealloc() tries to look for a __del__ method.) The same kind
of analysis also revealed that subtype_clear() doesn't need to clear
the instance dict.
With this fix, a useful property of the collector is once again
guaranteed: a single gc.collect() call will clear out all garbage.
(It didn't always before, which put us on the track of this bug.)
Will backport to 2.2.
about the test case, slot_nb_power gets called on behalf of its second
argument, but with a non-None modulus it wouldn't check this, and
believes it is called on behalf of its first argument. Fix this
properly, and get rid of the code in _PyType_Lookup() that tries to
call _PyType_Ready(). But do leave a check for a NULL tp_mro there,
because this can still legitimately occur.
I'll fix this in 2.2.x too.
While I was at it, I added a tp_clear handler and changed the
tp_dealloc handler to use the clear_slots helper for the tp_clear
handler.
Also tightened the rules for slot names: they must now be proper
identifiers (ignoring the dirty little fact that <ctype.h> is locale
sensitive).
Also set mp->flags = READONLY for the __weakref__ pseudo-slot.
Most of this is a 2.2 bugfix candidate; I'll apply it there myself.
Change the module constructor (module_init) to have the signature
__init__(name:str, doc=None); this prevents the call from type_new()
to succeed. While we're at it, prevent repeated calling of
module_init for the same module from leaking the dict, changing the
semantics so that __dict__ is only initialized if NULL.
Also adding a unittest, test_module.py.
This is an incompatibility with 2.2, if anybody was instantiating the
module class before, their argument list was probably empty; so this
can't be backported to 2.2.x.
In the past, an object's tp_compare could return any value. In 2.2
the docs were tightened to require it to return -1, 0 or 1; and -1 for
an error.
We now issue a warning if the value is not in this range. When an
exception is raised, we allow -1 or -2 as return value, since -2 will
the recommended return value for errors in the future. (Eventually
tp_compare will also be allowed to return +2, to indicate
NotImplemented; but that can only be implemented once we know all
extensions return a value in [-2...1]. Or perhaps it will require the
type to set a flag bit.)
I haven't decided yet whether to backport this to 2.2.x. The patch
applies fine. But is it fair to start warning in 2.2.2 about code
that worked flawlessly in 2.2.1?
for 'str' and 'unicode', and can be used instead of
types.StringTypes, e.g. to test whether something is "a string":
isinstance(x, string) is True for Unicode and 8-bit strings. This
is an abstract base class and cannot be instantiated directly.
A MemoryError is now raised when the list cannot be created.
There is a test, but as the comment says, it really only
works for 32 bit systems. I don't know how to improve
the test for other systems (ie, 64 bit or systems
where the data size != addressable size,
e.g. 64 bit data, but 48 bit addressable memory)
returned a proxy for __class__ whose __bases__ was also a proxy. The
merge_class_dict() helper for dir() assumed incorrectly that __bases__
would always be a tuple and used the in-line tuple API on the proxy.
I will backport this to 2.2 as well.
handlers were both set, but were not compatible. This change uses only the
tp_getattro handler with a more "modern" approach.
This fixes SF bug #551285.
don't understand how this function works, also beefed up the docs. The
most common usage error is of this form (often spread out across gotos):
if (_PyString_Resize(&s, n) < 0) {
Py_DECREF(s);
s = NULL;
goto outtahere;
}
The error is that if _PyString_Resize runs out of memory, it automatically
decrefs the input string object s (which also deallocates it, since its
refcount must be 1 upon entry), and sets s to NULL. So if the "if"
branch ever triggers, it's an error to call Py_DECREF(s): s is already
NULL! A correct way to write the above is the simpler (and intended)
if (_PyString_Resize(&s, n) < 0)
goto outtahere;
Bugfix candidate.
This implements ideas from Marc-Andre, Martin, Guido and me on Python-Dev.
"Short" Unicode strings are encoded into a "big enough" stack buffer,
then exactly as much string space as they turn out to need is allocated
at the end. This should have speed benefits akin to Martin's "measure
once, allocate once" strategy, but without needing a distinct measuring
pass.
"Long" Unicode strings allocate as much heap space as they could possibly
need (4 x # Unicode chars), and do a realloc at the end to return the
untouched excess. Since the overallocation is likely to be substantial,
this shouldn't burden the platform realloc with unusably small excess
blocks.
Also simplified uses of the PyString_xyz functions. Also added a release-
build check that 4*size doesn't overflow a C int. Sooner or later, that's
going to happen.
left and right type were of the same type and not classic instances.
This shortcut is dangerous for proxy types, because it means that
coerce(Proxy(1), Proxy(2.1)) leaves Proxy(1) unchanged rather than
turning it into Proxy(1.0).
In an ever-so-slight change of semantics, I now only take the shortcut
when the left and right types are of the same type and don't have the
CHECKTYPES feature. It so happens that classic instances have this
flag, so the shortcut is still skipped in this case (i.e. nothing
changes for classic instances). Proxies also have this flag set
(otherwise implementing numeric operations on proxies would become
nightmarish) and this means that the shortcut is also skipped there,
as desired. It so happens that int, long and float also have this
flag set; that means that e.g. coerce(1, 1) will now invoke
int_coerce(). This is fine: int_coerce() can deal with this, and I'm
not worried about the performance; int_coerce() is only invoked when
the user explicitly calls coerce(), which should be rarer than rare.
This fixes the problem that Barry reported on python-dev:
>>> 23000 .__class__ = bool
crashes in the deallocator. This was because int inherited tp_free
from object, which uses the default allocator.
2.2. Bugfix candidate.
states can be for this function, and ensure that only AttributeErrors
are masked. Any other exception raised via the equivalent of
getattr(cls, '__bases__') should be propagated up.
abstract_issubclass(): If abstract_get_bases() returns NULL, we must
call PyErr_Occurred() to see if an exception is being propagated, and
return -1 or 0 as appropriate. This is the specific fix for a problem
whereby if getattr(derived, '__bases__') raised an exception, an
"undetected error" would occur (under a debug build). This nasty
situation was uncovered when writing a security proxy extension type
for the Zope3 project, where the security proxy raised a Forbidden
exception on getattr of __bases__.
PyObject_IsInstance(), PyObject_IsSubclass(): After both calls to
abstract_get_bases(), where we're setting the TypeError if the return
value is NULL, we must first check to see if an exception occurred,
and /not/ mask an existing exception.
Neil Schemenauer should double check that these changes don't break
his ExtensionClass examples (there aren't any test cases for those
examples and abstract_get_bases() was added by him in response to
problems with ExtensionClass). Neil, please add test cases if
possible!
I belive this is a bug fix candidate for Python 2.2.2.