* Split into three separate types that share everything except the
code for iternext. Saves run time decision making and allows
each iternext function to be specialized.
* Inlined PyDict_Next(). In addition to saving a function call, this
allows a redundant test to be eliminated and further specialization
of the code for the unique needs of each iterator type.
* Created a reusable result tuple for iteritems(). Saves the malloc
time for tuples when the previous result was not kept by client code
(this is the typical use case for iteritems). If the client code
does keep the reference, then a new tuple is created.
Results in a 20% to 30% speedup depending on the size and sparsity
of the dictionary.
* Factored constant structure references out of the inner loops for
PyDict_Next(), dict_keys(), dict_values(), and dict_items().
Gave measurable speedups to each (the improvement varies depending
on the sparseness of the dictionary being measured).
* Added a freelist scheme styled after that for tuples. Saves around
80% of the calls to malloc and free. About 10% of the time, the
previous dictionary was completely empty; in those cases, the
dictionary initialization with memset() can be skipped.
scheme in situations that likely won't benefit from it. This further
improves memory utilization from Py2.3 which always over-allocates
except for PyList_New().
Situations expected to benefit from over-allocation:
list.insert(), list.pop(), list.append(), and list.extend()
Situations deemed unlikely to benefit:
list_inplace_repeat, list_ass_slice, list_ass_subscript
The most gray area was for listextend_internal() which only runs
when the argument is a list or a tuple. This could be viewed as
a one-time fixed length addition or it could be viewed as wrapping
a series of appends. I left its over-allocation turned on but
could be convinced otherwise.
worth it to in-line the call to PyIter_Next().
Saves another 15% on most list operations that acceptable a general
iterable argument (such as the list constructor).
avoids creating an intermediate tuple for iterable arguments other than
lists or tuples.
In other words, a+=b no longer requires extra memory when b is not a
list or tuple. The list and tuple cases are unchanged.
for xrange and list objects).
* list.__reversed__ now checks the length of the sequence object before
calling PyList_GET_ITEM() because the mutable could have changed length.
* all three implementations are now tranparent with respect to length and
maintain the invariant len(it) == len(list(it)) even when the underlying
sequence mutates.
* __builtin__.reversed() now frees the underlying sequence as soon
as the iterator is exhausted.
* the code paths were rearranged so that the most common paths
do not require a jump.
* Replace sprintf message with a constant message string -- this error
message ran on every invocation except straight deletions but it was
only needed when the rhs was not iterable. The message was also
out-of-date and did not reflect that iterable arguments were allowed.
* For inner loops that do not make ref count adjustments, use memmove()
for fast copying and better readability.
* For inner loops that do make ref count adjustments, speed them up by
factoring out the constant structure reference and using vitem[] instead.
* Using addition instead of substraction on array indices allows the
compiler to use a fast addressing mode. Saves about 10%.
* Using PyTuple_GET_ITEM and PyList_SET_ITEM is about 7% faster than
PySequenceFast_GET_ITEM which has to make a list check on every pass.
(Championed by Bob Ippolito.)
The update() method for mappings now accepts all the same argument forms
as the dict() constructor. This includes item lists and/or keyword
arguments.
recent gcc on Linux/x86)
[ 899109 ] 1==float('nan')
by implementing rich comparisons for floats.
Seems to make comparisons involving NaNs somewhat less surprising
when the underlying C compiler actually implements C99 semantics.
utilization, and speed:
* Moved the responsibility for emptying the previous list from list_fill
to list_init.
* Replaced the code in list_extend with the superior code from list_fill.
* Eliminated list_fill.
Results:
* list.extend() no longer creates an intermediate tuple except to handle
the special case of x.extend(x). The saves memory and time.
* list.extend(x) runs
5 to 10% faster when x is a list or tuple
15% faster when x is an iterable not defining __len__
twice as fast when x is an iterable defining __len__
* the code is about 15 lines shorter and no longer duplicates
functionality.
The Py2.3 approach overallocated small lists by up to 8 elements.
The last checkin would limited this to one but slowed down (by 20 to 30%)
the creation of small lists between 3 to 8 elements.
This tune-up balances the two, limiting overallocation to 3 elements
(significantly reducing space consumption from Py2.3) and running faster
than the previous checkin.
The first part of the growth pattern (0, 4, 8, 16) neatly meshes with
allocators that trigger data movement only when crossing a power of two
boundary. Also, then even numbers mesh well with common data alignments.
realloc(). This is achieved by tracking the overallocation size in a new
field and using that information to skip calls to realloc() whenever
possible.
* Simplified and tightened the amount of overallocation. For larger lists,
this overallocates by 1/8th (compared to the previous scheme which ranged
between 1/4th to 1/32nd over-allocation). For smaller lists (n<6), the
maximum overallocation is one byte (formerly it could be upto eight bytes).
This saves memory in applications with large numbers of small lists.
* Eliminated the NRESIZE macro in favor of a new, static list_resize function
that encapsulates the resizing logic. Coverting this back to macro would
give a small (under 1%) speed-up. This was too small to warrant the loss
of readability, maintainability, and de-coupling.
* Some functions using NRESIZE had grown unnecessarily complex in their
efforts to bend to the macro's calling pattern. With the new list_resize
function in place, those other functions could be simplified. That is
being saved for a separate patch.
* The ob_item==NULL check could be eliminated from the new list_resize
function. This would entail finding each piece of code that sets ob_item
to NULL and adding a new line to invalidate the overallocation tracking
field. Rather than impose a new requirement on other pieces of list code,
it was preferred to leave the NULL check in place and retain the benefits
of decoupling, maintainability and information hiding (only PyList_New()
and list_sort() need to know about the new field). This approach also
reduces the odds of breaking an extension module.
(Collaborative effort by Raymond Hettinger, Hye-Shik Chang, Tim Peters,
and Armin Rigo.)
the same object to be collected by the cyclic GC support if they are
only referenced by a cycle. If the weakref being collected was one of
the weakrefs without callbacks, some local variables for the
constructor became invalid and have to be re-computed.
The test caused a segfault under a debug build without the fix applied.
Formerly, length data fetched from sequence objects.
Now, any object that reports its length can benefit from pre-sizing.
On one sample timing, it gave a threefold speedup for list(s) where s
was a set object.
The special-case code that was removed could return a value indicating
success but leave an exception set. test_fileinput failed in a debug
build as a result.
which can be reviewed via
http://coding.derkeiler.com/Archive/Python/comp.lang.python/2003-12/1011.html
Duncan Booth investigated, and discovered that an "optimisation" was
in fact a pessimisation for small numbers of elements in a source list,
compared to not having the optimisation, although with large numbers
of elements in the source list the optimisation was quite beneficial.
He posted his change to comp.lang.python (but not to SF).
Further research has confirmed his assessment that the optimisation only
becomes a net win when the source list has more than 100 elements.
I also found that the optimisation could apply to tuples as well,
but the gains only arrive with source tuples larger than about 320
elements and are nowhere near as significant as the gains with lists,
(~95% gain @ 10000 elements for lists, ~20% gain @ 10000 elements for
tuples) so I haven't proceeded with this.
The code as it was applied the optimisation to list subclasses as
well, and this also appears to be a net loss for all reasonable sized
sources (~80-100% for up to 100 elements, ~20% for more than 500
elements; I tested up to 10000 elements).
Duncan also suggested special casing empty lists, which I've extended
to all empty sequences.
On the basis that list_fill() is only ever called with a list for the
result argument, testing for the source being the destination has
now happens before testing source types.
bit by checking the value of UCHAR_MAX in Include/Python.h. There was a
check in Objects/stringobject.c. Remove that. (Note that we don't define
UCHAR_MAX if it's not defined as the old test did.)
and left shifts. (Thanks to Kalle Svensson for SF patch 849227.)
This addresses most of the remaining semantic changes promised by
PEP 237, except for repr() of a long, which still shows the trailing
'L'. The PEP appears to promise warnings for operations that
changed semantics compared to Python 2.3, but this is not
implemented; we've suffered through enough warnings related to
hex/oct literals and I think it's best to be silent now.
* Add more tests
* Refactor and neaten the code a bit.
* Rename union_update() to update().
* Improve the algorithms (making them a closer to sets.py).
function.
* Add a better test for deepcopying.
* Add tests to show the __init__() function works like it does for list
and tuple. Add related test.
* Have shallow copies of frozensets return self. Add related test.
* Have frozenset(f) return f if f is already a frozenset. Add related test.
* Beefed-up some existing tests.
by the function object or by the method object, the function
object's attribute usually wins. Christian Tismer pointed out that
that this is really a mistake, because this only happens for special
methods (like __reduce__) where the method object's version is
really more appropriate than the function's attribute. So from now
on, all method attributes will have precedence over function
attributes with the same name.
* Improve the hash function to increase the chance that distinct sets will
have distinct xor'd hash totals.
* Use PyDict_Merge where possible (it is faster than an equivalent iter/set
pair).
* Don't rebuild dictionaries where the input already has one.
Also SF patch 843455.
This is a critical bugfix.
I'll backport to 2.3 maint, but not beyond that. The bugs this fixes
have been there since weakrefs were introduced.
* Install the unittests, docs, newsitem, include file, and makefile update.
* Exercise the new functions whereever sets.py was being used.
Includes the docs for libfuncs.tex. Separate docs for the types are
forthcoming.
subtype_dealloc(): This left the dying object exposed to gc, so that
if cyclic gc triggered during the weakref callback, gc tried to delete
the dying object a second time. That's a disaster. subtype_dealloc()
had a (I hope!) unique problem here, as every normal dealloc routine
untracks the object (from gc) before fiddling with weakrefs etc. But
subtype_dealloc has obscure technical reasons for re-registering the
dying object with gc (already explained in a large comment block at
the bottom of the function).
The fix amounts to simply refraining from reregistering the dying object
with gc until after the weakref callback (if any) has been called.
This is a critical bug (hard to predict, and causes seemingly random
memory corruption when it occurs). I'll backport it to 2.3 later.
charmaptranslate_makespace() allocated more memory than required for the
next replacement but didn't remember that fact, so memory size was growing
exponentially every time a replacement string is longer that one character.
This fixes SF bug #828737.
key provides C support for the decorate-sort-undecorate pattern.
reverse provide a stable sort of the list with the comparisions reversed.
* Amended the docs to guarantee sort stability.
If a length-1 Unicode string was in the freelist and it was
uninitialized or pointed to a very large (magnitude) negative number,
the check
unicode_latin1[unicode->str[0]] == unicode
could cause a segmentation violation, e.g. unicode->str[0] is 0xcbcbcbcb.
Fix this in two ways:
1. Change guard befor unicode_latin1[] to test against 256U. If I
understand correctly, the unsigned long used to store UCS4 on my
box was getting converted to a signed long to compare with the
signed constant 256.
2. Change _PyUnicode_New() to make sure the first element of str is
always initialized to zero. There are several places in the code
where the caller can exit with an error before initializing any
of str, which would leave junk in str[0].
Also, silence a compiler warning on pointer vs. int arithmetic.
Bug fix candidate.
The unicode_resize() family only returns -1 or 0 so simply checking
for != 0 is sufficient, but somewhat unclear. Many Python API
functions return < 0 on error, reserving the right to return 0 or 1 on
success. Change the call sites for consistency with these calls.
file_truncate(): C doesn't define what fflush(fp) does if fp is open
for update, and the preceding I/O operation on fp was input. On Windows,
fflush() actually changes the current file position then. Because
Windows doesn't support ftruncate() directly, this not only caused
Python's file.truncate() to change the file position (contra our docs),
it also caused the file not to change size.
Repaired by getting the initial file position at the start, restoring
it at the end, and tossing all the complicated micro-efficiency checks
trying to avoid "provably unnecessary" seeks. file.truncate() can't
be a frequent operation, and seeking to the current file position has
got to be cheap anyway.
Bugfix candidate.
[ 784825 ] fix obscure crash in descriptor handling
Should be applied to release23-maint and in all likelyhood
release22-maint, too.
Certainly doesn't apply to release21-maint.
number. This accounts for the 2 refcount leaks per test_complex run
Michael Hudson discovered (I figured only I would have the stomach to
look for leaks in floating-point code <wink>).
when an encoding error occurs and the callback name is unknown,
i.e. when the callback has to be called. The problem was that
the fact that the callback has already been looked up was only
recorded in a local variable in charmap_encoding_error(), because
charmap_encoding_error() got it's own copy of the errorHandler
pointer instead of a pointer to the pointer in
PyUnicode_EncodeCharmap().
Now test_descr only appears to leak two references & I think this
are in fact illusory (it's to do with things getting resurrected in
__del__ methods & it's easy to be believe confusion occurs when that
happens <wink>). Woohoo!
Sure looks like it to me! <wink>
When I run the leak2.py script I posted to python-dev, I only see
three reference leaks in all of test_descr. When I run
test_descr.test_main, I still see 46 leaks. This clearly demands
posting a yelp to python-dev :-)
This certainly should be applied to release23-maint, and in all
likelyhood release22-maint as well.
The !PyType_Check(base) check snuck in as part of rev 2.215, but was
unrelated to the SF patch that is mentioned in the checkin comment.
The test is currently unnecessary because base is set to the return
value of best_bases(), which returns a type or NULL.
float_pow(): Don't let the platform pow() raise -1.0 to an integer power
anymore; at least glibc gets it wrong in some cases. Note that
math.pow() will continue to deliver wrong (but platform-native) results
in such cases.
tp_free is NULL or PyObject_Del at the end. Because it's a base type
it must call tp_free in its dealloc function, and because it's gc'able
it must not call PyObject_Del.
inherit_slots(): Don't inherit tp_free unless the type and its base
agree about whether they're gc'able. If the type is gc'able and the
base is not, and the base uses the default PyObject_Del for its
tp_free, give the type PyObject_GC_Del for its tp_free (the appropriate
default for a gc'able type).
cPickle.c: The Pickler and Unpickler types claim to be base classes
and gc'able, but their dealloc functions didn't call tp_free.
Repaired that. Also call PyType_Ready() on these typeobjects, so
that the correct (PyObject_GC_Del) default memory-freeing function
gets plugged into these types' tp_free slots.
Reverted a Py2.3b1 change to iterator in subclasses of list and tuple.
They had been changed to use __getitem__ whenever it had been overriden
in the subclass.
This caused some usabilty and performance problems. Also, it was
inconsistent with the rest of python where many container methods
access the underlying object directly without first checking for
an overridden getter. Users needing a change in iterator behavior
should override it directly.
* Increase dictionary growth rate resulting in more sparse dictionaries,
fewer lookup collisions, increased memory use, and better cache
performance. For dicts with over 50k entries, keep the current
growth rate in case an application is suffering from tight memory
constraints.
* Set the most common case (no resize) to fall-through the test.
Some version of gcc in the "RTEMS port running on the Coldfire (m5200)
processor" generates bad code for a loop in long_from_binary_base(),
comparing the wrong half of an int to a short. The patch changes the
decl of the short temp to be an int temp instead. This "simplifies"
the code enough that gcc no longer blows it.
As a side issue on this bug, it was noted that list and tuple iterators
used macros to directly access containers and would not recognize
__getitem__ overrides. If the method is overridden, the patch returns
a generic sequence iterator which calls the __getitem__ method; otherwise,
it returns a high custom iterator with direct access to container elements.
raising an exception. This is consistent with calling the
constructors for the other builtin types -- called without argument
they all return the false value of that type. (SF patch #724135)
Thanks to Alex Martelli.
I'm finding some pretty baffling output, like reprs consisting entirely
of three left parens. At least this will let us know what type the object
is (it's not str -- there's no quote character in the repr).
New tool combinerefs.py, to combine the two output blocks produced via
PYTHONDUMPREFS.
new line.
New pvt API function _Py_PrintReferenceAddresses(): Prints only the
addresses and refcnts of the live objects. This is always safe to call,
because it has no dependence on Python's C API.
Py_Finalize(): If envar PYTHONDUMPREFS is set, call (the new)
_Py_PrintReferenceAddresses() right before dumping final pymalloc stats.
We can't print the reprs of the objects here because too much of the
interpreter has been shut down. You need to correlate the addresses
displayed here with the object reprs printed by the earlier
PYTHONDUMPREFS call to _Py_PrintReferences().
New functions:
unsigned long PyInt_AsUnsignedLongMask(PyObject *);
unsigned PY_LONG_LONG) PyInt_AsUnsignedLongLongMask(PyObject *);
unsigned long PyLong_AsUnsignedLongMask(PyObject *);
unsigned PY_LONG_LONG) PyLong_AsUnsignedLongLongMask(PyObject *);
New and changed format codes:
b unsigned char 0..UCHAR_MAX
B unsigned char none **
h unsigned short 0..USHRT_MAX
H unsigned short none **
i int INT_MIN..INT_MAX
I * unsigned int 0..UINT_MAX
l long LONG_MIN..LONG_MAX
k * unsigned long none
L long long LLONG_MIN..LLONG_MAX
K * unsigned long long none
Notes:
* New format codes.
** Changed from previous "range-and-a-half" to "none"; the
range-and-a-half checking wasn't particularly useful.
New test test_getargs2.py, to verify all this.
even farther down, to just before the call to
_PyObject_DebugMallocStats(). This required the following changes:
- pystate.c, PyThreadState_GetDict(): changed not to raise an
exception or issue a fatal error when no current thread state is
available, but simply return NULL without raising an exception
(ever).
- object.c, Py_ReprEnter(): when PyThreadState_GetDict() returns NULL,
don't raise an exception but return 0. This means that when
printing a container that's recursive, printing will go on and on
and on. But that shouldn't happen in the case we care about (see
first bullet).
- Updated Misc/NEWS and Doc/api/init.tex to reflect changes to
PyThreadState_GetDict() definition.
interpreted by slicing, so negative values count from the end of the
list. This was the only place where such an interpretation was not
placed on a list index.
* Doc - add doc for when functions were added
* UserString
* string object methods
* string module functions
'chars' is used for the last parameter everywhere.
These changes will be backported, since part of the changes
have already been made, but they were inconsistent.
If a class was defined inside a function, used a static or class
method, and used super() inside the method body, it would be caught in
an uncollectable cycle. (Simplified version: The static/class method
object would point to a function object with a closure that referred
to the class.)
Bugfix candidate.
Arranged that all the objects exposed by __builtin__ appear in the list
of all objects. I basically peed away two days tracking down a mystery
leak in sys.gettotalrefcount() in a ZODB app (== tons of code), because
the object leaking the references didn't appear in the sys.getobjects(0)
list. The object happened to be False. Now False is in the list, along
with other popular & previously missing leak candidates (like None).
Alas, we still don't have a choke point covering *all* Python objects,
so the list of all objects may still be incomplete.
_Py_AddToAllObjects() that simply inserts an object at the front of
the doubly-linked list of all objects. Changed PyType_Ready() (the
closest thing we've got to a choke point for type objects) to call
that.
a doubly-linked list, exposed by sys.getobjects(). Unfortunately, it's not
really all live objects, and it seems my fate to bump into programs where
sys.gettotalrefcount() keeps going up but where the reference leaks aren't
accounted for by anything in the list of all objects.
This patch helps a little: if COUNT_ALLOCS is also defined, from now on
type objects will also appear in this list, provided at least one object
of a type has been allocated.
constructor, when passed a single complex argument, returns the
argument unchanged. This should be done only for the complex base
class; a complex subclass should of course cast the value to the
subclass in this case.
The fix also revealed a segfault in complex_getnewargs(): the argument
for the Py_BuildValue() format code "D" is the *address* of a
Py_complex struct, not the value. (This corroborated by the API
documentation.)
I expect this needs to be backported to 2.2.3.
This still falls back to helpers in copy_reg for:
- pickle protocols < 2
- calculating the list of slot names (done only once per class)
- the __newobj__ function (which is used as a token but never called)
the PyInt_AsLong function, and this returns a long, the value is first
retrieved with PyLong_AsLong, but afterwards overwritten by a call to
PyInt_AS_LONG.
Fixes SF #690253.