Fine tune the speed/space trade-off for overallocating small lists.
The Py2.3 approach overallocated small lists by up to 8 elements. The last checkin would limited this to one but slowed down (by 20 to 30%) the creation of small lists between 3 to 8 elements. This tune-up balances the two, limiting overallocation to 3 elements (significantly reducing space consumption from Py2.3) and running faster than the previous checkin. The first part of the growth pattern (0, 4, 8, 16) neatly meshes with allocators that trigger data movement only when crossing a power of two boundary. Also, then even numbers mesh well with common data alignments.
This commit is contained in:
parent
2731ae4d6d
commit
ab517d2eac
|
@ -260,8 +260,8 @@ yellow 5
|
|||
\begin{itemize}
|
||||
|
||||
\item The machinery for growing and shrinking lists was optimized
|
||||
for speed and for space efficiency. Small lists (under six elements)
|
||||
never over-allocate by more than one element. Large lists do not
|
||||
for speed and for space efficiency. Small lists (under eight elements)
|
||||
never over-allocate by more than three elements. Large lists do not
|
||||
over-allocate by more than 1/8th. Appending and popping from lists
|
||||
now runs faster due to more efficient code paths and less frequent
|
||||
use of the underlying system realloc(). List comprehensions also
|
||||
|
|
|
@ -19,8 +19,8 @@ Core and builtins
|
|||
|
||||
- Changed the internal list over-allocation scheme. For larger lists,
|
||||
overallocation ranged between 3% and 25%. Now, it is a constant 12%.
|
||||
For smaller lists (n<=5), overallocation was upto eight bytes. Now,
|
||||
the overallocation is no more than one byte -- this improves space
|
||||
For smaller lists (n<8), overallocation was upto eight elements. Now,
|
||||
the overallocation is no more than three elements -- this improves space
|
||||
utilization for applications that have large numbers of small lists.
|
||||
|
||||
- Support for arbitrary objects supporting the read-only buffer
|
||||
|
|
|
@ -30,13 +30,10 @@ list_resize(PyListObject *self, int newsize)
|
|||
* for additional growth. The over-allocation is mild, but is
|
||||
* enough to give linear-time amortized behavior over a long
|
||||
* sequence of appends() in the presence of a poorly-performing
|
||||
* system realloc() (which is a reality, e.g., across all flavors
|
||||
* of Windows, with Win9x behavior being particularly bad -- and
|
||||
* we've still got address space fragmentation problems on Win9x
|
||||
* even with this scheme, although it requires much longer lists to
|
||||
* provoke them than it used to).
|
||||
* system realloc().
|
||||
* The growth pattern is: 0, 4, 8, 16, 25, 35, 46, 58, 72, 88, ...
|
||||
*/
|
||||
_new_size = (newsize >> 3) + (self->ob_size < 3 ? 1 : 6) + newsize;
|
||||
_new_size = (newsize >> 3) + (self->ob_size < 8 ? 3 : 6) + newsize;
|
||||
items = self->ob_item;
|
||||
if (_new_size <= ((~(size_t)0) / sizeof(PyObject *)))
|
||||
PyMem_RESIZE(items, PyObject *, _new_size);
|
||||
|
@ -316,8 +313,6 @@ list_length(PyListObject *a)
|
|||
return a->ob_size;
|
||||
}
|
||||
|
||||
|
||||
|
||||
static int
|
||||
list_contains(PyListObject *a, PyObject *el)
|
||||
{
|
||||
|
|
Loading…
Reference in New Issue