cpython/Objects
Andrew Dalke 525eab3712 Changes to string.split/rsplit on whitespace to preallocate space in the
results list.

Originally it allocated 0 items and used the list growth during append.  Now
it preallocates 12 items so the first few appends don't need list reallocs.

("Here are some words ."*2).split(None, 1) is 7% faster
("Here are some words ."*2).split() is is 15% faster

  (Your milage may vary, see dealership for details.)

File parsing like this

    for line in f:
        count += len(line.split())

is also about 15% faster.  There is a slowdown of about 3% for large
strings because of the additional overhead of checking if the append is
to a preallocated region of the list or not.  This will be the rare case.
It could be improved with special case code but we decided it was not
useful enough.

There is a cost of 12*sizeof(PyObject *) bytes per list.  For the normal
case of file parsing this is not a problem because of the lists have
a short lifetime.  We have not come up with cases where this is a problem
in real life.

I chose 12 because human text averages about 11 words per line in books,
one of my data sets averages 6.2 words with a final peak at 11 words per
line, and I work with a tab delimited data set with 8 tabs per line (or
9 words per line).  12 encompasses all of these.

Also changed the last rstrip code to append then reverse, rather than
doing insert(0).  The strip() and rstrip() times are now comparable.
2006-05-26 14:00:45 +00:00
..
abstract.c C++ compilation cleanup: Migrate declaration of 2006-04-18 00:27:46 +00:00
boolobject.c Remove unnecessary casts in type object initializers. 2006-03-30 11:57:00 +00:00
bufferobject.c More C++-compliance. Note especially listobject.c - to get C++ to accept the 2006-04-11 06:54:30 +00:00
cellobject.c Use Py_VISIT in all tp_traverse methods, instead of traversing manually or 2006-04-15 21:47:09 +00:00
classobject.c Replace PyObject_CallFunction calls with only object args 2006-05-25 19:15:31 +00:00
cobject.c Remove unnecessary casts in type object initializers. 2006-03-30 11:57:00 +00:00
codeobject.c Merge from rjones-funccall branch. 2006-05-23 10:37:38 +00:00
complexobject.c C++ compiler cleanup: bunch-o-casts, plus use of unsigned loop index var in a couple places 2006-04-18 00:35:43 +00:00
descrobject.c Use Py_VISIT in all tp_traverse methods, instead of traversing manually or 2006-04-15 21:47:09 +00:00
dictnotes.txt Fix typos and add some elaborations 2004-03-15 15:52:22 +00:00
dictobject.c Use Py_VISIT in all tp_traverse methods, instead of traversing manually or 2006-04-15 21:47:09 +00:00
enumobject.c Use Py_VISIT in all tp_traverse methods, instead of traversing manually or 2006-04-15 21:47:09 +00:00
fileobject.c Bug #1462152: file() now checks more thoroughly for invalid mode 2006-05-18 07:01:27 +00:00
floatobject.c Added a new macro, Py_IS_FINITE(X). On windows there is an intrinsic for this and it is more efficient than to use !Py_IS_INFINITE(X) && !Py_IS_NAN(X). No change on other platforms 2006-05-25 15:53:30 +00:00
frameobject.c fix broken merge 2006-05-23 18:32:11 +00:00
funcobject.c Use Py_VISIT in all tp_traverse methods, instead of traversing manually or 2006-04-15 21:47:09 +00:00
genobject.c gen_del(): Looks like much this was copy/pasted from 2006-04-15 22:59:10 +00:00
intobject.c C++ compiler cleanup: bunch-o-casts, plus use of unsigned loop index var in a couple places 2006-04-18 00:35:43 +00:00
iterobject.c Use Py_VISIT in all tp_traverse methods, instead of traversing manually or 2006-04-15 21:47:09 +00:00
listobject.c Remove now-unused variables from tp_traverse and tp_clear methods. 2006-04-15 22:51:26 +00:00
listsort.txt The key to the various sort columns got lost. Pulled from 2005-09-23 17:14:22 +00:00
longobject.c Patch #1494387: SVN longobject.c compiler warnings 2006-05-25 22:28:46 +00:00
methodobject.c Use Py_VISIT in all tp_traverse methods, instead of traversing manually or 2006-04-15 21:47:09 +00:00
moduleobject.c Use Py_VISIT in all tp_traverse methods, instead of traversing manually or 2006-04-15 21:47:09 +00:00
object.c Comment typo fix 2006-04-18 11:49:53 +00:00
obmalloc.c Get compiling again 2006-04-11 07:58:54 +00:00
rangeobject.c Remove "static forward" declaration. Move constructors 2006-04-11 09:04:12 +00:00
setobject.c Clear dummy and emptyfrozenset, so that we don't have 2006-04-15 12:47:23 +00:00
sliceobject.c Allow long integers in PySlice_GetIndices. 2006-04-03 11:38:08 +00:00
stringobject.c Changes to string.split/rsplit on whitespace to preallocate space in the 2006-05-26 14:00:45 +00:00
structseq.c Unlink the structseq type from the global list of 2006-04-15 12:45:05 +00:00
tupleobject.c Use Py_VISIT in all tp_traverse methods, instead of traversing manually or 2006-04-15 21:47:09 +00:00
typeobject.c Replace PyObject_CallFunction calls with only object args 2006-05-25 19:15:31 +00:00
unicodectype.c Enhance the performance of two important Unicode character 2005-10-20 19:06:35 +00:00
unicodeobject.c use Py_LOCAL also for string and unicode objects 2006-05-26 11:38:15 +00:00
unicodetype_db.h Update Unicode database to Unicode 4.1. 2006-03-09 23:38:20 +00:00
weakrefobject.c Replace PyObject_CallFunction calls with only object args 2006-05-25 19:15:31 +00:00