Commit Graph

286 Commits

Author SHA1 Message Date
Andrew Dalke 525eab3712 Changes to string.split/rsplit on whitespace to preallocate space in the
results list.

Originally it allocated 0 items and used the list growth during append.  Now
it preallocates 12 items so the first few appends don't need list reallocs.

("Here are some words ."*2).split(None, 1) is 7% faster
("Here are some words ."*2).split() is is 15% faster

  (Your milage may vary, see dealership for details.)

File parsing like this

    for line in f:
        count += len(line.split())

is also about 15% faster.  There is a slowdown of about 3% for large
strings because of the additional overhead of checking if the append is
to a preallocated region of the list or not.  This will be the rare case.
It could be improved with special case code but we decided it was not
useful enough.

There is a cost of 12*sizeof(PyObject *) bytes per list.  For the normal
case of file parsing this is not a problem because of the lists have
a short lifetime.  We have not come up with cases where this is a problem
in real life.

I chose 12 because human text averages about 11 words per line in books,
one of my data sets averages 6.2 words with a final peak at 11 words per
line, and I work with a tab delimited data set with 8 tabs per line (or
9 words per line).  12 encompasses all of these.

Also changed the last rstrip code to append then reverse, rather than
doing insert(0).  The strip() and rstrip() times are now comparable.
2006-05-26 14:00:45 +00:00
Fredrik Lundh 95e2a91615 use Py_LOCAL also for string and unicode objects 2006-05-26 11:38:15 +00:00
Fredrik Lundh f2c0dfdb13 needforspeed: use Py_ssize_t for the fastsearch counter and skip
length (thanks, neal!).  and yes, I've verified that this doesn't
slow things down ;-)
2006-05-26 10:27:17 +00:00
Fredrik Lundh 450277fef5 needforspeed: use METH_O for argument handling, which made partition some
~15% faster for the current tests (which is noticable faster than a corre-
sponding find call).  thanks to neal-who-never-sleeps for the tip.
2006-05-26 09:46:59 +00:00
Fredrik Lundh 06a69dd8ff needforspeed: partition implementation, part two.
feel free to improve the documentation and the docstrings.
2006-05-26 08:54:28 +00:00
Fredrik Lundh fe5bb7e6d9 needforspeed: partition for 8-bit strings. for some simple tests,
this is on par with a corresponding find, and nearly twice as fast
as split(sep, 1)

full tests, a unicode version, and documentation will follow to-
morrow.
2006-05-25 23:27:53 +00:00
Bob Ippolito 955b64c031 squelch gcc4 darwin/x86 compiler warnings 2006-05-25 20:52:38 +00:00
Fredrik Lundh 554da412a8 needforspeed: use insert+reverse instead of append 2006-05-25 19:19:05 +00:00
Jack Diederich 60cbb3fe49 * eliminate warning by reverting tmp_s type to 'const char*' 2006-05-25 18:47:15 +00:00
Fredrik Lundh c3434b3834 needforspeed: use fastsearch also for find/index and contains. the
related tests are now about 10x faster.
2006-05-25 18:44:29 +00:00
Andrew Dalke 598710c727 Added overflow test for adding two (very) large strings where the
new string is over max Py_ssize_t.  I have no way to test it on my
box or any box I have access to.  At least it doesn't break anything.
2006-05-25 18:18:39 +00:00
Andrew M. Kuchling f344c94c85 Comment typo 2006-05-25 18:11:16 +00:00
Fredrik Lundh af72237abc needforspeed: use "fastsearch" for count. this results in a 3x speedup
for the related stringbench tests.
2006-05-25 17:55:31 +00:00
Andrew Dalke 8c9091074b Fixed problem identified by Georg. The special-case in-place code for replace
made a copy of the string using PyString_FromStringAndSize(s, n) and modify
the copied string in-place.  However, 1 (and 0) character strings are shared
from a cache.  This cause "A".replace("A", "a") to change the cached version
of "A" -- used by everyone.

Now may the copy with NULL as the string and do the memcpy manually.  I've
added regression tests to check if this happens in the future.  Perhaps
there should be a PyString_Copy for this case?
2006-05-25 17:53:00 +00:00
Fredrik Lundh e68955cf32 needforspeed: new replace implementation by Andrew Dalke. replace is
now about 3x faster on my machine, for the replace tests from string-
bench.
2006-05-25 17:08:14 +00:00
Fredrik Lundh 0c71f88fc9 needforspeed: check for overflow in replace (from Andrew Dalke) 2006-05-25 16:46:54 +00:00
Fredrik Lundh dfe503d3f0 needforspeed: _toupper/_tolower is a SUSv2 thing; fall back on ISO C
versions if they're not defined.
2006-05-25 16:10:12 +00:00
Fredrik Lundh 4b4e33ef14 needforspeed: make new upper/lower work properly for single-character
strings too... (thanks to georg brandl for spotting the exact problem
faster than anyone else)
2006-05-25 15:49:45 +00:00
Fredrik Lundh 39ccef607e needforspeed: speed up upper and lower for 8-bit string objects.
(the unicode versions of these are still 2x faster on windows,
though...)

based on work by Andrew Dalke, with tweaks by yours truly.
2006-05-25 15:22:03 +00:00
Fredrik Lundh 763b50f9d9 docstring tweaks: count counts non-overlapping substrings, not
total number of occurences
2006-05-22 15:35:12 +00:00
Tim Peters 8931ff1f67 Teach PyString_FromFormat, PyErr_Format, and PyString_FromFormatV
about "%u", "%lu" and "%zu" formats.

Since PyString_FromFormat and PyErr_Format have exactly the same rules
(both inherited from PyString_FromFormatV), it would be good if someone
with more LaTeX Fu changed one of them to just point to the other.
Their docs were way out of synch before this patch, and I just did a
mass copy+paste to repair that.

Not a backport candidate (this is a new feature).
2006-05-13 23:28:20 +00:00
Martin v. Löwis 822f34a848 Revert 43315: Printing of %zd must be signed. 2006-05-13 13:34:04 +00:00
Thomas Wouters 568f1d0eed Py_ssize_t issue; repr()'ing a very large string would result in a teensy
string, because of a cast to int.
2006-04-21 13:54:43 +00:00
Thomas Wouters dc5f808cbc Make s.replace() work with explicit counts exceeding 2Gb. 2006-04-19 15:38:01 +00:00
Thomas Wouters 4abb3660ca Use Py_ssize_t to hold the 'width' argument to the ljust, rjust, center and
zfill stringmethods, so they can create strings larger than 2Gb on 64bit
systems (even win64.) The unicode versions of these methods already did this
right.
2006-04-19 14:50:15 +00:00
Skip Montanaro 429433b30b C++ compiler cleanup: bunch-o-casts, plus use of unsigned loop index var in a couple places 2006-04-18 00:35:43 +00:00
Neal Norwitz 0e2cbabb8d No need to cast a Py_ssize_t, use %z in PyErr_Format 2006-04-17 05:56:32 +00:00
Martin v. Löwis 5cb6936672 Make Py_BuildValue, PyObject_CallFunction and
PyObject_CallMethod aware of PY_SSIZE_T_CLEAN.
2006-04-14 09:08:42 +00:00
Martin v. Löwis 83687c98dc Change more occurrences of maxsplit to Py_ssize_t. 2006-04-13 08:52:56 +00:00
Martin v. Löwis 9c83076b7b Change maxsplit types to Py_ssize_t. 2006-04-13 08:37:17 +00:00
Martin v. Löwis 8ce358f5fe Replace most INT_MAX with PY_SSIZE_T_MAX. 2006-04-13 07:22:51 +00:00
Anthony Baxter a62862120d More low-hanging fruit. Still need to re-arrange some code (or find a better
solution) in the same way as listobject.c got changed. Hoping for a better
solution.
2006-04-11 07:42:36 +00:00
Neal Norwitz 7e957d38b7 Remove dead code (reported by HP compiler).
Can probably be backported if anyone cares.
2006-04-06 08:17:41 +00:00
Georg Brandl 347b30042b Remove unnecessary casts in type object initializers. 2006-03-30 11:57:00 +00:00
Neal Norwitz 7fbd6916b6 Get rid of warnings on some platforms by using %u for a size_t. 2006-03-25 23:55:39 +00:00
Neal Norwitz 2aa9a5dfdd Use macro versions instead of function versions when we already know the type.
This will hopefully get rid of some Coverity warnings, be a hint to
developers, and be marginally faster.

Some asserts were added when the type is currently known, but depends
on values from another function.
2006-03-20 01:53:23 +00:00
Tim Peters ae1d0c978d Introduced symbol PY_FORMAT_SIZE_T. See the new comments
in pyport.h.  Changed PyString_FromFormatV() to use it
instead of inlining its own maze of #if'ery.
2006-03-17 03:29:34 +00:00
Guido van Rossum 38fff8c4e4 Checking in the code for PEP 357.
This was mostly written by Travis Oliphant.
I've inspected it all; Neal Norwitz and MvL have also looked at it
(in an earlier incarnation).
2006-03-07 18:50:55 +00:00
Hye-Shik Chang 4af5c8cee4 SF #1444030: Fix several potential defects found by Coverity.
(reviewed by Neal Norwitz)
2006-03-07 15:39:21 +00:00
Martin v. Löwis 725507b52e Change int to Py_ssize_t in several places.
Add (int) casts to silence compiler warnings.
Raise Python exceptions for overflows.
2006-03-07 12:08:51 +00:00
Martin v. Löwis 15e62742fa Revert backwards-incompatible const changes. 2006-02-27 16:46:16 +00:00
Thomas Wouters 977485d888 Use Py_ssize_t in helper function between Py_ssize_t-using functions. 2006-02-16 15:59:12 +00:00
Martin v. Löwis eb079f1c25 Use Py_ssize_t for counts and sizes.
Convert Py_ssize_t using PyInt_FromSsize_t
2006-02-16 14:32:27 +00:00
Martin v. Löwis 2c95cc6d72 Support %zd in PyErr_Format and PyString_FromFormat. 2006-02-16 06:54:25 +00:00
Martin v. Löwis 18e165558b Merge ssize_t branch. 2006-02-15 17:27:45 +00:00
Jeremy Hylton af68c874a6 Add const to several API functions that take char *.
In C++, it's an error to pass a string literal to a char* function
without a const_cast().  Rather than require every C++ extension
module to put a cast around string literals, fix the API to state the
const-ness.

I focused on parts of the API where people usually pass literals:
PyArg_ParseTuple() and friends, Py_BuildValue(), PyMethodDef, the type
slots, etc.  Predictably, there were a large set of functions that
needed to be fixed as a result of these changes.  The most pervasive
change was to make the keyword args list passed to
PyArg_ParseTupleAndKewords() to be a const char *kwlist[].

One cast was required as a result of the changes:  A type object
mallocs the memory for its tp_doc slot and later frees it.
PyTypeObject says that tp_doc is const char *; but if the type was
created by type_new(), we know it is safe to cast to char *.
2005-12-10 18:50:16 +00:00
Michael W. Hudson b2308bb9be Fix bug:
[ 1327110 ] wrong TypeError traceback in generator expressions

by removing the code that can stomp on the users' TypeError raised by the
iterable argument to ''.join() -- PySequence_Fast (now?) gives a perfectly
reasonable message itself.  Also, a couple of tests.
2005-10-21 11:45:01 +00:00
Neal Norwitz 95c1e5065c SF bug #1331563 ] string_subscript doesn't check for failed PyMem_Malloc. Will backport 2005-10-20 04:15:52 +00:00
Georg Brandl d45014b236 Fix PyString_Format so that the "%s" format works again when Unicode is not
enabled.
2005-10-01 17:06:00 +00:00
Neil Schemenauer ab61923637 Fix bug in last checkin (2.231). To match previous behavior, unicode
subclasses should be substituted as-is and not have tp_str called on
them.
2005-08-31 23:02:05 +00:00