Commit Graph

2590 Commits

Author SHA1 Message Date
Thomas Wouters c1282eef0c Make last patch valid C89 so Windows compilers can deal with it. 2006-05-28 21:32:12 +00:00
Michael W. Hudson 27596279a2 use the UnicodeError traversal and clearing functions in UnicodeError
subclasses.
2006-05-28 21:19:03 +00:00
Georg Brandl 43ab100cdc Fix refleaks in UnicodeError get and set methods. 2006-05-28 20:57:09 +00:00
Michael W. Hudson 96495ee6dd Quality control, meet exceptions.c, round two.
Make some functions that should have been static static.

Fix a bunch of refleaks by fixing the definition of
MiddlingExtendsException.

Remove all the __new__ implementations apart from
BaseException_new.  Rewrite most code that needs it to cope with
NULL fields (such code could get excercised anyway, the
__new__-removal just makes it more likely).  This involved
editing the code for WindowsError, which I can't test.

This fixes all the refleaks in at least the start of a regrtest
-R :: run.
2006-05-28 17:40:29 +00:00
Michael W. Hudson 22a80e7cb0 Quality control, meet exceptions.c.
Fix a number of problems with the need for speed code:

One is doing this sort of thing:

    Py_DECREF(self->field);
    self->field = newval;
    Py_INCREF(self->field);

without being very sure that self->field doesn't start with a
value that has a __del__, because that almost certainly can lead
to segfaults.

As self->args is constrained to be an exact tuple we may as well
exploit this fact consistently.  This leads to quite a lot of
simplification (and, hey, probably better performance).

Add some error checking in places lacking it.

Fix some rather strange indentation in the Unicode code.

Delete some trailing whitespace.

More to come, I haven't fixed all the reference leaks yet...
2006-05-28 15:51:40 +00:00
Fredrik Lundh 80f8e80c15 needforspeed: added Py_MEMCPY macro (currently tuned for Visual C only),
and use it for string copy operations.  this gives a 20% speedup on some
string benchmarks.
2006-05-28 12:06:46 +00:00
Richard Jones 2d555b356a move semicolons 2006-05-27 16:15:11 +00:00
Richard Jones c5b2a2e7b9 doc string additions and tweaks 2006-05-27 16:07:28 +00:00
Fredrik Lundh 0b7ef46950 needforspeed: stringlib refactoring: use find_slice for stringobject 2006-05-27 15:26:19 +00:00
Fredrik Lundh 60d8b18831 needforspeed: stringlib refactoring: changed find_obj to find_slice,
to enable use from stringobject
2006-05-27 15:20:22 +00:00
Fredrik Lundh c2d29c5a6d needforspeed: replace improvements, changed to Py_LOCAL_INLINE
where appropriate
2006-05-27 14:58:20 +00:00
Georg Brandl 94b8c122fd Remove spurious semicolons after macro invocations. 2006-05-27 14:41:55 +00:00
Andrew Dalke d49d5c49ba cleanup - removed trailing whitespace 2006-05-27 14:16:40 +00:00
Richard Jones 7b9558d37d Conversion of exceptions over from faked-up classes to new-style C types. 2006-05-27 12:29:24 +00:00
Martin v. Löwis 2e3f6b77d5 Revert bogus change committed in 46432 to this file. 2006-05-27 11:07:49 +00:00
Andrew Dalke e0df762719 fixed typo 2006-05-27 11:04:36 +00:00
Fredrik Lundh 2d23d5bf2e needforspeed: more stringlib refactoring 2006-05-27 10:05:10 +00:00
Martin v. Löwis d004fc810a Patch 1494554: Update numeric properties to Unicode 4.1. 2006-05-27 08:36:52 +00:00
Neal Norwitz d1b6cd7bfb Fix Coverity warnings.
- Check the correct variable (str_obj, not str) for NULL
 - sep_len was already verified it wasn't 0
2006-05-27 05:21:30 +00:00
Andrew Dalke 7e0a62ea90 Added description of why splitlines doesn't use the prealloc strategy 2006-05-26 22:49:03 +00:00
Andrew Dalke 5132407868 Added limits to the replace code so it does not count all of the matching
patterns in a string, only the number needed by the max limit.
2006-05-26 20:25:22 +00:00
Georg Brandl e4e023c4d3 Simplify calling. 2006-05-26 20:22:50 +00:00
Andrew M. Kuchling 07bbfc6a51 Comment typo 2006-05-26 19:51:10 +00:00
Fredrik Lundh e6e43c867d needforspeed: stringlib refactoring: use stringlib/find for string find 2006-05-26 19:48:07 +00:00
Fredrik Lundh c816281304 needforspeed: use a macro to fix slice indexes 2006-05-26 19:33:03 +00:00
Fredrik Lundh ce4eccb0c4 needforspeed: stringlib refactoring: use stringlib/find for unicode
find
2006-05-26 19:29:05 +00:00
Fredrik Lundh 58b5e84d52 needforspeed: stringlib refactoring, continued. added count and
find helpers; updated unicodeobject to use stringlib_count
2006-05-26 19:24:53 +00:00
Andrew Dalke c5da53ba78 substring split now uses /F's fast string matching algorithm.
(If compiled without FAST search support, changed the pre-memcmp test
   to check the last character as well as the first.  This gave a 25%
   speedup for my test case.)

Rewrote the split algorithms so they stop when maxsplit gets to 0.
Previously they did a string match first then checked if the maxsplit
was reached.  The new way prevents a needless string search.
2006-05-26 19:02:09 +00:00
Fredrik Lundh 9c0e9c089c needspeed: rpartition documentation, tests, and a bug fixes.
feel free to add more tests and improve the documentation.
2006-05-26 18:24:15 +00:00
Fredrik Lundh b3167cbcd7 needforspeed: added rpartition implementation 2006-05-26 18:15:38 +00:00
Fredrik Lundh be9f219e40 removed unnecessary include 2006-05-26 18:05:34 +00:00
Fredrik Lundh 3a65d87e8c needforspeed: remove remaining USE_FAST macros; if fastsearch was
broken, someone would have noticed by now ;-)
2006-05-26 17:31:41 +00:00
Fredrik Lundh c2032fb86a needforspeed: cleanup 2006-05-26 17:26:39 +00:00
Fredrik Lundh b947948c61 needforspeed: stringlib refactoring (in progress) 2006-05-26 17:22:38 +00:00
Fredrik Lundh a50d201bd9 needforspeed: stringlib refactoring (in progress) 2006-05-26 17:04:58 +00:00
Fredrik Lundh 7c940d1d68 needforspeed: use Py_LOCAL on a few more locals in stringobject.c 2006-05-26 16:32:42 +00:00
Andrew Dalke 02758d66ce Eeked out another 3% or so performance in split whitespace by cleaning up the algorithm. 2006-05-26 15:21:01 +00:00
Andrew Dalke 525eab3712 Changes to string.split/rsplit on whitespace to preallocate space in the
results list.

Originally it allocated 0 items and used the list growth during append.  Now
it preallocates 12 items so the first few appends don't need list reallocs.

("Here are some words ."*2).split(None, 1) is 7% faster
("Here are some words ."*2).split() is is 15% faster

  (Your milage may vary, see dealership for details.)

File parsing like this

    for line in f:
        count += len(line.split())

is also about 15% faster.  There is a slowdown of about 3% for large
strings because of the additional overhead of checking if the append is
to a preallocated region of the list or not.  This will be the rare case.
It could be improved with special case code but we decided it was not
useful enough.

There is a cost of 12*sizeof(PyObject *) bytes per list.  For the normal
case of file parsing this is not a problem because of the lists have
a short lifetime.  We have not come up with cases where this is a problem
in real life.

I chose 12 because human text averages about 11 words per line in books,
one of my data sets averages 6.2 words with a final peak at 11 words per
line, and I work with a tab delimited data set with 8 tabs per line (or
9 words per line).  12 encompasses all of these.

Also changed the last rstrip code to append then reverse, rather than
doing insert(0).  The strip() and rstrip() times are now comparable.
2006-05-26 14:00:45 +00:00
Fredrik Lundh 95e2a91615 use Py_LOCAL also for string and unicode objects 2006-05-26 11:38:15 +00:00
Fredrik Lundh f2c0dfdb13 needforspeed: use Py_ssize_t for the fastsearch counter and skip
length (thanks, neal!).  and yes, I've verified that this doesn't
slow things down ;-)
2006-05-26 10:27:17 +00:00
Fredrik Lundh 450277fef5 needforspeed: use METH_O for argument handling, which made partition some
~15% faster for the current tests (which is noticable faster than a corre-
sponding find call).  thanks to neal-who-never-sleeps for the tip.
2006-05-26 09:46:59 +00:00
Fredrik Lundh 06a69dd8ff needforspeed: partition implementation, part two.
feel free to improve the documentation and the docstrings.
2006-05-26 08:54:28 +00:00
Fredrik Lundh fe5bb7e6d9 needforspeed: partition for 8-bit strings. for some simple tests,
this is on par with a corresponding find, and nearly twice as fast
as split(sep, 1)

full tests, a unicode version, and documentation will follow to-
morrow.
2006-05-25 23:27:53 +00:00
Tim Peters d89fc22dc6 Patch #1494387: SVN longobject.c compiler warnings
The SIGCHECK macro defined here has always been bizarre, but
it apparently causes compiler warnings on "Sun Studio 11".
I believe the warnings are bogus, but it doesn't hurt to make
the macro definition saner.

Bugfix candidate (but I'm not going to bother).
2006-05-25 22:28:46 +00:00
Bob Ippolito 955b64c031 squelch gcc4 darwin/x86 compiler warnings 2006-05-25 20:52:38 +00:00
Fredrik Lundh 554da412a8 needforspeed: use insert+reverse instead of append 2006-05-25 19:19:05 +00:00
Georg Brandl 684fd0c8ec Replace PyObject_CallFunction calls with only object args
with PyObject_CallFunctionObjArgs, which is 30% faster.
2006-05-25 19:15:31 +00:00
Jack Diederich 60cbb3fe49 * eliminate warning by reverting tmp_s type to 'const char*' 2006-05-25 18:47:15 +00:00
Fredrik Lundh c3434b3834 needforspeed: use fastsearch also for find/index and contains. the
related tests are now about 10x faster.
2006-05-25 18:44:29 +00:00
Bob Ippolito a85bf202ac Faster path for PyLong_FromLongLong, using PyLong_FromLong algorithm 2006-05-25 18:20:23 +00:00
Andrew Dalke 598710c727 Added overflow test for adding two (very) large strings where the
new string is over max Py_ssize_t.  I have no way to test it on my
box or any box I have access to.  At least it doesn't break anything.
2006-05-25 18:18:39 +00:00
Andrew M. Kuchling f344c94c85 Comment typo 2006-05-25 18:11:16 +00:00
Andrew Dalke b552c4d848 Code had returned an ssize_t, upcast to long, then converted with PyInt_FromLong.
Now using PyInt_FromSsize_t.
2006-05-25 18:03:25 +00:00
Fredrik Lundh af72237abc needforspeed: use "fastsearch" for count. this results in a 3x speedup
for the related stringbench tests.
2006-05-25 17:55:31 +00:00
Andrew Dalke 8c9091074b Fixed problem identified by Georg. The special-case in-place code for replace
made a copy of the string using PyString_FromStringAndSize(s, n) and modify
the copied string in-place.  However, 1 (and 0) character strings are shared
from a cache.  This cause "A".replace("A", "a") to change the cached version
of "A" -- used by everyone.

Now may the copy with NULL as the string and do the memcpy manually.  I've
added regression tests to check if this happens in the future.  Perhaps
there should be a PyString_Copy for this case?
2006-05-25 17:53:00 +00:00
Tim Peters da53afa1b0 A new table to help string->integer conversion was added yesterday to
both mystrtoul.c and longobject.c.  Share the table instead.  Also
cut its size by 64 entries (they had been used for an inscrutable
trick originally, but the code no longer tries to use that trick).
2006-05-25 17:34:03 +00:00
Fredrik Lundh e68955cf32 needforspeed: new replace implementation by Andrew Dalke. replace is
now about 3x faster on my machine, for the replace tests from string-
bench.
2006-05-25 17:08:14 +00:00
Fredrik Lundh 0c71f88fc9 needforspeed: check for overflow in replace (from Andrew Dalke) 2006-05-25 16:46:54 +00:00
Fredrik Lundh dfe503d3f0 needforspeed: _toupper/_tolower is a SUSv2 thing; fall back on ISO C
versions if they're not defined.
2006-05-25 16:10:12 +00:00
Kristján Valur Jónsson f94323fbb4 Added a new macro, Py_IS_FINITE(X). On windows there is an intrinsic for this and it is more efficient than to use !Py_IS_INFINITE(X) && !Py_IS_NAN(X). No change on other platforms 2006-05-25 15:53:30 +00:00
Fredrik Lundh 4b4e33ef14 needforspeed: make new upper/lower work properly for single-character
strings too... (thanks to georg brandl for spotting the exact problem
faster than anyone else)
2006-05-25 15:49:45 +00:00
Fredrik Lundh 39ccef607e needforspeed: speed up upper and lower for 8-bit string objects.
(the unicode versions of these are still 2x faster on windows,
though...)

based on work by Andrew Dalke, with tweaks by yours truly.
2006-05-25 15:22:03 +00:00
Tim Peters 696cf43b58 Heavily fiddled variant of patch #1442927: PyLong_FromString optimization.
``long(str, base)`` is now up to 6x faster for non-power-of-2 bases.  The
largest speedup is for inputs with about 1000 decimal digits.  Conversion
from non-power-of-2 bases remains quadratic-time in the number of input
digits (it was and remains linear-time for bases 2, 4, 8, 16 and 32).

Speedups at various lengths for decimal inputs, comparing 2.4.3 with
current trunk.  Note that it's actually a bit slower for 1-digit strings:

  len  speedup
 ----  -------
   1     -4.5%
   2      4.6%
   3      8.3%
   4     12.7%
   5     16.9%
   6     28.6%
   7     35.5%
   8     44.3%
   9     46.6%
  10     55.3%
  11     65.7%
  12     77.7%
  13     73.4%
  14     75.3%
  15     85.2%
  16    103.0%
  17     95.1%
  18    112.8%
  19    117.9%
  20    128.3%
  30    174.5%
  40    209.3%
  50    236.3%
  60    254.3%
  70    262.9%
  80    295.8%
  90    297.3%
 100    324.5%
 200    374.6%
 300    403.1%
 400    391.1%
 500    388.7%
 600    440.6%
 700    468.7%
 800    498.0%
 900    507.2%
1000    501.2%
2000    450.2%
3000    463.2%
4000    452.5%
5000    440.6%
6000    439.6%
7000    424.8%
8000    418.1%
9000    417.7%
2006-05-24 21:10:40 +00:00
Fredrik Lundh 347ee277aa needforspeed: refactored the replace code slightly; special-case
constant-length changes; use fastsearch to locate the first match.
2006-05-24 16:35:18 +00:00
Fredrik Lundh d5e0dc51cf needforspeedindeed: use fastsearch also for __contains__ 2006-05-24 15:11:01 +00:00
Fredrik Lundh 6471ee4f18 needforspeed: use "fastsearch" for count and findstring helpers. this
results in a 2.5x speedup on the stringbench count tests, and a 20x (!)
speedup on the stringbench search/find/contains test, compared to 2.5a2.

for more on the algorithm, see:

    http://effbot.org/zone/stringlib.htm

if you get weird results, you can disable the new algoritm by undefining
USE_FAST in Objects/unicodeobject.c.

enjoy /F
2006-05-24 14:28:11 +00:00
Fredrik Lundh 240bf2a8e4 use Py_ssize_t for string indexes (thanks, neal!) 2006-05-24 10:20:36 +00:00
Fredrik Lundh 7763351808 return 0 on misses, not -1. 2006-05-23 19:47:35 +00:00
Fredrik Lundh b63588c188 needforspeed: use append+reverse for rsplit, use "bloom filters" to
speed up splitlines and strip with charsets; etc.  rsplit is now as
fast as split in all our tests (reverse takes no time at all), and
splitlines() is nearly as fast as a plain split("\n") in our tests.
and we're not done yet... ;-)
2006-05-23 18:44:25 +00:00
Richard Jones a372711fcc fix broken merge 2006-05-23 18:32:11 +00:00
Richard Jones cebbefc98d Applied patch 1337051 by Neal Norwitz, saving 4 ints on frame objects. 2006-05-23 18:28:17 +00:00
Richard Jones 7c88dcc5ab Merge from rjones-funccall branch.
Applied patch zombie-frames-2.diff from sf patch 876206 with updates for
Python 2.5 and also modified to retain the free_list to avoid the 67%
slow-down in pybench recursion test. 5% speed up in function call pybench.
2006-05-23 10:37:38 +00:00
Fredrik Lundh 833bf9422e needforspeed: fixed unicode "in" operator to use same implementation
approach as find/index
2006-05-23 10:12:21 +00:00
Tim Peters 1bacc641a0 unicode_repeat(): Change type of local to Py_ssize_t,
since that's what it should be.
2006-05-23 05:47:16 +00:00
Tim Peters 286085c781 PyUnicode_Join(): Recent code changes introduced new
compiler warnings on Windows (signed vs unsigned mismatch
in comparisons).  Cleaned that up by switching more locals
to Py_ssize_t.  Simplified overflow checking (it can _be_
simpler because while these things are declared as
Py_ssize_t, then should in fact never be negative).
2006-05-22 19:17:04 +00:00
Fredrik Lundh 8a8e05a2b9 needforspeed: use memcpy for "long" strings; use a better algorithm
for long repeats.
2006-05-22 17:12:58 +00:00
Fredrik Lundh f1d60a5384 needforspeed: speed up unicode repeat, unicode string copy 2006-05-22 16:29:30 +00:00
Fredrik Lundh 763b50f9d9 docstring tweaks: count counts non-overlapping substrings, not
total number of occurences
2006-05-22 15:35:12 +00:00
Georg Brandl 7b90e168f3 Bug #1462152: file() now checks more thoroughly for invalid mode
strings and removes a possible "U" before passing the mode to the
C library function.
2006-05-18 07:01:27 +00:00
Neal Norwitz 1004a5339a Patch #1488312, Fix memory alignment problem on SPARC in unicode. Will backport 2006-05-15 07:17:23 +00:00
Tim Peters 8931ff1f67 Teach PyString_FromFormat, PyErr_Format, and PyString_FromFormatV
about "%u", "%lu" and "%zu" formats.

Since PyString_FromFormat and PyErr_Format have exactly the same rules
(both inherited from PyString_FromFormatV), it would be good if someone
with more LaTeX Fu changed one of them to just point to the other.
Their docs were way out of synch before this patch, and I just did a
mass copy+paste to repair that.

Not a backport candidate (this is a new feature).
2006-05-13 23:28:20 +00:00
Martin v. Löwis 822f34a848 Revert 43315: Printing of %zd must be signed. 2006-05-13 13:34:04 +00:00
Neal Norwitz c6a989ac3a Fix problems found by Coverity.
longobject.c: also fix an ssize_t problem
  <a> could have been NULL, so hoist the size calc to not use <a>.

_ssl.c: under fail: self is DECREF'd, but it would have been NULL.

_elementtree.c: delete self if there was an error.

_csv.c: I'm not sure if lineterminator could have been anything other than
a string.  However, other string method calls are checked, so check this
one too.
2006-05-10 06:57:58 +00:00
Guido van Rossum da5b701aee Get rid of __context__, per the latest changes to PEP 343 and python-dev
discussion.
There are two places of documentation that still mention __context__:
Doc/lib/libstdtypes.tex -- I wasn't quite sure how to rewrite that without
spending a whole lot of time thinking about it; and whatsnew, which Andrew
usually likes to change himself.
2006-05-02 19:47:52 +00:00
Neal Norwitz c4edb0ec81 SF #1479181: split open() and file() from being aliases for each other. 2006-05-02 04:43:14 +00:00
Martin v. Löwis 6685128b97 Fix more ssize_t issues. 2006-04-22 11:40:03 +00:00
Thomas Wouters 568f1d0eed Py_ssize_t issue; repr()'ing a very large string would result in a teensy
string, because of a cast to int.
2006-04-21 13:54:43 +00:00
Thomas Wouters 4e908107b0 Fix variable/format-char discrepancy in new-style class __getitem__,
__delitem__, __setslice__ and __delslice__ hooks. This caused test_weakref
and test_userlist to fail in the p3yk branch (where UserList, like all
classes, is new-style) on amd64 systems, with open-ended slices: the
sys.maxint value for empty-endpoint was transformed into -1.
2006-04-21 11:26:56 +00:00
Thomas Wouters dc5f808cbc Make s.replace() work with explicit counts exceeding 2Gb. 2006-04-19 15:38:01 +00:00
Thomas Wouters 4abb3660ca Use Py_ssize_t to hold the 'width' argument to the ljust, rjust, center and
zfill stringmethods, so they can create strings larger than 2Gb on 64bit
systems (even win64.) The unicode versions of these methods already did this
right.
2006-04-19 14:50:15 +00:00
Jeremy Hylton a4ebc135ac Refactor: Move code that uses co_lnotab from ceval to codeobject 2006-04-18 14:47:00 +00:00
Andrew M. Kuchling 2060d1bd27 Comment typo fix 2006-04-18 11:49:53 +00:00
Martin v. Löwis 45294a9562 Remove types from type_list if they have no objects
and unlist_types_without_objects is set.
Give dump_counts a FILE* argument.
2006-04-18 06:24:08 +00:00
Skip Montanaro 429433b30b C++ compiler cleanup: bunch-o-casts, plus use of unsigned loop index var in a couple places 2006-04-18 00:35:43 +00:00
Skip Montanaro 54e964d253 C++ compilation cleanup: Migrate declaration of
_PyObject_Call(Function|Method)_SizeT into Include/abstract.h.  This gets
them under the umbrella of the extern "C" { ... } block in that file.
2006-04-18 00:27:46 +00:00
Neal Norwitz 0e2cbabb8d No need to cast a Py_ssize_t, use %z in PyErr_Format 2006-04-17 05:56:32 +00:00
Thomas Wouters 715a4cdea2 Use %zd instead of %i as format character (in call to PyErr_Format) for
Py_ssize_t argument.
2006-04-16 22:04:49 +00:00
Tim Peters 81b092d0e6 gen_del(): Looks like much this was copy/pasted from
slot_tp_del(), but while the latter had to cater to types
that don't participate in GC, we know that generators do.
That allows strengthing an assert().
2006-04-15 22:59:10 +00:00
Tim Peters ffe2395777 Remove now-unused variables from tp_traverse and tp_clear methods. 2006-04-15 22:51:26 +00:00
Thomas Wouters c6e55068ca Use Py_VISIT in all tp_traverse methods, instead of traversing manually or
using a custom, nearly-identical macro. This probably changes how some of
these functions are compiled, which may result in fractionally slower (or
faster) execution. Considering the nature of traversal, visiting much of the
address space in unpredictable patterns, I'd argue the code readability and
maintainability is well worth it ;P
2006-04-15 21:47:09 +00:00