made a copy of the string using PyString_FromStringAndSize(s, n) and modify
the copied string in-place. However, 1 (and 0) character strings are shared
from a cache. This cause "A".replace("A", "a") to change the cached version
of "A" -- used by everyone.
Now may the copy with NULL as the string and do the memcpy manually. I've
added regression tests to check if this happens in the future. Perhaps
there should be a PyString_Copy for this case?
both mystrtoul.c and longobject.c. Share the table instead. Also
cut its size by 64 entries (they had been used for an inscrutable
trick originally, but the code no longer tries to use that trick).
results in a 2.5x speedup on the stringbench count tests, and a 20x (!)
speedup on the stringbench search/find/contains test, compared to 2.5a2.
for more on the algorithm, see:
http://effbot.org/zone/stringlib.htm
if you get weird results, you can disable the new algoritm by undefining
USE_FAST in Objects/unicodeobject.c.
enjoy /F
speed up splitlines and strip with charsets; etc. rsplit is now as
fast as split in all our tests (reverse takes no time at all), and
splitlines() is nearly as fast as a plain split("\n") in our tests.
and we're not done yet... ;-)
Applied patch zombie-frames-2.diff from sf patch 876206 with updates for
Python 2.5 and also modified to retain the free_list to avoid the 67%
slow-down in pybench recursion test. 5% speed up in function call pybench.
compiler warnings on Windows (signed vs unsigned mismatch
in comparisons). Cleaned that up by switching more locals
to Py_ssize_t. Simplified overflow checking (it can _be_
simpler because while these things are declared as
Py_ssize_t, then should in fact never be negative).
about "%u", "%lu" and "%zu" formats.
Since PyString_FromFormat and PyErr_Format have exactly the same rules
(both inherited from PyString_FromFormatV), it would be good if someone
with more LaTeX Fu changed one of them to just point to the other.
Their docs were way out of synch before this patch, and I just did a
mass copy+paste to repair that.
Not a backport candidate (this is a new feature).
longobject.c: also fix an ssize_t problem
<a> could have been NULL, so hoist the size calc to not use <a>.
_ssl.c: under fail: self is DECREF'd, but it would have been NULL.
_elementtree.c: delete self if there was an error.
_csv.c: I'm not sure if lineterminator could have been anything other than
a string. However, other string method calls are checked, so check this
one too.
discussion.
There are two places of documentation that still mention __context__:
Doc/lib/libstdtypes.tex -- I wasn't quite sure how to rewrite that without
spending a whole lot of time thinking about it; and whatsnew, which Andrew
usually likes to change himself.
__delitem__, __setslice__ and __delslice__ hooks. This caused test_weakref
and test_userlist to fail in the p3yk branch (where UserList, like all
classes, is new-style) on amd64 systems, with open-ended slices: the
sys.maxint value for empty-endpoint was transformed into -1.
zfill stringmethods, so they can create strings larger than 2Gb on 64bit
systems (even win64.) The unicode versions of these methods already did this
right.
slot_tp_del(), but while the latter had to cater to types
that don't participate in GC, we know that generators do.
That allows strengthing an assert().
using a custom, nearly-identical macro. This probably changes how some of
these functions are compiled, which may result in fractionally slower (or
faster) execution. Considering the nature of traversal, visiting much of the
address space in unpredictable patterns, I'd argue the code readability and
maintainability is well worth it ;P
- In functions where we already hold the same object in differently typed
pointers, use the correctly typed pointer instead of casting the other
pointer a second time.
objects before initializing it. It might be linked
already if there was a Py_Initialize/Py_Finalize
cycle earlier; not unlinking it would break the global
list.
Py_VISIT: cast the `op` argument to PyObject* when calling
`visit()`. Else the caller has to pay too much attention to
this silly detail (e.g., frame_traverse needs to traverse
`struct _frame *` and `PyCodeObject *` pointers too).
problems: first, PyGen_NeedsFinalizing() had an off-by-one bug that
prevented it from ever saying a generator didn't need finalizing, and
second, frame objects cleared themselves in a way that caused their
owning generator to think they were still executable, causing a double
deallocation of objects on the value stack if there was still a loop
on the block stack. This revision also removes some unnecessary
close() operations from test_generators that are now appropriately
handled by the cycle collector.
to avoid confusing situations like:
>>> int("")
ValueError: invalid literal for int():
>>> int("2\n\n2")
ValueError: invalid literal for int(): 2
2
Also report the base used, to avoid:
ValueError: invalid literal for int(): 2
They now report:
>>> int("")
ValueError: invalid literal for int() with base 10: ''
>>> int("2\n\n2")
ValueError: invalid literal for int() with base 10: '2\n\n2'
>>> int("2", 2)
ValueError: invalid literal for int() with base 2: '2'
(Reporting the base could be avoided when base is 10, which is the default,
but hrm.) Another effect of these changes is that the errormessage can be
longer; before, it was cut off at about 250 characters. Now, it can be up to
four times as long, as the unrepr'ed string is cut off at 200 characters,
instead.
No tests were added or changed, since testing for exact errormsgs is (pardon
the pun) somewhat errorprone, and I consider not testing the exact text
preferable. The actually changed code is tested frequent enough in the
test_builtin test as it is (120 runs for each of ints and longs.)
interpolate PY_FORMAT_SIZE_T for refcount display
instead of casting refcounts to long.
I understand that gcc on some boxes delivers
nuisance warnings about this, but if any new ones
appear because of this they'll get fixed by magic
when the others get fixed.
PyTypeObject structures, I had to make prototypes for the functions, and
move the structure definition ahead of the functions. I'd dearly like a better
way to do this - to change this would make for a massive set of changes to
the codebase.
There's still some warnings - this is purely to get rid of errors first.
that are suspended outside of any try/except/finally blocks to be
garbage collected even if they are part of a cycle. Generators that
suspend inside of an active try/except or try/finally block (including
those created by a ``with`` statement) are still not GC-able if they
are part of a cycle, however.
least as big as a long. I believe this to be a safe assumption that is being
made in many parts of CPython, but a check could be added.
len(xrange(sys.maxint)) works now, so fix the testsuite's odd exception for
64-bit platforms too. It also fixes 'zip(xrange(sys.maxint), it)' as a
portable-ish (if expensive) alternative to enumerate(it); since zip() now
calls len(), this was breaking on (real) 64-bit platforms. No additional
test was added for that behaviour.
adds the following API calls: PySet_Clear(), _PySet_Next(), and
_PySet_Update(). The latter two are considered non-public. Tests and
documentation (for the public API) are included.
This will hopefully get rid of some Coverity warnings, be a hint to
developers, and be marginally faster.
Some asserts were added when the type is currently known, but depends
on values from another function.
This is a heavily altered derivative of SF patch 1123430, Evan
Jones's heroic effort to make obmalloc return unused arenas to
the system free(), with some heuristic strategies to make it
more likley that arenas eventually _can_ be freed.
PyObject_Unicode(). This problem was originally reported from Coverity
and addresses mail on python-dev "checkin r43015".
This inlines the conversion of the string to unicode and cleans
up/simplifies some code at the end of the PyObject_Unicode().
We really need a complete C API test module for all public APIs
and passing good and bad parameter values.
Will backport.
there)
- Add missing DECREFs of inner-scope 'temp' variable
- Add various missing DECREFs by changing 'return NULL' into 'goto onError'
- Avoid double DECREF when last _PyUnicode_Resize() fails
Coverity found one of the missing DECREFs, but oddly enough not the others.
Anyway, this is the changes to the with-statement
so that __exit__ must return a true value in order
for a pending exception to be ignored.
The PEP (343) is already updated.
added message attribute compared to the previous version of Exception. It is
also a new-style class, making all exceptions now new-style. KeyboardInterrupt
and SystemExit inherit from BaseException directly. String exceptions now
raise DeprecationWarning.
Applies patch 1104669, and closes bugs 1012952 and 518846.
- New semantics for __exit__() -- it must re-raise the exception
if type is not None; the with-statement itself doesn't do this.
(See the updated PEP for motivation.)
- Added context managers to:
- file
- thread.LockType
- threading.{Lock,RLock,Condition,Semaphore,BoundedSemaphore}
- decimal.Context
- Added contextlib.py, which defines @contextmanager, nested(), closing().
- Unit tests all around; bot no docs yet.
- The copy module now "copies" function objects (as atomic objects).
- dict.__getitem__ now looks for a __missing__ hook before raising
KeyError.
- Added a new type, defaultdict, to the collections module.
This uses the new __missing__ hook behavior added to dict (see above).
Py_SAFE_DOWNCAST can evaluate its first argument multiple
times in a debug build. This caused two distinct assert-
failures in test_unicode run under a debug build. Rewrote
the code in trivial ways so that multiple evaluation of the
first argument doesn't hurt.
* Allow the 3rd argument to generator.throw() to be None.
The 'raise' statement does the same, and anyway it follows the
general policy that optional arguments of built-ins should, when
reasonable, have a default value specifiable from Python.
readline/readlines/read/readinto, loudly break by raising ValueError, rather
than silently deliver data out of order or hitting EOF prematurely.
Probably not a bugfix candidate, even though it affects no 'working' code.
to protect against actual uninitialized usage.
Objects/longobject.c: In function ‘PyLong_AsDouble’:
Objects/longobject.c:655: warning: ‘e’ may be used uninitialized in this function
Objects/longobject.c: In function ‘long_true_divide’:
Objects/longobject.c:2263: warning: ‘aexp’ may be used uninitialized in this function
Objects/longobject.c:2263: warning: ‘bexp’ may be used uninitialized in this function
This is how string objects work. u'%f' could use , instead of .
for the decimal point. Now both strings and unicode always use periods.
This is the code that would break:
import locale
locale.setlocale(locale.LC_NUMERIC, 'de_DE')
u'%.1f' % 1.0
assert '1.0' == u'%.1f' % 1.0
I couldn't create a test case which fails, but this fixes the problem.
Will backport.
* set sq_repeat and sq_concat to NULL for user-defined new-style
classes, as a way to fix a number of related problems. See
test_descr.notimplemented()). One of these problems was fixed
in r25556 and r25557 but many more existed; this is a general
fix and thus reverts r25556-r25557.
* to avoid having PySequence_Repeat()/PySequence_Concat() failing
on user-defined classes, they now fall back to nb_add/nb_mul if
sq_concat/sq_repeat are not defined and the arguments appear to
be sequences.
* added tests.
Backport candidate.
In C++, it's an error to pass a string literal to a char* function
without a const_cast(). Rather than require every C++ extension
module to put a cast around string literals, fix the API to state the
const-ness.
I focused on parts of the API where people usually pass literals:
PyArg_ParseTuple() and friends, Py_BuildValue(), PyMethodDef, the type
slots, etc. Predictably, there were a large set of functions that
needed to be fixed as a result of these changes. The most pervasive
change was to make the keyword args list passed to
PyArg_ParseTupleAndKewords() to be a const char *kwlist[].
One cast was required as a result of the changes: A type object
mallocs the memory for its tp_doc slot and later frees it.
PyTypeObject says that tp_doc is const char *; but if the type was
created by type_new(), we know it is safe to cast to char *.
[ 1346144 ] Segfaults from unaligned loads in floatobject.c
by using memcpy and not just blinding casting char* to double*.
Thanks to Rune Holm for the report.
'[].__add__', to match what the other internal descriptor types provide:
'__objclass__' attribute, '__self__' member, and reasonable repr and
comparison.
Added a test.
[ 1327110 ] wrong TypeError traceback in generator expressions
by removing the code that can stomp on the users' TypeError raised by the
iterable argument to ''.join() -- PySequence_Fast (now?) gives a perfectly
reasonable message itself. Also, a couple of tests.
This change implements a new bytecode compiler, based on a
transformation of the parse tree to an abstract syntax defined in
Parser/Python.asdl.
The compiler implementation is not complete, but it is in stable
enough shape to run the entire test suite excepting two disabled
tests.
type lookups: whitespace and linebreak.
These lookup tables are from the Python 1.6 version with the addition
of the 205F code point which was added as whitespace code point to Unicode
since then.
PyUnicode_DecodeCharmap() the accept a unicode string as the mapping
argument which is used as a mapping table.
This code isn't used by any of the codecs yet.
represented as a C int, raise OverflowError.
(Forward port from 2.4.2; the patch to classobject.c was already in
but needed a correction in the error message text.)
containing a value that doesn't fit in a C int, raise OverflowError
rather than truncating silently (and having 50% chance of hitting the
"it should be >= 0" error).
about illegal code points. The codec now supports PEP 293 style error handlers.
(This is a variant of the Nik Haldimann's patch that detects truncated data)
Fix over-aggressive PyErr_Clear(). The same code fragment appears in
various guises in list.extend(), map(), filter(), zip(), and internally
in PySequence_Tuple().
* set_merge() cannot assume that the table doesn't resize during iteration.
* convert some unnecessary tests to asserts -- they were necessary in
dictobject.c because PyDict_Next() is a public function. The same is
not true for set_next().
* re-arrange the order of functions to more closely match the order
in dictobject.c. This makes it must easier to compare the two
and ought to simplify any issues of maintaining both.
was never called during interpreter shutdown GC, so the f_back!=NULL
assertion was correct. Now that generators get close()d during GC,
the assertion was being triggered because the generator close() was being
called as the top-level frame. However, nothing actually is broken by
this; it's just that the condition was unexpected in previous Python
versions.
a frozenset conversion when the initial search attempt fails with a
TypeError and the key is some type of set. Add a testcase.
* Eliminate a duplicate if-stmt.
s|=s, s&=s, s-=s, or s^=s). Add related tests.
* Improve names for several variables and functions.
* Provide alternate table access functions (next, contains, add, and discard)
that work with an entry argument instead of just a key. This improves
set-vs-set operations because we already have a hash value for each key
and can avoid unnecessary calls to PyObject_Hash(). Provides a 5% to 20%
speed-up for quick hashing elements like strings and integers. Provides
much more substantial improvements for slow hashing elements like tuples
or objects defining a custom __hash__() function.
* Have difference operations resize() when 1/5 of the elements are dummies.
Formerly, it was 1/6. The new ratio triggers less frequently and only
in cases that it can resize quicker and with greater benefit. The right
answer is probably either 1/4, 1/5, or 1/6. Picked the middle value for
an even trade-off between resize time and the space/time costs of dummy
entries.
* Bring in free list from dictionary code.
* Improve several comments.
* Differencing can leave many dummy entries. If more than
1/6 are dummies, then resize them away.
* Factor-out common code with new macro, PyAnySet_CheckExact.
* Give set_lookkey_string() a fast alternate path when no dummy entries
are present.
* Have set_swap_bodies() reset the hash field to -1 whenever either of
bodies is not a frozenset. Maintains the invariant of regular sets
always having -1 in the hash field; otherwise, any mutation would make
the hash value invalid.
* Use an entry pointer to simplify the code in frozenset_hash().
dictobject.c.
* Have frozenset_hash() use entry->hash instead of re-computing each
individual hash with PyObject_Hash(o);
* Finalize the dummy entry before a system exit.
- Handle both frozenset() and frozenset([]).
- Do not use singleton for frozenset subclasses.
- Finalize the singleton.
- Add test cases.
* Factor-out set_update_internal() from set_update(). Simplifies the
code for several internal callers.
* Factor constant expressions out of loop in set_merge_internal().
* Minor comment touch-ups.
In addition, long_pow() skipped a necessary (albeit extremely unlikely
to trigger) error check when converting an int modulus to long.
Alas, I was unable to write a test case that crashed due to either
cause.
Bugfix candidate.
[ 1229429 ] missing Py_DECREF in PyObject_CallMethod
Add a test in test_enumerate, which is a bit random, but suffices
(reversed_new calls PyObject_CallMethod under some circumstances).
managed by C, because it's possible for the block to be smaller than the
new requested size, and at the end of allocated VM. Trying to copy over
nbytes bytes to a Python small-object block can segfault then, and there's
no portable way to avoid this (we would have to know how many bytes
starting at p are addressable, and std C has no means to determine that).
Bugfix candidate. Should be backported to 2.4, but I'm out of time.
[ 1181301 ] make float packing copy bytes when they can
which hasn't been reviewed, despite numerous threats to check it in
anyway if noone reviews it. Please read the diff on the checkin list,
at least!
The basic idea is to examine the bytes of some 'probe values' to see if
the current platform is a IEEE 754-ish platform, and if so
_PyFloat_{Pack,Unpack}{4,8} just copy bytes around.
The rest is hair for testing, and tests.
conversion using the proper magic slot (e.g., __int__()). Also move conversion
code out of PyNumber_*() functions in the C API into the nb_* function.
Applied patch #1109424. Thanks Walter Doewald.
[ 1165306 ] Property access with decorator makes interpreter crash
Don't allow the creation of unbound methods with NULL im_class, because
attempting to call such crashes.
Backport candidate.
* Speed-up "x in y" where x has more than one character.
The existing code made excessive calls to the expensive memcmp() function.
The new code uses memchr() to rapidly find a start point for memcmp().
In addition to knowing that the first character is a match, the new code
also checks that the last character is a match. This significantly reduces
the incidence of false starts (saving memcmp() calls and making quadratic
behavior less likely).
Improves the timings on:
python -m timeit -r7 -s"x='a'*1000" "'ab' in x"
python -m timeit -r7 -s"x='a'*1000" "'bc' in x"
Once this code has proven itself, then string_find_internal() should refer
to it rather than running its own version. Also, something similar may
apply to unicode objects.
[ 1124295 ] Function's __name__ no longer accessible in restricted mode
which I introduced with a bit of mindless copy-paste when making
__name__ writable. You can't assign to __name__ in restricted mode,
which I'm going to pretend was intentional :)
* Added missing error checks.
* Fixed O(n**2) growth pattern. Modeled after lists to achieve linear
amortized resizing. Improves construction of "tuple(it)" when "it" is
large and does not have a __len__ method. Other cases are unaffected.
In cyclic gc, clear weakrefs to unreachable objects before allowing any
Python code (weakref callbacks or __del__ methods) to run.
This is a critical bugfix, affecting all versions of Python since weakrefs
were introduced. I'll backport to 2.3.
exposed in header files. Fixed a few comments in these headers.
As we might have expected, writing down invariants systematically exposed a
(minor) bug. In this case, function objects have a writeable func_code
attribute, which could be set to code objects with the wrong number of
free variables. Calling the resulting function segfaulted the interpreter.
Added a corresponding test.
Also, add a testcase.
Formerly, the list_extend() code used several local variables to remember
its state across iterations. Since an iteration could call arbitrary
Python code, it was possible for the list state to be changed. The new
code uses dynamic structure references instead of C locals. So, they
are always up-to-date.
After list_resize() is called, its size has been updated but the new
cells are filled with NULLs. These needed to be filled before arbitrary
iteration code was called; otherwise, that code could attempt to modify
a list that was in a semi-invalid state. The solution was to change
the ob->size field back to a value reflecting the actual number of valid
cells.
When an integer is compared to a float now, the int isn't coerced to float.
This avoids spurious overflow exceptions and insane results. This should
compute correct results, without raising spurious exceptions, in all cases
now -- although I expect that what happens when an int/long is compared to
a NaN is still a platform accident.
Note that we had potential problems here even with "short" ints, on boxes
where sizeof(long)==8. There's #ifdef'ed code here to handle that, but
I can't test it as intended. I tested it by changing the #ifdef to
trigger on my 32-bit box instead.
I suppose this is a bugfix candidate, but I won't backport it. It's
long-winded (for speed) and messy (because the problem is messy). Note
that this also depends on a previous 2.4 patch that introduced
_Py_SwappedOp[] as an extern.
Make PySequence_Check() and PyMapping_Check() handle NULL inputs. This
goes beyond what most of the other checks do, but it is nice defensive
programming and solves the OP's problem.
module type with silly arguments. (The exact name can be quibbled
over, if you care).
This was partially inspired by bug #1014215 and so on, but is also
just a good idea.
The list resizing scheme only downsized when more than 16 elements were
removed in a single step: del a[100:120]. As a result, the list would
never shrink when popping elements off one at a time.
This patch makes it shrink whenever more than half of the space is unused.
Also, at Tim's suggestion, renamed _new_size to new_allocated. This makes
the code easier to understand.
decoding incomplete input (when the input stream is temporarily exhausted).
codecs.StreamReader now implements buffering, which enables proper
readline support for the UTF-16 decoders. codecs.StreamReader.read()
has a new argument chars which specifies the number of characters to
return. codecs.StreamReader.readline() and codecs.StreamReader.readlines()
have a new argument keepends. Trailing "\n"s will be stripped from the lines
if keepends is false. Added C APIs PyUnicode_DecodeUTF8Stateful and
PyUnicode_DecodeUTF16Stateful.
This checkin is adapted from part 2 (of 3) of Trevor Perrin's patch set.
BACKWARD INCOMPATIBILITY: SHIFT must now be divisible by 5. AFAIK,
nobody will care. long_pow() could be complicated to worm around that,
if necessary.
long_pow():
- BUGFIX: This leaked the base and power when the power was negative
(and so the computation delegated to float pow).
- Instead of doing right-to-left exponentiation, do left-to-right. This
is more efficient for small bases, which is the common case.
- In addition, if the exponent is large (more than FIVEARY_CUTOFF
digits), precompute [a**i % c for i in range(32)], and go left to
right 5 bits at a time.
l_divmod():
- The signature changed so that callers who don't want the quotient,
or don't want the remainder, can pass NULL in the slot they don't
want. This saves them from having to declare a vrbl for unwanted
stuff, and remembering to decref it.
long_mod(), long_div(), long_classic_div():
- Adjust to new l_divmod() signature, and simplified as a result.
This checkin is adapted from part 1 (of 3) of Trevor Perrin's patch set.
x_mul()
- sped a little by optimizing the C
- sped a lot (~2X) if it's doing a square; note that long_pow() squares
often
k_mul()
- more cache-friendly now if it's doing a square
KARATSUBA_CUTOFF
- boosted; gradeschool mult is quicker now, and it may have been too low
for many platforms anyway
KARATSUBA_SQUARE_CUTOFF
- new
- since x_mul is a lot faster at squaring now, the point at which
Karatsuba pays for squaring is much higher than for general mult
need to convert str objects from the iterable to unicode. So, if
someone set the system default encoding to something nasty enough,
the conversion process could mutate the input iterable as a side
effect, and PySequence_Fast doesn't hide that from us if the input was
a list. IOW, can't assume the size of PySequence_Fast's result is
invariant across PyUnicode_FromObject() calls.
much to reduce the size of the code, but greatly improves its clarity.
It's also quicker in what's probably the most common case (the argument
iterable is a list). Against it, if the iterable isn't a list or a tuple,
a temp tuple is materialized containing the entire input sequence, and
that's a bigger temp memory burden. Yawn.
1. u1.join([u2]) is u2
2. Be more careful about C-level int overflow.
Since PySequence_Fast() isn't needed to achieve #1, it's not used -- but
the code could sure be simpler if it were.
happen in 2.3, but nobody noticed it still was getting generated (the
warning was disabled by default). OverflowWarning and
PyExc_OverflowWarning should be removed for 2.5, and left notes all over
saying so.
(Patch contributed by Nick Coghlan.)
Now joining string subtypes will always return a string.
Formerly, if there were only one item, it was returned unchanged.
Added XXX comment about why the undocumented PyRange_New() API function
is too broken to be worth the considerable pain of repairing.
Changed range_new() to stop using PyRange_New(). This fixes a variety
of bogus errors. Nothing in the core uses PyRange_New() now.
Documented that xrange() is intended to be simple and fast, and that
CPython restricts its arguments, and length of its result sequence, to
native C longs.
Added some tests that failed before the patch, and repaired a test that
relied on a bogus OverflowError getting raised.
hack: it would resize *interned* strings in-place! This occurred because
their reference counts do not have their expected value -- stringobject.c
hacks them. Mea culpa.
interning were not clear here -- a subclass could be mutable, for
example -- and had bugs. Explicitly interning a subclass of string
via intern() will raise a TypeError. Internal operations that attempt
to intern a string subclass will have no effect.
Added a few tests to test_builtin that includes the old buggy code and
verifies that calls like PyObject_SetAttr() don't fail. Perhaps these
tests should have gone in test_string.
unicodedata.east_asian_width(). You can still implement your own
simple width() function using it like this:
def width(u):
w = 0
for c in unicodedata.normalize('NFC', u):
cwidth = unicodedata.east_asian_width(c)
if cwidth in ('W', 'F'): w += 2
else: w += 1
return w
the case of __del__ resurrecting an object.
This makes the apparent reference leaks in test_descr go away (which I
expected) and also kills off those in test_gc (which is more surprising
but less so once you actually think about it a bit).