i_divmod: New and simpler algorithm. Old one returned gibberish on most
boxes when the numerator was -sys.maxint-1. Oddly enough, it worked in the
release (not debug) build on Windows, because the compiler optimized away
some tricky sign manipulations that were incorrect in this case.
Makes you wonder <wink> ...
Bugfix candidate.
Gave Python linear-time repr() implementations for dicts, lists, strings.
This means, e.g., that repr(range(50000)) is no longer 50x slower than
pprint.pprint() in 2.2 <wink>.
I don't consider this a bugfix candidate, as it's a performance boost.
Added _PyString_Join() to the internal string API. If we want that in the
public API, fine, but then it requires runtime error checks instead of
asserts.
is allocated than needed (used to allocate 80 bytes of digit space no
matter how small the long input). This also runs faster, at least on 32-
bit boxes.
Replaced PyLong_{As,From}{Unsigned,}LongLong guts with calls
to _PyLong_{As,From}ByteArray.
_testcapimodule.c:
Added strong tests of PyLong_{As,From}{Unsigned,}LongLong.
Fixes SF bug #432552 PyLong_AsLongLong() problems.
Possible bugfix candidate, but the fix relies on code added to longobject
to support the new q/Q structmodule format codes.
This completes the q/Q project.
longobject.c _PyLong_AsByteArray: The original code had a gross bug:
the most-significant Python digit doesn't necessarily have SHIFT
significant bits, and you really need to count how many copies of the sign
bit it has else spurious overflow errors result.
test_struct.py: This now does exhaustive std q/Q testing at, and on both
sides of, all relevant power-of-2 boundaries, both positive and negative.
NEWS: Added brief dict news while I was at it.
_PyLong_FromByteArray
_PyLong_AsByteArray
Untested and probably buggy -- they compile OK, but nothing calls them
yet. Will soon be called by the struct module, to implement x-platform
'q' and 'Q'.
If other people have uses for them, we could move them into the public API.
See longobject.h for usage details.
case of objects with equal types which support tp_compare. Give
type objects a tp_compare function.
Also add c<0 tests before a few PyErr_Occurred tests.
frequently used, and in particular this allows to drop the last
remaining obvious time-waster in the crucial lookdict() and
lookdict_string() functions. Other changes consist mostly of changing
"i < ma_size" to "i <= ma_mask" everywhere.
be possible to provoke unbounded recursion now, but leaving that to someone
else to provoke and repair.
Bugfix candidate -- although this is getting harder to backstitch, and the
cases it's protecting against are mondo contrived.
code, less memory. Tests have uncovered no drawbacks. Christian and
Vladimir are the other two people who have burned many brain cells on the
dict code in recent years, and they like the approach too, so I'm checking
it in without further ado.
instead of multiplication to generate the probe sequence. The idea is
recorded in Python-Dev for Dec 2000, but that version is prone to rare
infinite loops.
The value is in getting *all* the bits of the hash code to participate;
and, e.g., this speeds up querying every key in a dict with keys
[i << 16 for i in range(20000)] by a factor of 500. Should be equally
valuable in any bad case where the high-order hash bits were getting
ignored.
Also wrote up some of the motivations behind Python's ever-more-subtle
hash table strategy.
resizing.
Accurate timings are impossible on my Win98SE box, but this is obviously
faster even on this box for reasonable list.append() cases. I give
credit for this not to the resizing strategy but to getting rid of integer
multiplication and divsion (in favor of shifting) when computing the
rounded-up size.
For unreasonable list.append() cases, Win98SE now displays linear behavior
for one-at-time appends up to a list with about 35 million elements. Then
it dies with a MemoryError, due to fatally fragmented *address space*
(there's plenty of VM available, but by this point Win9X has broken user
space into many distinct heaps none of which has enough contiguous space
left to resize the list, and for whatever reason Win9x isn't coalescing
the dead heaps). Before the patch it got a MemoryError for the same
reason, but once the list reached about 2 million elements.
Haven't yet tried on Win2K but have high hopes extreme list.append()
will be much better behaved now (NT & Win2K didn't fragment address space,
but suffered obvious quadratic-time behavior before as lists got large).
For other systems I'm relying on common sense: replacing integer * and /
by << and >> can't plausibly hurt, the number of function calls hasn't
changed, and the total operation count for reasonably small lists is about
the same (while the operations are cheaper now).
dictresize() was too aggressive about never ever resizing small dicts.
If a small dict is entirely full, it needs to rebuild it despite that
it won't actually resize it, in order to purge old dummy entries thus
creating at least one virgin slot (lookdict assumes at least one such
exists).
Also took the opportunity to add some high-level comments to dictresize.
The idea is Marc-Andre Lemburg's, the implementation is Tim's.
Add a new ma_smalltable member to dictobjects, an embedded vector of
MINSIZE (8) dictentry structs. Short course is that this lets us avoid
additional malloc(s) for dicts with no more than 5 entries.
The changes are widespread but mostly small.
Long course: WRT speed, all scalar operations (getitem, setitem, delitem)
on non-empty dicts benefit from no longer needing NULL-pointer checks
(ma_table is never NULL anymore). Bulk operations (copy, update, resize,
clearing slots during dealloc) benefit in some cases from now looping
on the ma_fill count rather than on ma_size, but that was an unexpected
benefit: the original reason to loop on ma_fill was to let bulk
operations on empty dicts end quickly (since the NULL-pointer checks
went away, empty dicts aren't special-cased any more).
Special considerations:
For dicts that remain empty, this change is a lose on two counts:
the dict object contains 8 new dictentry slots now that weren't
needed before, and dict object creation also spends time memset'ing
these doomed-to-be-unsused slots to NULLs.
For dicts with one or two entries that never get larger than 2, it's
a mix: a malloc()/free() pair is no longer needed, and the 2-entry case
gets to use 8 slots (instead of 4) thus decreasing the chance of
collision. Against that, dict object creation spends time memset'ing
4 slots that aren't strictly needed in this case.
For dicts with 3 through 5 entries that never get larger than 5, it's a
pure win: the dict is created with all the space they need, and they
never need to resize. Before they suffered two malloc()/free() calls,
plus 1 dict resize, to get enough space. In addition, the 8-slot
table they ended with consumed more memory overall, because of the
hidden overhead due to the additional malloc.
For dicts with 6 or more entries, the ma_smalltable member is wasted
space, but then these are large(r) dicts so 8 slots more or less doesn't
make much difference. They still benefit all the time from removing
ubiquitous dynamic null-pointer checks, and get a small benefit (but
relatively smaller the larger the dict) from not having to do two
mallocs, two frees, and a resize on the way *to* getting their sixth
entry.
All in all it appears a small but definite general win, with larger
benefits in specific cases. It's especially nice that it allowed to
get rid of several branches, gotos and labels, and overall made the
code smaller.
This should be faster.
This means:
(1) "for line in file:" won't work if the xreadlines module can't be
imported.
(2) The body of "for line in file:" shouldn't use the file directly;
the effects (e.g. of file.readline(), file.seek() or even
file.tell()) would be undefined because of the buffering that goes
on in the xreadlines module.
UTF-16 codec will now interpret and remove a *leading* BOM mark. Sub-
sequent BOM characters are no longer interpreted and removed.
UTF-16-LE and -BE pass through all BOM mark characters.
These changes should get the UTF-16 codec more in line with what
the Unicode FAQ recommends w/r to BOM marks.
Two exceedingly unlikely errors in dictresize():
1. The loop for finding the new size had an off-by-one error at the
end (could over-index the polys[] vector).
2. The polys[] vector ended with a 0, apparently intended as a sentinel
value but never used as such; i.e., it was never checked, so 0 could
have been used *as* a polynomial.
Neither bug could trigger unless a dict grew to 2**30 slots; since that
would consume at least 12GB of memory just to hold the dict pointers,
I'm betting it's not the cause of the bug Fred's tracking down <wink>.
in the comments for using two passes was bogus, as the only object that
can get decref'ed due to the copy is the dummy key, and decref'ing dummy
can't have side effects (for one thing, dummy is immortal! for another,
it's a string object, not a potentially dangerous user-defined object).
1. Omit the early-out EQ/NE "lengths different?" test. Was unable to find
any real code where it triggered, but it always costs. The same is not
true of list richcmps, where different-size lists appeared to get
compared about half the time.
2. Because tuples are immutable, there's no need to refetch the lengths of
both tuples from memory again on each loop trip.
BUG ALERT: The tuple (and list) richcmp algorithm is arguably wrong,
because it won't believe there's any difference unless Py_EQ returns false
for some corresponding elements:
>>> class C:
... def __lt__(x, y): return 1
... __eq__ = __lt__
...
>>> C() < C()
1
>>> (C(),) < (C(),)
0
>>>
That doesn't make sense -- provided you believe the defn. of C makes sense.
and introduces a new method .decode().
The major change is that strg.encode() will no longer try to convert
Unicode returns from the codec into a string, but instead pass along
the Unicode object as-is. The same is now true for all other codec
return types. The underlying C APIs were changed accordingly.
Note that even though this does have the potential of breaking
existing code, the chances are low since conversion from Unicode
previously took place using the default encoding which is normally
set to ASCII rendering this auto-conversion mechanism useless for
most Unicode encodings.
The good news is that you can now use .encode() and .decode() with
much greater ease and that the door was opened for better accessibility
of the builtin codecs.
As demonstration of the new feature, the patch includes a few new
codecs which allow string to string encoding and decoding (rot13,
hex, zip, uu, base64).
Written by Marc-Andre Lemburg. Copyright assigned to the PSF.
to reason that me_key is much more likely to match the key we're looking
for than to match dummy, and if the key is absent me_key is much more
likely to be NULL than dummy: most dicts don't even have a dummy entry.
Running instrumented dict code over the test suite and some apps confirmed
that matching dummy was 200-300x less frequent than matching key in
practice. So this reorders the tests to try the common case first.
It can lose if a large dict with many collisions is mostly deleted, not
resized, and then frequently searched, but that's hardly a case we
should be favoring.
The comment following used to say:
/* We use ~hash instead of hash, as degenerate hash functions, such
as for ints <sigh>, can have lots of leading zeros. It's not
really a performance risk, but better safe than sorry.
12-Dec-00 tim: so ~hash produces lots of leading ones instead --
what's the gain? */
That is, there was never a good reason for doing it. And to the contrary,
as explained on Python-Dev last December, it tended to make the *sum*
(i + incr) & mask (which is the first table index examined in case of
collison) the same "too often" across distinct hashes.
Changing to the simpler "i = hash & mask" reduced the number of string-dict
collisions (== # number of times we go around the lookup for-loop) from about
6 million to 5 million during a full run of the test suite (these are
approximate because the test suite does some random stuff from run to run).
The number of collisions in non-string dicts also decreased, but not as
dramatically.
Note that this may, for a given dict, change the order (wrt previous
releases) of entries exposed by .keys(), .values() and .items(). A number
of std tests suffered bogus failures as a result. For dicts keyed by
small ints, or (less so) by characters, the order is much more likely to be
in increasing order of key now; e.g.,
>>> d = {}
>>> for i in range(10):
... d[i] = i
...
>>> d
{0: 0, 1: 1, 2: 2, 3: 3, 4: 4, 5: 5, 6: 6, 7: 7, 8: 8, 9: 9}
>>>
Unfortunately. people may latch on to that in small examples and draw a
bogus conclusion.
test_support.py
Moved test_extcall's sortdict() into test_support, made it stronger,
and imported sortdict into other std tests that needed it.
test_unicode.py
Excluced cp875 from the "roundtrip over range(128)" test, because
cp875 doesn't have a well-defined inverse for unicode("?", "cp875").
See Python-Dev for excruciating details.
Cookie.py
Chaged various output functions to sort dicts before building
strings from them.
test_extcall
Fiddled the expected-result file. This remains sensitive to native
dict ordering, because, e.g., if there are multiple errors in a
keyword-arg dict (and test_extcall sets up many cases like that), the
specific error Python complains about first depends on native dict
ordering.
Allow module getattr and setattr to exploit string interning, via the
previously null module object tp_getattro and tp_setattro slots. Yields
a very nice speedup for things like random.random and os.path etc.
For rich comparisons, use instance_getattr2() when possible to avoid
the expense of setting an AttributeError. Also intern the name_op[]
table and use the interned strings rather than creating a new string
and interning it each time through.
doesn't know how to do LE, LT, GE, GT. dict_richcompare can't do the
latter any faster than dict_compare can. More importantly, for
cmp(dict1, dict2), Python *first* tries rich compares with EQ, LT, and
GT one at a time, even if the tp_compare slot is defined, and
dict_richcompare called dict_compare for the latter two because
it couldn't do them itself. The result was a lot of wasted calls to
dict_compare. Now dict_richcompare gives up at once the times Python
calls it with LT and GT from try_rich_to_3way_compare(), and dict_compare
is called only once (when Python gets around to trying the tp_compare
slot).
Continued mystery: despite that this cut the number of calls to
dict_compare approximately in half in test_mutants.py, the latter still
runs amazingly slowly. Running under the debugger doesn't show excessive
activity in the dict comparison code anymore, so I'm guessing the culprit
is somewhere else -- but where? Perhaps in the element (key/value)
comparison code? We clearly spend a lot of time figuring out how to
compare things.
Fixed a half dozen ways in which general dict comparison could crash
Python (even cause Win98SE to reboot) in the presence of kay and/or
value comparison routines that mutate the dict during dict comparison.
Bugfix candidate.
interned when created, so the cached versions generally aren't ever
interned. With the patch, the
Py_INCREF(t);
*p = t;
Py_DECREF(s);
return;
indirection block in PyString_InternInPlace() is never executed during a
full run of the test suite, but was executed very many times before. So
I'm trading more work when creating one-character strings for doing less
work later. Note that the "more work" here can happen at most 256 times
per program run, so it's trivial. The same reasoning accounts for the
patch's simplification of string_item (the new version can call
PyString_FromStringAndSize() no more than 256 times per run, so there's
no point to inlining that stuff -- if we were serious about saving time
here, we'd pre-initialize the characters vector so that no runtime testing
at all was needed!).
Store floats and doubles to full precision in marshal.
Test that floats read from .pyc/.pyo closely match those read from .py.
Declare PyFloat_AsString() in floatobject header file.
Add new PyFloat_AsReprString() API function.
Document the functions declared in floatobject.h.
d1 == d2 and d1 != d2 now work even if the keys and values in d1 and d2
don't support comparisons other than ==, and testing dicts for equality
is faster now (especially when inequality obtains).
safely together and don't duplicate logic (the common logic was factored
out into new private API function _PySequence_IterContains()).
Visible change:
some_complex_number in some_instance
no longer blows up if some_instance has __getitem__ but neither
__contains__ nor __iter__. test_iter changed to ensure that remains true.
NEEDS DOC CHANGES
A few more AttributeErrors turned into TypeErrors, but in test_contains
this time.
The full story for instance objects is pretty much unexplainable, because
instance_contains() tries its own flavor of iteration-based containment
testing first, and PySequence_Contains doesn't get a chance at it unless
instance_contains() blows up. A consequence is that
some_complex_number in some_instance
dies with a TypeError unless some_instance.__class__ defines __iter__ but
does not define __getitem__.
to string.join(), so that when the latter figures out in midstream that
it really needs unicode.join() instead, unicode.join() can actually get
all the sequence elements (i.e., there's no guarantee that the sequence
passed to string.join() can be iterated over *again* by unicode.join(),
so string.join() must not pass on the original sequence object anymore).
NEEDS DOC CHANGES.
This one surprised me! While I expected tuple() to be a no-brainer, turns
out it's actually dripping with consequences:
1. It will *allow* the popular PySequence_Fast() to work with any iterable
object (code for that not yet checked in, but should be trivial).
2. It caused two std tests to fail. This because some places used
PyTuple_Sequence() (the C spelling of tuple()) as an indirect way to test
whether something *is* a sequence. But tuple() code only looked for the
existence of sq->item to determine that, and e.g. an instance passed
that test whether or not it supported the other operations tuple()
needed (e.g., __len__). So some things the tests *expected* to fail
with an AttributeError now fail with a TypeError instead. This looks
like an improvement to me; e.g., test_coercion used to produce 559
TypeErrors and 2 AttributeErrors, and now they're all TypeErrors. The
error details are more informative too, because the places calling this
were *looking* for TypeErrors in order to replace the generic tuple()
"not a sequence" msg with their own more specific text, and
AttributeErrors snuck by that.
the code necessary to accomplish this is simpler and faster if confined to
the object implementations, so we only do this there.
This causes no behaviorial changes beyond a (very slight) speedup.
need to be specified in the type structures independently. The flag
exists only for binary compatibility.
This is a "source cleanliness" issue and introduces no behavioral changes.
dictionary size was comparing ma_size, the hash table size, which is
always a power of two, rather than ma_used, wich changes on each
insertion or deletion. Fixed this.
to no longer insist that len(seq) be defined.
NEEDS DOC CHANGES.
This is meant to be a model for how other functions of this ilk (max,
filter, etc) can be generalized similarly. Feel encouraged to grab your
favorite and convert it!
Note some cute consequences:
list(file) == file.readlines() == list(file.xreadlines())
list(dict) == dict.keys()
list(dict.iteritems()) = dict.items()
list(xrange(i, j, k)) == range(i, j, k)
object's type didn't define tp_print, there were still cases where the
full "print uses str() which falls back to repr()" semantics weren't
honored. This resulted in
>>> print None
<None object at 0x80bd674>
>>> print type(u'')
<type object at 0x80c0a80>
Fixed this by always using the appropriate PyObject_Repr() or
PyObject_Str() call, rather than trying to emulate what they would do.
Also simplified PyObject_Str() to always fall back on PyObject_Repr()
when tp_str is not defined (rather than making an extra check for
instances with a __str__ method). And got rid of the special case for
strings.
Patch #419651: Metrowerks on Mac adds 0x itself
C std says %#x and %#X conversion of 0 do not add the 0x/0X base marker.
Metrowerks apparently does. Mark Favas reported the same bug under a
Compaq compiler on Tru64 Unix, but no other libc broken in this respect
is known (known to be OK under MSVC and gcc).
So just try the damn thing at runtime and see what the platform does.
Note that we've always had bugs here, but never knew it before because
a relevant test case didn't exist before 2.1.
Fix a very old flaw in PyObject_Print(). Amazing! When an object
type defines tp_str but not tp_repr, 'print x' to a real file
object would not call the tp_str slot but rather print a default style
representation: <foo object at 0x....>. This even though 'print x' to
a file-like-object would correctly call the tp_str slot.
patch for sharing single character Unicode objects.
Martin's patch had to be reworked in a number of ways to take Unicode
resizing into consideration as well. Here's what the updated patch
implements:
* Single character Unicode strings in the Latin-1 range are shared
(not only ASCII chars as in Martin's original patch).
* The ASCII and Latin-1 codecs make use of this optimization,
providing a noticable speedup for single character strings. Most
Unicode methods can use the optimization as well (by virtue
of using PyUnicode_FromUnicode()).
* Some code cleanup was done (replacing memcpy with Py_UNICODE_COPY)
* The PyUnicode_Resize() can now also handle the case of resizing
unicode_empty which previously resulted in an error.
* Modified the internal API _PyUnicode_Resize() and
the public PyUnicode_Resize() API to handle references to
shared objects correctly. The _PyUnicode_Resize() signature
changed due to this.
* Callers of PyUnicode_FromUnicode() may now only modify the Unicode
object contents of the returned object in case they called the API
with NULL as content template.
Note that even though this patch passes the regression tests, there
may still be subtle bugs in the sharing code.
sees it (test_iter.py is unchanged).
- Added a tp_iternext slot, which calls the iterator's next() method;
this is much faster for built-in iterators over built-in types
such as lists and dicts, speeding up pybench's ForLoop with about
25% compared to Python 2.1. (Now there's a good argument for
iterators. ;-)
- Renamed the built-in sequence iterator SeqIter, affecting the C API
functions for it. (This frees up the PyIter prefix for generic
iterator operations.)
- Added PyIter_Check(obj), which checks that obj's type has a
tp_iternext slot and that the proper feature flag is set.
- Added PyIter_Next(obj) which calls the tp_iternext slot. It has a
somewhat complex return condition due to the need for speed: when it
returns NULL, it may not have set an exception condition, meaning
the iterator is exhausted; when the exception StopIteration is set
(or a derived exception class), it means the same thing; any other
exception means some other error occurred.
new slot tp_iter in type object, plus new flag Py_TPFLAGS_HAVE_ITER
new C API PyObject_GetIter(), calls tp_iter
new builtin iter(), with two forms: iter(obj), and iter(function, sentinel)
new internal object types iterobject and calliterobject
new exception StopIteration
new opcodes for "for" loops, GET_ITER and FOR_ITER (also supported by dis.py)
new magic number for .pyc files
new special method for instances: __iter__() returns an iterator
iteration over dictionaries: "for x in dict" iterates over the keys
iteration over files: "for x in file" iterates over lines
TODO:
documentation
test suite
decide whether to use a different way to spell iter(function, sentinal)
decide whether "for key in dict" is a good idea
use iterators in map/filter/reduce, min/max, and elsewhere (in/not in?)
speed tuning (make next() a slot tp_next???)
I know some people don't like this -- if it's really controversial,
I'll take it out again. (If it's only Alex Martelli who doesn't like
it, that doesn't count as "real controversial" though. :-)
That's why this is a separate checkin from the iterators stuff I'm
about to check in next.
PyTuple_New() could *conceivably* clear the dict, so move the test for
an empty dict after the tuple allocation. It means that we waste time
allocating and deallocating a 2-tuple when the dict is empty, but who
cares. It also means that when the dict is empty *and* there's no
memory to allocate a 2-tuple, we raise MemoryError, not KeyError --
but that may actually a good idea: if there's no room for a lousy
2-tuple, what are the chances that there's room for a KeyError
instance?
and reported to python-dev: because we were calling dict_resize() in
PyDict_Next(), and because GC's dict_traverse() uses PyDict_Next(),
and because PyTuple_New() can cause GC, and because dict_items() calls
PyTuple_New(), it was possible for dict_items() to have the dict
resized right under its nose.
The solution is convoluted, and touches several places: keys(),
values(), items(), popitem(), PyDict_Next(), and PyDict_SetItem().
There are two parts to it. First, we no longer call dict_resize() in
PyDict_Next(), which seems to solve the immediate problem. But then
PyDict_SetItem() must have a different policy about when *it* calls
dict_resize(), because we want to guarantee (e.g. for an algorithm
that Jeremy uses in the compiler) that you can loop over a dict using
PyDict_Next() and make changes to the dict as long as those changes
are only value replacements for existing keys using PyDict_SetItem().
This is done by resizing *after* the insertion instead of before, and
by remembering the size before we insert the item, and if the size is
still the same, we don't bother to even check if we might need to
resize. An additional detail is that if the dict starts out empty, we
must still resize it before the insertion.
That was the first part. :-)
The second part is to make keys(), values(), items(), and popitem()
safe against side effects on the dict caused by allocations, under the
assumption that if the GC can cause arbitrary Python code to run, it
can cause other threads to run, and it's not inconceivable that our
dict could be resized -- it would be insane to write code that relies
on this, but not all code is sane.
Now, I have this nagging feeling that the loops in lookdict probably
are blissfully assuming that doing a simple key comparison does not
change the dict's size. This is not necessarily true (the keys could
be class instances after all). But that's a battle for another day.
"%#x" % 0
blew up, at heart because C sprintf supplies a base marker if and only if
the value is not 0. I then fixed that, by tolerating C's inconsistency
when it does %#x, and taking away that *Python* produced 0x0 when
formatting 0L (the "long" flavor of 0) under %#x itself. But after talking
with Guido, we agreed it would be better to supply 0x for the short int
case too, despite that it's inconsistent with C, because C is inconsistent
with itself and with Python's hex(0) (plus, while "%#x" % 0 didn't work
before, "%#x" % 0L *did*, and returned "0x0"). Similarly for %#X conversion.
http://sourceforge.net/tracker/index.php?func=detail&aid=415514&group_id=5470&atid=105470
For short ints, Python defers to the platform C library to figure out what
%#x should do. The code asserted that the platform C returned a string
beginning with "0x". However, that's not true when-- and only when --the
*value* being formatted is 0. Changed the code to live with C's inconsistency
here. In the meantime, the problem does not arise if you format a long 0 (0L)
instead. However, that's because the code *we* wrote to do %#x conversions on
longs produces a leading "0x" regardless of value. That's probably wrong too:
we should drop leading "0x", for consistency with C, when (& only when) formatting
0L. So I changed the long formatting code to do that too.
must now initialize the extra field used by the weak-ref machinery to
NULL themselves, to avoid having to require PyObject_INIT() to check
if the type supports weak references and do it there. This causes less
work to be done for all objects (the type object does not need to be
consulted to check for the Py_TPFLAGS_HAVE_WEAKREFS bit).
frees. Note there doesn't seem to be any way to test LocalsToFast(),
because the instructions that trigger it are illegal in nested scopes
with free variables.
Fix allocation strategy for cells that are also formal parameters.
Instead of emitting LOAD_FAST / STORE_DEREF pairs for each parameter,
have the argument handling code in eval_code2() do the right thing.
A side-effect of this change is that cell variables that are also
arguments are listed at the front of co_cellvars in the order they
appear in the argument list.
hashable
This patch changes the behavior of slice objects in the following
manner:
- Slice objects are now comparable with other slice objects as though
they were logically tuples of (start,stop,step). The tuple is not
created in the comparison function, but the comparison behavior is
logically equivalent.
- Slice objects are not hashable. With the above change to being
comparable, slice objects now cannot be used as keys in dictionaries.
[I've edited the patch for style. Note that this fixes the problem
that dict[i:j] seemed to work but was meaningless. --GvR]
with free variables. Thanks to Martin v. Loewis for finding two of
the problems. This fixes SF buf 405583.
There is also a C API change: PyFrame_New() is reverting to its
pre-2.1 signature. The change introduced by nested scopes was a
mistake. XXX Is this okay between beta releases?
cell_clear(), the GC helper, must decref its reference to break
cycles.
frame_dealloc() must dealloc all cell vars and free vars in addition
to locals.
eval_code2() setup code must INCREF cells it copies out of the
closure.
The STORE_DEREF opcode implementation must DECREF the object it passes
to PyCell_Set().
May or may not be related to bug 407680 (obmalloc.c - looks like it's
corrupted). This repairs the illegal vrbl names, but leaves a pile of
illegal macro names (_THIS_xxx, _SYSTEM_xxx, _SET_HOOKS, _FETCH_HOOKS).
- In _portable_ftell(), try fgetpos() before ftello() and ftell64().
I ran into a situation on a 64-bit capable Linux where the C
library's ftello() and ftell64() returned negative numbers despite
fpos_t and off_t both being 64-bit types; fgetpos() did the right
thing.
- Define a new typedef, Py_off_t, which is either fpos_t or off_t,
depending on which one is 64 bits. This removes the need for a lot
of #ifdefs later on. (XXX Should this be moved to pyport.h? That
file currently seems oblivious to large fille support, so for now
I'll leave it here where it's needed.)
set a function attribute on a method (either bound or unbound). This
reverts to Python 2.0 behavior that no attributes of the method are
writable, but provides a more informative error message.
release the interned string dictionary. This is useful for memory
use debugging because it eliminates a huge source of noise from the
reports. Only defined when INTERN_STRINGS is defined.
and the test for errors, so that an error in the default compare
doesn't go undetected. This fixes SF Bug #132933 (submitted by
effbot) -- list.sort doesn't detect comparision errors.
This fixes SF bug #132008, reported by Warren J. Hack.
The copyright for this patch (and this patch only) belongs to CNRI, as
part of the (yet to be issued) 1.6.1 release.
This is now checked into the HEAD branch. Tim will check in a test
case to check for this specific bug, and an assertion in
PyArgs_ParseTuple() to catch similar bugs in the future.
This change eliminates an extra malloc/free when a frame with free
variables is created. Any cell vars or free vars are stored in
f_localsplus after the locals and before the stack.
eval_code2() fills in the appropriate values after handling
initialization of locals.
To track the size the frame has an f_size member that tracks the total
size of f_localsplus. It used to be implicitly f_nlocals + f_stacksize.
* Removed func_hash and func_compare, so they can be treated as immutable
content-less objects (address hash and comparison)
* Added tests to that affect to test_funcattrs (also testing func_code
is writable)
* Reverse meaning of tests in test_opcodes which checked identical code
gets identical functions
The majority of the changes are in the compiler. The mainloop changes
primarily to implement the new opcodes and to pass a function's
closure to eval_code2(). Frames and functions got new slots to hold
the closure.
Include/compile.h
Add co_freevars and co_cellvars slots to code objects.
Update PyCode_New() to take freevars and cellvars as arguments
Include/funcobject.h
Add func_closure slot to function objects.
Add GetClosure()/SetClosure() functions (and corresponding
macros) for getting at the closure.
Include/frameobject.h
PyFrame_New() now takes a closure.
Include/opcode.h
Add four new opcodes: MAKE_CLOSURE, LOAD_CLOSURE, LOAD_DEREF,
STORE_DEREF.
Remove comment about old requirement for opcodes to fit in 7
bits.
compile.c
Implement changes to code objects for co_freevars and co_cellvars.
Modify symbol table to use st_cur_name (string object for the name
of the current scope) and st_cur_children (list of nested blocks).
Also define st_nested, which might more properly be called
st_cur_nested. Add several DEF_XXX flags to track def-use
information for free variables.
New or modified functions of note:
com_make_closure(struct compiling *, PyCodeObject *)
Emit LOAD_CLOSURE opcodes as needed to pass cells for free
variables into nested scope.
com_addop_varname(struct compiling *, int, char *)
Emits opcodes for LOAD_DEREF and STORE_DEREF.
get_ref_type(struct compiling *, char *name)
Return NAME_CLOSURE if ref type is FREE or CELL
symtable_load_symbols(struct compiling *)
Decides what variables are cell or free based on def-use info.
Can now raise SyntaxError if nested scopes are mixed with
exec or from blah import *.
make_scope_info(PyObject *, PyObject *, int, int)
Helper functions for symtable scope stack.
symtable_update_free_vars(struct symtable *)
After a code block has been analyzed, it must check each of
its children for free variables that are not defined in the
block. If a variable is free in a child and not defined in
the parent, then it is defined by block the enclosing the
current one or it is a global. This does the right logic.
symtable_add_use() is now a macro for symtable_add_def()
symtable_assign(struct symtable *, node *)
Use goto instead of for (;;)
Fixed bug in symtable where name of keyword argument in function
call was treated as assignment in the scope of the call site. Ex:
def f():
g(a=2) # a was considered a local of f
ceval.c
eval_code2() now take one more argument, a closure.
Implement LOAD_CLOSURE, LOAD_DEREF, STORE_DEREF, MAKE_CLOSURE>
Also: When name error occurs for global variable, report that the
name was global in the error mesage.
Objects/frameobject.c
Initialize f_closure to be a tuple containing space for cellvars
and freevars. f_closure is NULL if neither are present.
Objects/funcobject.c
Add support for func_closure.
Python/import.c
Change the magic number.
Python/marshal.c
Track changes to code objects.
PyObject_Dump(): New function that is useful when debugging Python's C
runtime. In something like gdb it can be a pain to get some useful
information out of PyObject*'s. This function prints the str() of the
object to stderr, along with the object's refcount and hex address.
PyGC_Dump(): Similar to PyObject_Dump() but knows how to cast from the
garbage collector prefix back to the PyObject* structure.
[See Misc/gdbinit for some useful gdb hooks]
none_dealloc(): Rather than SEGV if we accidentally decref None out of
existance, we assign None's and NotImplemented's destructor slot to
this function, which just calls abort().
Barry, that comment belongs in the code, not in the checkin msg.
The code *used* to do this correctly (as you well know, since you
& I went thru considerable pain to fix this the first time).
However, because the *reason* for the convolution wasn't recorded
in the code as a comment, somebody threw it all away the first
time it got reworked.
c-code-isn't-often-self-explanatory-ly y'rs - tim
default_3way_compare(): Stick the checkin message from 2.110 in a
comment.
to integer types (i.e. Py_uintptr_t, our spelling of C9X's uintptr_t).
ANSI specifies that pointer compares other than == and != to
non-related structures are undefined. This quiets an Insure
portability warning.
del'ing func.func_dict. I took the opportunity to also clean up some
other nits with the code, namely core dumps when del'ing func_defaults
and KeyError instead of AttributeError when del'ing a non-existant
function attribute.
Specifically,
func_memberlist: Move func_dict and __dict__ into here instead of
special casing them in the setattro and getattro methods. I don't
remember why I took them out of here before I first uploaded the PEP
232 patch. :/
func_getattro(): No need to special case __dict__/func_dict since
their now in the func_memberlist and PyMember_Get() should Do The
Right Thing (i.e. transforms NULL values into Py_None).
func_setattro(): Document the intended behavior of del'ing or setting
to None one of the special func_* attributes. I.e.:
func_code - can only be set to a code object. It can't be del'd
or set to None.
func_defaults - can be del'd. Can only be set to None or a tuple.
func_dict - can be del'd. Can only be set to None or a
dictionary.
Fix core dumps and incorrect exceptions as described above. Also, if
we're del'ing an arbitrary function attribute but func_dict is NULL,
don't create func_dict before discovering that we'll get an
AttributeError anyway.
implementation details inside the ucnhash module.
also cleaned up the unicode copyright blurb a little; Secret Labs'
internal revision history isn't that interesting...
Also fixes two long-standing bugs (present in 2.0):
1. .join() didn't check that the result size fit in an int.
2. string.join(s) when len(s)==1 returned s[0] regardless of s[0]'s
type; e.g., "".join([3]) returned 3 (overly optimistic optimization).
I resisted a keen temptation to make .join() apply str() automagically.
I found where rich comparison of unequal recursive objects gave
unintuituve results. In a discussion with Tim, where we discovered
that our intuition on when a<=b should be true was failing, we decided
to outlaw ordering comparisons on recursive objects. (Once we have
fixed our intuition and designed a matching algorithm that's practical
and reasonable to implement, we can allow such orderings again.)
- Refactored the recursive-object comparison framework; more is now
done in the support routines so less needs to be done in the calling
routines (even at the expense of slowing it down a bit -- this
should normally never be invoked, it's mostly just there to avoid
blowing up the interpreter).
- Changed the framework so that the comparison operator used is also
stored. (The dictionary now stores triples (v, w, op) instead of
pairs (v, w).)
- Changed the nesting limit to a more reasonable small 20; this only
slows down comparisons of very deeply nested objects (unlikely to
occur in practice), while speeding up comparisons of recursive
objects (previously, this would first waste time and space on 500
nested comparisons before it would start detecting recursion).
- Changed rich comparisons for recursive objects to raise a ValueError
exception when recursion is detected for ordering oprators (<, <=,
>, >=).
Unrelated change:
- Moved PyObject_Unicode() to just under PyObject_Str(), where it
belongs. MAL's patch must've inserted in a random spot between two
functions in the file -- between two helpers for rich comparison...
exceptions when compared using <, <=, > or >=.
NOTE: This is a tentative change: this means that cmp() involving
complex numbers will raise an exception when the numbers differ, and
that in turn means that e.g. dictionaries and certain other compounds
(e.g. UserLists) containing complex numbers can't be compared either.
So we'll have to decide whether this is acceptable. The alpha test
cycle is a good time to keep an eye on this!
- Use PyObject_RichCompareBool() when comparing keys; this makes the
error handling cleaner.
- There were two implementations for dictionary comparison, an old one
(#ifdef'ed out) and a new one. Got rid of the old one, which was
abandoned years ago.
- In the characterize() function, part of dictionary comparison, use
PyObject_RichCompareBool() to compare keys and values instead. But
continue to use PyObject_Compare() for comparing the final
(deciding) elements.
- Align the comments in the type struct initializer.
Note: I don't implement rich comparison for dictionaries -- there
doesn't seem to be much to be gained. (The existing comparison
already decides that shorter dicts are always smaller than longer
dicts.)
- tuplecontains(): call RichCompare(Py_EQ).
- Get rid of tuplecompare(), in favor of new tuplerichcompare() (a
clone of list_compare()).
- Aligned the comments for large struct initializers.
earlier coercion changes, not by rich comparisons. When a coercion
function returns 1 (meaning it cannot do it), it should not INCREF the
arguments. When no __coerce__() method was found, instance_coerce()
originally returned 0, pretending it did it. Neil changed the return
value to 1, more accurately reflecting that it didn't do anything, but
forgot to take out the two INCREF calls.
- sort's docompare() calls RichCompare(Py_LT).
- list_contains(), list_index(), listcount(), listremove() call
RichCompare(Py_EQ).
- Get rid of list_compare(), in favor of new list_richcompare(). The
latter does some nice shortcuts, like when == or != is requested, it
first compares the lengths for trivial accept/reject. Then it goes
over the items until it finds an index where the items differe; then
it does more shortcut magic to minimize the number of additional
comparisons.
- Aligned the comments for large struct initializers.
- Use the compare nesting level and in-progress dictionary properly in
PyObject_RichCompare().
- Change the in-progress code to use static variables instead of
globals (both the nesting level and the key for the thread dict were
globals but have no reason to be globals; the key can even be a
function-static variable in get_inprogress_dict()).
- Rewrote try_rich_to_3way_compare() to benefit from the similarity of
the three cases, making it table-driven.
- In try_rich_to_3way_compare(), test for EQ before LT and GT. This
turns out essential when comparing recursive UserList instances;
with the old code, these would recurse into rich comparison three
times for each nesting level up to NESTING_LIMIT/2, making the total
number of calls in the order of 3**(NESTING_LIMIT/2)!
NOTE: I'm not 100% comfortable with this. It works for the standard
test suite (which compares a few trivial recursive data structures
only), but I'm not sure that the in-progress dictionary is used
properly by the rich comparison code. Jeremy suggested that maybe the
operation should be included in the dict. Currently I presume that
objects in the dict are equal unless proven otherwise, and I set the
outcome for the rich comparison accordingly: true for operators EQ,
LE, GE, and false for the other three. But Jeremy seems to think that
there may be counter-examples where this doesn't do the right thing.
except that it always returns Unicode objects.
A new C API PyObject_Unicode() is also provided.
This closes patch #101664.
Written by Marc-Andre Lemburg. Copyright assigned to Guido van Rossum.
- Got rid of instance_cmp(); refactored instance_compare().
- Added instance_richcompare() which calls __lt__() etc.
Some unrelated stuff mixed in:
- Aligned comments in various large struct initializers.
- Better test to avoid recursion if __coerce__ returns self as the
first argument (this is an unrelated fix by Neil Schemenauer!).
- Style nit: don't use Py_DECREF(Py_NotImplemented); use
Py_DECREF(result) -- it just looks better. :-)
PyObject_RichCompare() and PyObject_RichCompareBool().
XXX Note: the code that checks for deeply nested rich comparisons is
bogus -- it assumes the two objects are always identical, rather than
using the same logic as PyObject_Compare(). I'll fix that later.
simpler if we use fgetpos and fsetpos, rather than trying to mess with
platform-specific TELL64 alternatives.
Of course, this hasn't been tested on a 64-bit platform, so I may have
to withdraw this -- but I'm hopeful, and Trent Mick supports this
patch!
in case the parameters are out of bounds and fixes error handling
for .count(), .startswith() and .endswith() for the case of
mixed string/Unicode objects.
This patch adds Python style index semantics to PyUnicode_Count()
indices (including the special handling of negative indices).
The patch is an extended version of patch #103249 submitted
by Michael Hudson (mwh) on SF. It also includes new test cases.
Closes SF patch #103123.
funcobject.h:
PyFunctionObject: add the func_dict slot.
funcobject.c:
PyFunction_New(): Initialize the func_dict slot to NULL.
func_getattr(): Rename to func_getattro() and change the
signature. It's more efficient to use attro methods and dig the C
string out than it is to re-convert a C string to a PyString.
Also, add support for getting the __dict__ (a.k.a. func_dict)
attribute, and for getting an arbitrary function attribute.
func_setattr(): Rename to func_setattro() and change the signature
for the same reason. Also add support for setting __dict__
(a.k.a. func_dict) and any arbitrary function attribute.
func_dealloc(): Be sure to DECREF the func_dict slot.
func_traverse(): Be sure to traverse func_dict too.
PyFunction_Type: make the necessary func_?etattro() changes.
classobject.c:
instancemethod_memberlist: Add __dict__
instancemethod_setattro(): New method to set arbitrary attributes
on methods (really the underlying im_func). Raise TypeError when
the instance is bound or when you're trying to set one of the
reserved im_* attributes.
instancemethod_getattr(): Renamed to instancemethod_getattro()
since that's what it really is. Also, added support fo getting
arbitrary attributes through the im_func.
PyMethod_Type: Do the ?etattr{,o} dance.
object.
This fixes potential overflows in xrange()'s internal calculations on
64-bit platforms. The fix is complicated because the sq_length slot
function can only return an int; we want to support
xrange(sys.maxint), which is a 64-bit quantity on most 64-bit
platforms (except Win64). The solution is hacky but the best
possible: when the range is that long, we can use it in a for loop but
we can't ask for its length (nor can we actually iterate beyond
2**31-1, because the sq_item slot function has the same restrictions
on its arguments. Fixing those restrictions is a project for another
day...
faster than the other. Should be faster for Mark Favas's 254-character
mail log lines, and *is* 3-4% quicker for my test case with much shorter
lines (but they're typical of *my* text files, and I'm tired of optimizing
for everyone else at my expense <wink> -- in fact, the only one who loses
here is Guido ...).
Tim discovered another "bug" in my get_line() code: while the comments
said that n<0 was invalid, it was in fact still called with n<0 (when
PyFile_GetLine() was called with n<0). In that case fortunately
executed the same code as for n==0.
Changed the comment to admit this fact, and changed Tim's MS speed
hack code to use 'n <= 0' as the criteria for the speed hack.
code duplication is to let us get away without a realloc whenever possible;
boosted the init buf size (the cutoff at which we *can* get away without
a realloc) from 100 to 200 so that more files can enjoy this boost; and
allowed other threads to run in all cases. The last two cost something,
but not significantly: in my fat test case, less than a 1% slowdown total.
Since my test case has a great many short lines, that's probably the worst
slowdown, too. While the logic barely changed, there were lots of edits.
This also gets rid of the reference to fp->_cnt, so the last platform
assumption being made here is that fgets doesn't overwrite bytes
capriciously (== beyond the terminating null byte it must write).
variant that never needs to "search from the right".
Also fixed unlikely memory leak in get_line, if string size overflows INTMAX.
Also new std test test_bufio to make sure .readline() works.
realized that this behavior is already present in PyFile_GetLine(),
which is the only place that needs it. A little refactoring of that
function made get_line_raw() redundant.
The mapping dictionaries can now contain 1-n mappings, meaning
that character ordinals may be mapped to strings or Unicode object,
e.g. 0x0078 ('x') -> u"abc", causing the ordinal to be replaced by
the complete string or Unicode object instead of just one character.
Another feature introduced by the patch is that of mapping oridnals to
the emtpy string. This allows removing characters.
The patch is different from patch #103100 in that it does not cause a
performance hit for the normal use case of 1-1 mappings.
Written by Marc-Andre Lemburg, copyright assigned to Guido van Rossum.
- The raw_input() functionality is moved to a separate function.
- Drop GNU getline() in favor of getc_unlocked(), which exists on more
platforms (and is even a tad faster on my system).
codec to not apply Latin-1 mappings for keys which are not found
in the mapping dictionaries, but instead treat them as undefined
mappings.
The patch was originally written by Martin v. Loewis with some
additional (cosmetic) changes and an updated test script
by Marc-Andre Lemburg.
The standard codecs were recreated from the most current files
available at the Unicode.org site using the Tools/scripts/gencodec.py
tool.
This patch closes the bugs #116285 and #119960.
raise ValueError. Checked in the patch as far as it went, but also changed
all of ints, longs and floats to raise ZeroDivisionError instead when raising
0 to a negative number. This is what 754-inspired stds require, as the "true
result" is an infinity obtained from finite operands, i.e. it's a singularity.
Also changed float pow to not be so timid about using its square-and-multiply
algorithm. Note that what math.pow does is unrelated to what builtin pow
does, and will still vary by platform.
result-object-pointer that is passed in, when an exception occurs during
coercion. The pointer has to be explicitly initialized in the caller to avoid
putting trash on the Python stack.
#define'd to an unreasonable value (several recent gcc systems have
misdefined it, causing bogus overflows in integer multiplication). Nuke
CHAR_BIT entirely.
after unicode_empty has been freed, otherwise it might not point to
the real start of the unicode_freelist. Final closure for SF bug
#110681, Jitterbug PR#398.
Add definitions of INT_MAX and LONG_MAX to pyport.h.
Remove includes of limits.h and conditional definitions of INT_MAX
and LONG_MAX elsewhere.
This closes SourceForge patch #101659 and bug #115323.
- use unidb compression for the unicodectype module. smaller, faster,
and slightly more portable...
(note: this commit doesn't include the unicodectype.c file itself; I'm
still waiting for the reviewers...)
I fixed the specific complaint but left the (many) large issues untouched.
See the (very long) bug report discussion for why:
http://sourceforge.net/bugs/?func=detailbug&group_id=5470&bug_id=110624
Note that while I left the interface to the undocumented public API function
PyFloat_FromString alone, its 2nd argument is useless. From a comment block
in the code:
RED_FLAG 22-Sep-2000 tim
PyFloat_FromString's pend argument is braindead. Prior to this RED_FLAG,
1. If v was a regular string, *pend was set to point to its terminating
null byte. That's useless (the caller can find that without any
help from this function!).
2. If v was a Unicode string, or an object convertible to a character
buffer, *pend was set to point into stack trash (the auto temp
vector holding the character buffer). That was downright dangerous.
Since we can't change the interface of a public API function, pend is
still supported but now *officially* useless: if pend is not NULL,
*pend is set to NULL.
Note a curious extension to the std C rules: x, X and o formatting can never produce
a sign character in C, so the '+' and ' ' flags are meaningless for them. But
unbounded ints *can* produce a sign character under these conversions (no fixed-
width bitstring is wide enough to hold all negative values in 2's-comp form). So
these flags become meaningful in Python when formatting a Python long which is too
big to fit in a C long. This required shuffling around existing code, which hacked
x and X conversions to death when both the '#' and '0' flags were specified: the
hacks weren't strong enough to deal with the simultaneous possibility of the ' ' or
'+' flags too, since signs were always meaningless before for x and X conversions.
Isomorphic shuffling was required in unicodeobject.c.
Also added dozens of non-trivial new unbounded-int test cases to test_format.py.
which implements the automatic conversion from Unicode to a string
object using the default encoding.
The new API is then put to use to have eval() and exec accept
Unicode objects as code parameter. This closes bugs #110924
and #113890.
As side-effect, the traditional C APIs PyString_Size() and
PyString_AsString() will also accept Unicode objects as
parameters.
objects for the attribute name. Unicode objects are converted to
a string using the default encoding before trying the lookup.
Note that previously it was allowed to pass arbitrary objects as
attribute name in case the tp_getattro/setattro slots were defined.
This patch fixes this by applying an explicit string check first:
all uses of these slots expect string objects and do not check
for the type resulting in a core dump. The tp_getattro/setattro
are still useful as optimization for lookups using interned
string objects though.
This patch fixes bug #113829.
that Py_INCREF boosts global _Py_RefTotal when Py_REF_DEBUG is defined
but Py_TRACE_REFS isn't.
There are, IMO, way too many preprocessor gimmicks in use for refcount
debugging (at least 3 distinct true/false symbols, but not all 8 combos
are supported by the code, etc etc), and no coherent documentation of
this stuff -- 'twas too painful to track this one down.
all, either to see whether the # of chars fit in an int, or that the
amount of memory needed fit in a size_t. Checking these is expensive, but
the alternative is silently wrong answers (as in the bug report) or
core dumps (which were easy to provoke using Unicode strings).
exception context. This avoids improperly propogating errors raised by
a user-defined __cmp__() by a subsequent lookup operation.
This patch does *not* include the performance enhancement patch for
dictionaries with string keys only; that will be checked in separately.
This closes SourceForge patch #101277 and bug #112558.
file.writelines() now tries to emulate the behaviour of file.write()
as closely as possible. Due to the problems with releasing the
interpreter lock the solution isn't exactly optimal, but still better
than not supporting the file.write() semantics at all.
types (i.e. Py_uintptr_t, our spelling of C9X's uintptr_t). ANSI
specifies that pointer compares other than == and != to non-related
structures are undefined. This quiets an Insure portability warning.
scope. Previously, s_buffer[] was defined inside the
PyUnicode_Check() scope, but referred to in the outer scope via
assignment to s. This quiets an Insure portability warning.
to integer types (i.e. Py_uintptr_t, our spelling of C9X's uintptr_t).
ANSI specifies that pointer compares other than == and != to
non-related structures are undefined. This quiets an Insure
portability warning.
is no __getslice__ available. Also does the same for C extension types.
Includes rudimentary documentation (it could use a cross reference to the
section on slice objects, I couldn't figure out how to do that) and a test
suite for all Python __hooks__ I could think of, including the new
behaviour.
shutdown time, but CVS log entry for revision 2.45 explains why this
is so. Simply include a comment so we don't have to re-figure it out
again 5 years from now.
This was a misleading bug -- the true "bug" was that hash(x) gave an error
return when x is an infinity. Fixed that. Added new Py_IS_INFINITY macro to
pyport.h. Rearranged code to reduce growing duplication in hashing of float and
complex numbers, pushing Trent's earlier stab at that to a logical conclusion.
Fixed exceedingly rare bug where hashing of floats could return -1 even if there
wasn't an error (didn't waste time trying to construct a test case, it was simply
obvious from the code that it *could* happen). Improved complex hash so that
hash(complex(x, y)) doesn't systematically equal hash(complex(y, x)) anymore.
resized after creation. 0-length strings are usually shared
and _PyString_Resize() fails on these shared strings.
Fixes [ Bug #111667 ] unicode core dump.
Properly end a comment block. It was terminated fine later but by a subsequent
block and. It was also in #if 0. This patch is so trivial I can't believe I am
talking about it. :)
function (together with other locale aware ones) should into a new collation
support module. See python-dev for a discussion of this removal.
Note: This patch should also be applied to the 1.6 branch.
the Python Unicode implementation.
The internal buffer used for implementing the buffer protocol
is renamed to defenc to make this change visible. It now holds the
default encoded version of the Unicode object and is calculated
on demand (NULL otherwise).
Since the default encoding defaults to ASCII, this will mean that
Unicode objects which hold non-ASCII characters will no longer
work on C APIs using the "s" or "t" parser markers. C APIs must now
explicitly provide Unicode support via the "u", "U" or "es"/"es#"
parser markers in order to work with non-ASCII Unicode strings.
(Note: this patch will also have to be applied to the 1.6 branch
of the CVS tree.)
This doesn't change the copyright status for these files -- just the
markings! Doing it on the main branch for these three files for which
the HEAD revision was pushed back into 1.6.
The UTF-8 decoder is still buggy (i.e. it doesn't pass Markus Kuhn's
stress test), mainly due to the following construct:
#define UTF8_ERROR(details) do { \
if (utf8_decoding_error(&s, &p, errors, details)) \
goto onError; \
continue; \
} while (0)
(The "continue" statement is supposed to exit from the outer loop,
but of course, it doesn't. Indeed, this is a marvelous example of
the dangers of the C programming language and especially of the C
preprocessor.)
comments, docstrings or error messages. I fixed two minor things in
test_winreg.py ("didn't" -> "Didn't" and "Didnt" -> "Didn't").
There is a minor style issue involved: Guido seems to have preferred English
grammar (behaviour, honour) in a couple places. This patch changes that to
American, which is the more prominent style in the source. I prefer English
myself, so if English is preferred, I'd be happy to supply a patch myself ;)
use PyString_AS_STRING macro on local string object
when resizing string, make sure resized string will always be big enough
split string containing error message across two lines
add test to string_tests that causes resizing
seqlen==1 clause, before returning item, we need to DECREF seq. In
the res=PyString... failure clause, we need to goto finally to also
decref seq (and the DECREF of res in finally is changed to a
XDECREF). Also, we need to DECREF seq just before the
PyUnicode_Join() return.
implementation -- use PySequence_Fast interface to iterate over elements
interface -- if instance object reports wrong length, ignore it;
previous version raised an IndexError if reported length was too high
value is calculated from the character values, in a way
that makes sure an 8-bit ASCII string and a unicode string
with the same contents get the same hash value.
(as a side effect, this also works for ISO Latin 1 strings).
for more details, see the python-dev discussion.
was cascades of warnings about mismatching const decls. Overall,
I think const creates lots of headaches and solves almost
nothing. Added enough consts to shut up the warnings, but
this did require casting away const in one spot too (another
usual outcome of starting down this path): the function
mymemreplace can't return const char*, but sometimes wants to
return its first argument as-is, which latter must be declared
const char* in order to avoid const warnings at mymemreplace's
call sites. So, in the case the function wants to return the
first arg, that arg's declared constness must be subverted.
This was a convenient excuse to create the pyport.h file recently
discussed!
Please use new Py_ARITHMETIC_RIGHT_SHIFT when right-shifting a
signed int and you *need* sign-extension. This is #define'd in
pyport.h, keying off new config symbol SIGNED_RIGHT_SHIFT_ZERO_FILLS.
If you're running on a platform that needs that symbol #define'd,
the std tests never would have worked for you (in particular,
at least test_long would have failed).
The autoconfig stuff got added to Python after my Unix days, so
I don't know how that works. Would someone please look into doing
& testing an auto-config of the SIGNED_RIGHT_SHIFT_ZERO_FILLS
symbol? It needs to be defined if & only if, e.g., (-1) >> 3 is
not -1.
Stein -- thanks!). Incidentally removed all the Py_PROTO macros
from object.h, as they prevented my editor from magically finding
the definitions of the "coercion", "cmpfunc" and "reprfunc"
typedefs that were being redundantly applied in longobject.c.
works just like the Unicode one. The C APIs match the ones in the Unicode
implementation, but were extended to be able to reuse the existing
Unicode codecs for string purposes too.
Conversions from string to Unicode and back are done using the
default encoding.
implementation. This was really to test whether my new CVS+SSH
setup is more usable than the old one -- and turns out it is (for
whatever reason, it was impossible to do a commit before that
involved more than one directory).
The common technique for printing out a pointer has been to cast to a long
and use the "%lx" printf modifier. This is incorrect on Win64 where casting
to a long truncates the pointer. The "%p" formatter should be used instead.
The problem as stated by Tim:
> Unfortunately, the C committee refused to define what %p conversion "looks
> like" -- they explicitly allowed it to be implementation-defined. Older
> versions of Microsoft C even stuck a colon in the middle of the address (in
> the days of segment+offset addressing)!
The result is that the hex value of a pointer will maybe/maybe not have a 0x
prepended to it.
Notes on the patch:
There are two main classes of changes:
- in the various repr() functions that print out pointers
- debugging printf's in the various thread_*.h files (these are why the
patch is large)
Closes SourceForge patch #100505.
errors in some of the hash algorithms. For exmaple, in float_hash and
complex_hash a certain part of the value is not included in the hash
calculation. See Tim's, Guido's, and my discussion of this on
python-dev in May under the title "fix float_hash and complex_hash for
64-bit *nix"
(2) The hash algorithms that use pointers (e.g. func_hash, code_hash)
are universally not correct on Win64 (they assume that sizeof(long) ==
sizeof(void*))
As well, this patch significantly cleans up the hash code. It adds the
two function _Py_HashDouble and _PyHash_VoidPtr that the various
hashing routine are changed to use.
These help maintain the hash function invariant: (a==b) =>
(hash(a)==hash(b))) I have added Lib/test/test_hash.py and
Lib/test/output/test_hash to test this for some cases.
Avoid calling the dealloc function, previously triggered with
DECREF(inst). This caused a segfault in PyDict_GetItem, called with a
NULL dict, whenever inst->in_dict fails under low-memory conditions.
This patch modifies the type structures of objects that
participate in GC. The object's tp_basicsize is increased when
GC is enabled. GC information is prefixed to the object to
maintain binary compatibility. GC objects also define the
tp_flag Py_TPFLAGS_GC.
Fixed a bug in PyUnicode_Count() which would have caused a
core dump in case of substring coercion failure.
Synchronized .count() with the string method of the same name
to return len(s)+1 for s.count('').
The following patch adds "sq_contains" support to rangeobject, and enables
the already-written support for sq_contains in listobject and tupleobject.
The rangeobject "contains" code should be a bit more efficient than the
current default "in" implementation ;-) It might not get used much, but it's
not that much to add.
listobject.c and tupleobject.c already had code for sq_contains, and the
proper struct member was set, but the PyType structure was not extended to
include tp_flags, so the object-specific code was not getting called (Go
ahead, test it ;-). I also did this for the immutable_list_type in
listobject.c, eventhough it is probably never used. Symmetry and all that.
Fixed %c formatting to check for one character arguments. Thanks
to Finn Bock for finding this bug.
Added a fix for bug PR#348 which originated from not resetting
the globals correctly in _PyUnicode_Fini().
Change the default encoding to 'ascii' (it was previously
defined as UTF-8).
Note: The implementation still uses UTF-8 to implement
the buffer protocol, so C APIs will still see UTF-8. This
is on purpose: rather than fixing the Unicode implementation,
the C APIs should be made Unicode aware.
This patch correct bounds checking in PyLong_FromLongLong. Currently, it does
not check properly for negative values when checking to see if the incoming
value fits in a long or unsigned long. This results in possible silent
truncation of the value for very large negative values.
Added support for user settable default encodings. The
current implementation uses a per-process global which
defines the value of the encoding parameter in case it
is set to NULL (meaning: use the default encoding).
Fix the string methods that implement slice-like semantics with
optional args (count, find, endswith, etc.) to properly handle
indeces outside [INT_MIN, INT_MAX]. Previously the "i" formatter
for PyArg_ParseTuple was used to get the indices. These could overflow.
This patch changes the string methods to use the "O&" formatter with
the slice_index() function from ceval.c which is used to do the same
job for Python code slices (e.g. 'abcabcabc'[0:1000000000L]).
Fix the string methods that implement slice-like semantics with
optional args (count, find, endswith, etc.) to properly handle
indeces outside [INT_MIN, INT_MAX]. Previously the "i" formatter
for PyArg_ParseTuple was used to get the indices. These could overflow.
This patch changes the string methods to use the "O&" formatter with
the slice_index() function from ceval.c which is used to do the same
job for Python code slices (e.g. 'abcabcabc'[0:1000000000L]). slice_index()
is renamed _PyEval_SliceIndex() and is now exported. As well, the return
values for success/fail were changed to make slice_index directly
usable as required by the "O&" formatter.
[GvR: shouldn't a similar patch be applied to unicodeobject.c?]
gave bogus results for chars in the range 128-255, because their
implementation was using signed characters. Fixed this by using
unsigned character pointers (as opposed to using Py_CHARMASK()).
For more comments, read the patches@python.org archives.
For documentation read the comments in mymalloc.h and objimpl.h.
(This is not exactly what Vladimir posted to the patches list; I've
made a few changes, and Vladimir sent me a fix in private email for a
problem that only occurs in debug mode. I'm also holding back on his
change to main.c, which seems unnecessary to me.)
Fixed a reference leak in the allocator.
Renamed utf8_string to _PyUnicode_AsUTF8String() and made
it external for use by other parts of the interpreter.
The previous checkin (2.84) added a PyErr_Format call that made the
cost of raising an AttributeError much more expensive. In general
this doesn't matter, except that checks for __init__ and
__del__ methods, where exceptions are caught and cleared in C, also
got much more expensive.
The fix is to split instance_getattr1 into two calls:
instance_getattr2 checks the instance and the class for the attribute
and returns it or returns NULL on error. It does not raise an
exception.
instance_getattr1 does rexec checks, then calls instance_getattr2. It
raises an exception if instance_getattr2 returns NULL.
PyInstance_New and instance_dealloc now call instance_getattr2
directly.
Improvements:
- does no longer need any extra memory
- has no relationship to tstate
- works in debug mode
- can easily be modified for free threading (hi Greg:)
Side effects:
Trashcan does change the order of object destruction.
Prevending that would be quite an immense effort, as
my attempts have shown. This version works always
the same, with debug mode or not. The slightly
changed destruction order should therefore be no problem.
Algorithm:
While the old idea of delaying the destruction of some
obejcts at a certain recursion level was kept, we now
no longer aloocate an object to hold these objects.
The delayed objects are instead chained together
via their ob_type field. The type is encoded via
ob_refcnt. When it comes to the destruction of the
chain of waiting objects, the topmost object is popped
off the chain and revived with type and refcount 1,
then it gets a normal Py_DECREF.
I am confident that this solution is near optimum
for minimizing side effects and code bloat.
_PyTuple_Resize(). In addition, a change suggested by Jeremy Hylton
to limit the size of the free lists is also merged into this patch.
Charles wrote initially:
"""
Test Case: run the following code:
class Nothing:
def __len__(self):
return 5
def __getitem__(self, i):
if i < 3:
return i
else:
raise IndexError, i
def g(a,*b,**c):
return
for x in xrange(1000000):
g(*Nothing())
and watch Python's memory use go up and up.
Diagnosis:
The analysis begins with the call to PySequence_Tuple at line 1641 in
ceval.c - the argument to g is seen to be a sequence but not a tuple,
so it needs to be converted from an abstract sequence to a concrete
tuple. PySequence_Tuple starts off by creating a new tuple of length
5 (line 1122 in abstract.c). Then at line 1149, since only 3 elements
were assigned, _PyTuple_Resize is called to make the 5-tuple into a
3-tuple. When we're all done the 3-tuple is decrefed, but rather than
being freed it is placed on the free_tuples cache.
The basic problem is that the 3-tuples are being added to the cache
but never picked up again, since _PyTuple_Resize doesn't make use of
the free_tuples cache. If you are resizing a 5-tuple to a 3-tuple and
there is already a 3-tuple in free_tuples[3], instead of using this
tuple, _PyTuple_Resize will realloc the 5-tuple to a 3-tuple. It
would more efficient to use the existing 3-tuple and cache the
5-tuple.
By making _PyTuple_Resize aware of the free_tuples (just as
PyTuple_New), we not only save a few calls to realloc, but also
prevent this misbehavior whereby tuples are being added to the
free_tuples list but never properly "recycled".
"""
And later:
"""
This patch replaces my submission of Sun, 16 Apr and addresses Jeremy
Hylton's suggestions that we also limit the size of the free tuple
list. I chose 2000 as the maximum number of tuples of any particular
size to save.
There was also a problem with the previous version of this patch
causing a core dump if Python was built with Py_TRACE_REFS. This is
fixed in the below version of the patch, which uses tupledealloc
instead of _Py_Dealloc.
"""
The maxsplit functionality in .splitlines() was replaced by the keepends
functionality which allows keeping the line end markers together
with the string.
Added support for '%r' % obj: this inserts repr(obj) rather
than str(obj).
The maxsplit functionality in .splitlines() was replaced by the keepends
functionality which allows keeping the line end markers together
with the string.
* New exported API PyUnicode_Resize()
* The experimental Keep-Alive optimization was turned back
on after some tweaks to the implementation. It should now
work without causing core dumps... this has yet to tested
though (switching it off is easy: see the unicodeobject.c
file for details).
* Fixed a memory leak in the Unicode freelist cleanup code.
* Added tests to correctly process the return code from
_PyUnicode_Resize().
* Fixed a bug in the 'ignore' error handling routines
of some builtin codecs. Added test cases for these to
test_unicode.py.
* string_contains now calls PyUnicode_Contains() only when the other
operand is a Unicode string (not whenever it's not a string).
* New format style '%r' inserts repr(arg) instead of str(arg).
* '...%s...' % u"abc" now coerces to Unicode just like
string methods. Care is taken not to reevaluate already formatted
arguments -- only the first Unicode object appearing in the
argument mapping is looked up twice. Added test cases for
this to test_unicode.py.
In line with a similar checkin to object.c a while ago, this patch
gives a more descriptive error message for an attribute error on a
class instance. The message now looks like:
AttributeError: 'Descriptor' instance has no attribute 'GetReturnType'
his copy of test_contains.py seems to be broken -- the lines he
deleted were already absent). Checkin messages:
New Unicode support for int(), float(), complex() and long().
- new APIs PyInt_FromUnicode() and PyLong_FromUnicode()
- added support for Unicode to PyFloat_FromString()
- new encoding API PyUnicode_EncodeDecimal() which converts
Unicode to a decimal char* string (used in the above new
APIs)
- shortcuts for calls like int(<int object>) and float(<float obj>)
- tests for all of the above
Unicode compares and contains checks:
- comparing Unicode and non-string types now works; TypeErrors
are masked, all other errors such as ValueError during
Unicode coercion are passed through (note that PyUnicode_Compare
does not implement the masking -- PyObject_Compare does this)
- contains now works for non-string types too; TypeErrors are
masked and 0 returned; all other errors are passed through
Better testing support for the standard codecs.
Misc minor enhancements, such as an alias dbcs for the mbcs codec.
Changes:
- PyLong_FromString() now applies the same error checks as
does PyInt_FromString(): trailing garbage is reported
as error and not longer silently ignored. The only characters
which may be trailing the digits are 'L' and 'l' -- these
are still silently ignored.
- string.ato?() now directly interface to int(), long() and
float(). The error strings are now a little different, but
the type still remains the same. These functions are now
ready to get declared obsolete ;-)
- PyNumber_Int() now also does a check for embedded NULL chars
in the input string; PyNumber_Long() already did this (and
still does)
Followed by:
Looks like I've gone a step too far there... (and test_contains.py
seem to have a bug too).
I've changed back to reporting all errors in PyUnicode_Contains()
and added a few more test cases to test_contains.py (plus corrected
the join() NameError).
Attached you find an update of the Unicode implementation.
The patch is against the current CVS version. I would appreciate
if someone with CVS checkin permissions could check the changes
in.
The patch contains all bugs and patches sent this week and also
fixes a leak in the codecs code and a bug in the free list code
for Unicode objects (which only shows up when compiling Python
with Py_DEBUG; thanks to MarkH for spotting this one).
This (1) avoids thread unsafety whereby another thread could zap the
list while we were using it, and (2) now supports writing arbitrary
sequences of strings.
Added wrapping macros to dictobject.c, listobject.c, tupleobject.c,
frameobject.c, traceback.c that safely prevends core dumps
on stack overflow. Macros and functions in object.c, object.h.
The method is an "elevator destructor" that turns cascading
deletes into tail recursive behavior when some limit is hit.
diagnostics.
*** INCOMPATIBLE CHANGE: This changes append(), remove(), index(), and
*** count() to require exactly one argument -- previously, multiple
*** arguments were silently assumed to be a tuple.
messages from "OverflowError: integer pow()" to "OverflowError:
integer exponentiation". (Not that this takes care of the complaint
in general that the error messages could be greatly improved. :-)
trailing 'L' is appended to the representation,
otherwise not.
All existing call sites are modified to pass true for
addL.
Remove incorrect statement about external use of this
function from elsewhere; it's static!
long_str(): Handler for the tp_str slot in the type object.
Identical to long_repr(), but passes false as the addL
parameter of long_format().
specifier came from an int expression instead of a constant in the
format, a negative width was truncated to zero instead of taken to
mean the same as that negative constant plugged into the format. E.g.
"(%*s)" % (-5, "foo") yielded "(foo)" while "(%-5s)" yields "(foo )".
Now both yield the latter -- like sprintf() in C.
1. Fixes float divmod so that the quotient it returns is always an integral
value.
2. Fixes float % and float divmod so that the remainder always gets the
right sign (the current code uses a "are the signs different?" test that
doesn't work half the time <wink> when the product of the divisor and the
remainder underflows to 0).
a block cannot be freed, add its free items back to the free list, and
add its valid ints back to the small_ints array if they are in range.
This is necessary to avoid leaking when Python is reinitialized later.
represented by an explicit structure. (There are still too many casts
in the code, but that may be unavoidable.)
Also added code so that with -vv it is very chatty about what it does.
buffer increment, and sometimes the new buffer size. Make it do what
its name says, and fix the one place where this matters to the caller.
Also add a comment explaining why we call lseek() and then ftell().
The MS compiler doesn't call it 'long long', it uses __int64,
so a new #define, LONG_LONG, has been added and all occurrences
of 'long long' are replaced with it.
Previously, this said "unsubscriptable object"; in 1.5.1, the reverse
problem existed, where None[''] would complain about a non-integer
index. This fix does the right thing in all cases (for get, set and
del item).
before calling it. This check was there when the objects were of the
same type *before* coercion, but not if they initially differed but
became the same *after* coercion.
Sparc Solaris 2.6 (fully patched!) that I don't want to dig into, but
which I suspect is a bug in the multithreaded malloc library that only
shows up when run on a multiprocessor. (The program wasn't using
threads, it was just using the multithreaded C library.)
faster (using PyList_GetSlice()). Also added a test for a NULL
argument, as with PySequence_Tuple(). (Hmm... Better names for these
two would be PyList_FromSequence() and PyTuple_FromSequence(). Oh well.)
"indefinite length" sequences. These should still have a length, but
the length is only used as a hint -- the actual length of the sequence
is determined by the item that raises IndexError, which may be either
smaller or larger than what len() returns. (This is a novelty; map(),
filter() and reduce() only allow the actual length to be larger than
what len() returns, not shorter. I'll fix that shortly.)
conversions. Formerly, for example, int('-') would return 0 instead
of raising ValueError, and int(' 0') would raise ValueError
(complaining about a null byte!) instead of 0...
+ Took the "list" argument out of the other functions that no longer need
it. This speeds things up a little more.
+ Small comment changes in accord with that.
+ Exploited the now-safe ability to cache values in the partitioning loop.
Makes no timing difference on my flavor of Pentium, but this machine ran out
of registers 12 iterations ago. It should yield a small speedup on a RISC
machine, and not hurt in any case.
instead of testing whether the list changed size after each
comparison, temporarily set the type of the list to an immutable list
type. This should allow continued use of the list for legitimate
purposes but disallows all operations that can change it in any way.
(Changes to the internals of list items are not caught, of cause;
that's not possible to detect, and it's not necessary to protect the
sort code, either.)
not in restricted mode.
__dict__ can be set to any dictionary; the cl_getattr, cl_setattr and
cl_delattr slots are refreshed.
__name__ can be set to any string.
__bases__ can be set to to a tuple of classes, provided they are not
subclasses of the class whose attribute is being assigned.
__getattr__, __setattr__ and __delattr__ can be set to anything, or
deleted; the appropriate slot (cl_getattr, cl_setattr, cl_delattr) is
refreshed.
(Note: __name__ really doesn't need to be a special attribute, but
that would be more work.)
From: "Tim Peters" <tim_one@email.msn.com>
To: "Guido van Rossum" <guido@CNRI.Reston.VA.US>
Date: Sat, 23 May 1998 21:45:53 -0400
Guido, the overflow checking in PyLong_AsLong is off a little:
1) If the C in use sign-extends right shifts on signed longs, there's a
spurious overflow error when converting the most-negative int:
Python 1.5.1 (#0, Apr 13 1998, 20:22:04) [MSC 32 bit (Intel)] on win32
Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam
>>> x = -1L << 31
>>> x
-2147483648L
>>> int(x)
Traceback (innermost last):
File "<stdin>", line 1, in ?
OverflowError: long int too long to convert
>>>
2) If C does not sign-extend, some genuine overflows won't be caught.
The attached should repair both, and, because I installed a new disk and a C
compiler today, it's even been compiled this time <wink>.
Python 1.5.1 (#0, May 23 1998, 20:24:58) [MSC 32 bit (Intel)] on win32
Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam
>>> x = -1L << 31
>>> x
-2147483648L
>>> int(x)
-2147483648
>>> int(-x)
Traceback (innermost last):
File "<stdin>", line 1, in ?
OverflowError: long int too long to convert
>>> int(-x-1)
2147483647
>>> int(x-1)
Traceback (innermost last):
File "<stdin>", line 1, in ?
OverflowError: long int too long to convert
>>>
end-casing-ly y'rs - tim
Make sure that no tp_as_numbers->nb_<whatever> function is called
without checking for a NULL pointer. Marc-Andre Lemburg will love it!
(Except that he's just rewritten all this code for a different
approach to coercions ;-( )
programming style.
Recoded many routines to incorporate better error checking, and/or
better versions of the same function found elsewhere
(e.g. bltinmodule.c or ceval.c). In particular,
Py_Number_{Int,Long,Float}() now convert from strings, just like the
built-in functions int(), long() and float().
Sequences and mappings are now safe to have NULL function pointers
anywhere in their tp_as_sequence or tp_as_mapping fields. (A few
places in other files need to be checked in too.)
Renamed PySequence_In() to PySequence_Contains().
clear_carefully() used to do in import.c. Differences: leave only
__builtins__ alone in the 2nd pass; and don't clear the dictionary (on
the theory that as long as there are references left to the
dictionary, those might be destructors that might expect __builtins__
to be alive when they run; and __builtins__ can't normally be part of
a cycle).
PyNumber_Coerce() except that when the coercion can't be done and no
other exceptions happen, it returns 1 instead of raising an
exception.
Use this function in PyObject_Compare() to avoid raising an exception
simply because two objects with numeric behavior can't be coerced to a
common type; instead, proceed with the non-numeric default comparison.
Note that this is a somewhat questionable practice -- comparisons for
numeric objects shouldn't default to random behavior like this, but it
is required for backward compatibility. (Case in point, it broke
comparison of kjDict objects to integers in Aaron Watters' kjbuckets
extension.) A correct fix (for python 2.0) should involve a different
definiton of comparison altogether.
sys.stdin.readline(), you get a fatal error (no current thread). This
is because there was a call to PyErr_CheckSignals() while there was no
current thread. I wonder how many more of these we find... I bnetter
go hunting for PyErr_CheckSignals() now...
in libmath.a so they are available to mathmodule.so (in case it is
shared). While this still gets triggered on Solaris 2.x, this appears
to be harmless there.
__getitem__(). This method never raises an exception; if the key is
not in the dictionary, the second (optional) argument is returned. If
the second argument is not provided and the key is missing, None is
returned.
mapp_methods: added "get" method.
arbitrary nested parens in a %(...)X style format.
#Also folded two lines and added more detail to the error message for
#unsupported format character.
former lets you give an instance a set of new instance vars. The
latter lets you give it a new class. Both are typechecked and
disallowed in restricted mode.
For classes, the check for read-only special attributes is tightened
so that only assignments to __dict__, __bases__, __name__,
__getattr__, __setattr__, and __delattr__ (these could be made to work
as well, but I don't know if that's useful -- let's see first whether
mucking with instances will help).
from the interned table. There are references in hard-to-find static
variables all over the interpreter, and it's not worth trying to get
rid of all those; but "uninterning" isn't fair either and may cause
subtle failures later -- so we have to keep them in the interned
table.
Also get rid of no-longer-needed insert of None in interned dict.
no valid directory is passed in. This prevents __del__ to fail when
invoked after __builtins__ has already been discarded.
Also add PyFrame_Fini() to discard the cache of frames.
In _Py_PrintReferences(), no longer suppress once-referenced string.
Add Py_Malloc and friends and PyMem_Malloc and friends (malloc
wrappers for third parties).
complexity saved much any more. A simple benchmark (grail) showed
that there were 3 times as many misses as hits, and the same number of
times again the code was bypassed altogether due to the existence of
setattro/getattro.
this many bytes have been read, readlines stops. Because of
buffering, the amount of bytes read is usually at least 8K more than
the hint.
Also changed read() and readline() to use PyArg_ParseTuple().
(Note that the *previous* checkin also fixed error handling and
narrowed the range of thread unblocking for all methods using
fread().)
see if we can guess the #bytes until the end of the file. If we
can't, increment the buffer size increments up to 0.5Meg to avoid
realloc'ing too much.
The table size is now constrained to be a power of two, and we use a
variable increment based on GF(2^n)-{0} (not that I have the faintest
idea what that is :-) which helps avoid the expensive '%' operation.
Some of the entries in the table of polynomials have been modified
according to a post by Tim Peters.
Rather than allocating a list object for the fast locals and another
(extensible one) for the value stack and allocating the block stack
dynamically, allocate the block stack with a fixed size (CO_MAXBLOCKS
from compile.h), and stick the locals and value stack at the end of
the object (this is now possible since the stack size is known
beforehand). Get rid of the owner field and the nvalues argument --
it is available in the code object, like nlocals.
This requires small changes in ceval.c only.
sequence, otherwise
operator.indexOf([4, 3, 2, 1], 9) would raise a SystemError!
Note: it might be wise to double check all these functions. I haven't
done that yet.
object pointers. Should be a bit faster than the C library's qsort(),
and doesn't have the prohibition on recursion that Solaris qsort() has
in the threaded version of their C library.
Thanks to discussions with Tim Peters.
defines that a shorter dictionary is always smaller than a longer one.
For dictionaries of the same size, the smallest differing element
determines the outcome (which yields the same results as before,
without explicit sorting).
be Ellipsis!).
Bumped the API version because a linker-visible symbol is affected.
Old C code will still compile -- there's a b/w compat macro.
Similarly, old Python code will still run, builtin exports both
Ellipses and Ellipsis.
Removed im_doc attribute; __name__ and __doc__ are now handled by
special casing in instancemethodgetattr(). This saves a few bytes and
INCREF/DECREF calls per i.m. object allocation/deallocation.
getcounts() returns a list of counts of allocations and
deallocations for all different object types.
getobjects(n [, type ]) returns a list of recently allocated
and not-yet-freed objects of the given type (all
objects if no type given). Only the n most recent
(all if n==0) objects are returned.
getcounts is only available if compiled with -DCOUNT_ALLOCS,
getobjects is only available if compiled with -DTRACE_REFS. Note that
everything must be compiled with these options!
entirely redone operator overloading. The rules for class
instances are now much more relaxed than for other built-in types
(whose coerce must still return two objects of the same type)
* Objects/floatobject.c: add overflow check when converting float
to int and implement truncation towards zero using ceil/float
* Objects/longobject.c: change ValueError to OverflowError when
converting to int
* Objects/rangeobject.c: modernized
* Objects/stringobject.c: use HAVE_LIMITS instead of __STDC__
* Objects/xxobject.c: changed to use new style (not finished?)
and __setattr__ support to override getattr(x, name) and
setattr(x, name, value) for class instances. This uses a special
hack whereby the class is supposed to be static: the __getattr__
and __setattr__ methods are looked up only once and saved in the
instance structure for speed
* funcobject.c (func_repr): don't call getstringvalue(None) for anonymous
functions.
* bltinmodule.c: removed lambda (which is now a built-in function);
removed implied lambda for string arg to filter/map/reduce.
* Grammar, graminit.[ch], compile.[ch]: replaced lambda as built-in
function by lambda as grammar entity: instead of "lambda('x: x+1')" you
write "lambda x: x+1".
* Xtmodule.c (checkargdict): return 0, not NULL, for error.
* posixmodule.c: don't prototype getcwd() -- it's not portable...
* mappingobject.c: double-check validity of last_name_char in
dict{lookup,insert,remove}.
* arraymodule.c: need memmove only for non-STDC Suns.
* Makefile: comment out HTML_LIBS and XT_USE by default
* pythonmain.c: don't prototype getopt() -- it's not standardized
* socketmodule.c: cast flags arg to {get,set}sockopt() and addrbuf arg to
recvfrom() to (ANY*).
* pythonrun.c (initsigs): fix prototype, make it static
* intobject.c (LONG_BIT): only #define it if not already defined
* classobject.[ch]: remove all references to unused instance_convert()
* mappingobject.c (getmappingsize): Don't return NULL in int function.
each dir in sys.path, try each possible extension. (Note: C extensions
are loaded before Python modules in the same directory, to allow having
a C version used when dynamic loading is supported and a Python version
as a back-up.)
* import.c (reload_module): test for error from getmodulename()
* moduleobject.c: implement module name as dict entry '__name__' instead
of special-casing it in module_getattr(); this way a module (or
function!) can access its own module name, and programs that know what
they are doing can rename modules.
* stdwinmodule.c (initstdwin): strip ".py" suffix of argv[0].
* {tuple,list,mapping,array}object.c: call printobject with 0 for flags
* compile.c (parsestr): use quote instead of '\'' at one crucial point
* arraymodule.c (array_getattr): Added __members__ attribute
* object.[ch], bltinmodule.c, fileobject.c: changed str() to call
strobject() which calls an object's __str__ method if it has one.
strobject() is also called by writeobject() when PRINT_RAW is passed.
* ceval.c: rationalize code for PRINT_ITEM (no change in function!)
* funcobject.c, codeobject.c: added compare and hash functionality.
Functions with identical code objects and the same global dictionary are
equal. Code objects are equal when their code, constants list and names
list are identical (i.e. the filename and code name don't count).
(hash doesn't work yet since the constants are in a list and lists can't
be hashed -- suppose this should really be done with a tuple now we have
resizetuple!)
setlistslice() can be used to cut the unused part out of a freshly made
slice (as done by bagof()). [needed by the next mod!]
* structural changes to bagof(), map() etc.
* PROTO.h, mymalloc.h: added #ifdefs for TURBOC and GNUC.
* allobjects.h: added #include "rangeobject.h"
* Grammar: added lambda_input; relaxed syntax for exec.
* bltinmodule.c: added bagof, map, reduce, lambda, xrange.
* tupleobject.[ch]: added resizetuple().
* rangeobject.[ch]: new object type to speed up range operations (not
convinced this is needed!!!)
* Grammar: add exec statement; allow testlist in expr statement.
* ceval.c, compile.c, opcode.h: support exec statement;
avoid optimizing locals when it is used
* fileobject.{c,h}: add getfilename() internal function.
shared. The default is to save references to the integers in
the range -1..99. The lower limit can be set by defining
NSMALLNEGINTS (absolute value of smallest integer to be saved)
and NSMALLPOSINTS (1 more than the largest integer to be
saved).
tupleobject.c: Save a reference to the empty tuple to be returned
whenever a tuple of size 0 is requested. Tuples of size 1
upto, but not including, MAXSAVESIZE (default 20) are put in
free lists when deallocated. When MAXSAVESIZE equals 1, only
share references to the empty tuple, when MAXSAVESIZE equals
0, don't include the code at all and revert to the old
behavior.
object.c: Print some more statistics when COUNT_ALLOCS is defined.
without .py file); Bill's dynamic loading for SunOS using shared
libraries.
pwdmodule.c (mkgrent): remove DECREF of uninitialized variable.
classobject.c (instance_getattr): Fix case when class lookup returns
unbound method instead of function.
image objects, and lots of new methods.
* Added counting of allocations and deallocations of builtin types if
COUNT_ALLOCS is defined. Had to move calls to NEWREF down in some
files.
* Bug fix in sorting lists.
objects of its derived classes; allow anything that has an attribute
named "__privileged__" access to anything.
* object.[ch]: added hasattr() -- test whether getattr() will succeed.
* many files: made some functions static; removed "extern int errno;".
* frozenmain.c: fixed bugs introduced on 24 June...
* flmodule.c: remove 1.5 bw compat hacks, add new functions in 2.2a
(and some old functions that were omitted).
* timemodule.c: added MSDOS floatsleep version .
* pgenmain.c: changed exit() to goaway() and added defn of goaway().
* intrcheck.c: add hack (to UNIX only) so interrupting 3 times
will exit from a hanging program. The second interrupt prints
a message explaining this to the user.
Added $(SYSDEF) to its build rule in Makefile.
* cgensupport.[ch], modsupport.[ch]: removed some old stuff. Also
changed files that still used it... And made several things static
that weren't but should have been... And other minor cleanups...
* listobject.[ch]: add external interfaces {set,get}listslice
* socketmodule.c: fix bugs in new send() argument parsing.
* sunaudiodevmodule.c: added flush() and close().
function found as instance data.
* socketmodule.c: added 'flags' argument sendto/recvfrom, rewrite
argument parsing in send/recv.
* More changes related to access (terminology change: owner instead of
class; allow any object as owner; local/global variables are owned
by their dictionary, only class/instance data is owned by the class;
"from...import *" now only imports objects with public access; etc.)
* Added "access *: ...", made access work for class methods.
* Introduced subclass check: make sure that when calling
ClassName.methodname(instance, ...), the instance is an instance of
ClassName or of a subclass thereof (this might break some old code!)
yet). The class is now passed to eval_code and stored in the current
frame. It is also stored in instance method objects. An "unbound"
instance method is now returned when a function is retrieved through
"classname.funcname", which when called passes the class to eval_code.
(1) dictionaries/mappings now have attributes values() and items() as
well as keys(); at the C level, use the new function mappinggetnext()
to iterate over a dictionary.
(2) "class C(): ..." is now illegal; you must write "class C: ...".
(3) Class objects now know their own name (finally!); and minor
improvements to the way how classes, functions and methods are
represented as strings.
(4) Added an "access" statement and semantics. (This is still
experimental -- as long as you don't use the keyword 'access' nothing
should be changed.)
before it.
* ceval.c, object.c: moved testbool() to object.c (now extern visible)
* stringobject.c: fix bugs in and rationalize string resize in formatstring()
* tokenizer.[ch]: fix non-working code for lines longer than BUFSIZ
f_fastlocals in a traceback object (this is a core dump hazard
if there are <nil> entries), but instead eval_code() merges the fast
locals back into the locals dictionary if it looks like the local
variables will be retained. Also, the merge routines save
exceptions since this is sometimes needed (alas!).
* Added id() to bltinmodule.c, which returns an object's address
(identity). Useful to walk arbitrary data structures containing
cycles.
* Added compile() to bltinmodule.c and compile_string() to
pythonrun.[ch]: support to exec/eval arbitrary code objects. The
code that defaults globals and locals is moved from run_node in
pythonrun.c (which is now identical to eval_node) to eval_code in
ceval.c. [XXX For elegance a clean-up session is necessary.]
lookup (opcode.h, ceval.[ch], compile.c, frameobject.[ch],
pythonrun.c, import.c). The .pyc MAGIC number is changed again.
Added get_menu_text to flmodule.
* Stubs for faster implementation of local variables (not yet finished)
* Added function name to code object. Print it for code and function
objects. THIS MAKES THE .PYC FILE FORMAT INCOMPATIBLE (the version
number has changed accordingly)
* Print address of self for built-in methods
* New internal functions getattro and setattro (getattr/setattr with
string object arg)
* Replaced "dictobject" with more powerful "mappingobject"
* New per-type functio tp_hash to implement arbitrary object hashing,
and hashobject() to interface to it
* Added built-in functions hash(v) and hasattr(v, 'name')
* classobject: made some functions static that accidentally weren't;
added __hash__ special instance method to implement hash()
* Added proper comparison for built-in methods and functions
* Fixcprt.py: added [-y file] option, do only files younger than file.
* modsupport.[ch]: added vmkvalue().
* intobject.c: use mkvalue().
* stringobject.c: added "formatstring"; renamed string* to string_*;
ceval.c: call formatstring for string % value.
* longobject.c: close memory leak in divmod.
* parsetok.c: set result node to NULL when returning an error.
listfontnames, bitmap ops.
* listobject.c: use mkvalue() when possible; avoid weird error when
calling append() without args.
* modsupport.c: new feature in getargs(): if the format string
contains a semicolor the string after that is used as the error
message instead of "bad argument list (format %s)" when there's an
error.
* various modules: added 1993 to copyright.
* thread.c: added copyright notice.
* ceval.c: minor change to error message for "+"
* stdwinmodule.c: check for error from wfetchcolor
* config.c: MS-DOS fixes (define PYTHONPATH, use DELIM, use osdefs.h)
* Add declaration of inittab to import.h
* sysmodule.c: added sys.builtin_module_names
* xxmodule.c, xxobject.c: fix minor errors
stdwinmodule.c: wsetfont can now return an error
Makefile: add CL_USE and CL_LIB*S; config.c: move CL part around
New things in imgfile; also in Makefile.
longobject.c: fix comparison of negative long ints... [REAL BUG!]
marshal.c: add dumps() and loads() to read/write strings
timemodule.c: make sure there's always a floatsleep()
posixmodule.c: rationalize struct returned by times()
Makefile: add test target, disable imgfile by default
thread.c: Improved coexistance with dl module (sjoerd)
stdwinmodule.c: Change include stdwin.h if macintosh
rotormodule.c: added missing last argument to RTR_?_region calls
confic.c: merged with configmac.c, added 1993 to copyright message
fileobject.c: int compared to NULL in writestring(); change fopenRF ifdef
timemodule.c: simplify times() using mkvalue; include myselect.h
earlier (for sequent).
posixmodule: for sequent, include unistd.h instead of explicit
extern definitions and don't define rename()
Makefile: change misleading/wrong MD5 comments
* posixmodule.c: move extern function declarations to top
* listobject.c: cmp() arguments must be void* if __STDC__
* Makefile, allobjects.h, panelmodule.c, modsupport.c: get rid of
strdup() -- it is a portability risk
* Makefile: enclosed ranlib command in parentheses for Sequent Make
which aborts if the command is not found even if '-' is present
* timemodule.c: time() returns a floating point number, in microsecond
precision if BSD_TIME is defined.
* flmodule.c: added {do,check}_only_forms to fl's list of functions;
and don't print a message when an unknown object is returned.
* pythonrun.c: catch SIGHUP and SIGTERM to do essential cleanup.
* Made jpegmodule.c smaller by using getargs() and mkvalue() consistently.
* Increased parser stack size to 500 in parser.h.
* Implemented custom allocation of stack frames to frameobject.c and
added dynamic stack overflow checks (value stack only) to ceval.c.
(There seems to be a bug left: sometimes stack traces don't make sense.)
sys.stderr or sys.stdin, and to work with any object as long as it has
a write() (respectively readline()) methods. Some functions that took
a FILE* argument now take an object* argument.
* flmodule.c: added some missing functions; changed readonly flags of
some data members based upon FORMS documentation.
* listobject.c: fixed int/long arg lint bug (bites PC compilers).
* several: removed redundant print methods (repr is good enough).
* posixmodule.c: added (still experimental) process group functions.
calls the repr function. When the refcount is bad, don't print
the object at all (chances of crashes).
Changes to checking and printing of references: the consistency
check is somewhat faster; don't print strings referenced once
(most occur in function's name lists).
argument to malloc() (size_t or unsigned int)
* listobject.c: check for overflow of the size of the object,
so things like range(0x7fffffff) will raise MemoryError instead
of calling malloc() with -4 (and then crashing -- malloc's fault)
coercion is now completely generic.
* ceval.c: for instances, don't coerce for + and *; * reverses
arguments if left one is non-instance numeric and right one sequence.
* socketmodule.c: get rid of makepair(); fix makesocketaddr to fix
broken recvfrom()
* socketmodule: get rid of getStrarg()
* ceval.h: move eval_code() to new file eval.h, so compile.h is no
longer needed.
* ceval.c: move thread comments to ceval.h; always make save/restore
thread functions available (for dynloaded modules)
* cdmodule.c, listobject.c: don't include compile.h
* flmodule.c: include ceval.h
* import.c: include eval.h instead of ceval.h
* cgen.py: add forground(); noport(); winopen(""); to initgl().
* bltinmodule.c, socketmodule.c, fileobject.c, posixmodule.c,
selectmodule.c:
adapt to threads (add BGN/END SAVE macros)
* stdwinmodule.c: adapt to threads and use a special stdwin lock.
* pythonmain.c: don't include getpythonpath().
* pythonrun.c: use BGN/END SAVE instead of direct calls; also more
BGN/END SAVE calls etc.
* thread.c: bigger stack size for sun; change exit() to _exit()
* threadmodule.c: use BGN/END SAVE macros where possible
* timemodule.c: adapt better to threads; use BGN/END SAVE; add
longsleep internal function if BSD_TIME; cosmetics
* split pythonmain.c in two: most stuff goes to pythonrun.c, in the library.
* new optional built-in threadmodule.c, build upon Sjoerd's thread.{c,h}.
* new module from Sjoerd: mmmodule.c (dynamically loaded).
* new module from Sjoerd: sv (svgen.py, svmodule.c.proto).
* new files thread.{c,h} (from Sjoerd).
* new xxmodule.c (example only).
* myselect.h: bzero -> memset
* select.c: bzero -> memset; removed global variable
arguments crashed in INCREF() calls which should be XINCREF() calls.
timemodule.c: fix for SEQUENT port (sys/select, struct timezone) by
Jaap Vermeulen
xxobject.c: include modsupport.h
must (pretend to) support all operations except assignments;
if you don't want to support an operation you have to provide
a dummy function that raises an exception...