elements when crunching a list, dict or tuple. Now takes linear time
instead -- huge speedup for even moderately large containers, and the
code is notably simpler too.
Added some basic "is the output correct?" tests to test_pprint.
1) it didn't obey the "start" parameter (and when it does, we must validate
the value)
2) the return value needs to be an absolute index, rather than relative to
some arbitrary point in the file
(checking CVS, it appears this method never worked; these changes bring it
into line with typical .find() behavior)
to reason that me_key is much more likely to match the key we're looking
for than to match dummy, and if the key is absent me_key is much more
likely to be NULL than dummy: most dicts don't even have a dummy entry.
Running instrumented dict code over the test suite and some apps confirmed
that matching dummy was 200-300x less frequent than matching key in
practice. So this reorders the tests to try the common case first.
It can lose if a large dict with many collisions is mostly deleted, not
resized, and then frequently searched, but that's hardly a case we
should be favoring.
The comment following used to say:
/* We use ~hash instead of hash, as degenerate hash functions, such
as for ints <sigh>, can have lots of leading zeros. It's not
really a performance risk, but better safe than sorry.
12-Dec-00 tim: so ~hash produces lots of leading ones instead --
what's the gain? */
That is, there was never a good reason for doing it. And to the contrary,
as explained on Python-Dev last December, it tended to make the *sum*
(i + incr) & mask (which is the first table index examined in case of
collison) the same "too often" across distinct hashes.
Changing to the simpler "i = hash & mask" reduced the number of string-dict
collisions (== # number of times we go around the lookup for-loop) from about
6 million to 5 million during a full run of the test suite (these are
approximate because the test suite does some random stuff from run to run).
The number of collisions in non-string dicts also decreased, but not as
dramatically.
Note that this may, for a given dict, change the order (wrt previous
releases) of entries exposed by .keys(), .values() and .items(). A number
of std tests suffered bogus failures as a result. For dicts keyed by
small ints, or (less so) by characters, the order is much more likely to be
in increasing order of key now; e.g.,
>>> d = {}
>>> for i in range(10):
... d[i] = i
...
>>> d
{0: 0, 1: 1, 2: 2, 3: 3, 4: 4, 5: 5, 6: 6, 7: 7, 8: 8, 9: 9}
>>>
Unfortunately. people may latch on to that in small examples and draw a
bogus conclusion.
test_support.py
Moved test_extcall's sortdict() into test_support, made it stronger,
and imported sortdict into other std tests that needed it.
test_unicode.py
Excluced cp875 from the "roundtrip over range(128)" test, because
cp875 doesn't have a well-defined inverse for unicode("?", "cp875").
See Python-Dev for excruciating details.
Cookie.py
Chaged various output functions to sort dicts before building
strings from them.
test_extcall
Fiddled the expected-result file. This remains sensitive to native
dict ordering, because, e.g., if there are multiple errors in a
keyword-arg dict (and test_extcall sets up many cases like that), the
specific error Python complains about first depends on native dict
ordering.
are including Carbon/Carbon.h in stead of the old headers (unless WITHOUT_FRAMEWORKS
is defined, as it will be for classic MacPython) and selectively disabling all the
stuff that is unneeded in a unix-Python (event handling, etc).
rather than the idle.py script. This has advantages and
disadvantages; the biggest advantage being that we can more easily
have an alternative main program.
Allow module getattr and setattr to exploit string interning, via the
previously null module object tp_getattro and tp_setattro slots. Yields
a very nice speedup for things like random.random and os.path etc.
When getting a string buffer for a string we just created, use
PyString_AS_STRING() instead of PyString_AsString() to avoid the
call overhead and extra type check.
For rich comparisons, use instance_getattr2() when possible to avoid
the expense of setting an AttributeError. Also intern the name_op[]
table and use the interned strings rather than creating a new string
and interning it each time through.
class without providing any information about the constructor. This
should be used for classes which only exist to act as containers rather
than as factories for instances.