Commit Graph

247 Commits

Author SHA1 Message Date
Raymond Hettinger 574aa32578 SF patch #798467: Update docstring of has_key for bool changes
(Contributed by George Yoshida.)
2003-09-01 22:12:08 +00:00
Raymond Hettinger c8d2290c8c SF patch #729395: Dictionary tuning
Adjust resize argument for dict.update() and dict.copy().
Extends the previous change to dict.__setitem__().
2003-05-07 00:49:40 +00:00
Raymond Hettinger 3539f6b895 SF patch #729395: Dictionary tuning
* Increase dictionary growth rate resulting in more sparse dictionaries,
  fewer lookup collisions, increased memory use, and better cache
  performance.  For dicts with over 50k entries, keep the current
  growth rate in case an application is suffering from tight memory
  constraints.

* Set the most common case (no resize) to fall-through the test.
2003-05-05 22:22:10 +00:00
Raymond Hettinger 930427b892 Add a reference to dictnotes.txt. It does no good if you don't know it's
there or where to find it.
2003-05-03 06:51:59 +00:00
Raymond Hettinger 1da1dbf458 Renamed PyObject_GenericGetIter to PyObject_SelfIter
to more accurately describe what the function does.

Suggested by Thomas Wouters.
2003-03-17 19:46:11 +00:00
Raymond Hettinger 0153826964 Created PyObject_GenericGetIter().
Factors out the common case of returning self.
2003-03-17 08:24:35 +00:00
Raymond Hettinger a3e1e4cd79 SF patch #693753: fix for bug 639806: default for dict.pop
(contributed by Michael Stone.)
2003-03-06 23:54:28 +00:00
Neal Norwitz 0732301738 Add closing ) in comment 2003-02-15 14:45:12 +00:00
Tim Peters 080c88b912 cPickle.c, load_build(): Taught cPickle how to pick apart
the optional proto 2 slot state.

pickle.py, load_build():  CAUTION:  Noted that cPickle's
load_build and pickle's load_build really don't do the same
things with the state, and didn't before this patch either.
cPickle never tries to do .update(), and has no backoff if
instance.__dict__ can't be retrieved.  There are no tests
that can tell the difference, and part of what cPickle's
load_build() did looked accidental to me, so I don't know
what the true intent is here.

pickletester.py, test_pickle.py:  Got rid of the hack for
exempting cPickle from running some of the proto 2 tests.

dictobject.c, PyDict_Next():  documented intended use.
2003-02-15 03:01:11 +00:00
Raymond Hettinger ea3fdf44a2 SF patch #659536: Use PyArg_UnpackTuple where possible.
Obtain cleaner coding and a system wide
performance boost by using the fast, pre-parsed
PyArg_Unpack function instead of PyArg_ParseTuple
function which is driven by a format string.
2002-12-29 16:33:45 +00:00
Martin v. Löwis 32b4a1ba62 Constify char* API. Fixes #651363. 2.2 candidate. 2002-12-11 13:21:12 +00:00
Tim Peters bca1cbc6f8 SF 548651: Fix the METH_CLASS implementation.
Most of these patches are from Thomas Heller, with long lines folded
by Tim.  The change to test_descr.py is from Guido.  See the bug report.

Not a bugfix candidate -- METH_CLASS is new in 2.3.
2002-12-09 22:56:13 +00:00
Raymond Hettinger e03e5b1f91 Remove assumption that cls is a subclass of dict.
Simplifies the code and gets Just van Rossum's example to work.
2002-12-07 08:10:51 +00:00
Raymond Hettinger b02bb5ed0a Replace BadInternalCall with TypeError. Add a test case. Fix whitespace.
Just van Rossum showed a weird, but clever way for pure python code to
trigger the BadInternalCall.  The C code had assumed that calling a class
constructor would return an instance of that class; however, classes that
abuse __new__ can invalidate that assumption.
2002-12-04 07:32:25 +00:00
Neal Norwitz ef786ae1a5 Add missing decref 2002-11-27 19:38:00 +00:00
Raymond Hettinger e33d3df030 SF Patch 643443. Added dict.fromkeys(iterable, value=None), a class
method for constructing new dictionaries from sequences of keys.
2002-11-27 07:29:33 +00:00
Just van Rossum a797d8150d Patch #642500 with slight modifications: allow keyword arguments in
dict() constructor. Example:
  >>> dict(a=1, b=2)
  {'a': 1, 'b': 2}
  >>>
2002-11-23 09:45:04 +00:00
Guido van Rossum efae8862fe In doc strings, use 'k in D' rather than D.has_key(k). 2002-09-04 11:29:45 +00:00
Guido van Rossum 45ec02aed1 SF patch 576101, by Oren Tirosh: alternative implementation of
interning.  I modified Oren's patch significantly, but the basic idea
and most of the implementation is unchanged.  Interned strings created
with PyString_InternInPlace() are now mortal, and you must keep a
reference to the resulting string around; use the new function
PyString_InternImmortal() to create immortal interned strings.
2002-08-19 21:43:18 +00:00
Jeremy Hylton 938ace69a0 staticforward bites the dust.
The staticforward define was needed to support certain broken C
compilers (notably SCO ODT 3.0, perhaps early AIX as well) botched the
static keyword when it was used with a forward declaration of a static
initialized structure.  Standard C allows the forward declaration with
static, and we've decided to stop catering to broken C compilers.  (In
fact, we expect that the compilers are all fixed eight years later.)

I'm leaving staticforward and statichere defined in object.h as
static.  This is only for backwards compatibility with C extensions
that might still use it.

XXX I haven't updated the documentation.
2002-07-17 16:30:39 +00:00
Guido van Rossum 2147df748f Make StopIteration a sink state. This is done by clearing out the
di_dict field when the end of the list is reached.  Also make the
error ("dictionary changed size during iteration") a sticky state.

Also remove the next() method -- one is supplied automatically by
PyType_Ready() because the tp_iternext slot is set.  That's a good
thing, because the implementation given here was buggy (it never
raised StopIteration).
2002-07-16 20:30:22 +00:00
Martin v. Löwis 14f8b4cfcb Patch #568124: Add doc string macros. 2002-06-13 20:33:02 +00:00
Guido van Rossum e027d9818f Add Raymond Hettinger's d.pop(). See SF patch 539949. 2002-04-12 15:11:59 +00:00
Neil Schemenauer 6189b89cc5 PyObject_GC_Del and PyObject_Del can now be used as a function
designators.

Remove PyMalloc_New.
2002-04-12 02:43:00 +00:00
Guido van Rossum 77f6a65eb0 Add the 'bool' type and its values 'False' and 'True', as described in
PEP 285.  Everything described in the PEP is here, and there is even
some documentation.  I had to fix 12 unit tests; all but one of these
were printing Boolean outcomes that changed from 0/1 to False/True.
(The exception is test_unicode.py, which did a type(x) == type(y)
style comparison.  I could've fixed that with a single line using
issubtype(x, type(y)), but instead chose to be explicit about those
places where a bool is expected.

Still to do: perhaps more documentation; change standard library
modules to return False/True from predicates.
2002-04-03 22:41:51 +00:00
Tim Peters 1f7df3595a Remove the CACHE_HASH and INTERN_STRINGS preprocessor symbols. 2002-03-29 03:29:08 +00:00
Guido van Rossum ff413af605 This is Neil's fix for SF bug 535905 (Evil Trashcan and GC interaction).
The fix makes it possible to call PyObject_GC_UnTrack() more than once
on the same object, and then move the PyObject_GC_UnTrack() call to
*before* the trashcan code is invoked.

BUGFIX CANDIDATE!
2002-03-28 20:34:59 +00:00
Neil Schemenauer dcc819a5c9 Use pymalloc if it's enabled. 2002-03-22 15:33:15 +00:00
Tim Peters f582b82fe9 SF bug #491415 PyDict_UpdateFromSeq2() unused
PyDict_UpdateFromSeq2():  removed it.
PyDict_MergeFromSeq2():  made it public and documented it.
PyDict_Merge() docs:  updated to reveal <wink> that the second
argument can be any mapping object.
2001-12-11 18:51:08 +00:00
Guido van Rossum dbb53d9918 Fix of SF bug #475877 (Mutable subtype instances are hashable).
Rather than tweaking the inheritance of type object slots (which turns
out to be too messy to try), this fix adds a __hash__ to the list and
dict types (the only mutable types I'm aware of) that explicitly
raises an error.  This has the advantage that list.__hash__([]) also
raises an error (previously, this would invoke object.__hash__([]),
returning the argument's address); ditto for dict.__hash__.

The disadvantage for this fix is that 3rd party mutable types aren't
automatically fixed.  This should be added to the rules for creating
subclassable extension types: if you don't want your object to be
hashable, add a tp_hash function that raises an exception.

Also, it's possible that I've forgotten about other mutable types for
which this should be done.
2001-12-03 16:32:18 +00:00
Tim Peters a427a2b8d0 Rename "dictionary" (type and constructor) to "dict". 2001-10-29 22:25:45 +00:00
Tim Peters 4d85953fe6 dictionary() constructor:
+ Change keyword arg name from "x" to "items".  People passing a mapping
  object can stretch their imaginations <wink>.
+ Simplify the docstring text.
2001-10-27 18:27:48 +00:00
Tim Peters 1fc240e851 Generalize dictionary() to accept a sequence of 2-sequences. At the
outer level, the iterator protocol is used for memory-efficiency (the
outer sequence may be very large if fully materialized); at the inner
level, PySequence_Fast() is used for time-efficiency (these should
always be sequences of length 2).

dictobject.c, new functions PyDict_{Merge,Update}FromSeq2.  These are
wholly analogous to PyDict_{Merge,Update}, but process a sequence-of-2-
sequences argument instead of a mapping object.  For now, I left these
functions file static, so no corresponding doc changes.  It's tempting
to change dict.update() to allow a sequence-of-2-seqs argument too.

Also changed the name of dictionary's keyword argument from "mapping"
to "x".  Got a better name?  "mapping_or_sequence_of_pairs" isn't
attractive, although more so than "mosop" <wink>.

abstract.h, abstract.tex:  Added new PySequence_Fast_GET_SIZE function,
much faster than going thru the all-purpose PySequence_Size.

libfuncs.tex:
- Document dictionary().
- Fiddle tuple() and list() to admit that their argument is optional.
- The long-winded repetitions of "a sequence, a container that supports
  iteration, or an iterator object" is getting to be a PITA.  Many
  months ago I suggested factoring this out into "iterable object",
  where the definition of that could include being explicit about
  generators too (as is, I'm not sure a reader outside of PythonLabs
  could guess that "an iterator object" includes a generator call).
- Please check my curly braces -- I'm going blind <0.9 wink>.

abstract.c, PySequence_Tuple():  When PyObject_GetIter() fails, leave
its error msg alone now (the msg it produces has improved since
PySequence_Tuple was generalized to accept iterable objects, and
PySequence_Tuple was also stomping on the msg in cases it shouldn't
have even before PyObject_GetIter grew a better msg).
2001-10-26 05:06:50 +00:00
Guido van Rossum 9475a2310d Enable GC for new-style instances. This touches lots of files, since
many types were subclassable but had a xxx_dealloc function that
called PyObject_DEL(self) directly instead of deferring to
self->ob_type->tp_free(self).  It is permissible to set tp_free in the
type object directly to _PyObject_Del, for non-GC types, or to
_PyObject_GC_Del, for GC types.  Still, PyObject_DEL was a tad faster,
so I'm fearing that our pystone rating is going down again.  I'm not
sure if doing something like

void xxx_dealloc(PyObject *self)
{
	if (PyXxxCheckExact(self))
		PyObject_DEL(self);
	else
		self->ob_type->tp_free(self);
}

is any faster than always calling the else branch, so I haven't
attempted that -- however those types whose own dealloc is fancier
(int, float, unicode) do use this pattern.
2001-10-05 20:51:39 +00:00
Tim Peters 0ab085c4cb Changed the dict implementation to take "string shortcuts" only when
keys are true strings -- no subclasses need apply.  This may be debatable.

The problem is that a str subclass may very well want to override __eq__
and/or __hash__ (see the new example of case-insensitive strings in
test_descr), but go-fast shortcuts for strings are ubiquitous in our dicts
(and subclass overrides aren't even looked for then).  Another go-fast
reason for the change is that PyCheck_StringExact() is a quicker test
than PyCheck_String(), and we make such a test on virtually every access
to every dict.

OTOH, a str subclass may also be perfectly happy using the base str eq
and hash, and this change slows them a lot.  But those cases are still
hypothetical, while Python's own reliance on true-string dicts is not.
2001-09-14 00:25:33 +00:00
Tim Peters b95ec09a44 Repair typo in comment. 2001-09-02 18:35:54 +00:00
Tim Peters 25786c0851 Make dictionary() a real constructor. Accepts at most one argument, "a
mapping object", in the same sense dict.update(x) requires of x (that x
has a keys() method and a getitem).
Questionable:  The other type constructors accept a keyword argument, so I
did that here too (e.g., dictionary(mapping={1:2}) works).  But type_call
doesn't pass the keyword args to the tp_new slot (it passes NULL), it only
passes them to the tp_init slot, so getting at them required adding a
tp_init slot to dicts.  Looks like that makes the normal case (i.e., no
args at all) a little slower (the time it takes to call dict.tp_init and
have it figure out there's nothing to do).
2001-09-02 08:22:48 +00:00
Neil Schemenauer e83c00efd0 Use new GC API. 2001-08-29 23:54:21 +00:00
Martin v. Löwis e3eb1f2b23 Patch #427190: Implement and use METH_NOARGS and METH_O. 2001-08-16 13:15:00 +00:00
Guido van Rossum 05ac6de2d5 Add PyDict_Merge(a, b, override):
PyDict_Merge(a, b, 1) is the same as PyDict_Update(a, b).
PyDict_Merge(a, b, 0) does something similar but leaves existing items
unchanged.
2001-08-10 20:28:28 +00:00
Tim Peters 6d6c1a35e0 Merge of descr-branch back into trunk. 2001-08-02 04:15:00 +00:00
Barry Warsaw 66a0d1d9b9 dict_update(): Generalize this method so {}.update() accepts any
"mapping" object, specifically one that supports PyMapping_Keys() and
PyObject_GetItem().  This allows you to say e.g. {}.update(UserDict())

We keep the special case for concrete dict objects, although that
seems moderately questionable.  OTOH, the code exists and works, so
why change that?

.update()'s docstring already claims that D.update(E) implies calling
E.keys() so it's appropriate not to transform AttributeErrors in
PyMapping_Keys() to TypeErrors.

Patch eyeballed by Tim.
2001-06-26 20:08:32 +00:00
Tim Peters c605784174 dict_repr: Reuse one of the int vars (minor code simplification). 2001-06-16 07:52:53 +00:00
Tim Peters a7259597f1 SF bug 433228: repr(list) woes when len(list) big.
Gave Python linear-time repr() implementations for dicts, lists, strings.
This means, e.g., that repr(range(50000)) is no longer 50x slower than
pprint.pprint() in 2.2 <wink>.

I don't consider this a bugfix candidate, as it's a performance boost.

Added _PyString_Join() to the internal string API.  If we want that in the
public API, fine, but then it requires runtime error checks instead of
asserts.
2001-06-16 05:11:17 +00:00
Tim Peters afb6ae8452 Store the mask instead of the size in dictobjects. The mask is more
frequently used, and in particular this allows to drop the last
remaining obvious time-waster in the crucial lookdict() and
lookdict_string() functions.  Other changes consist mostly of changing
"i < ma_size" to "i <= ma_mask" everywhere.
2001-06-04 21:00:21 +00:00
Tim Peters 453163d842 lookdict: stop more insane core-dump mutating comparison cases. Should
be possible to provoke unbounded recursion now, but leaving that to someone
else to provoke and repair.
Bugfix candidate -- although this is getting harder to backstitch, and the
cases it's protecting against are mondo contrived.
2001-06-03 04:54:32 +00:00
Tim Peters 7b5d0afb1e lookdict: Reduce obfuscating code duplication with a judicious goto.
This code is likely to get even hairier to squash core dumps due to
mutating comparisons, and it's hard enough to follow without that.
2001-06-03 04:14:43 +00:00
Tim Peters 19b77cfc4b Finish the dict->string coredump fix. Need sleep.
Bugfix candidate.
2001-06-02 08:27:39 +00:00
Tim Peters 23cf6be23c Coredumpers from Michael Hudson, mutating dicts while printing or
converting to string.
Critical bugfix candidate -- if you take this seriously <wink>.
2001-06-02 08:02:56 +00:00
Tim Peters f4b33f61fb dict_popitem(): Repaired last-second 2.1 comment, which misidentified the
true reason for allocating the tuple before checking the dict size.
2001-06-02 05:42:29 +00:00
Tim Peters eb28ef209e New collision resolution scheme: no polynomials, simpler, faster, less
code, less memory.  Tests have uncovered no drawbacks.  Christian and
Vladimir are the other two people who have burned many brain cells on the
dict code in recent years, and they like the approach too, so I'm checking
it in without further ado.
2001-06-02 05:27:19 +00:00
Tim Peters 15d4929ae4 Implement an old idea of Christian Tismer's: use polynomial division
instead of multiplication to generate the probe sequence.  The idea is
recorded in Python-Dev for Dec 2000, but that version is prone to rare
infinite loops.

The value is in getting *all* the bits of the hash code to participate;
and, e.g., this speeds up querying every key in a dict with keys
 [i << 16 for i in range(20000)] by a factor of 500.  Should be equally
valuable in any bad case where the high-order hash bits were getting
ignored.

Also wrote up some of the motivations behind Python's ever-more-subtle
hash table strategy.
2001-05-27 07:39:22 +00:00
Martin v. Löwis cd35306a25 Patch #424335: Implement string_richcompare, remove string_compare.
Use new _PyString_Eq in lookdict_string.
2001-05-24 16:56:35 +00:00
Tim Peters f8a548c23c dictresize(): Rebuild small tables if there are any dummies, not just if
they're entirely full.  Not a question of correctness, but of temporarily
misplaced common sense.
2001-05-24 16:26:40 +00:00
Tim Peters 0c6010be75 Jack Jansen hit a bug in the new dict code, reported on python-dev.
dictresize() was too aggressive about never ever resizing small dicts.
If a small dict is entirely full, it needs to rebuild it despite that
it won't actually resize it, in order to purge old dummy entries thus
creating at least one virgin slot (lookdict assumes at least one such
exists).

Also took the opportunity to add some high-level comments to dictresize.
2001-05-23 23:33:57 +00:00
Fred Drake 0c23231f6e Remove unused variable. 2001-05-22 22:36:52 +00:00
Tim Peters dea48ec581 SF patch #425242: Patch which "inlines" small dictionaries.
The idea is Marc-Andre Lemburg's, the implementation is Tim's.
Add a new ma_smalltable member to dictobjects, an embedded vector of
MINSIZE (8) dictentry structs.  Short course is that this lets us avoid
additional malloc(s) for dicts with no more than 5 entries.

The changes are widespread but mostly small.

Long course:  WRT speed, all scalar operations (getitem, setitem, delitem)
on non-empty dicts benefit from no longer needing NULL-pointer checks
(ma_table is never NULL anymore).  Bulk operations (copy, update, resize,
clearing slots during dealloc) benefit in some cases from now looping
on the ma_fill count rather than on ma_size, but that was an unexpected
benefit:  the original reason to loop on ma_fill was to let bulk
operations on empty dicts end quickly (since the NULL-pointer checks
went away, empty dicts aren't special-cased any more).

Special considerations:

For dicts that remain empty, this change is a lose on two counts:
the dict object contains 8 new dictentry slots now that weren't
needed before, and dict object creation also spends time memset'ing
these doomed-to-be-unsused slots to NULLs.

For dicts with one or two entries that never get larger than 2, it's
a mix:  a malloc()/free() pair is no longer needed, and the 2-entry case
gets to use 8 slots (instead of 4) thus decreasing the chance of
collision.  Against that, dict object creation spends time memset'ing
4 slots that aren't strictly needed in this case.

For dicts with 3 through 5 entries that never get larger than 5, it's a
pure win:  the dict is created with all the space they need, and they
never need to resize.  Before they suffered two malloc()/free() calls,
plus 1 dict resize, to get enough space.  In addition, the 8-slot
table they ended with consumed more memory overall, because of the
hidden overhead due to the additional malloc.

For dicts with 6 or more entries, the ma_smalltable member is wasted
space, but then these are large(r) dicts so 8 slots more or less doesn't
make much difference.  They still benefit all the time from removing
ubiquitous dynamic null-pointer checks, and get a small benefit (but
relatively smaller the larger the dict) from not having to do two
mallocs, two frees, and a resize on the way *to* getting their sixth
entry.

All in all it appears a small but definite general win, with larger
benefits in specific cases.  It's especially nice that it allowed to
get rid of several branches, gotos and labels, and overall made the
code smaller.
2001-05-22 20:40:22 +00:00
Tim Peters 91a364df17 Bugfix candidate.
Two exceedingly unlikely errors in dictresize():
1. The loop for finding the new size had an off-by-one error at the
   end (could over-index the polys[] vector).
2. The polys[] vector ended with a 0, apparently intended as a sentinel
   value but never used as such; i.e., it was never checked, so 0 could
   have been used *as* a polynomial.
Neither bug could trigger unless a dict grew to 2**30 slots; since that
would consume at least 12GB of memory just to hold the dict pointers,
I'm betting it's not the cause of the bug Fred's tracking down <wink>.
2001-05-19 07:04:38 +00:00
Tim Peters 1928314ef4 Speed dictresize by collapsing its two passes into one; the reason given
in the comments for using two passes was bogus, as the only object that
can get decref'ed due to the copy is the dummy key, and decref'ing dummy
can't have side effects (for one thing, dummy is immortal!  for another,
it's a string object, not a potentially dangerous user-defined object).
2001-05-17 22:25:34 +00:00
Tim Peters 342c65e19a Aggressive reordering of dict comparisons. In case of collision, it stands
to reason that me_key is much more likely to match the key we're looking
for than to match dummy, and if the key is absent me_key is much more
likely to be NULL than dummy:  most dicts don't even have a dummy entry.
Running instrumented dict code over the test suite and some apps confirmed
that matching dummy was 200-300x less frequent than matching key in
practice.  So this reorders the tests to try the common case first.
It can lose if a large dict with many collisions is mostly deleted, not
resized, and then frequently searched, but that's hardly a case we
should be favoring.
2001-05-13 06:43:53 +00:00
Tim Peters 2f228e75e4 Get rid of the superstitious "~" in dict hashing's "i = (~hash) & mask".
The comment following used to say:
	/* We use ~hash instead of hash, as degenerate hash functions, such
	   as for ints <sigh>, can have lots of leading zeros. It's not
	   really a performance risk, but better safe than sorry.
	   12-Dec-00 tim:  so ~hash produces lots of leading ones instead --
	   what's the gain? */
That is, there was never a good reason for doing it.  And to the contrary,
as explained on Python-Dev last December, it tended to make the *sum*
(i + incr) & mask (which is the first table index examined in case of
collison) the same "too often" across distinct hashes.

Changing to the simpler "i = hash & mask" reduced the number of string-dict
collisions (== # number of times we go around the lookup for-loop) from about
6 million to 5 million during a full run of the test suite (these are
approximate because the test suite does some random stuff from run to run).
The number of collisions in non-string dicts also decreased, but not as
dramatically.

Note that this may, for a given dict, change the order (wrt previous
releases) of entries exposed by .keys(), .values() and .items().  A number
of std tests suffered bogus failures as a result.  For dicts keyed by
small ints, or (less so) by characters, the order is much more likely to be
in increasing order of key now; e.g.,

>>> d = {}
>>> for i in range(10):
...    d[i] = i
...
>>> d
{0: 0, 1: 1, 2: 2, 3: 3, 4: 4, 5: 5, 6: 6, 7: 7, 8: 8, 9: 9}
>>>

Unfortunately. people may latch on to that in small examples and draw a
bogus conclusion.

test_support.py
    Moved test_extcall's sortdict() into test_support, made it stronger,
    and imported sortdict into other std tests that needed it.
test_unicode.py
    Excluced cp875 from the "roundtrip over range(128)" test, because
    cp875 doesn't have a well-defined inverse for unicode("?", "cp875").
    See Python-Dev for excruciating details.
Cookie.py
    Chaged various output functions to sort dicts before building
    strings from them.
test_extcall
    Fiddled the expected-result file.  This remains sensitive to native
    dict ordering, because, e.g., if there are multiple errors in a
    keyword-arg dict (and test_extcall sets up many cases like that), the
    specific error Python complains about first depends on native dict
    ordering.
2001-05-13 00:19:31 +00:00
Tim Peters 4fa58bfac2 Restore dicts' tp_compare slot, and change dict_richcompare to say it
doesn't know how to do LE, LT, GE, GT.  dict_richcompare can't do the
latter any faster than dict_compare can.  More importantly, for
cmp(dict1, dict2), Python *first* tries rich compares with EQ, LT, and
GT one at a time, even if the tp_compare slot is defined, and
dict_richcompare called dict_compare for the latter two because
it couldn't do them itself.  The result was a lot of wasted calls to
dict_compare.  Now dict_richcompare gives up at once the times Python
calls it with LT and GT from try_rich_to_3way_compare(), and dict_compare
is called only once (when Python gets around to trying the tp_compare
slot).
Continued mystery:  despite that this cut the number of calls to
dict_compare approximately in half in test_mutants.py, the latter still
runs amazingly slowly.  Running under the debugger doesn't show excessive
activity in the dict comparison code anymore, so I'm guessing the culprit
is somewhere else -- but where?  Perhaps in the element (key/value)
comparison code?  We clearly spend a lot of time figuring out how to
compare things.
2001-05-10 21:45:19 +00:00
Tim Peters 3918fb2549 Repair typo in comment. 2001-05-10 18:58:31 +00:00
Tim Peters 95bf9390a4 SF bug #422121 Insecurities in dict comparison.
Fixed a half dozen ways in which general dict comparison could crash
Python (even cause Win98SE to reboot) in the presence of kay and/or
value comparison routines that mutate the dict during dict comparison.
Bugfix candidate.
2001-05-10 08:32:44 +00:00
Tim Peters e63415ead8 SF patch #421922: Implement rich comparison for dicts.
d1 == d2 and d1 != d2 now work even if the keys and values in d1 and d2
don't support comparisons other than ==, and testing dicts for equality
is faster now (especially when inequality obtains).
2001-05-08 04:38:29 +00:00
Guido van Rossum b1f35bffe5 Mchael Hudson pointed out that the code for detecting changes in
dictionary size was comparing ma_size, the hash table size, which is
always a power of two, rather than ma_used, wich changes on each
insertion or deletion.  Fixed this.
2001-05-02 15:13:44 +00:00
Guido van Rossum 09e563abb4 Add experimental iterkeys(), itervalues(), iteritems() to dict
objects.

Tests show that iteritems() is 5-10% faster than iterating over the
dict and extracting the value with dict[key].
2001-05-01 12:10:21 +00:00
Guido van Rossum 213c7a6aa5 Mondo changes to the iterator stuff, without changing how Python code
sees it (test_iter.py is unchanged).

- Added a tp_iternext slot, which calls the iterator's next() method;
  this is much faster for built-in iterators over built-in types
  such as lists and dicts, speeding up pybench's ForLoop with about
  25% compared to Python 2.1.  (Now there's a good argument for
  iterators. ;-)

- Renamed the built-in sequence iterator SeqIter, affecting the C API
  functions for it.  (This frees up the PyIter prefix for generic
  iterator operations.)

- Added PyIter_Check(obj), which checks that obj's type has a
  tp_iternext slot and that the proper feature flag is set.

- Added PyIter_Next(obj) which calls the tp_iternext slot.  It has a
  somewhat complex return condition due to the need for speed: when it
  returns NULL, it may not have set an exception condition, meaning
  the iterator is exhausted; when the exception StopIteration is set
  (or a derived exception class), it means the same thing; any other
  exception means some other error occurred.
2001-04-23 14:08:49 +00:00
Guido van Rossum 59d1d2b434 Iterators phase 1. This comprises:
new slot tp_iter in type object, plus new flag Py_TPFLAGS_HAVE_ITER
new C API PyObject_GetIter(), calls tp_iter
new builtin iter(), with two forms: iter(obj), and iter(function, sentinel)
new internal object types iterobject and calliterobject
new exception StopIteration
new opcodes for "for" loops, GET_ITER and FOR_ITER (also supported by dis.py)
new magic number for .pyc files
new special method for instances: __iter__() returns an iterator
iteration over dictionaries: "for x in dict" iterates over the keys
iteration over files: "for x in file" iterates over lines

TODO:

documentation
test suite
decide whether to use a different way to spell iter(function, sentinal)
decide whether "for key in dict" is a good idea
use iterators in map/filter/reduce, min/max, and elsewhere (in/not in?)
speed tuning (make next() a slot tp_next???)
2001-04-20 19:13:02 +00:00
Guido van Rossum 55ad67d74d Oops. Removed dictiter_new decl that wasn't supposed to go in yet. 2001-04-20 16:52:06 +00:00
Guido van Rossum 0dbb4fba4c Implement, test and document "key in dict" and "key not in dict".
I know some people don't like this -- if it's really controversial,
I'll take it out again.  (If it's only Alex Martelli who doesn't like
it, that doesn't count as "real controversial" though. :-)

That's why this is a separate checkin from the iterators stuff I'm
about to check in next.
2001-04-20 16:50:40 +00:00
Guido van Rossum e04eaec5b6 Tim pointed out a remaining vulnerability in popitem(): the
PyTuple_New() could *conceivably* clear the dict, so move the test for
an empty dict after the tuple allocation.  It means that we waste time
allocating and deallocating a 2-tuple when the dict is empty, but who
cares.  It also means that when the dict is empty *and* there's no
memory to allocate a 2-tuple, we raise MemoryError, not KeyError --
but that may actually a good idea: if there's no room for a lousy
2-tuple, what are the chances that there's room for a KeyError
instance?
2001-04-16 00:02:32 +00:00
Guido van Rossum a4dd011259 Tentative fix for a problem that Tim discovered at the last moment,
and reported to python-dev: because we were calling dict_resize() in
PyDict_Next(), and because GC's dict_traverse() uses PyDict_Next(),
and because PyTuple_New() can cause GC, and because dict_items() calls
PyTuple_New(), it was possible for dict_items() to have the dict
resized right under its nose.

The solution is convoluted, and touches several places: keys(),
values(), items(), popitem(), PyDict_Next(), and PyDict_SetItem().

There are two parts to it. First, we no longer call dict_resize() in
PyDict_Next(), which seems to solve the immediate problem.  But then
PyDict_SetItem() must have a different policy about when *it* calls
dict_resize(), because we want to guarantee (e.g. for an algorithm
that Jeremy uses in the compiler) that you can loop over a dict using
PyDict_Next() and make changes to the dict as long as those changes
are only value replacements for existing keys using PyDict_SetItem().
This is done by resizing *after* the insertion instead of before, and
by remembering the size before we insert the item, and if the size is
still the same, we don't bother to even check if we might need to
resize.  An additional detail is that if the dict starts out empty, we
must still resize it before the insertion.

That was the first part. :-)

The second part is to make keys(), values(), items(), and popitem()
safe against side effects on the dict caused by allocations, under the
assumption that if the GC can cause arbitrary Python code to run, it
can cause other threads to run, and it's not inconceivable that our
dict could be resized -- it would be insane to write code that relies
on this, but not all code is sane.

Now, I have this nagging feeling that the loops in lookdict probably
are blissfully assuming that doing a simple key comparison does not
change the dict's size.  This is not necessarily true (the keys could
be class instances after all).  But that's a battle for another day.
2001-04-15 22:16:26 +00:00
Tim Peters 6783070ebf Make PyDict_Next safe to use for loops that merely modify the values
associated with existing dict keys.
This is a variant of part of Michael Hudson's patch #409864 "lazy fix for
Pings bizarre scoping crash".
2001-03-21 19:23:56 +00:00
Guido van Rossum b932420cc7 Rich comparisons:
- Use PyObject_RichCompareBool() when comparing keys; this makes the
  error handling cleaner.

- There were two implementations for dictionary comparison, an old one
  (#ifdef'ed out) and a new one.  Got rid of the old one, which was
  abandoned years ago.

- In the characterize() function, part of dictionary comparison, use
  PyObject_RichCompareBool() to compare keys and values instead.  But
  continue to use PyObject_Compare() for comparing the final
  (deciding) elements.

- Align the comments in the type struct initializer.

Note: I don't implement rich comparison for dictionaries -- there
doesn't seem to be much to be gained.  (The existing comparison
already decides that shorter dicts are always smaller than longer
dicts.)
2001-01-18 00:39:02 +00:00
Jeremy Hylton 1fb6088e86 dict_update has two boundary conditions: a.update(a) and a.update({})
Added test for second one.
2001-01-03 22:34:59 +00:00
Tim Peters f7f88b11e4 Add long-overdue docstrings to dict methods. 2000-12-13 23:18:45 +00:00
Tim Peters f1c7c884b3 Typo repair in comments. Fell for GregS's .popitem() poke. 2000-12-13 19:58:25 +00:00
Tim Peters ea8f2bf9ca Bring comments up to date (e.g., they still said the table had to be
a prime size, which is in fact never true anymore ...).
2000-12-13 01:02:46 +00:00
Guido van Rossum ba6ab84e73 Add popitem() -- SF patch #102733. 2000-12-12 22:02:18 +00:00
Moshe Zadka 5725d1eb03 Backing out my changes.
Improved version coming soon to a Source Forge near you!
2000-11-30 19:30:21 +00:00
Moshe Zadka 1a62750eda Added .first{item,value,key}() to dictionaries.
Complete with docos and tests.
OKed by Guido.
2000-11-30 12:31:03 +00:00
Guido van Rossum 8586991099 REMOVED all CWI, CNRI and BeOpen copyright markings.
This should match the situation in the 1.6b1 tree.
2000-09-01 23:29:29 +00:00
Fred Drake 1bff34ab96 Slight performance hack that also avoids requiring the existence of thread
state for dictionaries that have only been indexed by string keys.

See the comments in SourceForge for more.

This closes SourceForge patch #101309.
2000-08-31 19:31:38 +00:00
Fred Drake c88b99ce06 Clear errors raised by PyObject_Compare() without losing any existing
exception context.  This avoids improperly propogating errors raised by
a user-defined __cmp__() by a subsequent lookup operation.

This patch does *not* include the performance enhancement patch for
dictionaries with string keys only; that will be checked in separately.

This closes SourceForge patch #101277 and bug #112558.
2000-08-31 19:04:07 +00:00
Guido van Rossum 164452cec4 Barry's patch to implement the new setdefault() method. 2000-08-08 16:12:54 +00:00
Thomas Wouters 7889010731 Miscelaneous ANSIfications. I'm assuming here 'main' should take (int,
char**) and return an int even on PC platforms. If not, please fix
PC/utils/makesrc.c ;-P
2000-07-22 19:25:51 +00:00
Thomas Wouters 7e47402264 Spelling fixes supplied by Rob W. W. Hooft. All these are fixes in either
comments, docstrings or error messages. I fixed two minor things in
test_winreg.py ("didn't" -> "Didn't" and "Didnt" -> "Didn't").

There is a minor style issue involved: Guido seems to have preferred English
grammar (behaviour, honour) in a couple places. This patch changes that to
American, which is the more prominent style in the source. I prefer English
myself, so if English is preferred, I'd be happy to supply a patch myself ;)
2000-07-16 12:04:32 +00:00
Tim Peters 1f5871e834 Removed Py_PROTO and switched to ANSI C declarations in the dict
implementation.  This was really to test whether my new CVS+SSH
setup is more usable than the old one -- and turns out it is (for
whatever reason, it was impossible to do a commit before that
involved more than one directory).
2000-07-04 17:44:48 +00:00
Guido van Rossum 4cc6ac7b87 Neil Schemenauer: small fixes for GC 2000-07-01 01:00:38 +00:00
Guido van Rossum ffcc3813d8 Change copyright notice - 2nd try. 2000-06-30 23:58:06 +00:00
Guido van Rossum fd71b9e9d4 Change copyright notice. 2000-06-30 23:50:40 +00:00
Jeremy Hylton c5007aa5c3 final patches from Neil Schemenauer for garbage collection 2000-06-30 05:02:53 +00:00
Jeremy Hylton d08b4c4524 part 2 of Neil Schemenauer's GC patches:
This patch modifies the type structures of objects that
participate in GC.  The object's tp_basicsize is increased when
GC is enabled.  GC information is prefixed to the object to
maintain binary compatibility.  GC objects also define the
tp_flag Py_TPFLAGS_GC.
2000-06-23 19:37:02 +00:00
Jeremy Hylton 8caad49c30 Round 1 of Neil Schemenauer's GC patches:
This patch adds the type methods traverse and clear necessary for GC
implementation.
2000-06-23 14:18:11 +00:00
Guido van Rossum b18618dab7 Vladimir Marangozov's long-awaited malloc restructuring.
For more comments, read the patches@python.org archives.
For documentation read the comments in mymalloc.h and objimpl.h.

(This is not exactly what Vladimir posted to the patches list; I've
made a few changes, and Vladimir sent me a fix in private email for a
problem that only occurs in debug mode.  I'm also holding back on his
change to main.c, which seems unnecessary to me.)
2000-05-03 23:44:39 +00:00
Jeremy Hylton a12c7a7620 Add PyDict_Copy() function to C API for dicts. It returns a new
dictionary that contains the same key/value pairs as p.
2000-03-30 22:27:31 +00:00
Guido van Rossum d724b23420 Christian Tismer's "trashcan" patch:
Added wrapping macros to dictobject.c, listobject.c, tupleobject.c,
frameobject.c, traceback.c that safely prevends core dumps
on stack overflow. Macros and functions in object.c, object.h.
The method is an "elevator destructor" that turns cascading
deletes into tail recursive behavior when some limit is hit.
2000-03-13 16:01:29 +00:00
Fred Drake 52fccfda5b dict_has_key(): Accept only one parameter. PR#210 reported by
Andreas Jung <ajung@sz-sb.de>.
2000-02-23 15:47:16 +00:00
Guido van Rossum 2bc137909d Vladimir Marangozov contributed updated comments. 1999-03-24 19:06:42 +00:00
Guido van Rossum f05fc716d1 Remove dead code discovered by Vladimir Marangozov. 1998-11-16 22:46:30 +00:00
Guido van Rossum c1c7b1a699 Slight rearrangement of code in lookdict() by Vladimir Marangozov, to
make a common case slightly faster.
1998-10-06 16:01:14 +00:00
Guido van Rossum 0fd00334c6 Avoid using calloc(). This triggered an obscure bug on multiprocessor
Sparc Solaris 2.6 (fully patched!) that I don't want to dig into, but
which I suspect is a bug in the multithreaded malloc library that only
shows up when run on a multiprocessor.  (The program wasn't using
threads, it was just using the multithreaded C library.)
1998-07-16 15:06:13 +00:00
Guido van Rossum 474b19e2ab Make sure that PyDict_GetItem[String]() *never* raises an exception.
If the argument is not a dictionary, simply return NULL.  If the
hash() on the key fails, clear the error.
1998-05-14 01:00:51 +00:00
Guido van Rossum 255443b720 Use Py_Repr{Enter,Leave} to display recursive dictionaries in finite space.
(Jeremy will hardly recognize his patch :-)
1998-04-10 22:47:14 +00:00
Guido van Rossum 6fcfa72c63 Correct Barry's fix -- take care of {}.get(0). 1997-10-20 20:10:00 +00:00
Barry Warsaw 320ac331d1 dict_get(): Fixed a couple of stupid mistakes which caused crashes.
Also got rid of some unnecessary code.
1997-10-20 17:26:25 +00:00
Barry Warsaw c38c5da5d0 dict_get(): New method for item access with different semantics than
__getitem__().  This method never raises an exception; if the key is
not in the dictionary, the second (optional) argument is returned.  If
the second argument is not provided and the key is missing, None is
returned.

mapp_methods: added "get" method.
1997-10-06 17:49:20 +00:00
Guido van Rossum 4f3bf1e383 Don't intern the key string for getitem and delitem. 1997-09-29 23:31:11 +00:00
Guido van Rossum fd7a0b871f Made lookdict nearly twice as fast, resulting in a 5% overall
improvement of pystone.  Vladimir Marangozov.
1997-08-18 21:52:47 +00:00
Guido van Rossum 5d8123f34a Reordered list of methods to hopefully put the most frequently used
ones near the front.
1997-07-13 03:58:01 +00:00
Guido van Rossum a8d5131d57 Renamed dict.absorb() (too spungy) to dict.update(). 1997-06-02 17:13:37 +00:00
Guido van Rossum e3f5b9c8d1 Added dict.absorb() and dict.copy(). 1997-05-28 19:15:28 +00:00
Guido van Rossum 5b2121b25f PyObject_Compare can now return an error. Unfortunately, there are a
few places where we don't know how to test for them without losing
speed; don't know yet how to handle that.
1997-05-23 00:01:41 +00:00
Guido van Rossum 037b2205e8 Moved PyObject_{Get,Set}Attr to object.c.
Fixed two 'return NULL' that should be 'return -1'.
1997-05-20 18:35:19 +00:00
Guido van Rossum 3cca24570e Got rid of all the last_name_* bogosities. I don't think the
complexity saved much any more.  A simple benchmark (grail) showed
that there were 3 times as many misses as hits, and the same number of
times again the code was bypassed altogether due to the existence of
setattro/getattro.
1997-05-16 14:23:33 +00:00
Guido van Rossum a9e7a81137 Renamed from mappingobject.c to dictobject.c.
(Sorry Jack, all your projects will have to be changed again. :-( )
1997-05-13 21:02:11 +00:00
Guido van Rossum c0b618a2cc Quickly renamed the last directory. 1997-05-02 03:12:38 +00:00
Guido van Rossum 3648884490 (Jack:) Align mapping entries to 4-words if USE_CACHE_ALIGNED is defined. 1997-04-11 19:14:07 +00:00
Guido van Rossum 2095d24842 Tweaks to keep the Microsoft compiler quiet. 1997-04-09 19:41:24 +00:00
Guido van Rossum fb8f1cadb2 Add clear() method to dictionary objects. 1997-03-21 21:55:12 +00:00
Guido van Rossum efb4609c4a Small lookmapping nits:
- remove bogus initialization using uninitialized i
- derive initial incr from hash
- copy mp->ma_table into a local variable
1997-01-29 15:53:56 +00:00
Guido van Rossum 9e5656ca3f Final three poly table entries corrected by Tim Peters.
Reindented the whole table.
1997-01-29 04:45:16 +00:00
Guido van Rossum 16e93a8d59 Changed the lookup algorithm again, based on Reimer Behrends's post.
The table size is now constrained to be a power of two, and we use a
variable increment based on GF(2^n)-{0} (not that I have the faintest
idea what that is :-) which helps avoid the expensive '%' operation.

Some of the entries in the table of polynomials have been modified
according to a post by Tim Peters.
1997-01-28 00:00:11 +00:00
Guido van Rossum ca756f2a1d Forget keeping track of whether a dictionary contains all interned
string keys.  Just doing a pointer compare before the string compare
(in fact before the hash compare!) is just as fast.
1997-01-23 19:39:29 +00:00
Guido van Rossum 2a61e7428d String interning. 1997-01-18 07:55:05 +00:00
Guido van Rossum 7d18159614 Rewrote lookmapping() according to suggestions by Jyrki Alakuijala. 1997-01-16 21:06:45 +00:00
Guido van Rossum a0a69b8b42 Experimental new implementation of dictionary comparison. This
defines that a shorter dictionary is always smaller than a longer one.
For dictionaries of the same size, the smallest differing element
determines the outcome (which yields the same results as before,
without explicit sorting).
1996-12-05 21:55:55 +00:00
Guido van Rossum d266eb460e New permission notice, includes CNRI. 1996-10-25 14:44:06 +00:00
Guido van Rossum d8eb1b340f Support for tp_getattro, tp_setattro (Sjoerd) 1996-08-09 20:52:03 +00:00
Guido van Rossum 310968dc06 Speedup suggested by Sjoerd 1996-07-30 16:45:31 +00:00
Guido van Rossum 992ded8f12 fix free memory reads in dictlookup et al 1995-12-08 01:16:31 +00:00
Guido van Rossum 5fe605889a a few peephole optimizations 1995-03-09 12:12:50 +00:00
Guido van Rossum 6610ad9d6b Added 1995 to copyright message.
floatobject.c: fix hash().
methodobject.c: support METH_FREENAME flag bit.
1995-01-04 19:07:38 +00:00
Guido van Rossum efc8713428 * Objects/mappingobject.c (mappingremove): don't call
lookmapping() for empty dictionary
1995-01-02 19:42:39 +00:00
Guido van Rossum d7047b395e Lots of minor changes. Note for mappingobject.c: the hash table pointer
can now be NULL.
1995-01-02 19:07:15 +00:00
Guido van Rossum 1d5735e846 Merge back to main trunk 1994-08-30 08:27:36 +00:00
Guido van Rossum 8732d6aeea Fix lay-out of previous fix. 1993-11-23 17:54:03 +00:00
Guido van Rossum b376a4ad18 * timemodule.c: Add hack for Solaris 2.
* posixmodule.c: don't prototype getcwd() -- it's not portable...
* mappingobject.c: double-check validity of last_name_char in
  dict{lookup,insert,remove}.
* arraymodule.c: need memmove only for non-STDC Suns.
* Makefile: comment out HTML_LIBS and XT_USE by default
* pythonmain.c: don't prototype getopt() -- it's not standardized
* socketmodule.c: cast flags arg to {get,set}sockopt() and addrbuf arg to
  recvfrom() to (ANY*).
* pythonrun.c (initsigs): fix prototype, make it static
* intobject.c (LONG_BIT): only #define it if not already defined
* classobject.[ch]: remove all references to unused instance_convert()
* mappingobject.c (getmappingsize): Don't return NULL in int function.
1993-11-23 17:53:17 +00:00
Guido van Rossum 52f2c05401 * parsermodule.c, Makefile, config.c: rudimentary interface to the Python
parser.
* mappingobject.c (lookmapping): 'freeslot' was never used due to a bug in
  the code.
1993-11-10 12:53:24 +00:00
Guido van Rossum a3d78fb268 * posixmodule.c: added set{uid,gid}.
* {tuple,list,mapping,array}object.c: call printobject with 0 for flags
* compile.c (parsestr): use quote instead of '\'' at one crucial point
* arraymodule.c (array_getattr): Added __members__ attribute
1993-11-10 09:23:53 +00:00
Guido van Rossum 4199facacd Added getmappingsize(). (Needed by previous checkin of posixmodule.c) 1993-11-05 10:18:44 +00:00
Sjoerd Mullender 3bb8a05947 Several optimizations and speed improvements.
cstubs: Use Matrix type instead of float[4][4].
1993-10-22 12:04:32 +00:00
Guido van Rossum 1fc238a813 Minor fixes / changes for Mac compatibility. 1993-07-29 08:25:09 +00:00
Guido van Rossum 25831652fd Several changes in one:
(1) dictionaries/mappings now have attributes values() and items() as
well as keys(); at the C level, use the new function mappinggetnext()
to iterate over a dictionary.

(2) "class C(): ..." is now illegal; you must write "class C: ...".

(3) Class objects now know their own name (finally!); and minor
improvements to the way how classes, functions and methods are
represented as strings.

(4) Added an "access" statement and semantics.  (This is still
experimental -- as long as you don't use the keyword 'access' nothing
should be changed.)
1993-05-19 14:50:45 +00:00
Guido van Rossum 9bfef44d97 * Changed all copyright messages to include 1993.
* Stubs for faster implementation of local variables (not yet finished)
* Added function name to code object.  Print it for code and function
  objects.  THIS MAKES THE .PYC FILE FORMAT INCOMPATIBLE (the version
  number has changed accordingly)
* Print address of self for built-in methods
* New internal functions getattro and setattro (getattr/setattr with
  string object arg)
* Replaced "dictobject" with more powerful "mappingobject"
* New per-type functio tp_hash to implement arbitrary object hashing,
  and hashobject() to interface to it
* Added built-in functions hash(v) and hasattr(v, 'name')
* classobject: made some functions static that accidentally weren't;
  added __hash__ special instance method to implement hash()
* Added proper comparison for built-in methods and functions
1993-03-29 10:43:31 +00:00
Guido van Rossum 4b1302bd1d Generalized version of dictionaries, with compatibility hacks. 1993-03-27 18:11:32 +00:00