Commit Graph

155 Commits

Author SHA1 Message Date
Raymond Hettinger 019a148c72 Optimize dictionary iterators.
* Split into three separate types that share everything except the
  code for iternext.  Saves run time decision making and allows
  each iternext function to be specialized.

* Inlined PyDict_Next().  In addition to saving a function call, this
  allows a redundant test to be eliminated and further specialization
  of the code for the unique needs of each iterator type.

* Created a reusable result tuple for iteritems().  Saves the malloc
  time for tuples when the previous result was not kept by client code
  (this is the typical use case for iteritems).  If the client code
  does keep the reference, then a new tuple is created.

Results in a 20% to 30% speedup depending on the size and sparsity
of the dictionary.
2004-03-18 02:41:19 +00:00
Raymond Hettinger 4344278250 Dictionary optimizations:
* Factored constant structure references out of the inner loops for
  PyDict_Next(), dict_keys(), dict_values(), and dict_items().
  Gave measurable speedups to each (the improvement varies depending
  on the sparseness of the dictionary being measured).

* Added a freelist scheme styled after that for tuples.  Saves around
  80% of the calls to malloc and free.  About 10% of the time, the
  previous dictionary was completely empty; in those cases, the
  dictionary initialization with memset() can be skipped.
2004-03-17 21:55:03 +00:00
Raymond Hettinger ebedb2f773 Factor out code common to PyDict_Copy() and PyDict_Merge(). 2004-03-08 04:19:01 +00:00
Raymond Hettinger 31017aed36 SF #904720: dict.update should take a 2-tuple sequence like dict.__init_
(Championed by Bob Ippolito.)

The update() method for mappings now accepts all the same argument forms
as the dict() constructor.  This includes item lists and/or keyword
arguments.
2004-03-04 08:25:44 +00:00
Jeremy Hylton 7083bb744a Oops. Return -1 to distinguish error from empty dict.
This change probably isn't work a bug fix.  It's unlikely that anyone
was calling this method without passing it a real dict.
2004-02-17 20:10:11 +00:00
Raymond Hettinger 0c66967e3d Simplify previous checkin -- a new function was not needed. 2003-12-13 13:31:55 +00:00
Raymond Hettinger 8f5cdaa784 * Added a new method flag, METH_COEXIST.
* Used the flag to optimize set.__contains__(), dict.__contains__(),
  dict.__getitem__(), and list.__getitem__().
2003-12-13 11:26:12 +00:00
Raymond Hettinger bc0f2ab9bb Expose dict_contains() and PyDict_Contains() with is about 10% faster
than PySequence_Contains() and more clearly applicable to dicts.

Apply the new function in setobject.c where __contains__ checking is
ubiquitous.
2003-11-25 21:12:14 +00:00
Raymond Hettinger 574aa32578 SF patch #798467: Update docstring of has_key for bool changes
(Contributed by George Yoshida.)
2003-09-01 22:12:08 +00:00
Raymond Hettinger c8d2290c8c SF patch #729395: Dictionary tuning
Adjust resize argument for dict.update() and dict.copy().
Extends the previous change to dict.__setitem__().
2003-05-07 00:49:40 +00:00
Raymond Hettinger 3539f6b895 SF patch #729395: Dictionary tuning
* Increase dictionary growth rate resulting in more sparse dictionaries,
  fewer lookup collisions, increased memory use, and better cache
  performance.  For dicts with over 50k entries, keep the current
  growth rate in case an application is suffering from tight memory
  constraints.

* Set the most common case (no resize) to fall-through the test.
2003-05-05 22:22:10 +00:00
Raymond Hettinger 930427b892 Add a reference to dictnotes.txt. It does no good if you don't know it's
there or where to find it.
2003-05-03 06:51:59 +00:00
Raymond Hettinger 1da1dbf458 Renamed PyObject_GenericGetIter to PyObject_SelfIter
to more accurately describe what the function does.

Suggested by Thomas Wouters.
2003-03-17 19:46:11 +00:00
Raymond Hettinger 0153826964 Created PyObject_GenericGetIter().
Factors out the common case of returning self.
2003-03-17 08:24:35 +00:00
Raymond Hettinger a3e1e4cd79 SF patch #693753: fix for bug 639806: default for dict.pop
(contributed by Michael Stone.)
2003-03-06 23:54:28 +00:00
Neal Norwitz 0732301738 Add closing ) in comment 2003-02-15 14:45:12 +00:00
Tim Peters 080c88b912 cPickle.c, load_build(): Taught cPickle how to pick apart
the optional proto 2 slot state.

pickle.py, load_build():  CAUTION:  Noted that cPickle's
load_build and pickle's load_build really don't do the same
things with the state, and didn't before this patch either.
cPickle never tries to do .update(), and has no backoff if
instance.__dict__ can't be retrieved.  There are no tests
that can tell the difference, and part of what cPickle's
load_build() did looked accidental to me, so I don't know
what the true intent is here.

pickletester.py, test_pickle.py:  Got rid of the hack for
exempting cPickle from running some of the proto 2 tests.

dictobject.c, PyDict_Next():  documented intended use.
2003-02-15 03:01:11 +00:00
Raymond Hettinger ea3fdf44a2 SF patch #659536: Use PyArg_UnpackTuple where possible.
Obtain cleaner coding and a system wide
performance boost by using the fast, pre-parsed
PyArg_Unpack function instead of PyArg_ParseTuple
function which is driven by a format string.
2002-12-29 16:33:45 +00:00
Martin v. Löwis 32b4a1ba62 Constify char* API. Fixes #651363. 2.2 candidate. 2002-12-11 13:21:12 +00:00
Tim Peters bca1cbc6f8 SF 548651: Fix the METH_CLASS implementation.
Most of these patches are from Thomas Heller, with long lines folded
by Tim.  The change to test_descr.py is from Guido.  See the bug report.

Not a bugfix candidate -- METH_CLASS is new in 2.3.
2002-12-09 22:56:13 +00:00
Raymond Hettinger e03e5b1f91 Remove assumption that cls is a subclass of dict.
Simplifies the code and gets Just van Rossum's example to work.
2002-12-07 08:10:51 +00:00
Raymond Hettinger b02bb5ed0a Replace BadInternalCall with TypeError. Add a test case. Fix whitespace.
Just van Rossum showed a weird, but clever way for pure python code to
trigger the BadInternalCall.  The C code had assumed that calling a class
constructor would return an instance of that class; however, classes that
abuse __new__ can invalidate that assumption.
2002-12-04 07:32:25 +00:00
Neal Norwitz ef786ae1a5 Add missing decref 2002-11-27 19:38:00 +00:00
Raymond Hettinger e33d3df030 SF Patch 643443. Added dict.fromkeys(iterable, value=None), a class
method for constructing new dictionaries from sequences of keys.
2002-11-27 07:29:33 +00:00
Just van Rossum a797d8150d Patch #642500 with slight modifications: allow keyword arguments in
dict() constructor. Example:
  >>> dict(a=1, b=2)
  {'a': 1, 'b': 2}
  >>>
2002-11-23 09:45:04 +00:00
Guido van Rossum efae8862fe In doc strings, use 'k in D' rather than D.has_key(k). 2002-09-04 11:29:45 +00:00
Guido van Rossum 45ec02aed1 SF patch 576101, by Oren Tirosh: alternative implementation of
interning.  I modified Oren's patch significantly, but the basic idea
and most of the implementation is unchanged.  Interned strings created
with PyString_InternInPlace() are now mortal, and you must keep a
reference to the resulting string around; use the new function
PyString_InternImmortal() to create immortal interned strings.
2002-08-19 21:43:18 +00:00
Jeremy Hylton 938ace69a0 staticforward bites the dust.
The staticforward define was needed to support certain broken C
compilers (notably SCO ODT 3.0, perhaps early AIX as well) botched the
static keyword when it was used with a forward declaration of a static
initialized structure.  Standard C allows the forward declaration with
static, and we've decided to stop catering to broken C compilers.  (In
fact, we expect that the compilers are all fixed eight years later.)

I'm leaving staticforward and statichere defined in object.h as
static.  This is only for backwards compatibility with C extensions
that might still use it.

XXX I haven't updated the documentation.
2002-07-17 16:30:39 +00:00
Guido van Rossum 2147df748f Make StopIteration a sink state. This is done by clearing out the
di_dict field when the end of the list is reached.  Also make the
error ("dictionary changed size during iteration") a sticky state.

Also remove the next() method -- one is supplied automatically by
PyType_Ready() because the tp_iternext slot is set.  That's a good
thing, because the implementation given here was buggy (it never
raised StopIteration).
2002-07-16 20:30:22 +00:00
Martin v. Löwis 14f8b4cfcb Patch #568124: Add doc string macros. 2002-06-13 20:33:02 +00:00
Guido van Rossum e027d9818f Add Raymond Hettinger's d.pop(). See SF patch 539949. 2002-04-12 15:11:59 +00:00
Neil Schemenauer 6189b89cc5 PyObject_GC_Del and PyObject_Del can now be used as a function
designators.

Remove PyMalloc_New.
2002-04-12 02:43:00 +00:00
Guido van Rossum 77f6a65eb0 Add the 'bool' type and its values 'False' and 'True', as described in
PEP 285.  Everything described in the PEP is here, and there is even
some documentation.  I had to fix 12 unit tests; all but one of these
were printing Boolean outcomes that changed from 0/1 to False/True.
(The exception is test_unicode.py, which did a type(x) == type(y)
style comparison.  I could've fixed that with a single line using
issubtype(x, type(y)), but instead chose to be explicit about those
places where a bool is expected.

Still to do: perhaps more documentation; change standard library
modules to return False/True from predicates.
2002-04-03 22:41:51 +00:00
Tim Peters 1f7df3595a Remove the CACHE_HASH and INTERN_STRINGS preprocessor symbols. 2002-03-29 03:29:08 +00:00
Guido van Rossum ff413af605 This is Neil's fix for SF bug 535905 (Evil Trashcan and GC interaction).
The fix makes it possible to call PyObject_GC_UnTrack() more than once
on the same object, and then move the PyObject_GC_UnTrack() call to
*before* the trashcan code is invoked.

BUGFIX CANDIDATE!
2002-03-28 20:34:59 +00:00
Neil Schemenauer dcc819a5c9 Use pymalloc if it's enabled. 2002-03-22 15:33:15 +00:00
Tim Peters f582b82fe9 SF bug #491415 PyDict_UpdateFromSeq2() unused
PyDict_UpdateFromSeq2():  removed it.
PyDict_MergeFromSeq2():  made it public and documented it.
PyDict_Merge() docs:  updated to reveal <wink> that the second
argument can be any mapping object.
2001-12-11 18:51:08 +00:00
Guido van Rossum dbb53d9918 Fix of SF bug #475877 (Mutable subtype instances are hashable).
Rather than tweaking the inheritance of type object slots (which turns
out to be too messy to try), this fix adds a __hash__ to the list and
dict types (the only mutable types I'm aware of) that explicitly
raises an error.  This has the advantage that list.__hash__([]) also
raises an error (previously, this would invoke object.__hash__([]),
returning the argument's address); ditto for dict.__hash__.

The disadvantage for this fix is that 3rd party mutable types aren't
automatically fixed.  This should be added to the rules for creating
subclassable extension types: if you don't want your object to be
hashable, add a tp_hash function that raises an exception.

Also, it's possible that I've forgotten about other mutable types for
which this should be done.
2001-12-03 16:32:18 +00:00
Tim Peters a427a2b8d0 Rename "dictionary" (type and constructor) to "dict". 2001-10-29 22:25:45 +00:00
Tim Peters 4d85953fe6 dictionary() constructor:
+ Change keyword arg name from "x" to "items".  People passing a mapping
  object can stretch their imaginations <wink>.
+ Simplify the docstring text.
2001-10-27 18:27:48 +00:00
Tim Peters 1fc240e851 Generalize dictionary() to accept a sequence of 2-sequences. At the
outer level, the iterator protocol is used for memory-efficiency (the
outer sequence may be very large if fully materialized); at the inner
level, PySequence_Fast() is used for time-efficiency (these should
always be sequences of length 2).

dictobject.c, new functions PyDict_{Merge,Update}FromSeq2.  These are
wholly analogous to PyDict_{Merge,Update}, but process a sequence-of-2-
sequences argument instead of a mapping object.  For now, I left these
functions file static, so no corresponding doc changes.  It's tempting
to change dict.update() to allow a sequence-of-2-seqs argument too.

Also changed the name of dictionary's keyword argument from "mapping"
to "x".  Got a better name?  "mapping_or_sequence_of_pairs" isn't
attractive, although more so than "mosop" <wink>.

abstract.h, abstract.tex:  Added new PySequence_Fast_GET_SIZE function,
much faster than going thru the all-purpose PySequence_Size.

libfuncs.tex:
- Document dictionary().
- Fiddle tuple() and list() to admit that their argument is optional.
- The long-winded repetitions of "a sequence, a container that supports
  iteration, or an iterator object" is getting to be a PITA.  Many
  months ago I suggested factoring this out into "iterable object",
  where the definition of that could include being explicit about
  generators too (as is, I'm not sure a reader outside of PythonLabs
  could guess that "an iterator object" includes a generator call).
- Please check my curly braces -- I'm going blind <0.9 wink>.

abstract.c, PySequence_Tuple():  When PyObject_GetIter() fails, leave
its error msg alone now (the msg it produces has improved since
PySequence_Tuple was generalized to accept iterable objects, and
PySequence_Tuple was also stomping on the msg in cases it shouldn't
have even before PyObject_GetIter grew a better msg).
2001-10-26 05:06:50 +00:00
Guido van Rossum 9475a2310d Enable GC for new-style instances. This touches lots of files, since
many types were subclassable but had a xxx_dealloc function that
called PyObject_DEL(self) directly instead of deferring to
self->ob_type->tp_free(self).  It is permissible to set tp_free in the
type object directly to _PyObject_Del, for non-GC types, or to
_PyObject_GC_Del, for GC types.  Still, PyObject_DEL was a tad faster,
so I'm fearing that our pystone rating is going down again.  I'm not
sure if doing something like

void xxx_dealloc(PyObject *self)
{
	if (PyXxxCheckExact(self))
		PyObject_DEL(self);
	else
		self->ob_type->tp_free(self);
}

is any faster than always calling the else branch, so I haven't
attempted that -- however those types whose own dealloc is fancier
(int, float, unicode) do use this pattern.
2001-10-05 20:51:39 +00:00
Tim Peters 0ab085c4cb Changed the dict implementation to take "string shortcuts" only when
keys are true strings -- no subclasses need apply.  This may be debatable.

The problem is that a str subclass may very well want to override __eq__
and/or __hash__ (see the new example of case-insensitive strings in
test_descr), but go-fast shortcuts for strings are ubiquitous in our dicts
(and subclass overrides aren't even looked for then).  Another go-fast
reason for the change is that PyCheck_StringExact() is a quicker test
than PyCheck_String(), and we make such a test on virtually every access
to every dict.

OTOH, a str subclass may also be perfectly happy using the base str eq
and hash, and this change slows them a lot.  But those cases are still
hypothetical, while Python's own reliance on true-string dicts is not.
2001-09-14 00:25:33 +00:00
Tim Peters b95ec09a44 Repair typo in comment. 2001-09-02 18:35:54 +00:00
Tim Peters 25786c0851 Make dictionary() a real constructor. Accepts at most one argument, "a
mapping object", in the same sense dict.update(x) requires of x (that x
has a keys() method and a getitem).
Questionable:  The other type constructors accept a keyword argument, so I
did that here too (e.g., dictionary(mapping={1:2}) works).  But type_call
doesn't pass the keyword args to the tp_new slot (it passes NULL), it only
passes them to the tp_init slot, so getting at them required adding a
tp_init slot to dicts.  Looks like that makes the normal case (i.e., no
args at all) a little slower (the time it takes to call dict.tp_init and
have it figure out there's nothing to do).
2001-09-02 08:22:48 +00:00
Neil Schemenauer e83c00efd0 Use new GC API. 2001-08-29 23:54:21 +00:00
Martin v. Löwis e3eb1f2b23 Patch #427190: Implement and use METH_NOARGS and METH_O. 2001-08-16 13:15:00 +00:00
Guido van Rossum 05ac6de2d5 Add PyDict_Merge(a, b, override):
PyDict_Merge(a, b, 1) is the same as PyDict_Update(a, b).
PyDict_Merge(a, b, 0) does something similar but leaves existing items
unchanged.
2001-08-10 20:28:28 +00:00
Tim Peters 6d6c1a35e0 Merge of descr-branch back into trunk. 2001-08-02 04:15:00 +00:00
Barry Warsaw 66a0d1d9b9 dict_update(): Generalize this method so {}.update() accepts any
"mapping" object, specifically one that supports PyMapping_Keys() and
PyObject_GetItem().  This allows you to say e.g. {}.update(UserDict())

We keep the special case for concrete dict objects, although that
seems moderately questionable.  OTOH, the code exists and works, so
why change that?

.update()'s docstring already claims that D.update(E) implies calling
E.keys() so it's appropriate not to transform AttributeErrors in
PyMapping_Keys() to TypeErrors.

Patch eyeballed by Tim.
2001-06-26 20:08:32 +00:00