cpython

Commit Graph

Author	SHA1	Message	Date
Armin Rigo	e170937af6	Ignore the references to the dummy objects used as deleted keys in dicts and sets when computing the total number of references.	2006-04-12 17:06:05 +00:00
Georg Brandl	347b30042b	Remove unnecessary casts in type object initializers.	2006-03-30 11:57:00 +00:00
Guido van Rossum	4b92a82504	Oops. Fix syntax for C89 compilers.	2006-02-25 23:32:30 +00:00
Guido van Rossum	1968ad32cd	- Patch 1433928: - The copy module now "copies" function objects (as atomic objects). - dict.__getitem__ now looks for a __missing__ hook before raising KeyError. - Added a new type, defaultdict, to the collections module. This uses the new __missing__ hook behavior added to dict (see above).	2006-02-25 22:38:04 +00:00
Martin v. Löwis	e0e89f7920	Revert 42400.	2006-02-16 06:59:22 +00:00
Martin v. Löwis	2c95cc6d72	Support %zd in PyErr_Format and PyString_FromFormat.	2006-02-16 06:54:25 +00:00
Neal Norwitz	26efe402c2	Get rid of compiler warnings (gcc 3.3.4 on x86)	2006-02-16 06:21:57 +00:00
Martin v. Löwis	18e165558b	Merge ssize_t branch.	2006-02-15 17:27:45 +00:00
Armin Rigo	f5b3e36493	Renamed _length_cue() to __length_hint__(). See: http://mail.python.org/pipermail/python-dev/2006-February/060524.html	2006-02-11 21:32:43 +00:00
Tim Peters	60b29961dc	Fixed English in a comment; trimmed trailing whitespace; no code changes.	2006-01-01 01:19:23 +00:00
Raymond Hettinger	6b27cda643	Convert iterator __len__() methods to a private API.	2005-09-24 21:23:05 +00:00
Raymond Hettinger	f81e45023e	Fix nits.	2005-08-17 02:19:36 +00:00
Raymond Hettinger	186e739d29	SF patch #1200051 : Small optimization for PyDict_Merge() (Contributed by Barry Warsaw and Matt Messier.)	2005-05-14 18:08:25 +00:00
Raymond Hettinger	1356f785c1	SF bug #1183742 : PyDict_Copy() can return non-NULL value on error	2005-04-15 15:58:42 +00:00
Raymond Hettinger	07ead17318	Code simplification -- eliminate lookup when value is known in advance.	2005-02-05 23:42:57 +00:00
Nicholas Bastin	9ba301e589	Moved SunPro warning suppression into pyport.h and out of individual modules and objects.	2004-07-15 15:54:05 +00:00
Nicholas Bastin	9e1bfe7dd9	Disabling end-of-loop code not reached warning on SunPro	2004-06-18 19:57:13 +00:00
Walter Dörwald	d70ad8a9d9	Update docstring for dict.update() to match the new realities.	2004-05-28 20:59:21 +00:00
Raymond Hettinger	7892b1c651	* Add unittests for iterators that report their length * Document the differences between them * Fix corner cases covered by the unittests * Use Py_RETURN_NONE where possible for dictionaries	2004-04-12 18:10:01 +00:00
Guido van Rossum	09240f65f8	GCC was complaining that 'value' in dictiter_iternextvalue() wasn't necessarily always set before used. Between Tim, Armin & me we couldn't prove GCC wrong, so we decided to fix the algorithm. This version is Armin's.	2004-03-20 19:11:58 +00:00
Raymond Hettinger	0690512a7d	Factor out a double lookup.	2004-03-19 10:30:00 +00:00
Raymond Hettinger	0ce6dc8530	Make the new dictionary iterators transparent with respect to length. This gives another 30% speedup for operations such as map(func, d.iteritems()) or list(d.iteritems()) which can both take advantage of length information when provided.	2004-03-18 08:38:00 +00:00
Raymond Hettinger	019a148c72	Optimize dictionary iterators. * Split into three separate types that share everything except the code for iternext. Saves run time decision making and allows each iternext function to be specialized. * Inlined PyDict_Next(). In addition to saving a function call, this allows a redundant test to be eliminated and further specialization of the code for the unique needs of each iterator type. * Created a reusable result tuple for iteritems(). Saves the malloc time for tuples when the previous result was not kept by client code (this is the typical use case for iteritems). If the client code does keep the reference, then a new tuple is created. Results in a 20% to 30% speedup depending on the size and sparsity of the dictionary.	2004-03-18 02:41:19 +00:00
Raymond Hettinger	4344278250	Dictionary optimizations: * Factored constant structure references out of the inner loops for PyDict_Next(), dict_keys(), dict_values(), and dict_items(). Gave measurable speedups to each (the improvement varies depending on the sparseness of the dictionary being measured). * Added a freelist scheme styled after that for tuples. Saves around 80% of the calls to malloc and free. About 10% of the time, the previous dictionary was completely empty; in those cases, the dictionary initialization with memset() can be skipped.	2004-03-17 21:55:03 +00:00
Raymond Hettinger	ebedb2f773	Factor out code common to PyDict_Copy() and PyDict_Merge().	2004-03-08 04:19:01 +00:00
Raymond Hettinger	31017aed36	SF #904720 : dict.update should take a 2-tuple sequence like dict.__init_ (Championed by Bob Ippolito.) The update() method for mappings now accepts all the same argument forms as the dict() constructor. This includes item lists and/or keyword arguments.	2004-03-04 08:25:44 +00:00
Jeremy Hylton	7083bb744a	Oops. Return -1 to distinguish error from empty dict. This change probably isn't work a bug fix. It's unlikely that anyone was calling this method without passing it a real dict.	2004-02-17 20:10:11 +00:00
Raymond Hettinger	0c66967e3d	Simplify previous checkin -- a new function was not needed.	2003-12-13 13:31:55 +00:00
Raymond Hettinger	8f5cdaa784	* Added a new method flag, METH_COEXIST. * Used the flag to optimize set.__contains__(), dict.__contains__(), dict.__getitem__(), and list.__getitem__().	2003-12-13 11:26:12 +00:00
Raymond Hettinger	bc0f2ab9bb	Expose dict_contains() and PyDict_Contains() with is about 10% faster than PySequence_Contains() and more clearly applicable to dicts. Apply the new function in setobject.c where __contains__ checking is ubiquitous.	2003-11-25 21:12:14 +00:00
Raymond Hettinger	574aa32578	SF patch #798467 : Update docstring of has_key for bool changes (Contributed by George Yoshida.)	2003-09-01 22:12:08 +00:00
Raymond Hettinger	c8d2290c8c	SF patch #729395 : Dictionary tuning Adjust resize argument for dict.update() and dict.copy(). Extends the previous change to dict.__setitem__().	2003-05-07 00:49:40 +00:00
Raymond Hettinger	3539f6b895	SF patch #729395 : Dictionary tuning * Increase dictionary growth rate resulting in more sparse dictionaries, fewer lookup collisions, increased memory use, and better cache performance. For dicts with over 50k entries, keep the current growth rate in case an application is suffering from tight memory constraints. * Set the most common case (no resize) to fall-through the test.	2003-05-05 22:22:10 +00:00
Raymond Hettinger	930427b892	Add a reference to dictnotes.txt. It does no good if you don't know it's there or where to find it.	2003-05-03 06:51:59 +00:00
Raymond Hettinger	1da1dbf458	Renamed PyObject_GenericGetIter to PyObject_SelfIter to more accurately describe what the function does. Suggested by Thomas Wouters.	2003-03-17 19:46:11 +00:00
Raymond Hettinger	0153826964	Created PyObject_GenericGetIter(). Factors out the common case of returning self.	2003-03-17 08:24:35 +00:00
Raymond Hettinger	a3e1e4cd79	SF patch #693753 : fix for bug 639806: default for dict.pop (contributed by Michael Stone.)	2003-03-06 23:54:28 +00:00
Neal Norwitz	0732301738	Add closing ) in comment	2003-02-15 14:45:12 +00:00
Tim Peters	080c88b912	cPickle.c, load_build(): Taught cPickle how to pick apart the optional proto 2 slot state. pickle.py, load_build(): CAUTION: Noted that cPickle's load_build and pickle's load_build really don't do the same things with the state, and didn't before this patch either. cPickle never tries to do .update(), and has no backoff if instance.__dict__ can't be retrieved. There are no tests that can tell the difference, and part of what cPickle's load_build() did looked accidental to me, so I don't know what the true intent is here. pickletester.py, test_pickle.py: Got rid of the hack for exempting cPickle from running some of the proto 2 tests. dictobject.c, PyDict_Next(): documented intended use.	2003-02-15 03:01:11 +00:00
Raymond Hettinger	ea3fdf44a2	SF patch #659536 : Use PyArg_UnpackTuple where possible. Obtain cleaner coding and a system wide performance boost by using the fast, pre-parsed PyArg_Unpack function instead of PyArg_ParseTuple function which is driven by a format string.	2002-12-29 16:33:45 +00:00
Martin v. Löwis	32b4a1ba62	Constify char* API. Fixes #651363 . 2.2 candidate.	2002-12-11 13:21:12 +00:00
Tim Peters	bca1cbc6f8	SF 548651: Fix the METH_CLASS implementation. Most of these patches are from Thomas Heller, with long lines folded by Tim. The change to test_descr.py is from Guido. See the bug report. Not a bugfix candidate -- METH_CLASS is new in 2.3.	2002-12-09 22:56:13 +00:00
Raymond Hettinger	e03e5b1f91	Remove assumption that cls is a subclass of dict. Simplifies the code and gets Just van Rossum's example to work.	2002-12-07 08:10:51 +00:00
Raymond Hettinger	b02bb5ed0a	Replace BadInternalCall with TypeError. Add a test case. Fix whitespace. Just van Rossum showed a weird, but clever way for pure python code to trigger the BadInternalCall. The C code had assumed that calling a class constructor would return an instance of that class; however, classes that abuse __new__ can invalidate that assumption.	2002-12-04 07:32:25 +00:00
Neal Norwitz	ef786ae1a5	Add missing decref	2002-11-27 19:38:00 +00:00
Raymond Hettinger	e33d3df030	SF Patch 643443. Added dict.fromkeys(iterable, value=None), a class method for constructing new dictionaries from sequences of keys.	2002-11-27 07:29:33 +00:00
Just van Rossum	a797d8150d	Patch #642500 with slight modifications: allow keyword arguments in dict() constructor. Example: >>> dict(a=1, b=2) {'a': 1, 'b': 2} >>>	2002-11-23 09:45:04 +00:00
Guido van Rossum	efae8862fe	In doc strings, use 'k in D' rather than D.has_key(k).	2002-09-04 11:29:45 +00:00
Guido van Rossum	45ec02aed1	SF patch 576101, by Oren Tirosh: alternative implementation of interning. I modified Oren's patch significantly, but the basic idea and most of the implementation is unchanged. Interned strings created with PyString_InternInPlace() are now mortal, and you must keep a reference to the resulting string around; use the new function PyString_InternImmortal() to create immortal interned strings.	2002-08-19 21:43:18 +00:00
Jeremy Hylton	938ace69a0	staticforward bites the dust. The staticforward define was needed to support certain broken C compilers (notably SCO ODT 3.0, perhaps early AIX as well) botched the static keyword when it was used with a forward declaration of a static initialized structure. Standard C allows the forward declaration with static, and we've decided to stop catering to broken C compilers. (In fact, we expect that the compilers are all fixed eight years later.) I'm leaving staticforward and statichere defined in object.h as static. This is only for backwards compatibility with C extensions that might still use it. XXX I haven't updated the documentation.	2002-07-17 16:30:39 +00:00
Guido van Rossum	2147df748f	Make StopIteration a sink state. This is done by clearing out the di_dict field when the end of the list is reached. Also make the error ("dictionary changed size during iteration") a sticky state. Also remove the next() method -- one is supplied automatically by PyType_Ready() because the tp_iternext slot is set. That's a good thing, because the implementation given here was buggy (it never raised StopIteration).	2002-07-16 20:30:22 +00:00
Martin v. Löwis	14f8b4cfcb	Patch #568124 : Add doc string macros.	2002-06-13 20:33:02 +00:00
Guido van Rossum	e027d9818f	Add Raymond Hettinger's d.pop(). See SF patch 539949.	2002-04-12 15:11:59 +00:00
Neil Schemenauer	6189b89cc5	PyObject_GC_Del and PyObject_Del can now be used as a function designators. Remove PyMalloc_New.	2002-04-12 02:43:00 +00:00
Guido van Rossum	77f6a65eb0	Add the 'bool' type and its values 'False' and 'True', as described in PEP 285. Everything described in the PEP is here, and there is even some documentation. I had to fix 12 unit tests; all but one of these were printing Boolean outcomes that changed from 0/1 to False/True. (The exception is test_unicode.py, which did a type(x) == type(y) style comparison. I could've fixed that with a single line using issubtype(x, type(y)), but instead chose to be explicit about those places where a bool is expected. Still to do: perhaps more documentation; change standard library modules to return False/True from predicates.	2002-04-03 22:41:51 +00:00
Tim Peters	1f7df3595a	Remove the CACHE_HASH and INTERN_STRINGS preprocessor symbols.	2002-03-29 03:29:08 +00:00
Guido van Rossum	ff413af605	This is Neil's fix for SF bug 535905 (Evil Trashcan and GC interaction). The fix makes it possible to call PyObject_GC_UnTrack() more than once on the same object, and then move the PyObject_GC_UnTrack() call to before the trashcan code is invoked. BUGFIX CANDIDATE!	2002-03-28 20:34:59 +00:00
Neil Schemenauer	dcc819a5c9	Use pymalloc if it's enabled.	2002-03-22 15:33:15 +00:00
Tim Peters	f582b82fe9	SF bug #491415 PyDict_UpdateFromSeq2() unused PyDict_UpdateFromSeq2(): removed it. PyDict_MergeFromSeq2(): made it public and documented it. PyDict_Merge() docs: updated to reveal <wink> that the second argument can be any mapping object.	2001-12-11 18:51:08 +00:00
Guido van Rossum	dbb53d9918	Fix of SF bug #475877 (Mutable subtype instances are hashable). Rather than tweaking the inheritance of type object slots (which turns out to be too messy to try), this fix adds a __hash__ to the list and dict types (the only mutable types I'm aware of) that explicitly raises an error. This has the advantage that list.__hash__([]) also raises an error (previously, this would invoke object.__hash__([]), returning the argument's address); ditto for dict.__hash__. The disadvantage for this fix is that 3rd party mutable types aren't automatically fixed. This should be added to the rules for creating subclassable extension types: if you don't want your object to be hashable, add a tp_hash function that raises an exception. Also, it's possible that I've forgotten about other mutable types for which this should be done.	2001-12-03 16:32:18 +00:00
Tim Peters	a427a2b8d0	Rename "dictionary" (type and constructor) to "dict".	2001-10-29 22:25:45 +00:00
Tim Peters	4d85953fe6	dictionary() constructor: + Change keyword arg name from "x" to "items". People passing a mapping object can stretch their imaginations <wink>. + Simplify the docstring text.	2001-10-27 18:27:48 +00:00
Tim Peters	1fc240e851	Generalize dictionary() to accept a sequence of 2-sequences. At the outer level, the iterator protocol is used for memory-efficiency (the outer sequence may be very large if fully materialized); at the inner level, PySequence_Fast() is used for time-efficiency (these should always be sequences of length 2). dictobject.c, new functions PyDict_{Merge,Update}FromSeq2. These are wholly analogous to PyDict_{Merge,Update}, but process a sequence-of-2- sequences argument instead of a mapping object. For now, I left these functions file static, so no corresponding doc changes. It's tempting to change dict.update() to allow a sequence-of-2-seqs argument too. Also changed the name of dictionary's keyword argument from "mapping" to "x". Got a better name? "mapping_or_sequence_of_pairs" isn't attractive, although more so than "mosop" <wink>. abstract.h, abstract.tex: Added new PySequence_Fast_GET_SIZE function, much faster than going thru the all-purpose PySequence_Size. libfuncs.tex: - Document dictionary(). - Fiddle tuple() and list() to admit that their argument is optional. - The long-winded repetitions of "a sequence, a container that supports iteration, or an iterator object" is getting to be a PITA. Many months ago I suggested factoring this out into "iterable object", where the definition of that could include being explicit about generators too (as is, I'm not sure a reader outside of PythonLabs could guess that "an iterator object" includes a generator call). - Please check my curly braces -- I'm going blind <0.9 wink>. abstract.c, PySequence_Tuple(): When PyObject_GetIter() fails, leave its error msg alone now (the msg it produces has improved since PySequence_Tuple was generalized to accept iterable objects, and PySequence_Tuple was also stomping on the msg in cases it shouldn't have even before PyObject_GetIter grew a better msg).	2001-10-26 05:06:50 +00:00
Guido van Rossum	9475a2310d	Enable GC for new-style instances. This touches lots of files, since many types were subclassable but had a xxx_dealloc function that called PyObject_DEL(self) directly instead of deferring to self->ob_type->tp_free(self). It is permissible to set tp_free in the type object directly to _PyObject_Del, for non-GC types, or to _PyObject_GC_Del, for GC types. Still, PyObject_DEL was a tad faster, so I'm fearing that our pystone rating is going down again. I'm not sure if doing something like void xxx_dealloc(PyObject *self) { if (PyXxxCheckExact(self)) PyObject_DEL(self); else self->ob_type->tp_free(self); } is any faster than always calling the else branch, so I haven't attempted that -- however those types whose own dealloc is fancier (int, float, unicode) do use this pattern.	2001-10-05 20:51:39 +00:00
Tim Peters	0ab085c4cb	Changed the dict implementation to take "string shortcuts" only when keys are true strings -- no subclasses need apply. This may be debatable. The problem is that a str subclass may very well want to override __eq__ and/or __hash__ (see the new example of case-insensitive strings in test_descr), but go-fast shortcuts for strings are ubiquitous in our dicts (and subclass overrides aren't even looked for then). Another go-fast reason for the change is that PyCheck_StringExact() is a quicker test than PyCheck_String(), and we make such a test on virtually every access to every dict. OTOH, a str subclass may also be perfectly happy using the base str eq and hash, and this change slows them a lot. But those cases are still hypothetical, while Python's own reliance on true-string dicts is not.	2001-09-14 00:25:33 +00:00
Tim Peters	b95ec09a44	Repair typo in comment.	2001-09-02 18:35:54 +00:00
Tim Peters	25786c0851	Make dictionary() a real constructor. Accepts at most one argument, "a mapping object", in the same sense dict.update(x) requires of x (that x has a keys() method and a getitem). Questionable: The other type constructors accept a keyword argument, so I did that here too (e.g., dictionary(mapping={1:2}) works). But type_call doesn't pass the keyword args to the tp_new slot (it passes NULL), it only passes them to the tp_init slot, so getting at them required adding a tp_init slot to dicts. Looks like that makes the normal case (i.e., no args at all) a little slower (the time it takes to call dict.tp_init and have it figure out there's nothing to do).	2001-09-02 08:22:48 +00:00
Neil Schemenauer	e83c00efd0	Use new GC API.	2001-08-29 23:54:21 +00:00
Martin v. Löwis	e3eb1f2b23	Patch #427190 : Implement and use METH_NOARGS and METH_O.	2001-08-16 13:15:00 +00:00
Guido van Rossum	05ac6de2d5	Add PyDict_Merge(a, b, override): PyDict_Merge(a, b, 1) is the same as PyDict_Update(a, b). PyDict_Merge(a, b, 0) does something similar but leaves existing items unchanged.	2001-08-10 20:28:28 +00:00
Tim Peters	6d6c1a35e0	Merge of descr-branch back into trunk.	2001-08-02 04:15:00 +00:00
Barry Warsaw	66a0d1d9b9	dict_update(): Generalize this method so {}.update() accepts any "mapping" object, specifically one that supports PyMapping_Keys() and PyObject_GetItem(). This allows you to say e.g. {}.update(UserDict()) We keep the special case for concrete dict objects, although that seems moderately questionable. OTOH, the code exists and works, so why change that? .update()'s docstring already claims that D.update(E) implies calling E.keys() so it's appropriate not to transform AttributeErrors in PyMapping_Keys() to TypeErrors. Patch eyeballed by Tim.	2001-06-26 20:08:32 +00:00
Tim Peters	c605784174	dict_repr: Reuse one of the int vars (minor code simplification).	2001-06-16 07:52:53 +00:00
Tim Peters	a7259597f1	SF bug 433228: repr(list) woes when len(list) big. Gave Python linear-time repr() implementations for dicts, lists, strings. This means, e.g., that repr(range(50000)) is no longer 50x slower than pprint.pprint() in 2.2 <wink>. I don't consider this a bugfix candidate, as it's a performance boost. Added _PyString_Join() to the internal string API. If we want that in the public API, fine, but then it requires runtime error checks instead of asserts.	2001-06-16 05:11:17 +00:00
Tim Peters	afb6ae8452	Store the mask instead of the size in dictobjects. The mask is more frequently used, and in particular this allows to drop the last remaining obvious time-waster in the crucial lookdict() and lookdict_string() functions. Other changes consist mostly of changing "i < ma_size" to "i <= ma_mask" everywhere.	2001-06-04 21:00:21 +00:00
Tim Peters	453163d842	lookdict: stop more insane core-dump mutating comparison cases. Should be possible to provoke unbounded recursion now, but leaving that to someone else to provoke and repair. Bugfix candidate -- although this is getting harder to backstitch, and the cases it's protecting against are mondo contrived.	2001-06-03 04:54:32 +00:00
Tim Peters	7b5d0afb1e	lookdict: Reduce obfuscating code duplication with a judicious goto. This code is likely to get even hairier to squash core dumps due to mutating comparisons, and it's hard enough to follow without that.	2001-06-03 04:14:43 +00:00
Tim Peters	19b77cfc4b	Finish the dict->string coredump fix. Need sleep. Bugfix candidate.	2001-06-02 08:27:39 +00:00
Tim Peters	23cf6be23c	Coredumpers from Michael Hudson, mutating dicts while printing or converting to string. Critical bugfix candidate -- if you take this seriously <wink>.	2001-06-02 08:02:56 +00:00
Tim Peters	f4b33f61fb	dict_popitem(): Repaired last-second 2.1 comment, which misidentified the true reason for allocating the tuple before checking the dict size.	2001-06-02 05:42:29 +00:00
Tim Peters	eb28ef209e	New collision resolution scheme: no polynomials, simpler, faster, less code, less memory. Tests have uncovered no drawbacks. Christian and Vladimir are the other two people who have burned many brain cells on the dict code in recent years, and they like the approach too, so I'm checking it in without further ado.	2001-06-02 05:27:19 +00:00
Tim Peters	15d4929ae4	Implement an old idea of Christian Tismer's: use polynomial division instead of multiplication to generate the probe sequence. The idea is recorded in Python-Dev for Dec 2000, but that version is prone to rare infinite loops. The value is in getting all the bits of the hash code to participate; and, e.g., this speeds up querying every key in a dict with keys [i << 16 for i in range(20000)] by a factor of 500. Should be equally valuable in any bad case where the high-order hash bits were getting ignored. Also wrote up some of the motivations behind Python's ever-more-subtle hash table strategy.	2001-05-27 07:39:22 +00:00
Martin v. Löwis	cd35306a25	Patch #424335 : Implement string_richcompare, remove string_compare. Use new _PyString_Eq in lookdict_string.	2001-05-24 16:56:35 +00:00
Tim Peters	f8a548c23c	dictresize(): Rebuild small tables if there are any dummies, not just if they're entirely full. Not a question of correctness, but of temporarily misplaced common sense.	2001-05-24 16:26:40 +00:00
Tim Peters	0c6010be75	Jack Jansen hit a bug in the new dict code, reported on python-dev. dictresize() was too aggressive about never ever resizing small dicts. If a small dict is entirely full, it needs to rebuild it despite that it won't actually resize it, in order to purge old dummy entries thus creating at least one virgin slot (lookdict assumes at least one such exists). Also took the opportunity to add some high-level comments to dictresize.	2001-05-23 23:33:57 +00:00
Fred Drake	0c23231f6e	Remove unused variable.	2001-05-22 22:36:52 +00:00
Tim Peters	dea48ec581	SF patch #425242 : Patch which "inlines" small dictionaries. The idea is Marc-Andre Lemburg's, the implementation is Tim's. Add a new ma_smalltable member to dictobjects, an embedded vector of MINSIZE (8) dictentry structs. Short course is that this lets us avoid additional malloc(s) for dicts with no more than 5 entries. The changes are widespread but mostly small. Long course: WRT speed, all scalar operations (getitem, setitem, delitem) on non-empty dicts benefit from no longer needing NULL-pointer checks (ma_table is never NULL anymore). Bulk operations (copy, update, resize, clearing slots during dealloc) benefit in some cases from now looping on the ma_fill count rather than on ma_size, but that was an unexpected benefit: the original reason to loop on ma_fill was to let bulk operations on empty dicts end quickly (since the NULL-pointer checks went away, empty dicts aren't special-cased any more). Special considerations: For dicts that remain empty, this change is a lose on two counts: the dict object contains 8 new dictentry slots now that weren't needed before, and dict object creation also spends time memset'ing these doomed-to-be-unsused slots to NULLs. For dicts with one or two entries that never get larger than 2, it's a mix: a malloc()/free() pair is no longer needed, and the 2-entry case gets to use 8 slots (instead of 4) thus decreasing the chance of collision. Against that, dict object creation spends time memset'ing 4 slots that aren't strictly needed in this case. For dicts with 3 through 5 entries that never get larger than 5, it's a pure win: the dict is created with all the space they need, and they never need to resize. Before they suffered two malloc()/free() calls, plus 1 dict resize, to get enough space. In addition, the 8-slot table they ended with consumed more memory overall, because of the hidden overhead due to the additional malloc. For dicts with 6 or more entries, the ma_smalltable member is wasted space, but then these are large(r) dicts so 8 slots more or less doesn't make much difference. They still benefit all the time from removing ubiquitous dynamic null-pointer checks, and get a small benefit (but relatively smaller the larger the dict) from not having to do two mallocs, two frees, and a resize on the way to getting their sixth entry. All in all it appears a small but definite general win, with larger benefits in specific cases. It's especially nice that it allowed to get rid of several branches, gotos and labels, and overall made the code smaller.	2001-05-22 20:40:22 +00:00
Tim Peters	91a364df17	Bugfix candidate. Two exceedingly unlikely errors in dictresize(): 1. The loop for finding the new size had an off-by-one error at the end (could over-index the polys[] vector). 2. The polys[] vector ended with a 0, apparently intended as a sentinel value but never used as such; i.e., it was never checked, so 0 could have been used as a polynomial. Neither bug could trigger unless a dict grew to 2**30 slots; since that would consume at least 12GB of memory just to hold the dict pointers, I'm betting it's not the cause of the bug Fred's tracking down <wink>.	2001-05-19 07:04:38 +00:00
Tim Peters	1928314ef4	Speed dictresize by collapsing its two passes into one; the reason given in the comments for using two passes was bogus, as the only object that can get decref'ed due to the copy is the dummy key, and decref'ing dummy can't have side effects (for one thing, dummy is immortal! for another, it's a string object, not a potentially dangerous user-defined object).	2001-05-17 22:25:34 +00:00
Tim Peters	342c65e19a	Aggressive reordering of dict comparisons. In case of collision, it stands to reason that me_key is much more likely to match the key we're looking for than to match dummy, and if the key is absent me_key is much more likely to be NULL than dummy: most dicts don't even have a dummy entry. Running instrumented dict code over the test suite and some apps confirmed that matching dummy was 200-300x less frequent than matching key in practice. So this reorders the tests to try the common case first. It can lose if a large dict with many collisions is mostly deleted, not resized, and then frequently searched, but that's hardly a case we should be favoring.	2001-05-13 06:43:53 +00:00
Tim Peters	2f228e75e4	Get rid of the superstitious "~" in dict hashing's "i = (~hash) & mask". The comment following used to say: /* We use ~hash instead of hash, as degenerate hash functions, such as for ints <sigh>, can have lots of leading zeros. It's not really a performance risk, but better safe than sorry. 12-Dec-00 tim: so ~hash produces lots of leading ones instead -- what's the gain? / That is, there was never a good reason for doing it. And to the contrary, as explained on Python-Dev last December, it tended to make the sum* (i + incr) & mask (which is the first table index examined in case of collison) the same "too often" across distinct hashes. Changing to the simpler "i = hash & mask" reduced the number of string-dict collisions (== # number of times we go around the lookup for-loop) from about 6 million to 5 million during a full run of the test suite (these are approximate because the test suite does some random stuff from run to run). The number of collisions in non-string dicts also decreased, but not as dramatically. Note that this may, for a given dict, change the order (wrt previous releases) of entries exposed by .keys(), .values() and .items(). A number of std tests suffered bogus failures as a result. For dicts keyed by small ints, or (less so) by characters, the order is much more likely to be in increasing order of key now; e.g., >>> d = {} >>> for i in range(10): ... d[i] = i ... >>> d {0: 0, 1: 1, 2: 2, 3: 3, 4: 4, 5: 5, 6: 6, 7: 7, 8: 8, 9: 9} >>> Unfortunately. people may latch on to that in small examples and draw a bogus conclusion. test_support.py Moved test_extcall's sortdict() into test_support, made it stronger, and imported sortdict into other std tests that needed it. test_unicode.py Excluced cp875 from the "roundtrip over range(128)" test, because cp875 doesn't have a well-defined inverse for unicode("?", "cp875"). See Python-Dev for excruciating details. Cookie.py Chaged various output functions to sort dicts before building strings from them. test_extcall Fiddled the expected-result file. This remains sensitive to native dict ordering, because, e.g., if there are multiple errors in a keyword-arg dict (and test_extcall sets up many cases like that), the specific error Python complains about first depends on native dict ordering.	2001-05-13 00:19:31 +00:00
Tim Peters	4fa58bfac2	Restore dicts' tp_compare slot, and change dict_richcompare to say it doesn't know how to do LE, LT, GE, GT. dict_richcompare can't do the latter any faster than dict_compare can. More importantly, for cmp(dict1, dict2), Python first tries rich compares with EQ, LT, and GT one at a time, even if the tp_compare slot is defined, and dict_richcompare called dict_compare for the latter two because it couldn't do them itself. The result was a lot of wasted calls to dict_compare. Now dict_richcompare gives up at once the times Python calls it with LT and GT from try_rich_to_3way_compare(), and dict_compare is called only once (when Python gets around to trying the tp_compare slot). Continued mystery: despite that this cut the number of calls to dict_compare approximately in half in test_mutants.py, the latter still runs amazingly slowly. Running under the debugger doesn't show excessive activity in the dict comparison code anymore, so I'm guessing the culprit is somewhere else -- but where? Perhaps in the element (key/value) comparison code? We clearly spend a lot of time figuring out how to compare things.	2001-05-10 21:45:19 +00:00
Tim Peters	3918fb2549	Repair typo in comment.	2001-05-10 18:58:31 +00:00
Tim Peters	95bf9390a4	SF bug #422121 Insecurities in dict comparison. Fixed a half dozen ways in which general dict comparison could crash Python (even cause Win98SE to reboot) in the presence of kay and/or value comparison routines that mutate the dict during dict comparison. Bugfix candidate.	2001-05-10 08:32:44 +00:00
Tim Peters	e63415ead8	SF patch #421922 : Implement rich comparison for dicts. d1 == d2 and d1 != d2 now work even if the keys and values in d1 and d2 don't support comparisons other than ==, and testing dicts for equality is faster now (especially when inequality obtains).	2001-05-08 04:38:29 +00:00
Guido van Rossum	b1f35bffe5	Mchael Hudson pointed out that the code for detecting changes in dictionary size was comparing ma_size, the hash table size, which is always a power of two, rather than ma_used, wich changes on each insertion or deletion. Fixed this.	2001-05-02 15:13:44 +00:00
Guido van Rossum	09e563abb4	Add experimental iterkeys(), itervalues(), iteritems() to dict objects. Tests show that iteritems() is 5-10% faster than iterating over the dict and extracting the value with dict[key].	2001-05-01 12:10:21 +00:00
Guido van Rossum	213c7a6aa5	Mondo changes to the iterator stuff, without changing how Python code sees it (test_iter.py is unchanged). - Added a tp_iternext slot, which calls the iterator's next() method; this is much faster for built-in iterators over built-in types such as lists and dicts, speeding up pybench's ForLoop with about 25% compared to Python 2.1. (Now there's a good argument for iterators. ;-) - Renamed the built-in sequence iterator SeqIter, affecting the C API functions for it. (This frees up the PyIter prefix for generic iterator operations.) - Added PyIter_Check(obj), which checks that obj's type has a tp_iternext slot and that the proper feature flag is set. - Added PyIter_Next(obj) which calls the tp_iternext slot. It has a somewhat complex return condition due to the need for speed: when it returns NULL, it may not have set an exception condition, meaning the iterator is exhausted; when the exception StopIteration is set (or a derived exception class), it means the same thing; any other exception means some other error occurred.	2001-04-23 14:08:49 +00:00
Guido van Rossum	59d1d2b434	Iterators phase 1. This comprises: new slot tp_iter in type object, plus new flag Py_TPFLAGS_HAVE_ITER new C API PyObject_GetIter(), calls tp_iter new builtin iter(), with two forms: iter(obj), and iter(function, sentinel) new internal object types iterobject and calliterobject new exception StopIteration new opcodes for "for" loops, GET_ITER and FOR_ITER (also supported by dis.py) new magic number for .pyc files new special method for instances: __iter__() returns an iterator iteration over dictionaries: "for x in dict" iterates over the keys iteration over files: "for x in file" iterates over lines TODO: documentation test suite decide whether to use a different way to spell iter(function, sentinal) decide whether "for key in dict" is a good idea use iterators in map/filter/reduce, min/max, and elsewhere (in/not in?) speed tuning (make next() a slot tp_next???)	2001-04-20 19:13:02 +00:00
Guido van Rossum	55ad67d74d	Oops. Removed dictiter_new decl that wasn't supposed to go in yet.	2001-04-20 16:52:06 +00:00

1 2 3 4 5

227 Commits