cpython

Commit Graph

Author	SHA1	Message	Date
Fred Drake	0c23231f6e	Remove unused variable.	2001-05-22 22:36:52 +00:00
Tim Peters	dea48ec581	SF patch #425242 : Patch which "inlines" small dictionaries. The idea is Marc-Andre Lemburg's, the implementation is Tim's. Add a new ma_smalltable member to dictobjects, an embedded vector of MINSIZE (8) dictentry structs. Short course is that this lets us avoid additional malloc(s) for dicts with no more than 5 entries. The changes are widespread but mostly small. Long course: WRT speed, all scalar operations (getitem, setitem, delitem) on non-empty dicts benefit from no longer needing NULL-pointer checks (ma_table is never NULL anymore). Bulk operations (copy, update, resize, clearing slots during dealloc) benefit in some cases from now looping on the ma_fill count rather than on ma_size, but that was an unexpected benefit: the original reason to loop on ma_fill was to let bulk operations on empty dicts end quickly (since the NULL-pointer checks went away, empty dicts aren't special-cased any more). Special considerations: For dicts that remain empty, this change is a lose on two counts: the dict object contains 8 new dictentry slots now that weren't needed before, and dict object creation also spends time memset'ing these doomed-to-be-unsused slots to NULLs. For dicts with one or two entries that never get larger than 2, it's a mix: a malloc()/free() pair is no longer needed, and the 2-entry case gets to use 8 slots (instead of 4) thus decreasing the chance of collision. Against that, dict object creation spends time memset'ing 4 slots that aren't strictly needed in this case. For dicts with 3 through 5 entries that never get larger than 5, it's a pure win: the dict is created with all the space they need, and they never need to resize. Before they suffered two malloc()/free() calls, plus 1 dict resize, to get enough space. In addition, the 8-slot table they ended with consumed more memory overall, because of the hidden overhead due to the additional malloc. For dicts with 6 or more entries, the ma_smalltable member is wasted space, but then these are large(r) dicts so 8 slots more or less doesn't make much difference. They still benefit all the time from removing ubiquitous dynamic null-pointer checks, and get a small benefit (but relatively smaller the larger the dict) from not having to do two mallocs, two frees, and a resize on the way to getting their sixth entry. All in all it appears a small but definite general win, with larger benefits in specific cases. It's especially nice that it allowed to get rid of several branches, gotos and labels, and overall made the code smaller.	2001-05-22 20:40:22 +00:00
Guido van Rossum	5b021848ac	file_getiter(): make iter(file) be equivalent to file.xreadlines(). This should be faster. This means: (1) "for line in file:" won't work if the xreadlines module can't be imported. (2) The body of "for line in file:" shouldn't use the file directly; the effects (e.g. of file.readline(), file.seek() or even file.tell()) would be undefined because of the buffering that goes on in the xreadlines module.	2001-05-22 16:48:37 +00:00
Guido van Rossum	0ba9e3ac27	init_name_op(): add (void) to the argument list to make it a valid prototype, for gcc -Wstrict-prototypes.	2001-05-22 02:33:08 +00:00
Marc-André Lemburg	489b56e044	This patch changes the behaviour of the UTF-16 codec family. Only the UTF-16 codec will now interpret and remove a leading BOM mark. Sub- sequent BOM characters are no longer interpreted and removed. UTF-16-LE and -BE pass through all BOM mark characters. These changes should get the UTF-16 codec more in line with what the Unicode FAQ recommends w/r to BOM marks.	2001-05-21 20:30:15 +00:00
Tim Peters	91a364df17	Bugfix candidate. Two exceedingly unlikely errors in dictresize(): 1. The loop for finding the new size had an off-by-one error at the end (could over-index the polys[] vector). 2. The polys[] vector ended with a 0, apparently intended as a sentinel value but never used as such; i.e., it was never checked, so 0 could have been used as a polynomial. Neither bug could trigger unless a dict grew to 2**30 slots; since that would consume at least 12GB of memory just to hold the dict pointers, I'm betting it's not the cause of the bug Fred's tracking down <wink>.	2001-05-19 07:04:38 +00:00
Tim Peters	1928314ef4	Speed dictresize by collapsing its two passes into one; the reason given in the comments for using two passes was bogus, as the only object that can get decref'ed due to the copy is the dummy key, and decref'ing dummy can't have side effects (for one thing, dummy is immortal! for another, it's a string object, not a potentially dangerous user-defined object).	2001-05-17 22:25:34 +00:00
Tim Peters	d7ed3bf552	Speed tuple comparisons in two ways: 1. Omit the early-out EQ/NE "lengths different?" test. Was unable to find any real code where it triggered, but it always costs. The same is not true of list richcmps, where different-size lists appeared to get compared about half the time. 2. Because tuples are immutable, there's no need to refetch the lengths of both tuples from memory again on each loop trip. BUG ALERT: The tuple (and list) richcmp algorithm is arguably wrong, because it won't believe there's any difference unless Py_EQ returns false for some corresponding elements: >>> class C: ... def __lt__(x, y): return 1 ... __eq__ = __lt__ ... >>> C() < C() 1 >>> (C(),) < (C(),) 0 >>> That doesn't make sense -- provided you believe the defn. of C makes sense.	2001-05-15 20:12:59 +00:00
Marc-André Lemburg	2d9204199f	This patch changes the way the string .encode() method works slightly and introduces a new method .decode(). The major change is that strg.encode() will no longer try to convert Unicode returns from the codec into a string, but instead pass along the Unicode object as-is. The same is now true for all other codec return types. The underlying C APIs were changed accordingly. Note that even though this does have the potential of breaking existing code, the chances are low since conversion from Unicode previously took place using the default encoding which is normally set to ASCII rendering this auto-conversion mechanism useless for most Unicode encodings. The good news is that you can now use .encode() and .decode() with much greater ease and that the door was opened for better accessibility of the builtin codecs. As demonstration of the new feature, the patch includes a few new codecs which allow string to string encoding and decoding (rot13, hex, zip, uu, base64). Written by Marc-Andre Lemburg. Copyright assigned to the PSF.	2001-05-15 12:00:02 +00:00
Tim Peters	342c65e19a	Aggressive reordering of dict comparisons. In case of collision, it stands to reason that me_key is much more likely to match the key we're looking for than to match dummy, and if the key is absent me_key is much more likely to be NULL than dummy: most dicts don't even have a dummy entry. Running instrumented dict code over the test suite and some apps confirmed that matching dummy was 200-300x less frequent than matching key in practice. So this reorders the tests to try the common case first. It can lose if a large dict with many collisions is mostly deleted, not resized, and then frequently searched, but that's hardly a case we should be favoring.	2001-05-13 06:43:53 +00:00
Tim Peters	2f228e75e4	Get rid of the superstitious "~" in dict hashing's "i = (~hash) & mask". The comment following used to say: /* We use ~hash instead of hash, as degenerate hash functions, such as for ints <sigh>, can have lots of leading zeros. It's not really a performance risk, but better safe than sorry. 12-Dec-00 tim: so ~hash produces lots of leading ones instead -- what's the gain? / That is, there was never a good reason for doing it. And to the contrary, as explained on Python-Dev last December, it tended to make the sum* (i + incr) & mask (which is the first table index examined in case of collison) the same "too often" across distinct hashes. Changing to the simpler "i = hash & mask" reduced the number of string-dict collisions (== # number of times we go around the lookup for-loop) from about 6 million to 5 million during a full run of the test suite (these are approximate because the test suite does some random stuff from run to run). The number of collisions in non-string dicts also decreased, but not as dramatically. Note that this may, for a given dict, change the order (wrt previous releases) of entries exposed by .keys(), .values() and .items(). A number of std tests suffered bogus failures as a result. For dicts keyed by small ints, or (less so) by characters, the order is much more likely to be in increasing order of key now; e.g., >>> d = {} >>> for i in range(10): ... d[i] = i ... >>> d {0: 0, 1: 1, 2: 2, 3: 3, 4: 4, 5: 5, 6: 6, 7: 7, 8: 8, 9: 9} >>> Unfortunately. people may latch on to that in small examples and draw a bogus conclusion. test_support.py Moved test_extcall's sortdict() into test_support, made it stronger, and imported sortdict into other std tests that needed it. test_unicode.py Excluced cp875 from the "roundtrip over range(128)" test, because cp875 doesn't have a well-defined inverse for unicode("?", "cp875"). See Python-Dev for excruciating details. Cookie.py Chaged various output functions to sort dicts before building strings from them. test_extcall Fiddled the expected-result file. This remains sensitive to native dict ordering, because, e.g., if there are multiple errors in a keyword-arg dict (and test_extcall sets up many cases like that), the specific error Python complains about first depends on native dict ordering.	2001-05-13 00:19:31 +00:00
Tim Peters	16cabc0a3d	Repair "module has no attribute xxx" error msg; bug introduced when switching from tp_getattr to tp_getattro.	2001-05-12 20:24:22 +00:00
Tim Peters	d85e102337	Variant of patch #423262 : Change module attribute get & set Allow module getattr and setattr to exploit string interning, via the previously null module object tp_getattro and tp_setattro slots. Yields a very nice speedup for things like random.random and os.path etc.	2001-05-11 21:51:48 +00:00
Jeremy Hylton	1b0feb4ada	Variant of SF patch 423181 For rich comparisons, use instance_getattr2() when possible to avoid the expense of setting an AttributeError. Also intern the name_op[] table and use the interned strings rather than creating a new string and interning it each time through.	2001-05-11 14:48:41 +00:00
Tim Peters	5acbfcc164	Cosmetic: code under "else" clause was missing indent.	2001-05-11 03:36:45 +00:00
Tim Peters	4fa58bfac2	Restore dicts' tp_compare slot, and change dict_richcompare to say it doesn't know how to do LE, LT, GE, GT. dict_richcompare can't do the latter any faster than dict_compare can. More importantly, for cmp(dict1, dict2), Python first tries rich compares with EQ, LT, and GT one at a time, even if the tp_compare slot is defined, and dict_richcompare called dict_compare for the latter two because it couldn't do them itself. The result was a lot of wasted calls to dict_compare. Now dict_richcompare gives up at once the times Python calls it with LT and GT from try_rich_to_3way_compare(), and dict_compare is called only once (when Python gets around to trying the tp_compare slot). Continued mystery: despite that this cut the number of calls to dict_compare approximately in half in test_mutants.py, the latter still runs amazingly slowly. Running under the debugger doesn't show excessive activity in the dict comparison code anymore, so I'm guessing the culprit is somewhere else -- but where? Perhaps in the element (key/value) comparison code? We clearly spend a lot of time figuring out how to compare things.	2001-05-10 21:45:19 +00:00
Tim Peters	3918fb2549	Repair typo in comment.	2001-05-10 18:58:31 +00:00
Tim Peters	95bf9390a4	SF bug #422121 Insecurities in dict comparison. Fixed a half dozen ways in which general dict comparison could crash Python (even cause Win98SE to reboot) in the presence of kay and/or value comparison routines that mutate the dict during dict comparison. Bugfix candidate.	2001-05-10 08:32:44 +00:00
Tim Peters	9c012af3c3	Heh. I need a break. After this: stropmodule & stringobject were more out of synch than I realized, and I managed to break replace's "count" argument when it was 0. All is well again. Maybe. Bugfix candidate.	2001-05-10 00:32:57 +00:00
Tim Peters	4cd44ef4bf	Fudge. stropmodule and stringobject both had copies of the buggy mymemXXX stuff, and they were already out of synch. Fix the remaining bugs in both and get them back in synch. Bugfix release candidate.	2001-05-10 00:05:33 +00:00
Tim Peters	1a97d5f098	SF patch #416247 2.1c1 stringobject: unused vrbl cleanup. Thanks to Mark Favas.	2001-05-09 20:06:00 +00:00
Tim Peters	4862ab7bf4	Sheesh -- repair the dodge around "cast isn't an lvalue" complaints to restore correct semantics.	2001-05-09 08:43:21 +00:00
Tim Peters	9e897f41db	Mark Favas reported that gcc caught me using casts as lvalues. Dodge it.	2001-05-09 07:37:07 +00:00
Tim Peters	b4bbcd76ea	Ack! Restore the COUNT_ALLOCS one_strings code.	2001-05-09 00:31:40 +00:00
Tim Peters	cf5ad5d6f6	My change to string_item() left an extra reference to each 1-character interned string created by "string"[i]. Since they're immortal anyway, this was hard to notice, but it was still wrong <wink>.	2001-05-09 00:24:55 +00:00
Tim Peters	5b4d477568	Intern 1-character strings as soon as they're created. As-is, they aren't interned when created, so the cached versions generally aren't ever interned. With the patch, the Py_INCREF(t); *p = t; Py_DECREF(s); return; indirection block in PyString_InternInPlace() is never executed during a full run of the test suite, but was executed very many times before. So I'm trading more work when creating one-character strings for doing less work later. Note that the "more work" here can happen at most 256 times per program run, so it's trivial. The same reasoning accounts for the patch's simplification of string_item (the new version can call PyString_FromStringAndSize() no more than 256 times per run, so there's no point to inlining that stuff -- if we were serious about saving time here, we'd pre-initialize the characters vector so that no runtime testing at all was needed!).	2001-05-08 22:33:50 +00:00
Tim Peters	72f98e9b83	SF bug #422177 : Results from .pyc differs from .py Store floats and doubles to full precision in marshal. Test that floats read from .pyc/.pyo closely match those read from .py. Declare PyFloat_AsString() in floatobject header file. Add new PyFloat_AsReprString() API function. Document the functions declared in floatobject.h.	2001-05-08 15:19:57 +00:00
Tim Peters	e63415ead8	SF patch #421922 : Implement rich comparison for dicts. d1 == d2 and d1 != d2 now work even if the keys and values in d1 and d2 don't support comparisons other than ==, and testing dicts for equality is faster now (especially when inequality obtains).	2001-05-08 04:38:29 +00:00
Jeremy Hylton	4c889011db	SF patch 419176 from MvL; fixed bug 418977 Two errors in dict_to_map() helper used by PyFrame_LocalsToFast().	2001-05-08 04:08:59 +00:00
Jeremy Hylton	d37292bb8d	Remove unused variable	2001-05-08 04:00:45 +00:00
Tim Peters	6d60b2e762	SF bug #422108 - Error in rich comparisons. 2.1.1 bugfix candidate too. Fix a bad (albeit unlikely) return value in try_rich_to_3way_compare(). Also document do_cmp()'s return values.	2001-05-07 20:53:51 +00:00
Tim Peters	cb8d368b82	Reimplement PySequence_Contains() and instance_contains(), so they work safely together and don't duplicate logic (the common logic was factored out into new private API function _PySequence_IterContains()). Visible change: some_complex_number in some_instance no longer blows up if some_instance has __getitem__ but neither __contains__ nor __iter__. test_iter changed to ensure that remains true.	2001-05-05 21:05:01 +00:00
Tim Peters	75f8e35ef4	Generalize PySequence_Count() (operator.countOf) to work with iterators.	2001-05-05 11:33:43 +00:00
Tim Peters	de9725f135	Make 'x in y' and 'x not in y' (PySequence_Contains) play nice w/ iterators. NEEDS DOC CHANGES A few more AttributeErrors turned into TypeErrors, but in test_contains this time. The full story for instance objects is pretty much unexplainable, because instance_contains() tries its own flavor of iteration-based containment testing first, and PySequence_Contains doesn't get a chance at it unless instance_contains() blows up. A consequence is that some_complex_number in some_instance dies with a TypeError unless some_instance.__class__ defines __iter__ but does not define __getitem__.	2001-05-05 10:06:17 +00:00
Tim Peters	2cfe368283	Make unicode.join() work nice with iterators. This also required a change to string.join(), so that when the latter figures out in midstream that it really needs unicode.join() instead, unicode.join() can actually get all the sequence elements (i.e., there's no guarantee that the sequence passed to string.join() can be iterated over again by unicode.join(), so string.join() must not pass on the original sequence object anymore).	2001-05-05 05:36:48 +00:00
Tim Peters	12d0a6c78a	Fix a tiny and unlikely memory leak. Was there before too, and actually several of these turned up and got fixed during the iteration crusade.	2001-05-05 04:10:25 +00:00
Tim Peters	6912d4ddf0	Generalize tuple() to work nicely with iterators. NEEDS DOC CHANGES. This one surprised me! While I expected tuple() to be a no-brainer, turns out it's actually dripping with consequences: 1. It will allow the popular PySequence_Fast() to work with any iterable object (code for that not yet checked in, but should be trivial). 2. It caused two std tests to fail. This because some places used PyTuple_Sequence() (the C spelling of tuple()) as an indirect way to test whether something is a sequence. But tuple() code only looked for the existence of sq->item to determine that, and e.g. an instance passed that test whether or not it supported the other operations tuple() needed (e.g., __len__). So some things the tests expected to fail with an AttributeError now fail with a TypeError instead. This looks like an improvement to me; e.g., test_coercion used to produce 559 TypeErrors and 2 AttributeErrors, and now they're all TypeErrors. The error details are more informative too, because the places calling this were looking for TypeErrors in order to replace the generic tuple() "not a sequence" msg with their own more specific text, and AttributeErrors snuck by that.	2001-05-05 03:56:37 +00:00
Tim Peters	f4848dac41	Make PyIter_Next() a little smarter (wrt its knowledge of iterator internals) so clients can be a lot dumber (wrt their knowledge).	2001-05-05 00:14:56 +00:00
Fred Drake	6aebded915	The weakref support in PyObject_InitVar() as well; this should have come out at the same time as it did from PyObject_Init() .	2001-05-03 20:04:33 +00:00
Fred Drake	ba40ec42c8	Remove unnecessary intialization for the case of weakly-referencable objects; the code necessary to accomplish this is simpler and faster if confined to the object implementations, so we only do this there. This causes no behaviorial changes beyond a (very slight) speedup.	2001-05-03 19:44:50 +00:00
Fred Drake	4dcb85b817	Since Py_TPFLAGS_HAVE_WEAKREFS is set in Py_TPFLAGS_DEFAULT, it does not need to be specified in the type structures independently. The flag exists only for binary compatibility. This is a "source cleanliness" issue and introduces no behavioral changes.	2001-05-03 16:04:13 +00:00
Guido van Rossum	b1f35bffe5	Mchael Hudson pointed out that the code for detecting changes in dictionary size was comparing ma_size, the hash table size, which is always a power of two, rather than ma_used, wich changes on each insertion or deletion. Fixed this.	2001-05-02 15:13:44 +00:00
Marc-André Lemburg	542fe56cb9	Fix for bug #417030 : "print '%*s' fails for unicode string"	2001-05-02 14:21:53 +00:00
Tim Peters	6ad22c41c2	Plug a memory leak in list(), when appending to the result list.	2001-05-02 07:12:39 +00:00
Tim Peters	f553f89d45	Generalize list(seq) to work with iterators. This also generalizes list() to no longer insist that len(seq) be defined. NEEDS DOC CHANGES. This is meant to be a model for how other functions of this ilk (max, filter, etc) can be generalized similarly. Feel encouraged to grab your favorite and convert it! Note some cute consequences: list(file) == file.readlines() == list(file.xreadlines()) list(dict) == dict.keys() list(dict.iteritems()) = dict.items() list(xrange(i, j, k)) == range(i, j, k)	2001-05-01 20:45:31 +00:00
Guido van Rossum	47668928e6	Discard a misleading comment about iter_iternext().	2001-05-01 17:01:25 +00:00
Guido van Rossum	4f288ab7d6	Printing objects to a real file still wasn't done right: if the object's type didn't define tp_print, there were still cases where the full "print uses str() which falls back to repr()" semantics weren't honored. This resulted in >>> print None <None object at 0x80bd674> >>> print type(u'') <type object at 0x80c0a80> Fixed this by always using the appropriate PyObject_Repr() or PyObject_Str() call, rather than trying to emulate what they would do. Also simplified PyObject_Str() to always fall back on PyObject_Repr() when tp_str is not defined (rather than making an extra check for instances with a __str__ method). And got rid of the special case for strings.	2001-05-01 16:53:37 +00:00
Guido van Rossum	189f1df301	Add a proper implementation for the tp_str slot (returning self, of course), so I can get rid of the special case for strings in PyObject_Str().	2001-05-01 16:51:53 +00:00
Guido van Rossum	09e563abb4	Add experimental iterkeys(), itervalues(), iteritems() to dict objects. Tests show that iteritems() is 5-10% faster than iterating over the dict and extracting the value with dict[key].	2001-05-01 12:10:21 +00:00
Guido van Rossum	82c690f11a	Well darnit! The innocuous fix I made to PyObject_Print() caused printing of instances not to look for __str__(). Fix this.	2001-04-30 14:39:18 +00:00

1 2 3 4 5 ...

959 Commits