Commit Graph

1150 Commits

Author SHA1 Message Date
Jeremy Hylton 910d7d46dc Remove much dead code from ceval.c
The descr changes moved the dispatch for calling objects from
call_object() in ceval.c to PyObject_Call() in abstract.c.
call_object() and the many functions it used in ceval.c were no longer
used, but were not removed.

Rename meth_call() as PyCFunction_Call() so that it can be called by
the CALL_FUNCTION opcode in ceval.c.

Also, fix error message that referred to PyEval_EvalCodeEx() by its
old name eval_code2().  (I'll probably refer to it by its old name,
too.)
2001-08-12 21:52:24 +00:00
Guido van Rossum 8e24818cf4 Make dynamic types work as intended. Or at least more so.
XXX There are still some loose ends: repr(), str(), hash() and
comparisons don't inherit a default implementation from object.  This
must be resolved similarly to the way it's resolved for classic
instances.
2001-08-12 05:17:56 +00:00
Guido van Rossum 8de8680d07 Temporary stop-gap fix for dynamic classes, so they pass the test.
XXX This is not sufficient: if a dynamic class has no __repr__ method
(for instance), but later one is added, that doesn't add a tp_repr
slot, so repr() doesn't call the __repr__ method.  To make this work,
I'll have to add default implementations of several slots to 'object'.

XXX Also, dynamic types currently only inherit slots from their
dominant base.
2001-08-12 03:43:35 +00:00
Guido van Rossum 13d52f0b32 - Big changes to fix SF bug #442833 (a nasty multiple inheritance
problem).  inherit_slots() is split in two parts: inherit_special()
  which inherits the flags and a few very special members from the
  dominant base; inherit_slots() which inherits only regular slots,
  and is now called for each base in the MRO in turn.  These are now
  both void functions since they don't have error returns.

- Added object.__setitem__() back -- for the same reason as
  object.__new__(): a subclass of object should be able to call
  object.__new__().

- add_wrappers() was moved around to be closer to where it is used (it
  was defined together with add_methods() etc., but has nothing to do
  with these).
2001-08-10 21:24:08 +00:00
Guido van Rossum 05ac6de2d5 Add PyDict_Merge(a, b, override):
PyDict_Merge(a, b, 1) is the same as PyDict_Update(a, b).
PyDict_Merge(a, b, 0) does something similar but leaves existing items
unchanged.
2001-08-10 20:28:28 +00:00
Guido van Rossum d614f97733 Change PyType_Ready() to use the READY and READYING flags. This makes
it possible to detect recursive calls early (as opposed to when the
stack overflows :-).
2001-08-10 17:39:49 +00:00
Tim Peters 772747b3f1 SF patch #438013 Remove 2-byte Py_UCS2 assumptions
Removed all instances of Py_UCS2 from the codebase, and so also (I hope)
the last remaining reliance on the platform having an integral type
with exactly 16 bits.
PyUnicode_DecodeUTF16() and PyUnicode_EncodeUTF16() now read and write
one byte at a time.
2001-08-09 22:21:55 +00:00
Guido van Rossum 29687cd211 Sigh. Strengthen the resriction of the previous checkin: tp_new is
inherited unless *both*: (a) the base type is 'object', and (b) the
subtype is not a "heap" type.
2001-08-09 19:43:37 +00:00
Guido van Rossum c11e192d41 Thinking back to the 2.22 revision, I didn't like what I did there one
bit.  For one, this class:

    class C(object):
        def __new__(myclass, ...): ...

would have no way to call the __new__ method of its base class, and
the workaround (to create an intermediate base class whose __new__ you
can call) is ugly.

So, I've come up with a better solution that restores object.__new__,
but still solves the original problem, which is that built-in and
extension types shouldn't inherit object.__new__.  The solution is
simple: only "heap types" inherit tp_new.  Simpler, less code,
perfect!
2001-08-09 19:38:15 +00:00
Guido van Rossum 29206bc8a3 Apply anonymous SF patch #441229.
Previously, f.read() and f.readlines() checked for
  errors on their file object and possibly raised an
  IOError, but f.readline() didn't. This patch makes
  f.readline() behave like the others.

Note that I've added a call to clearerr() since the other calls to
ferror() include that too.

I have no way to test this code. :-)
2001-08-09 18:14:59 +00:00
Guido van Rossum dc91b99f23 Proper support for binary operators, including true division and floor
division.  The basic binary operators now all correctly call the
__rxxx__ variant when they should.

In type_new(), I now make the new type a new-style number unless it
inherits from an old-style number that has numeric methods.

By way of cosmetics, I've changed the signatures of the SLOT<i> macros
to take actual function names and operator names as strings, rather
than rely on C preprocessor symbol manipulations.  This makes the
calls slightly more verbose, but greatly helps simple searches through
the file: you can now find out where "__radd__" is used or where the
function slot_nb_power() is defined and where it is used.
2001-08-08 22:26:22 +00:00
Jack Jansen 8e938b4257 Removed extraneous semicolons that caused a gazzilion "empty declaration" warnings in the MetroWerks compiler. 2001-08-08 15:29:49 +00:00
Guido van Rossum 4668b000a1 Implement PEP 238 in its (almost) full glory.
This introduces:

- A new operator // that means floor division (the kind of division
  where 1/2 is 0).

- The "future division" statement ("from __future__ import division)
  which changes the meaning of the / operator to implement "true
  division" (where 1/2 is 0.5).

- New overloadable operators __truediv__ and __floordiv__.

- New slots in the PyNumberMethods struct for true and floor division,
  new abstract APIs for them, new opcodes, and so on.

I emphasize that without the future division statement, the semantics
of / will remain unchanged until Python 3.0.

Not yet implemented are warnings (default off) when / is used with int
or long arguments.

This has been on display since 7/31 as SF patch #443474.

Flames to /dev/null.
2001-08-08 05:00:18 +00:00
Guido van Rossum 528b7eb0b0 - Rename PyType_InitDict() to PyType_Ready().
- Add an explicit call to PyType_Ready(&PyList_Type) to pythonrun.c
  (just for the heck of it, really -- we should either explicitly
  ready all types, or none).
2001-08-07 17:24:28 +00:00
Guido van Rossum f040ede6e8 Cosmetics:
- Add comment blocks explaining add_operators() and override_slots().
  (This file could use some more explaining, but this is all I had
  breath for today. :)

- Renamed the argument 'base' of add_wrappers() to 'wraps' because
  it's not a base class (which is what the 'base' identifier is used
  for elsewhere).

Small nits:

- Fix add_tp_new_wrapper() to avoid overwriting an existing __new__
  descriptor in tp_defined.

- In add_operators(), check the return value of add_tp_new_wrapper().

Functional change:

- Remove the tp_new functionality from PyBaseObject_Type; this means
  you can no longer instantiate the 'object' type.  It's only useful
  as a base class.

- To make up for the above loss, add tp_new to dynamic types.  This
  has to be done in a hackish way (after override_slots() has been
  called, with an explicit call to add_tp_new_wrapper() at the very
  end) because otherwise I ran into recursive calls of slot_tp_new().
  Sigh.
2001-08-07 16:40:56 +00:00
Guido van Rossum 63e0a64562 Remove spurious "closed" attribute definition from the memberlist
table.  (reported as an aside in SF #446049).
2001-08-06 18:51:38 +00:00
Guido van Rossum 0d231eda52 A totally new way to do the __new__ wrapper. This should address the
problem brought up in SF bug #444229.
2001-08-06 16:50:37 +00:00
Guido van Rossum 2b8d7bdd77 Fix SF #442791 (revisited): No __delitem__ wrapper was defined. 2001-08-02 15:31:58 +00:00
Tim Peters 5962cbf5ba Fix the test_weakref.py failure. Introduced by resolving "a conflict"
(which didn't actually exist!) incorrectly.
2001-08-02 04:45:20 +00:00
Tim Peters 6d6c1a35e0 Merge of descr-branch back into trunk. 2001-08-02 04:15:00 +00:00
Jeremy Hylton 3ce45389bd Add _PyUnicode_AsDefaultEncodedString to unicodeobject.h.
And remove all the extern decls in the middle of .c files.
Apparently, it was excluded from the header file because it is
intended for internal use by the interpreter.  It's still intended for
internal use and documented as such in the header file.
2001-07-30 22:34:24 +00:00
Tim Peters 7321ec437b SF bug #444510: int() should guarantee truncation.
It's guaranteed now, assuming the platform modf() works correctly.
2001-07-26 20:02:17 +00:00
Marc-André Lemburg 80d1dd5f3b Fix for bug #444493: u'\U00010001' segfaults with current CVS on
wide builds.
2001-07-25 16:05:59 +00:00
Marc-André Lemburg 6c6bfb7c70 Make the unicode-escape and the UTF-16 codecs handle surrogates
correctly and thus roundtrip-safe.

Some minor cleanups of the code.

Added tests for the roundtrip-safety.
2001-07-20 17:39:11 +00:00
Guido van Rossum 0d42e0c54a #ifdef out generation of \U escapes unless Py_UNICODE_WIDE. This
#caused warnings with the VMS C compiler.  (SF bug #442998, in part.)
On a narrow system the current code should never be executed since ch
will always be < 0x10000.

Marc-Andre: you may end up fixing this a different way, since I
believe you have plans to generate \U for surrogate pairs.  I'll leave
that to you.
2001-07-20 16:36:21 +00:00
Fred Drake 1bc8fab0e7 Kill more warnings from the SGI compiler.
Part of SF patch #434992.
2001-07-19 21:49:38 +00:00
Tim Peters 0d5dd68692 Python.h: Don't attempt to redefine NDEBUG if it's already defined.
Others:  Remove redundant includes of assert.h.
2001-07-15 18:38:47 +00:00
Tim Peters 586b2e3f3d long_format: Simplify the overly elaborate base-is-a-power-of-2 code. 2001-07-15 09:11:14 +00:00
Guido van Rossum 3c29602d74 _Py_GetObjects(): GCC suggests to add () around && within || for some
code only compiled in debug mode, and I dutifully comply.
2001-07-14 17:58:00 +00:00
Tim Peters 212e614f60 divrem1 & long_format: found a clean way to factor divrem1 so that
long_format can reuse a scratch area for its repeated divisions (instead
of malloc/free for every digit produced); speeds str(long)/repr(long).
2001-07-14 12:23:19 +00:00
Tim Peters c8a6b9b6d6 long_format(): Simplify new code a bit. 2001-07-14 11:01:28 +00:00
Tim Peters fad225f1ba long_format(): Easy speedup for output bases that aren't a power of 2 (in
particular, str(long) and repr(long) use base 10, and that gets a factor
of 4 speedup).  Another factor of 2 can be gotten by refactoring divrem1 to
support in-place division, but that started getting messy so I'm leaving
that out.
2001-07-13 02:59:26 +00:00
Neil Schemenauer 10c6692774 GC for method objects. 2001-07-12 13:27:35 +00:00
Neil Schemenauer 7eac9b72d4 GC for iterator objects. 2001-07-12 13:27:25 +00:00
Neil Schemenauer 19cd292bbc GC for frame objects. 2001-07-12 13:27:11 +00:00
Guido van Rossum 0ec9abaa2b On long to the negative long power, let float handle it instead of
raising an error.  This was one of the two issues that the VPython
folks were particularly problematic for their students.  (The other
one was integer division...)  This implements (my) SF patch #440487.
2001-07-12 11:21:17 +00:00
Guido van Rossum b82fedc7d8 On int to the negative integral power, let float handle it instead of
raising an error.  This was one of the two issues that the VPython
folks were particularly problematic for their students.  (The other
one was integer division...)  This implements (my) SF patch #440487.
2001-07-12 11:19:45 +00:00
Thomas Wouters efafcea280 Re-add 'advanced' xrange features, adding DeprecationWarnings as discussed
on python-dev. The features will still vanish, however, just one release
later.
2001-07-09 12:30:54 +00:00
Tim Peters 6ee4234802 SF bug #439104: Tuple richcompares has code-typo.
Symptom:  (1, 2, 3) <= (1, 2) returned 1.
This was already fixed in CVS for tuples, but an isomorphic error was in
the list richcompare code.
2001-07-06 17:45:43 +00:00
Guido van Rossum 3f56166b1a Rip out the fancy behaviors of xrange that nobody uses: repeat, slice,
contains, tolist(), and the start/stop/step attributes.  This includes
removing the 4th ('repeat') argument to PyRange_New().
2001-07-05 13:27:48 +00:00
Fredrik Lundh 72b068566a removed "register const" from scalar arguments to the unicode
predicates
2001-06-27 22:08:26 +00:00
Fredrik Lundh 8f4558583f use Py_UNICODE_WIDE instead of USE_UCS4_STORAGE and Py_UNICODE_SIZE
tests.
2001-06-27 18:59:43 +00:00
Martin v. Löwis ce9b5a55e1 Encode surrogates in UTF-8 even for a wide Py_UNICODE.
Implement sys.maxunicode.
Explicitly wrap around upper/lower computations for wide Py_UNICODE.
When decoding large characters with UTF-8, represent expected test
results using the \U notation.
2001-06-27 06:28:56 +00:00
Martin v. Löwis ac93bc2501 When decoding UTF-16, don't assume that the buffer is in native endianness
when checking surrogates.
2001-06-26 22:43:40 +00:00
Martin v. Löwis 0ba70cc3c8 Support using UCS-4 as the Py_UNICODE type:
Add configure option --enable-unicode.
Add config.h macros Py_USING_UNICODE, PY_UNICODE_TYPE, Py_UNICODE_SIZE,
                    SIZEOF_WCHAR_T.
Define Py_UCS2.
Encode and decode large UTF-8 characters into single Py_UNICODE values
for wide Unicode types; likewise for UTF-16.
Remove test whether sizeof Py_UNICODE is two.
2001-06-26 22:22:37 +00:00
Fredrik Lundh ee13dba1aa more unicode tweaks: fix unicodectype for sizeof(Py_UNICODE) >
sizeof(int)
2001-06-26 20:36:12 +00:00
Barry Warsaw 66a0d1d9b9 dict_update(): Generalize this method so {}.update() accepts any
"mapping" object, specifically one that supports PyMapping_Keys() and
PyObject_GetItem().  This allows you to say e.g. {}.update(UserDict())

We keep the special case for concrete dict objects, although that
seems moderately questionable.  OTOH, the code exists and works, so
why change that?

.update()'s docstring already claims that D.update(E) implies calling
E.keys() so it's appropriate not to transform AttributeErrors in
PyMapping_Keys() to TypeErrors.

Patch eyeballed by Tim.
2001-06-26 20:08:32 +00:00
Fredrik Lundh 1294ad0c59 experimental UCS-4 support: added USE_UCS4_STORAGE define to
unicodeobject.h, which forces sizeof(Py_UNICODE) == sizeof(Py_UCS4).
(this may be good enough for platforms that doesn't have a 16-bit
type.  the UTF-16 codecs don't work, though)
2001-06-26 17:17:07 +00:00
Fredrik Lundh 45714e9ecb experimental UCS-4 support: made compare a bit more robust, in case
sizeof(Py_UNICODE) >= sizeof(long).  also changed surrogate expansion
to work if sizeof(Py_UNICODE) > 2.
2001-06-26 16:39:36 +00:00
Fredrik Lundh 3083163dc1 experimental UCS-4 support: don't assume that MS_WIN32 implies
HAVE_USABLE_WCHAR_T
2001-06-26 15:11:00 +00:00
Tim Peters 8c96369513 PyFrameObject: rename f_stackbottom to f_stacktop, since it points to
the next free valuestack slot, not to the base (in America, stacks push
and pop at the top -- they mutate at the bottom in Australia <winK>).
eval_frame():  assert that f_stacktop isn't NULL upon entry.
frame_delloc():  avoid ordered pointer comparisons involving f_stacktop
when f_stacktop is NULL.
2001-06-23 05:26:56 +00:00
Tim Peters 5ca576ed0a Merging the gen-branch into the main line, at Guido's direction. Yay!
Bugfix candidate in inspect.py:  it was referencing "self" outside of
a method.
2001-06-18 22:08:13 +00:00
Tim Peters 1dad6a86de SF bug 434186: 0x80000000/2 != 0x80000000>>1
i_divmod:  New and simpler algorithm.  Old one returned gibberish on most
boxes when the numerator was -sys.maxint-1.  Oddly enough, it worked in the
release (not debug) build on Windows, because the compiler optimized away
some tricky sign manipulations that were incorrect in this case.
Makes you wonder <wink> ...
Bugfix candidate.
2001-06-18 19:21:11 +00:00
Tim Peters 70128a1ba6 PyLong_{As, From}VoidPtr: cleanup, replacing assumptions in comments with
#if/#error constructs.
2001-06-16 08:48:40 +00:00
Tim Peters c605784174 dict_repr: Reuse one of the int vars (minor code simplification). 2001-06-16 07:52:53 +00:00
Tim Peters 52e155e31b Reformat decl of new _PyString_Join. Add NEWS blurb about repr() speedup. 2001-06-16 05:42:57 +00:00
Tim Peters a7259597f1 SF bug 433228: repr(list) woes when len(list) big.
Gave Python linear-time repr() implementations for dicts, lists, strings.
This means, e.g., that repr(range(50000)) is no longer 50x slower than
pprint.pprint() in 2.2 <wink>.

I don't consider this a bugfix candidate, as it's a performance boost.

Added _PyString_Join() to the internal string API.  If we want that in the
public API, fine, but then it requires runtime error checks instead of
asserts.
2001-06-16 05:11:17 +00:00
Tim Peters cf37dfc3db Change IS_LITTLE_ENDIAN macro -- a little faster now. 2001-06-14 18:42:50 +00:00
Guido van Rossum ad98db1d9e Fix a mis-indentation in _PyUnicode_New() that caused me to stare at
some code for longer than needed.
2001-06-14 17:52:02 +00:00
Tim Peters ede0509111 _PyLong_AsByteArray: simplify the logic for dealing with the most-
significant digits sign bits.  Again no change in semantics.
2001-06-14 08:53:38 +00:00
Tim Peters ce9de2f79a PyLong_From{Unsigned,}Long: count the # of digits first, so no more space
is allocated than needed (used to allocate 80 bytes of digit space no
matter how small the long input).  This also runs faster, at least on 32-
bit boxes.
2001-06-14 04:56:19 +00:00
Tim Peters f251d06a66 _PyLong_FromByteArray: changed decl of "carry" to match "thisbyte". No
semantic change, but a bit clearer and may help a really stupid compiler
avoid pointless runtime length conversions.
2001-06-13 21:09:15 +00:00
Tim Peters 05607ad4fd _PyLong_AsByteArray: Don't do the "delicate overflow" check unless it's
truly needed; usually saves a little time, but no change in semantics.
2001-06-13 21:01:27 +00:00
Tim Peters 898cf85c25 _PyLong_AsByteArray: added assert that the input is normalized. This is
outside the function's control, but is crucial to correct operation.
2001-06-13 20:50:08 +00:00
Tim Peters 9cb0c38fff PyLong_As{Unsigned,}LongLong: fiddled final result casting. 2001-06-13 20:45:17 +00:00
Tim Peters d1a7da6c0d longobject.c:
Replaced PyLong_{As,From}{Unsigned,}LongLong guts with calls
    to _PyLong_{As,From}ByteArray.
_testcapimodule.c:
    Added strong tests of PyLong_{As,From}{Unsigned,}LongLong.

Fixes SF bug #432552 PyLong_AsLongLong() problems.
Possible bugfix candidate, but the fix relies on code added to longobject
to support the new q/Q structmodule format codes.
2001-06-13 00:35:57 +00:00
Tim Peters 8bc84b4391 _PyLong_{As,From}ByteArray: Minor code rearrangement aimed at improving
clarity.  Should have no effect visible to callers.
2001-06-12 19:17:03 +00:00
Marc-André Lemburg 8c2133da7b Fix for bug #432384: Recursion in PyString_AsEncodedString? 2001-06-12 13:14:10 +00:00
Tim Peters 7a3bfc3a47 Added q/Q standard (x-platform 8-byte ints) mode in struct module.
This completes the q/Q project.

longobject.c _PyLong_AsByteArray:  The original code had a gross bug:
the most-significant Python digit doesn't necessarily have SHIFT
significant bits, and you really need to count how many copies of the sign
bit it has else spurious overflow errors result.

test_struct.py:  This now does exhaustive std q/Q testing at, and on both
sides of, all relevant power-of-2 boundaries, both positive and negative.

NEWS:  Added brief dict news while I was at it.
2001-06-12 01:22:22 +00:00
Tim Peters 2a9b367385 Two new private longobject API functions,
_PyLong_FromByteArray
    _PyLong_AsByteArray
Untested and probably buggy -- they compile OK, but nothing calls them
yet.  Will soon be called by the struct module, to implement x-platform
'q' and 'Q'.
If other people have uses for them, we could move them into the public API.
See longobject.h for usage details.
2001-06-11 21:23:58 +00:00
Jack Jansen fcc54cab10 Added a missing cast to the hashfunc initializer. 2001-06-10 21:43:28 +00:00
Martin v. Löwis 0163d6d6ef Patch #424475: Speed-up tp_compare usage, by special-casing the common
case of objects with equal types which support tp_compare. Give
type objects a tp_compare function.
Also add c<0 tests before a few PyErr_Occurred tests.
2001-06-09 07:34:05 +00:00
Marc-André Lemburg 8879a33613 Fixes [ #430986 ] Buglet in PyUnicode_FromUnicode. 2001-06-07 12:26:56 +00:00
Tim Peters afb6ae8452 Store the mask instead of the size in dictobjects. The mask is more
frequently used, and in particular this allows to drop the last
remaining obvious time-waster in the crucial lookdict() and
lookdict_string() functions.  Other changes consist mostly of changing
"i < ma_size" to "i <= ma_mask" everywhere.
2001-06-04 21:00:21 +00:00
Tim Peters 453163d842 lookdict: stop more insane core-dump mutating comparison cases. Should
be possible to provoke unbounded recursion now, but leaving that to someone
else to provoke and repair.
Bugfix candidate -- although this is getting harder to backstitch, and the
cases it's protecting against are mondo contrived.
2001-06-03 04:54:32 +00:00
Tim Peters 7b5d0afb1e lookdict: Reduce obfuscating code duplication with a judicious goto.
This code is likely to get even hairier to squash core dumps due to
mutating comparisons, and it's hard enough to follow without that.
2001-06-03 04:14:43 +00:00
Tim Peters 19b77cfc4b Finish the dict->string coredump fix. Need sleep.
Bugfix candidate.
2001-06-02 08:27:39 +00:00
Tim Peters 23cf6be23c Coredumpers from Michael Hudson, mutating dicts while printing or
converting to string.
Critical bugfix candidate -- if you take this seriously <wink>.
2001-06-02 08:02:56 +00:00
Tim Peters f4b33f61fb dict_popitem(): Repaired last-second 2.1 comment, which misidentified the
true reason for allocating the tuple before checking the dict size.
2001-06-02 05:42:29 +00:00
Tim Peters eb28ef209e New collision resolution scheme: no polynomials, simpler, faster, less
code, less memory.  Tests have uncovered no drawbacks.  Christian and
Vladimir are the other two people who have burned many brain cells on the
dict code in recent years, and they like the approach too, so I'm checking
it in without further ado.
2001-06-02 05:27:19 +00:00
Jeremy Hylton 9cea41c195 fix bogus indentation 2001-05-29 17:13:15 +00:00
Thomas Wouters 0dcea5973d _PyTuple_Resize: guard against PyTuple_New() returning NULL, using Tim's
suggestion (modulo style).
2001-05-29 07:58:45 +00:00
Tim Peters 4324aa3572 Cruft cleanup: Removed the unused last_is_sticky argument from the internal
_PyTuple_Resize().
2001-05-28 22:30:08 +00:00
Thomas Wouters 6a922372ad _PyTuple_Resize: take into account the empty tuple. There can be only one.
Instead of raising a SystemError, just create a new tuple of the desired
size.

This fixes (at least) SF bug #420343.
2001-05-28 13:11:02 +00:00
Tim Peters 15d4929ae4 Implement an old idea of Christian Tismer's: use polynomial division
instead of multiplication to generate the probe sequence.  The idea is
recorded in Python-Dev for Dec 2000, but that version is prone to rare
infinite loops.

The value is in getting *all* the bits of the hash code to participate;
and, e.g., this speeds up querying every key in a dict with keys
 [i << 16 for i in range(20000)] by a factor of 500.  Should be equally
valuable in any bad case where the high-order hash bits were getting
ignored.

Also wrote up some of the motivations behind Python's ever-more-subtle
hash table strategy.
2001-05-27 07:39:22 +00:00
Tim Peters 1af03e98d9 Change list.extend() error msgs and NEWS to reflect that list.extend()
now takes any iterable argument, not only sequences.

NEEDS DOC CHANGES -- but I don't think we settled on a concise way to
say this stuff.
2001-05-26 19:37:54 +00:00
Tim Peters 442914d265 Cruft cleanup: removed the #ifdef'ery in support of compiling to allow
multi-argument list.append(1, 2, 3) (as opposed to .append((1,2,3))).
2001-05-26 05:50:03 +00:00
Tim Peters 65b8b84839 roundupsize() and friends: fiddle over-allocation strategy for list
resizing.

Accurate timings are impossible on my Win98SE box, but this is obviously
faster even on this box for reasonable list.append() cases.  I give
credit for this not to the resizing strategy but to getting rid of integer
multiplication and divsion (in favor of shifting) when computing the
rounded-up size.

For unreasonable list.append() cases, Win98SE now displays linear behavior
for one-at-time appends up to a list with about 35 million elements.  Then
it dies with a MemoryError, due to fatally fragmented *address space*
(there's plenty of VM available, but by this point Win9X has broken user
space into many distinct heaps none of which has enough contiguous space
left to resize the list, and for whatever reason Win9x isn't coalescing
the dead heaps).  Before the patch it got a MemoryError for the same
reason, but once the list reached about 2 million elements.

Haven't yet tried on Win2K but have high hopes extreme list.append()
will be much better behaved now (NT & Win2K didn't fragment address space,
but suffered obvious quadratic-time behavior before as lists got large).

For other systems I'm relying on common sense:  replacing integer * and /
by << and >> can't plausibly hurt, the number of function calls hasn't
changed, and the total operation count for reasonably small lists is about
the same (while the operations are cheaper now).
2001-05-26 05:28:40 +00:00
Martin v. Löwis cd35306a25 Patch #424335: Implement string_richcompare, remove string_compare.
Use new _PyString_Eq in lookdict_string.
2001-05-24 16:56:35 +00:00
Tim Peters f8a548c23c dictresize(): Rebuild small tables if there are any dummies, not just if
they're entirely full.  Not a question of correctness, but of temporarily
misplaced common sense.
2001-05-24 16:26:40 +00:00
Tim Peters 0c6010be75 Jack Jansen hit a bug in the new dict code, reported on python-dev.
dictresize() was too aggressive about never ever resizing small dicts.
If a small dict is entirely full, it needs to rebuild it despite that
it won't actually resize it, in order to purge old dummy entries thus
creating at least one virgin slot (lookdict assumes at least one such
exists).

Also took the opportunity to add some high-level comments to dictresize.
2001-05-23 23:33:57 +00:00
Fred Drake 0c23231f6e Remove unused variable. 2001-05-22 22:36:52 +00:00
Tim Peters dea48ec581 SF patch #425242: Patch which "inlines" small dictionaries.
The idea is Marc-Andre Lemburg's, the implementation is Tim's.
Add a new ma_smalltable member to dictobjects, an embedded vector of
MINSIZE (8) dictentry structs.  Short course is that this lets us avoid
additional malloc(s) for dicts with no more than 5 entries.

The changes are widespread but mostly small.

Long course:  WRT speed, all scalar operations (getitem, setitem, delitem)
on non-empty dicts benefit from no longer needing NULL-pointer checks
(ma_table is never NULL anymore).  Bulk operations (copy, update, resize,
clearing slots during dealloc) benefit in some cases from now looping
on the ma_fill count rather than on ma_size, but that was an unexpected
benefit:  the original reason to loop on ma_fill was to let bulk
operations on empty dicts end quickly (since the NULL-pointer checks
went away, empty dicts aren't special-cased any more).

Special considerations:

For dicts that remain empty, this change is a lose on two counts:
the dict object contains 8 new dictentry slots now that weren't
needed before, and dict object creation also spends time memset'ing
these doomed-to-be-unsused slots to NULLs.

For dicts with one or two entries that never get larger than 2, it's
a mix:  a malloc()/free() pair is no longer needed, and the 2-entry case
gets to use 8 slots (instead of 4) thus decreasing the chance of
collision.  Against that, dict object creation spends time memset'ing
4 slots that aren't strictly needed in this case.

For dicts with 3 through 5 entries that never get larger than 5, it's a
pure win:  the dict is created with all the space they need, and they
never need to resize.  Before they suffered two malloc()/free() calls,
plus 1 dict resize, to get enough space.  In addition, the 8-slot
table they ended with consumed more memory overall, because of the
hidden overhead due to the additional malloc.

For dicts with 6 or more entries, the ma_smalltable member is wasted
space, but then these are large(r) dicts so 8 slots more or less doesn't
make much difference.  They still benefit all the time from removing
ubiquitous dynamic null-pointer checks, and get a small benefit (but
relatively smaller the larger the dict) from not having to do two
mallocs, two frees, and a resize on the way *to* getting their sixth
entry.

All in all it appears a small but definite general win, with larger
benefits in specific cases.  It's especially nice that it allowed to
get rid of several branches, gotos and labels, and overall made the
code smaller.
2001-05-22 20:40:22 +00:00
Guido van Rossum 5b021848ac file_getiter(): make iter(file) be equivalent to file.xreadlines().
This should be faster.

This means:

(1) "for line in file:" won't work if the xreadlines module can't be
    imported.

(2) The body of "for line in file:" shouldn't use the file directly;
    the effects (e.g. of file.readline(), file.seek() or even
    file.tell()) would be undefined because of the buffering that goes
    on in the xreadlines module.
2001-05-22 16:48:37 +00:00
Guido van Rossum 0ba9e3ac27 init_name_op(): add (void) to the argument list to make it a valid
prototype, for gcc -Wstrict-prototypes.
2001-05-22 02:33:08 +00:00
Marc-André Lemburg 489b56e044 This patch changes the behaviour of the UTF-16 codec family. Only the
UTF-16 codec will now interpret and remove a *leading* BOM mark. Sub-
sequent BOM characters are no longer interpreted and removed.
UTF-16-LE and -BE pass through all BOM mark characters.

These changes should get the UTF-16 codec more in line with what
the Unicode FAQ recommends w/r to BOM marks.
2001-05-21 20:30:15 +00:00
Tim Peters 91a364df17 Bugfix candidate.
Two exceedingly unlikely errors in dictresize():
1. The loop for finding the new size had an off-by-one error at the
   end (could over-index the polys[] vector).
2. The polys[] vector ended with a 0, apparently intended as a sentinel
   value but never used as such; i.e., it was never checked, so 0 could
   have been used *as* a polynomial.
Neither bug could trigger unless a dict grew to 2**30 slots; since that
would consume at least 12GB of memory just to hold the dict pointers,
I'm betting it's not the cause of the bug Fred's tracking down <wink>.
2001-05-19 07:04:38 +00:00
Tim Peters 1928314ef4 Speed dictresize by collapsing its two passes into one; the reason given
in the comments for using two passes was bogus, as the only object that
can get decref'ed due to the copy is the dummy key, and decref'ing dummy
can't have side effects (for one thing, dummy is immortal!  for another,
it's a string object, not a potentially dangerous user-defined object).
2001-05-17 22:25:34 +00:00
Tim Peters d7ed3bf552 Speed tuple comparisons in two ways:
1. Omit the early-out EQ/NE "lengths different?" test.  Was unable to find
   any real code where it triggered, but it always costs.  The same is not
   true of list richcmps, where different-size lists appeared to get
   compared about half the time.
2. Because tuples are immutable, there's no need to refetch the lengths of
   both tuples from memory again on each loop trip.

BUG ALERT:  The tuple (and list) richcmp algorithm is arguably wrong,
because it won't believe there's any difference unless Py_EQ returns false
for some corresponding elements:

>>> class C:
...     def __lt__(x, y): return 1
...     __eq__ = __lt__
...
>>> C() < C()
1
>>> (C(),) < (C(),)
0
>>>

That doesn't make sense -- provided you believe the defn. of C makes sense.
2001-05-15 20:12:59 +00:00
Marc-André Lemburg 2d9204199f This patch changes the way the string .encode() method works slightly
and introduces a new method .decode().

The major change is that strg.encode() will no longer try to convert
Unicode returns from the codec into a string, but instead pass along
the Unicode object as-is. The same is now true for all other codec
return types. The underlying C APIs were changed accordingly.

Note that even though this does have the potential of breaking
existing code, the chances are low since conversion from Unicode
previously took place using the default encoding which is normally
set to ASCII rendering this auto-conversion mechanism useless for
most Unicode encodings.

The good news is that you can now use .encode() and .decode() with
much greater ease and that the door was opened for better accessibility
of the builtin codecs.

As demonstration of the new feature, the patch includes a few new
codecs which allow string to string encoding and decoding (rot13,
hex, zip, uu, base64).

Written by Marc-Andre Lemburg. Copyright assigned to the PSF.
2001-05-15 12:00:02 +00:00
Tim Peters 342c65e19a Aggressive reordering of dict comparisons. In case of collision, it stands
to reason that me_key is much more likely to match the key we're looking
for than to match dummy, and if the key is absent me_key is much more
likely to be NULL than dummy:  most dicts don't even have a dummy entry.
Running instrumented dict code over the test suite and some apps confirmed
that matching dummy was 200-300x less frequent than matching key in
practice.  So this reorders the tests to try the common case first.
It can lose if a large dict with many collisions is mostly deleted, not
resized, and then frequently searched, but that's hardly a case we
should be favoring.
2001-05-13 06:43:53 +00:00
Tim Peters 2f228e75e4 Get rid of the superstitious "~" in dict hashing's "i = (~hash) & mask".
The comment following used to say:
	/* We use ~hash instead of hash, as degenerate hash functions, such
	   as for ints <sigh>, can have lots of leading zeros. It's not
	   really a performance risk, but better safe than sorry.
	   12-Dec-00 tim:  so ~hash produces lots of leading ones instead --
	   what's the gain? */
That is, there was never a good reason for doing it.  And to the contrary,
as explained on Python-Dev last December, it tended to make the *sum*
(i + incr) & mask (which is the first table index examined in case of
collison) the same "too often" across distinct hashes.

Changing to the simpler "i = hash & mask" reduced the number of string-dict
collisions (== # number of times we go around the lookup for-loop) from about
6 million to 5 million during a full run of the test suite (these are
approximate because the test suite does some random stuff from run to run).
The number of collisions in non-string dicts also decreased, but not as
dramatically.

Note that this may, for a given dict, change the order (wrt previous
releases) of entries exposed by .keys(), .values() and .items().  A number
of std tests suffered bogus failures as a result.  For dicts keyed by
small ints, or (less so) by characters, the order is much more likely to be
in increasing order of key now; e.g.,

>>> d = {}
>>> for i in range(10):
...    d[i] = i
...
>>> d
{0: 0, 1: 1, 2: 2, 3: 3, 4: 4, 5: 5, 6: 6, 7: 7, 8: 8, 9: 9}
>>>

Unfortunately. people may latch on to that in small examples and draw a
bogus conclusion.

test_support.py
    Moved test_extcall's sortdict() into test_support, made it stronger,
    and imported sortdict into other std tests that needed it.
test_unicode.py
    Excluced cp875 from the "roundtrip over range(128)" test, because
    cp875 doesn't have a well-defined inverse for unicode("?", "cp875").
    See Python-Dev for excruciating details.
Cookie.py
    Chaged various output functions to sort dicts before building
    strings from them.
test_extcall
    Fiddled the expected-result file.  This remains sensitive to native
    dict ordering, because, e.g., if there are multiple errors in a
    keyword-arg dict (and test_extcall sets up many cases like that), the
    specific error Python complains about first depends on native dict
    ordering.
2001-05-13 00:19:31 +00:00
Tim Peters 16cabc0a3d Repair "module has no attribute xxx" error msg; bug introduced when
switching from tp_getattr to tp_getattro.
2001-05-12 20:24:22 +00:00
Tim Peters d85e102337 Variant of patch #423262: Change module attribute get & set
Allow module getattr and setattr to exploit string interning, via the
previously null module object tp_getattro and tp_setattro slots.   Yields
a very nice speedup for things like random.random and os.path etc.
2001-05-11 21:51:48 +00:00
Jeremy Hylton 1b0feb4ada Variant of SF patch 423181
For rich comparisons, use instance_getattr2() when possible to avoid
the expense of setting an AttributeError.  Also intern the name_op[]
table and use the interned strings rather than creating a new string
and interning it each time through.
2001-05-11 14:48:41 +00:00
Tim Peters 5acbfcc164 Cosmetic: code under "else" clause was missing indent. 2001-05-11 03:36:45 +00:00
Tim Peters 4fa58bfac2 Restore dicts' tp_compare slot, and change dict_richcompare to say it
doesn't know how to do LE, LT, GE, GT.  dict_richcompare can't do the
latter any faster than dict_compare can.  More importantly, for
cmp(dict1, dict2), Python *first* tries rich compares with EQ, LT, and
GT one at a time, even if the tp_compare slot is defined, and
dict_richcompare called dict_compare for the latter two because
it couldn't do them itself.  The result was a lot of wasted calls to
dict_compare.  Now dict_richcompare gives up at once the times Python
calls it with LT and GT from try_rich_to_3way_compare(), and dict_compare
is called only once (when Python gets around to trying the tp_compare
slot).
Continued mystery:  despite that this cut the number of calls to
dict_compare approximately in half in test_mutants.py, the latter still
runs amazingly slowly.  Running under the debugger doesn't show excessive
activity in the dict comparison code anymore, so I'm guessing the culprit
is somewhere else -- but where?  Perhaps in the element (key/value)
comparison code?  We clearly spend a lot of time figuring out how to
compare things.
2001-05-10 21:45:19 +00:00
Tim Peters 3918fb2549 Repair typo in comment. 2001-05-10 18:58:31 +00:00
Tim Peters 95bf9390a4 SF bug #422121 Insecurities in dict comparison.
Fixed a half dozen ways in which general dict comparison could crash
Python (even cause Win98SE to reboot) in the presence of kay and/or
value comparison routines that mutate the dict during dict comparison.
Bugfix candidate.
2001-05-10 08:32:44 +00:00
Tim Peters 9c012af3c3 Heh. I need a break. After this: stropmodule & stringobject were more
out of synch than I realized, and I managed to break replace's "count"
argument when it was 0.  All is well again.  Maybe.
Bugfix candidate.
2001-05-10 00:32:57 +00:00
Tim Peters 4cd44ef4bf Fudge. stropmodule and stringobject both had copies of the buggy
mymemXXX stuff, and they were already out of synch.  Fix the remaining
bugs in both and get them back in synch.
Bugfix release candidate.
2001-05-10 00:05:33 +00:00
Tim Peters 1a97d5f098 SF patch #416247 2.1c1 stringobject: unused vrbl cleanup.
Thanks to Mark Favas.
2001-05-09 20:06:00 +00:00
Tim Peters 4862ab7bf4 Sheesh -- repair the dodge around "cast isn't an lvalue" complaints to
restore correct semantics.
2001-05-09 08:43:21 +00:00
Tim Peters 9e897f41db Mark Favas reported that gcc caught me using casts as lvalues. Dodge it. 2001-05-09 07:37:07 +00:00
Tim Peters b4bbcd76ea Ack! Restore the COUNT_ALLOCS one_strings code. 2001-05-09 00:31:40 +00:00
Tim Peters cf5ad5d6f6 My change to string_item() left an extra reference to each 1-character
interned string created by "string"[i].  Since they're immortal anyway,
this was hard to notice, but it was still wrong <wink>.
2001-05-09 00:24:55 +00:00
Tim Peters 5b4d477568 Intern 1-character strings as soon as they're created. As-is, they aren't
interned when created, so the cached versions generally aren't ever
interned.  With the patch, the
		Py_INCREF(t);
		*p = t;
		Py_DECREF(s);
		return;
indirection block in PyString_InternInPlace() is never executed during a
full run of the test suite, but was executed very many times before.  So
I'm trading more work when creating one-character strings for doing less
work later.  Note that the "more work" here can happen at most 256 times
per program run, so it's trivial.  The same reasoning accounts for the
patch's simplification of string_item (the new version can call
PyString_FromStringAndSize() no more than 256 times per run, so there's
no point to inlining that stuff -- if we were serious about saving time
here, we'd pre-initialize the characters vector so that no runtime testing
at all was needed!).
2001-05-08 22:33:50 +00:00
Tim Peters 72f98e9b83 SF bug #422177: Results from .pyc differs from .py
Store floats and doubles to full precision in marshal.
Test that floats read from .pyc/.pyo closely match those read from .py.
Declare PyFloat_AsString() in floatobject header file.
Add new PyFloat_AsReprString() API function.
Document the functions declared in floatobject.h.
2001-05-08 15:19:57 +00:00
Tim Peters e63415ead8 SF patch #421922: Implement rich comparison for dicts.
d1 == d2 and d1 != d2 now work even if the keys and values in d1 and d2
don't support comparisons other than ==, and testing dicts for equality
is faster now (especially when inequality obtains).
2001-05-08 04:38:29 +00:00
Jeremy Hylton 4c889011db SF patch 419176 from MvL; fixed bug 418977
Two errors in dict_to_map() helper used by PyFrame_LocalsToFast().
2001-05-08 04:08:59 +00:00
Jeremy Hylton d37292bb8d Remove unused variable 2001-05-08 04:00:45 +00:00
Tim Peters 6d60b2e762 SF bug #422108 - Error in rich comparisons.
2.1.1 bugfix candidate too.
Fix a bad (albeit unlikely) return value in try_rich_to_3way_compare().
Also document do_cmp()'s return values.
2001-05-07 20:53:51 +00:00
Tim Peters cb8d368b82 Reimplement PySequence_Contains() and instance_contains(), so they work
safely together and don't duplicate logic (the common logic was factored
out into new private API function _PySequence_IterContains()).
Visible change:
    some_complex_number  in  some_instance
no longer blows up if some_instance has __getitem__ but neither
__contains__ nor __iter__.  test_iter changed to ensure that remains true.
2001-05-05 21:05:01 +00:00
Tim Peters 75f8e35ef4 Generalize PySequence_Count() (operator.countOf) to work with iterators. 2001-05-05 11:33:43 +00:00
Tim Peters de9725f135 Make 'x in y' and 'x not in y' (PySequence_Contains) play nice w/ iterators.
NEEDS DOC CHANGES
A few more AttributeErrors turned into TypeErrors, but in test_contains
this time.
The full story for instance objects is pretty much unexplainable, because
instance_contains() tries its own flavor of iteration-based containment
testing first, and PySequence_Contains doesn't get a chance at it unless
instance_contains() blows up.  A consequence is that
    some_complex_number in some_instance
dies with a TypeError unless some_instance.__class__ defines __iter__ but
does not define __getitem__.
2001-05-05 10:06:17 +00:00
Tim Peters 2cfe368283 Make unicode.join() work nice with iterators. This also required a change
to string.join(), so that when the latter figures out in midstream that
it really needs unicode.join() instead, unicode.join() can actually get
all the sequence elements (i.e., there's no guarantee that the sequence
passed to string.join() can be iterated over *again* by unicode.join(),
so string.join() must not pass on the original sequence object anymore).
2001-05-05 05:36:48 +00:00
Tim Peters 12d0a6c78a Fix a tiny and unlikely memory leak. Was there before too, and actually
several of these turned up and got fixed during the iteration crusade.
2001-05-05 04:10:25 +00:00
Tim Peters 6912d4ddf0 Generalize tuple() to work nicely with iterators.
NEEDS DOC CHANGES.
This one surprised me!  While I expected tuple() to be a no-brainer, turns
out it's actually dripping with consequences:
1. It will *allow* the popular PySequence_Fast() to work with any iterable
   object (code for that not yet checked in, but should be trivial).
2. It caused two std tests to fail.  This because some places used
   PyTuple_Sequence() (the C spelling of tuple()) as an indirect way to test
   whether something *is* a sequence.  But tuple() code only looked for the
   existence of sq->item to determine that, and e.g. an instance passed
   that test whether or not it supported the other operations tuple()
   needed (e.g., __len__).  So some things the tests *expected* to fail
   with an AttributeError now fail with a TypeError instead.  This looks
   like an improvement to me; e.g., test_coercion used to produce 559
   TypeErrors and 2 AttributeErrors, and now they're all TypeErrors.  The
   error details are more informative too, because the places calling this
   were *looking* for TypeErrors in order to replace the generic tuple()
   "not a sequence" msg with their own more specific text, and
   AttributeErrors snuck by that.
2001-05-05 03:56:37 +00:00
Tim Peters f4848dac41 Make PyIter_Next() a little smarter (wrt its knowledge of iterator
internals) so clients can be a lot dumber (wrt their knowledge).
2001-05-05 00:14:56 +00:00
Fred Drake 6aebded915 The weakref support in PyObject_InitVar() as well; this should have come out
at the same time as it did from PyObject_Init() .
2001-05-03 20:04:33 +00:00
Fred Drake ba40ec42c8 Remove unnecessary intialization for the case of weakly-referencable objects;
the code necessary to accomplish this is simpler and faster if confined to
the object implementations, so we only do this there.

This causes no behaviorial changes beyond a (very slight) speedup.
2001-05-03 19:44:50 +00:00
Fred Drake 4dcb85b817 Since Py_TPFLAGS_HAVE_WEAKREFS is set in Py_TPFLAGS_DEFAULT, it does not
need to be specified in the type structures independently.  The flag
exists only for binary compatibility.

This is a "source cleanliness" issue and introduces no behavioral changes.
2001-05-03 16:04:13 +00:00
Guido van Rossum b1f35bffe5 Mchael Hudson pointed out that the code for detecting changes in
dictionary size was comparing ma_size, the hash table size, which is
always a power of two, rather than ma_used, wich changes on each
insertion or deletion.  Fixed this.
2001-05-02 15:13:44 +00:00
Marc-André Lemburg 542fe56cb9 Fix for bug #417030: "print '%*s' fails for unicode string" 2001-05-02 14:21:53 +00:00
Tim Peters 6ad22c41c2 Plug a memory leak in list(), when appending to the result list. 2001-05-02 07:12:39 +00:00
Tim Peters f553f89d45 Generalize list(seq) to work with iterators. This also generalizes list()
to no longer insist that len(seq) be defined.
NEEDS DOC CHANGES.
This is meant to be a model for how other functions of this ilk (max,
filter, etc) can be generalized similarly.  Feel encouraged to grab your
favorite and convert it!
Note some cute consequences:
    list(file) == file.readlines() == list(file.xreadlines())
    list(dict) == dict.keys()
    list(dict.iteritems()) = dict.items()
    list(xrange(i, j, k)) == range(i, j, k)
2001-05-01 20:45:31 +00:00
Guido van Rossum 47668928e6 Discard a misleading comment about iter_iternext(). 2001-05-01 17:01:25 +00:00
Guido van Rossum 4f288ab7d6 Printing objects to a real file still wasn't done right: if the
object's type didn't define tp_print, there were still cases where the
full "print uses str() which falls back to repr()" semantics weren't
honored.  This resulted in

    >>> print None
    <None object at 0x80bd674>
    >>> print type(u'')
    <type object at 0x80c0a80>

Fixed this by always using the appropriate PyObject_Repr() or
PyObject_Str() call, rather than trying to emulate what they would do.

Also simplified PyObject_Str() to always fall back on PyObject_Repr()
when tp_str is not defined (rather than making an extra check for
instances with a __str__ method).  And got rid of the special case for
strings.
2001-05-01 16:53:37 +00:00
Guido van Rossum 189f1df301 Add a proper implementation for the tp_str slot (returning self, of
course), so I can get rid of the special case for strings in
PyObject_Str().
2001-05-01 16:51:53 +00:00
Guido van Rossum 09e563abb4 Add experimental iterkeys(), itervalues(), iteritems() to dict
objects.

Tests show that iteritems() is 5-10% faster than iterating over the
dict and extracting the value with dict[key].
2001-05-01 12:10:21 +00:00
Guido van Rossum 82c690f11a Well darnit! The innocuous fix I made to PyObject_Print() caused
printing of instances not to look for __str__().  Fix this.
2001-04-30 14:39:18 +00:00
Tim Peters b3d8d1f76c A different approach to the problem reported in
Patch #419651: Metrowerks on Mac adds 0x itself
C std says %#x and %#X conversion of 0 do not add the 0x/0X base marker.
Metrowerks apparently does.  Mark Favas reported the same bug under a
Compaq compiler on Tru64 Unix, but no other libc broken in this respect
is known (known to be OK under MSVC and gcc).
So just try the damn thing at runtime and see what the platform does.
Note that we've always had bugs here, but never knew it before because
a relevant test case didn't exist before 2.1.
2001-04-28 05:38:26 +00:00
Guido van Rossum 3a80c4a29c (Adding this to the trunk as well.)
Fix a very old flaw in PyObject_Print().  Amazing!  When an object
type defines tp_str but not tp_repr, 'print x' to a real file
object would not call the tp_str slot but rather print a default style
representation: <foo object at 0x....>.  This even though 'print x' to
a file-like-object would correctly call the tp_str slot.
2001-04-27 21:35:01 +00:00
Marc-André Lemburg 8155e0e541 This patch originated from an idea by Martin v. Loewis who submitted a
patch for sharing single character Unicode objects.

Martin's patch had to be reworked in a number of ways to take Unicode
resizing into consideration as well. Here's what the updated patch
implements:

* Single character Unicode strings in the Latin-1 range are shared
  (not only ASCII chars as in Martin's original patch).

* The ASCII and Latin-1 codecs make use of this optimization,
  providing a noticable speedup for single character strings. Most
  Unicode methods can use the optimization as well (by virtue
  of using PyUnicode_FromUnicode()).

* Some code cleanup was done (replacing memcpy with Py_UNICODE_COPY)

* The PyUnicode_Resize() can now also handle the case of resizing
  unicode_empty which previously resulted in an error.

* Modified the internal API _PyUnicode_Resize() and
  the public PyUnicode_Resize() API to handle references to
  shared objects correctly. The _PyUnicode_Resize() signature
  changed due to this.

* Callers of PyUnicode_FromUnicode() may now only modify the Unicode
  object contents of the returned object in case they called the API
  with NULL as content template.

Note that even though this patch passes the regression tests, there
may still be subtle bugs in the sharing code.
2001-04-23 14:44:21 +00:00
Guido van Rossum 213c7a6aa5 Mondo changes to the iterator stuff, without changing how Python code
sees it (test_iter.py is unchanged).

- Added a tp_iternext slot, which calls the iterator's next() method;
  this is much faster for built-in iterators over built-in types
  such as lists and dicts, speeding up pybench's ForLoop with about
  25% compared to Python 2.1.  (Now there's a good argument for
  iterators. ;-)

- Renamed the built-in sequence iterator SeqIter, affecting the C API
  functions for it.  (This frees up the PyIter prefix for generic
  iterator operations.)

- Added PyIter_Check(obj), which checks that obj's type has a
  tp_iternext slot and that the proper feature flag is set.

- Added PyIter_Next(obj) which calls the tp_iternext slot.  It has a
  somewhat complex return condition due to the need for speed: when it
  returns NULL, it may not have set an exception condition, meaning
  the iterator is exhausted; when the exception StopIteration is set
  (or a derived exception class), it means the same thing; any other
  exception means some other error occurred.
2001-04-23 14:08:49 +00:00
Guido van Rossum 65967259f2 Oops, forgot to merge this from the iter-branch to the trunk.
This adds "for line in file" iteration, as promised.
2001-04-21 13:20:18 +00:00
Tim Peters cf96de052f SF but #417587: compiler warnings compiling 2.1.
Repaired *some* of the SGI compiler warnings Sjoerd Mullender reported.
2001-04-21 02:46:11 +00:00
Guido van Rossum 05311481d4 Adding iterobject.[ch], which were accidentally not added. Sorry\! 2001-04-20 21:06:46 +00:00
Guido van Rossum 59d1d2b434 Iterators phase 1. This comprises:
new slot tp_iter in type object, plus new flag Py_TPFLAGS_HAVE_ITER
new C API PyObject_GetIter(), calls tp_iter
new builtin iter(), with two forms: iter(obj), and iter(function, sentinel)
new internal object types iterobject and calliterobject
new exception StopIteration
new opcodes for "for" loops, GET_ITER and FOR_ITER (also supported by dis.py)
new magic number for .pyc files
new special method for instances: __iter__() returns an iterator
iteration over dictionaries: "for x in dict" iterates over the keys
iteration over files: "for x in file" iterates over lines

TODO:

documentation
test suite
decide whether to use a different way to spell iter(function, sentinal)
decide whether "for key in dict" is a good idea
use iterators in map/filter/reduce, min/max, and elsewhere (in/not in?)
speed tuning (make next() a slot tp_next???)
2001-04-20 19:13:02 +00:00
Guido van Rossum 55ad67d74d Oops. Removed dictiter_new decl that wasn't supposed to go in yet. 2001-04-20 16:52:06 +00:00