Commit Graph

840 Commits

Author SHA1 Message Date
Tim Peters 19fe14e76a Derivative of patch #102549, "simpler, faster(!) implementation of string.join".
Also fixes two long-standing bugs (present in 2.0):
1. .join() didn't check that the result size fit in an int.
2. string.join(s) when len(s)==1 returned s[0] regardless of s[0]'s
   type; e.g., "".join([3]) returned 3 (overly optimistic optimization).
I resisted a keen temptation to make .join() apply str() automagically.
2001-01-19 03:03:47 +00:00
Guido van Rossum 65e8bd7fd5 Rich comparisons fallout: instance_hash() should check for both
__cmp__ and __eq__ absent before deciding to do a quickie
based on the object address.  (Tim Peters discovered this.)
2001-01-18 23:46:31 +00:00
Guido van Rossum 41c3244875 Rich comparisons fallout: PyObject_Hash() should check for both
tp_compare and tp_richcompare NULL before deciding to do a quickie
based on the object address.  (Tim Peters discovered this.)
2001-01-18 23:33:37 +00:00
Guido van Rossum a3af41d564 Changes to recursive-object comparisons, having to do with a test case
I found where rich comparison of unequal recursive objects gave
unintuituve results.  In a discussion with Tim, where we discovered
that our intuition on when a<=b should be true was failing, we decided
to outlaw ordering comparisons on recursive objects.  (Once we have
fixed our intuition and designed a matching algorithm that's practical
and reasonable to implement, we can allow such orderings again.)

- Refactored the recursive-object comparison framework; more is now
  done in the support routines so less needs to be done in the calling
  routines (even at the expense of slowing it down a bit -- this
  should normally never be invoked, it's mostly just there to avoid
  blowing up the interpreter).

- Changed the framework so that the comparison operator used is also
  stored.  (The dictionary now stores triples (v, w, op) instead of
  pairs (v, w).)

- Changed the nesting limit to a more reasonable small 20; this only
  slows down comparisons of very deeply nested objects (unlikely to
  occur in practice), while speeding up comparisons of recursive
  objects (previously, this would first waste time and space on 500
  nested comparisons before it would start detecting recursion).

- Changed rich comparisons for recursive objects to raise a ValueError
  exception when recursion is detected for ordering oprators (<, <=,
  >, >=).

Unrelated change:

- Moved PyObject_Unicode() to just under PyObject_Str(), where it
  belongs.  MAL's patch must've inserted in a random spot between two
  functions in the file -- between two helpers for rich comparison...
2001-01-18 22:07:06 +00:00
Tim Peters 60f42b50d8 Move distributed and duplicated config for stat() and fstat() into pyport.h. 2001-01-18 03:03:16 +00:00
Guido van Rossum be4cbb1668 Use rich comparisons to fulfill an old wish: complex numbers now raise
exceptions when compared using <, <=, > or >=.

NOTE: This is a tentative change: this means that cmp() involving
complex numbers will raise an exception when the numbers differ, and
that in turn means that e.g. dictionaries and certain other compounds
(e.g. UserLists) containing complex numbers can't be compared either.
So we'll have to decide whether this is acceptable.  The alpha test
cycle is a good time to keep an eye on this!
2001-01-18 01:12:39 +00:00
Guido van Rossum b932420cc7 Rich comparisons:
- Use PyObject_RichCompareBool() when comparing keys; this makes the
  error handling cleaner.

- There were two implementations for dictionary comparison, an old one
  (#ifdef'ed out) and a new one.  Got rid of the old one, which was
  abandoned years ago.

- In the characterize() function, part of dictionary comparison, use
  PyObject_RichCompareBool() to compare keys and values instead.  But
  continue to use PyObject_Compare() for comparing the final
  (deciding) elements.

- Align the comments in the type struct initializer.

Note: I don't implement rich comparison for dictionaries -- there
doesn't seem to be much to be gained.  (The existing comparison
already decides that shorter dicts are always smaller than longer
dicts.)
2001-01-18 00:39:02 +00:00
Guido van Rossum f77bc62e73 Same treatment as listobject.c:
- tuplecontains(): call RichCompare(Py_EQ).

- Get rid of tuplecompare(), in favor of new tuplerichcompare() (a
  clone of list_compare()).

- Aligned the comments for large struct initializers.
2001-01-18 00:00:53 +00:00
Guido van Rossum 24f67d568c Fix a leak in instance_coerce(). This was introduced by Neil's
earlier coercion changes, not by rich comparisons.  When a coercion
function returns 1 (meaning it cannot do it), it should not INCREF the
arguments.  When no __coerce__() method was found, instance_coerce()
originally returned 0, pretending it did it.  Neil changed the return
value to 1, more accurately reflecting that it didn't do anything, but
forgot to take out the two INCREF calls.
2001-01-17 23:43:43 +00:00
Guido van Rossum 65e1cea6e3 Convert to rich comparisons:
- sort's docompare() calls RichCompare(Py_LT).

- list_contains(), list_index(), listcount(), listremove() call
  RichCompare(Py_EQ).

- Get rid of list_compare(), in favor of new list_richcompare().  The
  latter does some nice shortcuts, like when == or != is requested, it
  first compares the lengths for trivial accept/reject.  Then it goes
  over the items until it finds an index where the items differe; then
  it does more shortcut magic to minimize the number of additional
  comparisons.

- Aligned the comments for large struct initializers.
2001-01-17 22:11:59 +00:00
Guido van Rossum 2ffbf6b112 Deal properly (?) with comparing recursive datastructures.
- Use the compare nesting level and in-progress dictionary properly in
  PyObject_RichCompare().

- Change the in-progress code to use static variables instead of
  globals (both the nesting level and the key for the thread dict were
  globals but have no reason to be globals; the key can even be a
  function-static variable in get_inprogress_dict()).

- Rewrote try_rich_to_3way_compare() to benefit from the similarity of
  the three cases, making it table-driven.

- In try_rich_to_3way_compare(), test for EQ before LT and GT.  This
  turns out essential when comparing recursive UserList instances;
  with the old code, these would recurse into rich comparison three
  times for each nesting level up to NESTING_LIMIT/2, making the total
  number of calls in the order of 3**(NESTING_LIMIT/2)!

NOTE: I'm not 100% comfortable with this.  It works for the standard
test suite (which compares a few trivial recursive data structures
only), but I'm not sure that the in-progress dictionary is used
properly by the rich comparison code.  Jeremy suggested that maybe the
operation should be included in the dict.  Currently I presume that
objects in the dict are equal unless proven otherwise, and I set the
outcome for the rich comparison accordingly: true for operators EQ,
LE, GE, and false for the other three.  But Jeremy seems to think that
there may be counter-examples where this doesn't do the right thing.
2001-01-17 21:27:02 +00:00
Marc-André Lemburg ad7c98e264 This patch adds a new builtin unistr() which behaves like str()
except that it always returns Unicode objects.

A new C API PyObject_Unicode() is also provided.

This closes patch #101664.

Written by Marc-Andre Lemburg. Copyright assigned to Guido van Rossum.
2001-01-17 17:09:53 +00:00
Guido van Rossum f916e7aa62 Rich comparisons fall-out:
- Get rid of float_cmp().

- Renamed Py_TPFLAGS_NEWSTYLENUMBER to Py_TPFLAGS_CHECKTYPES.
2001-01-17 15:33:42 +00:00
Guido van Rossum 6fd867b04d Rich comparisons fall-out:
- Get rid of long_cmp().

- Renamed Py_TPFLAGS_NEWSTYLENUMBER to Py_TPFLAGS_CHECKTYPES.
2001-01-17 15:33:18 +00:00
Guido van Rossum 3968e4c0f5 Rich comparisons fall-out:
- Get rid of int_cmp().

- Renamed Py_TPFLAGS_NEWSTYLENUMBER to Py_TPFLAGS_CHECKTYPES.
2001-01-17 15:32:23 +00:00
Guido van Rossum c31896960a Rich comparisons fall-out:
- Renamed Py_TPFLAGS_NEWSTYLENUMBER to Py_TPFLAGS_CHECKTYPES.

- Use PyObject_RichCompareBool() in PySequence_Contains().
2001-01-17 15:29:42 +00:00
Guido van Rossum 8998b4f691 Rich comparisons.
- Got rid of instance_cmp(); refactored instance_compare().

- Added instance_richcompare() which calls __lt__() etc.

Some unrelated stuff mixed in:

- Aligned comments in various large struct initializers.

- Better test to avoid recursion if __coerce__ returns self as the
  first argument (this is an unrelated fix by Neil Schemenauer!).

- Style nit: don't use Py_DECREF(Py_NotImplemented); use
  Py_DECREF(result) -- it just looks better. :-)
2001-01-17 15:28:20 +00:00
Guido van Rossum e797ec1cb8 Rich comparisons. Refactored internal routine do_cmp() and added APIs
PyObject_RichCompare() and PyObject_RichCompareBool().

XXX Note: the code that checks for deeply nested rich comparisons is
bogus -- it assumes the two objects are always identical, rather than
using the same logic as PyObject_Compare().  I'll fix that later.
2001-01-17 15:24:28 +00:00
Guido van Rossum e54e0be3b6 Rationalizing the fallback code for portable fseek -- this is all much
simpler if we use fgetpos and fsetpos, rather than trying to mess with
platform-specific TELL64 alternatives.

Of course, this hasn't been tested on a 64-bit platform, so I may have
to withdraw this -- but I'm hopeful, and Trent Mick supports this
patch!
2001-01-16 20:53:31 +00:00
Marc-André Lemburg 3a645e4dd4 Added checks to prevent PyUnicode_Count() from dumping core
in case the parameters are out of bounds and fixes error handling
for .count(), .startswith() and .endswith() for the case of
mixed string/Unicode objects.

This patch adds Python style index semantics to PyUnicode_Count()
indices (including the special handling of negative indices).

The patch is an extended version of patch #103249 submitted
by Michael Hudson (mwh) on SF. It also includes new test cases.
2001-01-16 11:54:12 +00:00
Barry Warsaw d6a9e84c81 Committing PEP 232, function attribute feature, approved by Guido.
Closes SF patch #103123.

funcobject.h:

    PyFunctionObject: add the func_dict slot.

funcobject.c:

    PyFunction_New(): Initialize the func_dict slot to NULL.

    func_getattr(): Rename to func_getattro() and change the
    signature.  It's more efficient to use attro methods and dig the C
    string out than it is to re-convert a C string to a PyString.

    Also, add support for getting the __dict__ (a.k.a. func_dict)
    attribute, and for getting an arbitrary function attribute.

    func_setattr(): Rename to func_setattro() and change the signature
    for the same reason.  Also add support for setting __dict__
    (a.k.a. func_dict) and any arbitrary function attribute.

    func_dealloc(): Be sure to DECREF the func_dict slot.

    func_traverse(): Be sure to traverse func_dict too.

    PyFunction_Type: make the necessary func_?etattro() changes.

classobject.c:

    instancemethod_memberlist: Add __dict__

    instancemethod_setattro(): New method to set arbitrary attributes
    on methods (really the underlying im_func).  Raise TypeError when
    the instance is bound or when you're trying to set one of the
    reserved im_* attributes.

    instancemethod_getattr(): Renamed to instancemethod_getattro()
    since that's what it really is.  Also, added support fo getting
    arbitrary attributes through the im_func.

    PyMethod_Type: Do the ?etattr{,o} dance.
2001-01-15 20:40:19 +00:00
Guido van Rossum 65e0b99b61 SF patch #103158 by Greg Ball: Don't do unsafe arithmetic in xrange
object.

This fixes potential overflows in xrange()'s internal calculations on
64-bit platforms.  The fix is complicated because the sq_length slot
function can only return an int; we want to support
xrange(sys.maxint), which is a 64-bit quantity on most 64-bit
platforms (except Win64).  The solution is hacky but the best
possible: when the range is that long, we can use it in a for loop but
we can't ask for its length (nor can we actually iterate beyond
2**31-1, because the sq_item slot function has the same restrictions
on its arguments.  Fixing those restrictions is a project for another
day...
2001-01-15 18:58:56 +00:00
Tim Peters 142297ac92 Speed getline_via_fgets(), by supplying two "fast paths", although one is
faster than the other.  Should be faster for Mark Favas's 254-character
mail log lines, and *is* 3-4% quicker for my test case with much shorter
lines (but they're typical of *my* text files, and I'm tired of optimizing
for everyone else at my expense <wink> -- in fact, the only one who loses
here is Guido ...).
2001-01-15 10:36:56 +00:00
Tim Peters f29b64d243 Use the "MS" getline hack (fgets()) by default on non-get_unlocked
platforms.  See NEWS for details.
2001-01-15 06:33:19 +00:00
Guido van Rossum e07d5cf966 Jeff Epler's patch adding an xreadlines() method. (It just imports
the xreadlines module and lets it do its thing.)
2001-01-09 21:50:24 +00:00
Guido van Rossum dcf5715db1 Tsk, tsk, tsk. Treat FreeBSD the same as the other BSDs when defining
a fallback for TELL64.  Fixes SF Bug #128119.
2001-01-09 02:00:11 +00:00
Neil Schemenauer 010b0cc218 Fix a silly bug in float_pow. Sorry Tim. 2001-01-08 06:29:50 +00:00
Tim Peters 1c73323d6f A few reformats; no logic changes. 2001-01-08 04:02:07 +00:00
Guido van Rossum 8628206b95 Let's hope that three time's a charm...
Tim discovered another "bug" in my get_line() code: while the comments
said that n<0 was invalid, it was in fact still called with n<0 (when
PyFile_GetLine() was called with n<0).  In that case fortunately
executed the same code as for n==0.

Changed the comment to admit this fact, and changed Tim's MS speed
hack code to use 'n <= 0' as the criteria for the speed hack.
2001-01-08 01:26:47 +00:00
Tim Peters 15b838521f Fiddled ms_getline_hack after talking w/ Guido: made clearer that the
code duplication is to let us get away without a realloc whenever possible;
boosted the init buf size (the cutoff at which we *can* get away without
a realloc) from 100 to 200 so that more files can enjoy this boost; and
allowed other threads to run in all cases.  The last two cost something,
but not significantly:  in my fat test case, less than a 1% slowdown total.
Since my test case has a great many short lines, that's probably the worst
slowdown, too.  While the logic barely changed, there were lots of edits.
This also gets rid of the reference to fp->_cnt, so the last platform
assumption being made here is that fgets doesn't overwrite bytes
capriciously (== beyond the terminating null byte it must write).
2001-01-08 00:53:12 +00:00
Tim Peters 86821b2563 MS Win32 .readline() speedup, as discussed on Python-Dev. This is a tricky
variant that never needs to "search from the right".
Also fixed unlikely memory leak in get_line, if string size overflows INTMAX.
Also new std test test_bufio to make sure .readline() works.
2001-01-07 21:19:34 +00:00
Guido van Rossum 4ddf0a01f7 Tim noticed that I had botched get_line_raw(). Looking again, I
realized that this behavior is already present in PyFile_GetLine(),
which is the only place that needs it.  A little refactoring of that
function made get_line_raw() redundant.
2001-01-07 20:51:39 +00:00
Marc-André Lemburg ec233e5803 This patch adds a new feature to the builtin charmap codec:
The mapping dictionaries can now contain 1-n mappings, meaning
that character ordinals may be mapped to strings or Unicode object,
e.g. 0x0078 ('x') -> u"abc", causing the ordinal to be replaced by
the complete string or Unicode object instead of just one character.

Another feature introduced by the patch is that of mapping oridnals to
the emtpy string. This allows removing characters.

The patch is different from patch #103100 in that it does not cause a
performance hit for the normal use case of 1-1 mappings.

Written by Marc-Andre Lemburg, copyright assigned to Guido van Rossum.
2001-01-06 14:59:58 +00:00
Guido van Rossum 1187aa4d33 Restructured get_line() for clarity and speed.
- The raw_input() functionality is moved to a separate function.

- Drop GNU getline() in favor of getc_unlocked(), which exists on more
  platforms (and is even a tad faster on my system).
2001-01-05 14:43:05 +00:00
Neil Schemenauer 5ed85ec0c0 Changes for PEP 208. PyObject_Compare has been rewritten. Instances no
longer get special treatment.  The Py_NotImplemented type is here as well.
2001-01-04 01:48:10 +00:00
Neil Schemenauer ba872e2534 Make long a new style number type. Sequence repeat is now done here
now as well.
2001-01-04 01:46:03 +00:00
Neil Schemenauer 139e72ad1a Make int a new style number type. Sequence repeat is now done here
now as well.
2001-01-04 01:45:33 +00:00
Neil Schemenauer 32117e5c29 Make float a new style number type. 2001-01-04 01:44:34 +00:00
Neil Schemenauer 29bfc07183 Make instances a new style number type. See PEP 208 for details. Instance
types no longer get special treatment from abstract.c so more number number
methods have to be implemented.
2001-01-04 01:43:46 +00:00
Neil Schemenauer 5a1f015bee Massive changes as per PEP 208. Read it for details. 2001-01-04 01:39:06 +00:00
Jeremy Hylton 1fb6088e86 dict_update has two boundary conditions: a.update(a) and a.update({})
Added test for second one.
2001-01-03 22:34:59 +00:00
Jeremy Hylton db60bb5aad fix leak 2001-01-03 22:32:16 +00:00
Marc-André Lemburg a866df806d This patch changes the default behaviour of the builtin charmap
codec to not apply Latin-1 mappings for keys which are not found
in the mapping dictionaries, but instead treat them as undefined
mappings.

The patch was originally written by Martin v. Loewis with some
additional (cosmetic) changes and an updated test script
by Marc-Andre Lemburg.

The standard codecs were recreated from the most current files
available at the Unicode.org site using the Tools/scripts/gencodec.py
tool.

This patch closes the bugs #116285 and #119960.
2001-01-03 21:29:14 +00:00
Neil Schemenauer 10e31cf82e Add garbage collection for module objects. Closes patch #102939 and
fixes bug #126345.
2001-01-02 15:58:27 +00:00
Fred Drake e7e190ef97 Make the indentation consistently use tabs instead of using spaces just
in one place.
2000-12-20 00:55:07 +00:00
Andrew M. Kuchling f947ffe951 Patch #102940: use only printable Unicode chars in reporting
incorrect % characters; characters outside the printable range are
 replaced with '?'
2000-12-19 22:49:06 +00:00
Andrew M. Kuchling 932af110d3 Patch #102868 from cgw: fix memory leak when an EOF is encountered
using GNU libc's getline()
2000-12-19 20:59:04 +00:00
Guido van Rossum cda4f9a8dc Fix off-by-one error in split_substring(). Fixes SF bug #122162. 2000-12-19 02:23:19 +00:00
Andrew M. Kuchling 6ca8917758 [ Patch #102852 ] Make % error a bit more informative by indicates the
index at which an unknown %-escape was found
2000-12-15 13:07:46 +00:00
Guido van Rossum adf5410dc4 Test for NULL returned from PyObject_NEW(). 2000-12-14 15:09:46 +00:00