Commit Graph

340 Commits

Author SHA1 Message Date
Michael W. Hudson b2308bb9be Fix bug:
[ 1327110 ] wrong TypeError traceback in generator expressions

by removing the code that can stomp on the users' TypeError raised by the
iterable argument to ''.join() -- PySequence_Fast (now?) gives a perfectly
reasonable message itself.  Also, a couple of tests.
2005-10-21 11:45:01 +00:00
Neal Norwitz 95c1e5065c SF bug #1331563 ] string_subscript doesn't check for failed PyMem_Malloc. Will backport 2005-10-20 04:15:52 +00:00
Georg Brandl d45014b236 Fix PyString_Format so that the "%s" format works again when Unicode is not
enabled.
2005-10-01 17:06:00 +00:00
Neil Schemenauer ab61923637 Fix bug in last checkin (2.231). To match previous behavior, unicode
subclasses should be substituted as-is and not have tp_str called on
them.
2005-08-31 23:02:05 +00:00
Neil Schemenauer cf52c07843 Change the %s format specifier for str objects so that it returns a
unicode instance if the argument is not an instance of basestring and
calling __str__ on the argument returns a unicode instance.
2005-08-12 17:34:58 +00:00
Raymond Hettinger 3296e696db SF bug #1224347: int/long unification and hex()
Hex longs now print with lowercase letters like their int counterparts.
2005-06-29 23:29:56 +00:00
Raymond Hettinger 57e7447c44 * Beef-up tests for str.count().
* Speed-up str.count() by using memchr() to fly between first char matches.
2005-02-20 09:54:53 +00:00
Raymond Hettinger 7cbf1bcb3e * Beef-up testing of str.__contains__() and str.find().
* Speed-up "x in y" where x has more than one character.

The existing code made excessive calls to the expensive memcmp() function.
The new code uses memchr() to rapidly find a start point for memcmp().
In addition to knowing that the first character is a match, the new code
also checks that the last character is a match.  This significantly reduces
the incidence of false starts (saving memcmp() calls and making quadratic
behavior less likely).

Improves the timings on:
    python -m timeit -r7 -s"x='a'*1000" "'ab' in x"
    python -m timeit -r7 -s"x='a'*1000" "'bc' in x"

Once this code has proven itself, then string_find_internal() should refer
to it rather than running its own version.  Also, something similar may
apply to unicode objects.
2005-02-20 04:07:08 +00:00
Michael W. Hudson faa7648ffe More bug #1077106 stuff, sorry -- modem induced impatiece!
This should go on whatever bugfix branches the other fetches up on.
2005-01-31 17:09:25 +00:00
Raymond Hettinger 561fbf138d SF bug #1054139: serious string hashing error in 2.4b1
_PyString_Resize() readied strings for mutation but did not invalidate
the cached hash value.
2004-10-26 01:52:37 +00:00
Raymond Hettinger 674f241e9c SF Patch #1007087: Return new string for single subclass joins (Bug #1001011)
(Patch contributed by Nick Coghlan.)

Now joining string subtypes will always return a string.
Formerly, if there were only one item, it was returned unchanged.
2004-08-23 23:23:54 +00:00
Armin Rigo 618fbf5469 This was quite a dark bug in my recent in-place string concatenation
hack: it would resize *interned* strings in-place!  This occurred because
their reference counts do not have their expected value -- stringobject.c
hacks them.  Mea culpa.
2004-08-07 20:58:32 +00:00
Armin Rigo 79f7ad228b Fixed some compiler warnings. 2004-08-07 19:27:39 +00:00
Jeremy Hylton 4c989ddc9c Subclasses of string can no longer be interned. The semantics of
interning were not clear here -- a subclass could be mutable, for
example -- and had bugs.  Explicitly interning a subclass of string
via intern() will raise a TypeError.  Internal operations that attempt
to intern a string subclass will have no effect.

Added a few tests to test_builtin that includes the old buggy code and
verifies that calls like PyObject_SetAttr() don't fail.  Perhaps these
tests should have gone in test_string.
2004-08-07 19:20:05 +00:00
Marc-André Lemburg 1dffb120b7 .encode()/.decode() patch part 2. 2004-07-08 19:13:55 +00:00
Marc-André Lemburg d2d4598ec2 Allow string and unicode return types from .encode()/.decode()
methods on string and unicode objects. Added unicode.decode()
which was missing for no apparent reason.
2004-07-08 17:57:32 +00:00
Tim Peters e7c053233f sizeof(char) is 1, by definition, so get rid of that expression in
places it's just noise.
2004-06-27 17:24:49 +00:00
Martin v. Löwis 737ea82a5a Patch #774665: Make Python LC_NUMERIC agnostic. 2004-06-08 18:52:54 +00:00
Hye-Shik Chang 75c00efcc7 [SF #866875] Add a specialized routine for one character
separaters on str.split() and str.rsplit().
2004-01-05 00:29:51 +00:00
Skip Montanaro ac4ea13a3a There are places in Python which assume bytes have 8-bits. Formalize that a
bit by checking the value of UCHAR_MAX in Include/Python.h.  There was a
check in Objects/stringobject.c.  Remove that.  (Note that we don't define
UCHAR_MAX if it's not defined as the old test did.)
2003-12-22 16:31:41 +00:00
Hye-Shik Chang 3ae811b57d Add rsplit method for str and unicode builtin types.
SF feature request #801847.
Original patch is written by Sean Reifschneider.
2003-12-15 18:49:53 +00:00
Guido van Rossum 6c9e130524 - Removed FutureWarnings related to hex/oct literals and conversions
and left shifts.  (Thanks to Kalle Svensson for SF patch 849227.)
  This addresses most of the remaining semantic changes promised by
  PEP 237, except for repr() of a long, which still shows the trailing
  'L'.  The PEP appears to promise warnings for operations that
  changed semantics compared to Python 2.3, but this is not
  implemented; we've suffered through enough warnings related to
  hex/oct literals and I think it's best to be silent now.
2003-11-29 23:52:13 +00:00
Raymond Hettinger 4f8f976576 Add optional fillchar argument to ljust(), rjust(), and center() string methods. 2003-11-26 08:21:35 +00:00
Fred Drake d22bb6584d Avoid confusing name for the 3rd argument to str.replace().
This closes SF bug #827260.
2003-10-22 02:56:40 +00:00
Martin v. Löwis 6828e18a6a Patch #825679: Clarify semantics of .isfoo on empty strings.
Backported to 2.3.
2003-10-18 09:55:08 +00:00
Raymond Hettinger 9bfe533c69 SF bug #795506: Wrong handling of string format code for float values.
Adding missing support for '%F'.

Will backport to 2.3.1.
2003-08-27 04:55:52 +00:00
Walter Dörwald 9ff3f03c3e Fix whitespace. 2003-06-18 14:17:01 +00:00
Neal Norwitz ffe33b7f24 Attempt to make all the various string *strip methods the same.
* Doc - add doc for when functions were added
 * UserString
 * string object methods
 * string module functions
'chars' is used for the last parameter everywhere.

These changes will be backported, since part of the changes
have already been made, but they were inconsistent.
2003-04-10 22:35:32 +00:00
Guido van Rossum a7132189d2 Reformat a few docstrings that caused line wraps in help() output. 2003-04-09 19:32:45 +00:00
Walter Dörwald 43440a621e Fix PyString_Format() so that '%c' % u'a' returns u'a'
instead of raising a TypeError. (From SF patch #710127)

Add tests to verify this is fixed.

Add various tests for '%c' % int.
2003-03-31 18:07:50 +00:00
Guido van Rossum 5d9113d8be Implement appropriate __getnewargs__ for all immutable subclassable builtin
types.  The special handling for these can now be removed from save_newobj().
Add some testing for this.

Also add support for setting the 'fast' flag on the Python Pickler class,
which suppresses use of the memo.
2003-01-29 17:58:45 +00:00
Raymond Hettinger 5d5e7c0e34 SF patch #664192 bug #661913: inconsistent error messages between string
and unicode

Patch by Christopher Blunck.
2003-01-15 05:32:57 +00:00
Raymond Hettinger 0a2f849b79 GvR's idea to use memset() for the most common special case of repeating
a single character.  Shaves another 10% off the running time by avoiding
the lg2(N) loops and cache effects for the other cases.
2003-01-06 22:42:41 +00:00
Raymond Hettinger 698258a199 Optimize string_repeat.
Christian Tismer pointed out the high cost of the loop overhead and
function call overhead for 'c' * n where n is large.  Accordingly,
the new code only makes lg2(n) loops.

Interestingly, 'c' * 1000 * 1000 ran a bit faster with old code.  At some
point, the loop and function call overhead became cheaper than invalidating
the cache with lengthy memcpys.  But for more typical sizes of n, the new
code runs much faster and for larger values of n it runs only a bit slower.
2003-01-06 10:33:56 +00:00
Marc-André Lemburg 79f57833f3 Patch for bug #659709: bogus computation of float length
Python 2.2.x backport candidate. (This bug has been around since
Python 1.6.)
2002-12-29 19:44:06 +00:00
Raymond Hettinger ea3fdf44a2 SF patch #659536: Use PyArg_UnpackTuple where possible.
Obtain cleaner coding and a system wide
performance boost by using the fast, pre-parsed
PyArg_Unpack function instead of PyArg_ParseTuple
function which is driven by a format string.
2002-12-29 16:33:45 +00:00
Martin v. Löwis 00b6127097 Patch #650653: Raise always value error if the table is not 256 bytes long. 2002-12-12 20:03:19 +00:00
Martin v. Löwis 79acb9edfa Patch #614055: Support OpenVMS. 2002-12-06 12:48:53 +00:00
Neil Schemenauer a6cd4e65d7 Add nb_remainder (i.e. __mod__) slot to str type. Fixes SF bug #615506. 2002-11-18 16:09:38 +00:00
Neal Norwitz 80a1bf4b5d Fix SF # 635969, No error "not all arguments converted"
When mwh added extended slicing, strings and unicode became mappings.
Thus, dict was set which prevented an error when doing:
	newstr = 'format without a percent' % string_value

This fix raises an exception again when there are no formats
and % with a string value.
2002-11-12 23:01:12 +00:00
Martin v. Löwis a5f0907d79 Back out #479898. 2002-10-11 05:37:59 +00:00
Guido van Rossum 049cd6b563 Fix a nasty endcase reported by Armin Rigo in SF bug 618623:
'%2147483647d' % -123 segfaults.  This was because an integer overflow
in a comparison caused the string resize to be skipped.  After fixing
the overflow, this could call _PyString_Resize() with a negative size,
so I (1) test for that and raise MemoryError instead; (2) also added a
test for negative newsize to _PyString_Resize(), raising SystemError
as for all bad arguments.

An identical bug existed in unicodeobject.c, of course.

Will backport to 2.2.2.
2002-10-11 00:43:48 +00:00
Guido van Rossum 8052f8921e Undo this part of the previous checkin:
Also fixed an error message -- %s argument has non-string str()
  doesn't make sense for %r, so the error message now differentiates
  between %s and %r.

because PyObject_Repr() and PyObject_Str() ensure that this can never
happen.  Added a helpful comment instead.
2002-10-09 19:14:30 +00:00
Guido van Rossum b00c07f038 The string formatting code has a test to switch to Unicode when %s
sees a Unicode argument.  Unfortunately this test was also executed
for %r, because %s and %r share almost all of their code.  This meant
that, if u is a unicode object while repr(u) is an 8-bit string
containing ASCII characters, '%r' % u is a *unicode* string containing
only ASCII characters!

Fixed by executing the test only for %s.

Also fixed an error message -- %s argument has non-string str()
doesn't make sense for %r, so the error message now differentiates
between %s and %r.
2002-10-09 19:07:53 +00:00
Martin v. Löwis bab9559d12 Include wctype.h. 2002-10-07 18:26:16 +00:00
Martin v. Löwis fed2405cb5 Patch #479898: Use multibyte C library for printing strings if available. 2002-10-07 13:55:50 +00:00
Guido van Rossum efc1188239 Fix warnings on 64-bit platforms about casts from pointers to ints.
Two of these were real bugs.
2002-09-12 14:43:41 +00:00
Martin v. Löwis 2412853f8e Fix escaping of non-ASCII characters. 2002-09-09 06:17:05 +00:00
Walter Dörwald 8709a420c4 Check whether a string resize is necessary at the end
of PyString_DecodeEscape(). This prevents a call to
_PyString_Resize() for the empty string, which would
result in a PyErr_BadInternalCall(), because the
empty string has more than one reference.

This closes SF bug http://www.python.org/sf/603937
2002-09-03 13:53:40 +00:00
Walter Dörwald 3aeb632c31 PEP 293 implemention (from SF patch http://www.python.org/sf/432401) 2002-09-02 13:14:32 +00:00
Guido van Rossum bf935fde15 string_contains(): speed up by avoiding function calls where
possible.  This always called PyUnicode_Check() and PyString_Check(),
at least one of which would call PyType_IsSubtype().  Also, this would
call PyString_Size() on known string objects.
2002-08-24 06:57:49 +00:00
Guido van Rossum 8b1a6d694f Code by Inyeol Lee, submitted to SF bug 595350, to implement
the string/unicode method .replace() with a zero-lengt first argument.
Inyeol contributed tests for this too.
2002-08-23 18:21:28 +00:00
Guido van Rossum 76afbd9aa4 Fix some endcase bugs in unicode rfind()/rindex() and endswith().
These were reported and fixed by Inyeol Lee in SF bug 595350.  The
endswith() bug was already fixed in 2.3, but this adds some more test
cases.
2002-08-20 17:29:29 +00:00
Guido van Rossum 45ec02aed1 SF patch 576101, by Oren Tirosh: alternative implementation of
interning.  I modified Oren's patch significantly, but the basic idea
and most of the implementation is unchanged.  Interned strings created
with PyString_InternInPlace() are now mortal, and you must keep a
reference to the resulting string around; use the new function
PyString_InternImmortal() to create immortal interned strings.
2002-08-19 21:43:18 +00:00
Guido van Rossum e3a8e7ed1d Call me anal, but there was a particular phrase that was speading to
comments everywhere that bugged me: /* Foo is inlined */ instead of
/* Inline Foo */.  Somehow the "is inlined" phrase always confused me
for half a second (thinking, "No it isn't" until I added the missing
"here").  The new phrase is hopefully unambiguous.
2002-08-19 19:26:42 +00:00
Neal Norwitz b898d9fc9a Get this to compile again if Py_USING_UNICODE is not defined.
com_error() is static in Python/compile.c.
2002-08-16 23:20:39 +00:00
Guido van Rossum 54df53a352 More changes of DeprecationWarning to FutureWarning. 2002-08-14 18:38:27 +00:00
Martin v. Löwis eb3f00aeeb Check for trailing backslash. Fixes #593656. 2002-08-14 08:22:50 +00:00
Martin v. Löwis 8a8da798a5 Patch #505705: Remove eval in pickle and cPickle. 2002-08-14 07:46:28 +00:00
Guido van Rossum 078151da90 Implement stage B0 of PEP 237: add warnings for operations that
currently return inconsistent results for ints and longs; in
particular: hex/oct/%u/%o/%x/%X of negative short ints, and x<<n that
either loses bits or changes sign.  (No warnings for repr() of a long,
though that will also change to lose the trailing 'L' eventually.)

This introduces some warnings in the test suite; I'll take care of
those later.
2002-08-11 04:24:12 +00:00
Barry Warsaw 817918cc3c Committing patch #591250 which provides "str1 in str2" when str1 is a
string of longer than 1 character.
2002-08-06 16:58:21 +00:00
Raymond Hettinger bc552ce1b8 SF 582071 clarified the .split() method's docstring to note that sep=None
will trigger splitting on any whitespace.
2002-08-05 06:28:21 +00:00
Neal Norwitz 88fe4ff5a9 Fix the problem of not raising a TypeError exception when doing:
'%g' % '1'
    '%d' % '1'

Add a test for these conditions
Fix the test so that if not exception is raise, this is a failure
2002-07-28 16:44:23 +00:00
Neal Norwitz 7beeed5dfd SF patch #577031, remove PyArg_Parse() since it's deprecated 2002-07-28 15:19:47 +00:00
Martin v. Löwis 75d2d94e0f Patch #554716: Use __va_copy where available. 2002-07-28 10:23:27 +00:00
Jeremy Hylton 938ace69a0 staticforward bites the dust.
The staticforward define was needed to support certain broken C
compilers (notably SCO ODT 3.0, perhaps early AIX as well) botched the
static keyword when it was used with a forward declaration of a static
initialized structure.  Standard C allows the forward declaration with
static, and we've decided to stop catering to broken C compilers.  (In
fact, we expect that the compilers are all fixed eight years later.)

I'm leaving staticforward and statichere defined in object.h as
static.  This is only for backwards compatibility with C extensions
that might still use it.

XXX I haven't updated the documentation.
2002-07-17 16:30:39 +00:00
Tim Peters 3459251d5a object.h special-build macro minefield: renamed all the new lexical
helper macros to something saner, and used them appropriately in other
files too, to reduce #ifdef blocks.

classobject.c, instance_dealloc():  One of my worst Python Memories is
trying to fix this routine a few years ago when COUNT_ALLOCS was defined
but Py_TRACE_REFS wasn't.  The special-build code here is way too
complicated.  Now it's much simpler.  Difference:  in a Py_TRACE_REFS
build, the instance is no longer in the doubly-linked list of live
objects while its __del__ method is executing, and that may be visible
via sys.getobjects() called from a __del__ method.  Tough -- the object
is presumed dead while its __del__ is executing anyway, and not calling
_Py_NewReference() at the start allows enormous code simplification.

typeobject.c, call_finalizer():  The special-build instance_dealloc()
pain apparently spread to here too via cut-'n-paste, and this is much
simpler now too.  In addition, I didn't understand why this routine
was calling _PyObject_GC_TRACK() after a resurrection, since there's no
plausible way _PyObject_GC_UNTRACK() could have been called on the
object by this point.  I suspect it was left over from pasting the
instance_delloc() code.  Instead asserted that the object is still
tracked.  Caution:  I suspect we don't have a test that actually
exercises the subtype_dealloc() __del__-resurrected-me code.
2002-07-11 06:23:50 +00:00
Neal Norwitz 1f68fc7fa5 SF bug # 493951 string.{starts,ends}with vs slices
Handle negative indices similar to slices.
2002-06-14 00:50:42 +00:00
Martin v. Löwis 14f8b4cfcb Patch #568124: Add doc string macros. 2002-06-13 20:33:02 +00:00
Michael W. Hudson 5efaf7eac8 This is my nearly two year old patch
[ 400998 ] experimental support for extended slicing on lists

somewhat spruced up and better tested than it was when I wrote it.

Includes docs & tests.  The whatsnew section needs expanding, and arrays
should support extended slices -- later.
2002-06-11 10:55:12 +00:00
Neal Norwitz 32a7e7f6b6 Change name from string to basestring 2002-05-31 19:58:02 +00:00
Guido van Rossum cacfc07d08 - A new type object, 'string', is added. This is a common base type
for 'str' and 'unicode', and can be used instead of
  types.StringTypes, e.g. to test whether something is "a string":
  isinstance(x, string) is True for Unicode and 8-bit strings.  This
  is an abstract base class and cannot be instantiated directly.
2002-05-24 19:01:59 +00:00
Raymond Hettinger 0ebac97058 Patch 549187. Improve string formatting error message. 2002-05-21 15:14:57 +00:00
Walter Dörwald 775c11f07a Add #ifdef PY_USING_UNICODE sections, so that
stringobject.c compiles again with --disable-unicode.

Fixes SF bug http://www.python.org/sf/554912
2002-05-13 09:00:41 +00:00
Tim Peters 5de9842b34 Repair widespread misuse of _PyString_Resize. Since it's clear people
don't understand how this function works, also beefed up the docs.  The
most common usage error is of this form (often spread out across gotos):

	if (_PyString_Resize(&s, n) < 0) {
		Py_DECREF(s);
		s = NULL;
		goto outtahere;
	}

The error is that if _PyString_Resize runs out of memory, it automatically
decrefs the input string object s (which also deallocates it, since its
refcount must be 1 upon entry), and sets s to NULL.  So if the "if"
branch ever triggers, it's an error to call Py_DECREF(s):  s is already
NULL!  A correct way to write the above is the simpler (and intended)

	if (_PyString_Resize(&s, n) < 0)
		goto outtahere;

Bugfix candidate.
2002-04-27 18:44:32 +00:00
Walter Dörwald de02bcb265 Apply patch diff.txt from SF feature request
http://www.python.org/sf/444708

This adds the optional argument for str.strip
to unicode.strip too and makes it possible
to call str.strip with a unicode argument
and unicode.strip with a str argument.
2002-04-22 17:42:37 +00:00
Walter Dörwald 0fe940c862 Return the orginal string only if it's a real str or unicode
instance, otherwise make a copy.
2002-04-15 18:42:15 +00:00
Guido van Rossum 3aa3fc46c8 Remove 'const' from local variable declaration in string_zfill() -- it
isn't constant, so why bother.

Folded long lines.

Whitespace normalization.
2002-04-15 13:48:52 +00:00
Walter Dörwald 068325ef92 Apply the second version of SF patch http://www.python.org/sf/536241
Add a method zfill to str, unicode and UserString and change
Lib/string.py accordingly.

This activates the zfill version in unicodeobject.c that was
commented out and implements the same in stringobject.c. It also
adds the test for unicode support in Lib/string.py back in and
uses repr() instead() of str() (as it was before Lib/string.py 1.62)
2002-04-15 13:36:47 +00:00
Guido van Rossum 018b0eb0f5 Partially implement SF feature request 444708.
Add optional arg to string methods strip(), lstrip(), rstrip().
The optional arg specifies characters to delete.

Also for UserString.

Still to do:

- Misc/NEWS
- LaTeX docs (I did the docstrings though)
- Unicode methods, and Unicode support in the string methods.
2002-04-13 00:56:08 +00:00
Neil Schemenauer 510492e985 Remove PyMalloc_New, _PyMalloc_MALLOC, and PyMalloc_Del. 2002-04-12 03:05:19 +00:00
Guido van Rossum 77f6a65eb0 Add the 'bool' type and its values 'False' and 'True', as described in
PEP 285.  Everything described in the PEP is here, and there is even
some documentation.  I had to fix 12 unit tests; all but one of these
were printing Boolean outcomes that changed from 0/1 to False/True.
(The exception is test_unicode.py, which did a type(x) == type(y)
style comparison.  I could've fixed that with a single line using
issubtype(x, type(y)), but instead chose to be explicit about those
places where a bool is expected.

Still to do: perhaps more documentation; change standard library
modules to return False/True from predicates.
2002-04-03 22:41:51 +00:00
Tim Peters 8deda70b16 Eliminate DONT_SHARE_SHORT_STRINGS. 2002-03-30 10:06:07 +00:00
Tim Peters 1f7df3595a Remove the CACHE_HASH and INTERN_STRINGS preprocessor symbols. 2002-03-29 03:29:08 +00:00
Neil Schemenauer dcc819a5c9 Use pymalloc if it's enabled. 2002-03-22 15:33:15 +00:00
Andrew MacIntyre 5e9c80d906 %#x/%#X format conversion cleanup (see patch #450267):
Objects/
    stringobject.c
    unicodeobject.c
2002-02-28 11:38:24 +00:00
Andrew MacIntyre c487439aa7 OS/2 EMX port changes (Objects part of patch #450267):
Objects/
    fileobject.c
    stringobject.c
    unicodeobject.c

This commit doesn't include the cleanup patches for stringobject.c and
unicodeobject.c which are shown separately in the patch manager.  Those
patches will be regenerated and applied in a subsequent commit, so as
to preserve a fallback position (this commit to those files).
2002-02-26 11:36:35 +00:00
Martin v. Löwis 1f803f782c Updated patch #487906: Revise inline docs. 2002-01-16 10:53:24 +00:00
Guido van Rossum 169192e818 SF patch #491049 (David Jacobs): Small PyString_FromString optimization
PyString_FromString():
  Since the length of the string is already being stored in size,
  changed the strcpy() to a memcpy() for a small speed improvement.
2001-12-10 15:45:54 +00:00
Tim Peters 62de65b25e PyString_FromString: this requires its argument be non-NULL, but doesn't
check it.  Added an assert() to that effect.
2001-12-06 20:29:32 +00:00
Jeremy Hylton 7802a53e38 Little stuff.
Add a missing DECREF in an obscure corner.  If the str() or repr() of
an object passed to a string interpolation -- e.g. "%s" % obj --
returns a non-string, the returned object was leaked.

Repair an indentation glitch.

Replace a bunch of PyString_AsString() calls (and their ilk) with
macros.
2001-12-06 15:18:48 +00:00
Martin v. Löwis 8f1ea71eab Add more inline documentation, as contributed in #487906. 2001-12-03 08:24:52 +00:00
Tim Peters 9161c8b0a1 PyString_FromFormatV, string_repr: document why these use sprintf
instead of PyOS_snprintf; add some relevant comments and asserts.
2001-12-03 01:55:38 +00:00
Martin v. Löwis d132750206 Patch 487906: update inline docs. 2001-12-02 18:09:41 +00:00
Tim Peters 885d457709 sprintf -> PyOS_snprintf in some "obviously safe" cases.
Also changed <>-style #includes to ""-style in some places where the
former didn't make sense.
2001-11-28 20:27:42 +00:00
Guido van Rossum 5c66a26dee Make the error message for unsupported operand types cleaner, in
response to a message by Laura Creighton on c.l.py.  E.g.

    >>> 0+''
    TypeError: unsupported operand types for +: 'int' and 'str'

(previously this did not mention the operand types)

    >>> ''+0
    TypeError: cannot concatenate 'str' and 'int' objects
2001-10-22 04:12:44 +00:00
Tim Peters c993315b18 SF bug [#468061] __str__ ignored in str subclass.
object.c, PyObject_Str:  Don't try to optimize anything except exact
string objects here; in particular, let str subclasses go thru tp_str,
same as non-str objects.  This allows overrides of tp_str to take
effect.

stringobject.c:
+ string_print (str's tp_print):  If the argument isn't an exact string
  object, get one from PyObject_Str.

+ string_str (str's tp_str):  Make a genuine-string copy of the object if
  it's of a proper str subclass type.  str() applied to a str subclass
  that doesn't override __str__ ends up here.

test_descr.py:  New str_of_str_subclass() test.
2001-10-16 20:18:24 +00:00
Fred Drake 2bae4face2 Remove extra "]" in splitlines() docstring.
Reported by Neal Norwitz.
2001-10-13 15:57:55 +00:00
Guido van Rossum 9475a2310d Enable GC for new-style instances. This touches lots of files, since
many types were subclassable but had a xxx_dealloc function that
called PyObject_DEL(self) directly instead of deferring to
self->ob_type->tp_free(self).  It is permissible to set tp_free in the
type object directly to _PyObject_Del, for non-GC types, or to
_PyObject_GC_Del, for GC types.  Still, PyObject_DEL was a tad faster,
so I'm fearing that our pystone rating is going down again.  I'm not
sure if doing something like

void xxx_dealloc(PyObject *self)
{
	if (PyXxxCheckExact(self))
		PyObject_DEL(self);
	else
		self->ob_type->tp_free(self);
}

is any faster than always calling the else branch, so I haven't
attempted that -- however those types whose own dealloc is fancier
(int, float, unicode) do use this pattern.
2001-10-05 20:51:39 +00:00
Tim Peters c15c4f1f39 SF bug [#467265] Compile errors on SuSe Linux on IBM/s390.
Unknown whether this fixes it.
- stringobject.c, PyString_FromFormatV:  don't assume that va_list is of
  a type that can be copied via an initializer.
- errors.c, PyErr_Format:  add a va_end() to balance the va_start().
2001-10-02 21:32:07 +00:00
Guido van Rossum 2ed6bf87c9 Merge branch changes (coercion, rich comparisons) into trunk. 2001-09-27 20:30:07 +00:00
Guido van Rossum bb77e6801e Change string comparison so that it applies even when one (or both)
arguments are subclasses of str, as long as they don't override rich
comparison.
2001-09-24 16:51:54 +00:00
Tim Peters 111f60964e If interning an instance of a string subclass, intern a real string object
with the same value instead.  This ensures that a string (or string
subclass) object's ob_sinterned pointer is always a str (or NULL), and
that the dict of interned strings only has strs as keys.
2001-09-12 07:54:51 +00:00
Tim Peters af90b3e610 str_subtype_new, unicode_subtype_new:
+ These were leaving the hash fields at 0, which all string and unicode
  routines believe is a legitimate hash code.  As a result, hash() applied
  to str and unicode subclass instances always returned 0, which in turn
  confused dict operations, etc.
+ Changed local names "new"; no point to antagonizing C++ compilers.
2001-09-12 05:18:58 +00:00
Tim Peters 8fa5dd0601 More bug 460020: lots of string optimizations inhibited for string
subclasses, all "the usual" ones (slicing etc), plus replace, translate,
ljust, rjust, center and strip.  I don't know how to be sure they've all
been caught.

Question:  Should we complain if someone tries to intern an instance of
a string subclass?  I hate to slow any code on those paths.
2001-09-12 02:18:30 +00:00
Tim Peters 5a49ade70e More on SF bug [#460020] bug or feature: unicode() and subclasses.
Repaired str(i) to return a genuine string when i is an instance of a str
subclass.  New PyString_CheckExact() macro.
2001-09-11 01:41:59 +00:00
Guido van Rossum 29d55a38ce Fix a memory leak in str_subtype_new(). (All the other
xxx_subtype_new() functions are OK, but I goofed up in this one. :-( )
2001-08-31 16:11:15 +00:00
Guido van Rossum ae960afb5e Make str and tuple types subclassable. 2001-08-30 03:11:59 +00:00
Barry Warsaw 7c47beb860 Two improvements suggested by Greg Stein:
PyString_FromFormatV(): In the final resize at the end, we can use
    PyString_AS_STRING() since we know the object is a string and can
    avoid the typechecking.

PyString_FromFormat(): GS sez: "For safety/propriety, you should call
    va_end() on the vargs variable."
2001-08-27 03:11:09 +00:00
Tim Peters 6af5bbb565 PyString_FromFormatV: Massage platform %p output to match what gcc does,
at least in the first two characters.  %p is ill-defined, and people will
forever commit bad tests otherwise ("bad" in the sense that they fall
over (at least on Windows) for lack of a leading '0x'; 5 of the 7 tests
in test_repr.py failed on Windows for that reason this time around).
2001-08-25 03:02:28 +00:00
Barry Warsaw dadace004b PyString_FromFormat() and PyString_FromFormatV(): Largely ripped from
PyErr_Format() these new C API methods can be used instead of
    sprintf()'s into hardcoded char* buffers.  This allows us to fix
    many situation where long package, module, or class names get
    truncated in reprs.

    PyString_FromFormat() is the varargs variety.
    PyString_FromFormatV() is the va_list variety

    Original PyErr_Format() code was modified to allow %p and %ld
    expansions.

    Many reprs were converted to this, checkins coming soo.  Not
    changed: complex_repr(), float_repr(), float_print(), float_str(),
    int_repr().  There may be other candidates not yet converted.

    Closes patch #454743.
2001-08-24 18:32:06 +00:00
Martin v. Löwis 339d0f720e Patch #445762: Support --disable-unicode
- Do not compile unicodeobject, unicodectype, and unicodedata if Unicode is disabled
- check for Py_USING_UNICODE in all places that use Unicode functions
- disables unicode literals, and the builtin functions
- add the types.StringTypes list
- remove Unicode literals from most tests.
2001-08-17 18:39:25 +00:00
Martin v. Löwis e3eb1f2b23 Patch #427190: Implement and use METH_NOARGS and METH_O. 2001-08-16 13:15:00 +00:00
Tim Peters 6d6c1a35e0 Merge of descr-branch back into trunk. 2001-08-02 04:15:00 +00:00
Jeremy Hylton 3ce45389bd Add _PyUnicode_AsDefaultEncodedString to unicodeobject.h.
And remove all the extern decls in the middle of .c files.
Apparently, it was excluded from the header file because it is
intended for internal use by the interpreter.  It's still intended for
internal use and documented as such in the header file.
2001-07-30 22:34:24 +00:00
Tim Peters 52e155e31b Reformat decl of new _PyString_Join. Add NEWS blurb about repr() speedup. 2001-06-16 05:42:57 +00:00
Tim Peters a7259597f1 SF bug 433228: repr(list) woes when len(list) big.
Gave Python linear-time repr() implementations for dicts, lists, strings.
This means, e.g., that repr(range(50000)) is no longer 50x slower than
pprint.pprint() in 2.2 <wink>.

I don't consider this a bugfix candidate, as it's a performance boost.

Added _PyString_Join() to the internal string API.  If we want that in the
public API, fine, but then it requires runtime error checks instead of
asserts.
2001-06-16 05:11:17 +00:00
Marc-André Lemburg 8c2133da7b Fix for bug #432384: Recursion in PyString_AsEncodedString? 2001-06-12 13:14:10 +00:00
Martin v. Löwis cd35306a25 Patch #424335: Implement string_richcompare, remove string_compare.
Use new _PyString_Eq in lookdict_string.
2001-05-24 16:56:35 +00:00
Marc-André Lemburg 2d9204199f This patch changes the way the string .encode() method works slightly
and introduces a new method .decode().

The major change is that strg.encode() will no longer try to convert
Unicode returns from the codec into a string, but instead pass along
the Unicode object as-is. The same is now true for all other codec
return types. The underlying C APIs were changed accordingly.

Note that even though this does have the potential of breaking
existing code, the chances are low since conversion from Unicode
previously took place using the default encoding which is normally
set to ASCII rendering this auto-conversion mechanism useless for
most Unicode encodings.

The good news is that you can now use .encode() and .decode() with
much greater ease and that the door was opened for better accessibility
of the builtin codecs.

As demonstration of the new feature, the patch includes a few new
codecs which allow string to string encoding and decoding (rot13,
hex, zip, uu, base64).

Written by Marc-Andre Lemburg. Copyright assigned to the PSF.
2001-05-15 12:00:02 +00:00
Tim Peters 9c012af3c3 Heh. I need a break. After this: stropmodule & stringobject were more
out of synch than I realized, and I managed to break replace's "count"
argument when it was 0.  All is well again.  Maybe.
Bugfix candidate.
2001-05-10 00:32:57 +00:00
Tim Peters 4cd44ef4bf Fudge. stropmodule and stringobject both had copies of the buggy
mymemXXX stuff, and they were already out of synch.  Fix the remaining
bugs in both and get them back in synch.
Bugfix release candidate.
2001-05-10 00:05:33 +00:00
Tim Peters 1a97d5f098 SF patch #416247 2.1c1 stringobject: unused vrbl cleanup.
Thanks to Mark Favas.
2001-05-09 20:06:00 +00:00
Tim Peters 4862ab7bf4 Sheesh -- repair the dodge around "cast isn't an lvalue" complaints to
restore correct semantics.
2001-05-09 08:43:21 +00:00
Tim Peters 9e897f41db Mark Favas reported that gcc caught me using casts as lvalues. Dodge it. 2001-05-09 07:37:07 +00:00
Tim Peters b4bbcd76ea Ack! Restore the COUNT_ALLOCS one_strings code. 2001-05-09 00:31:40 +00:00
Tim Peters cf5ad5d6f6 My change to string_item() left an extra reference to each 1-character
interned string created by "string"[i].  Since they're immortal anyway,
this was hard to notice, but it was still wrong <wink>.
2001-05-09 00:24:55 +00:00
Tim Peters 5b4d477568 Intern 1-character strings as soon as they're created. As-is, they aren't
interned when created, so the cached versions generally aren't ever
interned.  With the patch, the
		Py_INCREF(t);
		*p = t;
		Py_DECREF(s);
		return;
indirection block in PyString_InternInPlace() is never executed during a
full run of the test suite, but was executed very many times before.  So
I'm trading more work when creating one-character strings for doing less
work later.  Note that the "more work" here can happen at most 256 times
per program run, so it's trivial.  The same reasoning accounts for the
patch's simplification of string_item (the new version can call
PyString_FromStringAndSize() no more than 256 times per run, so there's
no point to inlining that stuff -- if we were serious about saving time
here, we'd pre-initialize the characters vector so that no runtime testing
at all was needed!).
2001-05-08 22:33:50 +00:00
Tim Peters 2cfe368283 Make unicode.join() work nice with iterators. This also required a change
to string.join(), so that when the latter figures out in midstream that
it really needs unicode.join() instead, unicode.join() can actually get
all the sequence elements (i.e., there's no guarantee that the sequence
passed to string.join() can be iterated over *again* by unicode.join(),
so string.join() must not pass on the original sequence object anymore).
2001-05-05 05:36:48 +00:00
Marc-André Lemburg 542fe56cb9 Fix for bug #417030: "print '%*s' fails for unicode string" 2001-05-02 14:21:53 +00:00
Guido van Rossum 189f1df301 Add a proper implementation for the tp_str slot (returning self, of
course), so I can get rid of the special case for strings in
PyObject_Str().
2001-05-01 16:51:53 +00:00
Tim Peters b3d8d1f76c A different approach to the problem reported in
Patch #419651: Metrowerks on Mac adds 0x itself
C std says %#x and %#X conversion of 0 do not add the 0x/0X base marker.
Metrowerks apparently does.  Mark Favas reported the same bug under a
Compaq compiler on Tru64 Unix, but no other libc broken in this respect
is known (known to be OK under MSVC and gcc).
So just try the damn thing at runtime and see what the platform does.
Note that we've always had bugs here, but never knew it before because
a relevant test case didn't exist before 2.1.
2001-04-28 05:38:26 +00:00
Guido van Rossum 59d1d2b434 Iterators phase 1. This comprises:
new slot tp_iter in type object, plus new flag Py_TPFLAGS_HAVE_ITER
new C API PyObject_GetIter(), calls tp_iter
new builtin iter(), with two forms: iter(obj), and iter(function, sentinel)
new internal object types iterobject and calliterobject
new exception StopIteration
new opcodes for "for" loops, GET_ITER and FOR_ITER (also supported by dis.py)
new magic number for .pyc files
new special method for instances: __iter__() returns an iterator
iteration over dictionaries: "for x in dict" iterates over the keys
iteration over files: "for x in file" iterates over lines

TODO:

documentation
test suite
decide whether to use a different way to spell iter(function, sentinal)
decide whether "for key in dict" is a good idea
use iterators in map/filter/reduce, min/max, and elsewhere (in/not in?)
speed tuning (make next() a slot tp_next???)
2001-04-20 19:13:02 +00:00
Tim Peters fff5325078 Bug 415514 reported that e.g.
"%#x" % 0
blew up, at heart because C sprintf supplies a base marker if and only if
the value is not 0.  I then fixed that, by tolerating C's inconsistency
when it does %#x, and taking away that *Python* produced 0x0 when
formatting 0L (the "long" flavor of 0) under %#x itself.  But after talking
with Guido, we agreed it would be better to supply 0x for the short int
case too, despite that it's inconsistent with C, because C is inconsistent
with itself and with Python's hex(0) (plus, while "%#x" % 0 didn't work
before, "%#x" % 0L *did*, and returned "0x0").  Similarly for %#X conversion.
2001-04-12 18:38:48 +00:00
Tim Peters 711088d9b8 Fix for SF bug #415514: "%#x" % 0 caused assertion failure/abort.
http://sourceforge.net/tracker/index.php?func=detail&aid=415514&group_id=5470&atid=105470
For short ints, Python defers to the platform C library to figure out what
%#x should do.  The code asserted that the platform C returned a string
beginning with "0x".  However, that's not true when-- and only when --the
*value* being formatted is 0.  Changed the code to live with C's inconsistency
here.  In the meantime, the problem does not arise if you format a long 0 (0L)
instead.  However, that's because the code *we* wrote to do %#x conversions on
longs produces a leading "0x" regardless of value.  That's probably wrong too:
we should drop leading "0x", for consistency with C, when (& only when) formatting
0L.  So I changed the long formatting code to do that too.
2001-04-12 00:35:51 +00:00
Barry Warsaw a903ad9855 _Py_ReleaseInternedStrings(): Private API function to decref and
release the interned string dictionary.  This is useful for memory
use debugging because it eliminates a huge source of noise from the
reports.  Only defined when INTERN_STRINGS is defined.
2001-02-23 16:40:48 +00:00
Ka-Ping Yee fa004ad36c Show '\011', '\012', and '\015' as '\t', '\n', '\r' in strings.
Switch from octal escapes to hex escapes for other nonprintable characters.
2001-01-24 17:19:08 +00:00
Tim Peters 19fe14e76a Derivative of patch #102549, "simpler, faster(!) implementation of string.join".
Also fixes two long-standing bugs (present in 2.0):
1. .join() didn't check that the result size fit in an int.
2. string.join(s) when len(s)==1 returned s[0] regardless of s[0]'s
   type; e.g., "".join([3]) returned 3 (overly optimistic optimization).
I resisted a keen temptation to make .join() apply str() automagically.
2001-01-19 03:03:47 +00:00
Marc-André Lemburg 3a645e4dd4 Added checks to prevent PyUnicode_Count() from dumping core
in case the parameters are out of bounds and fixes error handling
for .count(), .startswith() and .endswith() for the case of
mixed string/Unicode objects.

This patch adds Python style index semantics to PyUnicode_Count()
indices (including the special handling of negative indices).

The patch is an extended version of patch #103249 submitted
by Michael Hudson (mwh) on SF. It also includes new test cases.
2001-01-16 11:54:12 +00:00
Andrew M. Kuchling 6ca8917758 [ Patch #102852 ] Make % error a bit more informative by indicates the
index at which an unknown %-escape was found
2000-12-15 13:07:46 +00:00
Fred Drake 49312a52ec Jeffrey D. Collins <tokeneater@users.sourceforge.net>:
Fix type of the self parameter to some string object methods.

This closes patch #102670.
2000-12-06 14:27:49 +00:00
Tim Peters a3a3a030af Fox for SF bug #123859: %[duxXo] long formats inconsistent. 2000-11-30 05:22:44 +00:00
Guido van Rossum 2ccda8a7c4 SF patch #102548, fix for bug #121013, by mwh@users.sourceforge.net.
Fixes a typo that caused "".join(u"this is a test") to dump core.
2000-11-27 18:46:26 +00:00
Fred Drake 661ea26b3d Ka-Ping Yee <ping@lfw.org>:
Changes to error messages to increase consistency & clarity.

This (mostly) closes SourceForge patch #101839.
2000-10-24 19:57:45 +00:00
Marc-André Lemburg 53f3d4ac74 [ Bug #116174 ] using %% in cstrings sometimes fails with unicode paramsFix for the bug reported in Bug #116174: "%% %s" % u"abc" failed due
to the way string formatting delegated work to the Unicode formatting
function.
2000-10-07 08:54:09 +00:00
Fred Drake d5fadf75e4 Rationalize use of limits.h, moving the inclusion to Python.h.
Add definitions of INT_MAX and LONG_MAX to pyport.h.
Remove includes of limits.h and conditional definitions of INT_MAX
and LONG_MAX elsewhere.

This closes SourceForge patch #101659 and bug #115323.
2000-09-26 05:46:01 +00:00
Tim Peters 38fd5b6413 Derived from Martin's SF patch 110609: support unbounded ints in %d,i,u,x,X,o formats.
Note a curious extension to the std C rules:  x, X and o formatting can never produce
a sign character in C, so the '+' and ' ' flags are meaningless for them.  But
unbounded ints *can* produce a sign character under these conversions (no fixed-
width bitstring is wide enough to hold all negative values in 2's-comp form).  So
these flags become meaningful in Python when formatting a Python long which is too
big to fit in a C long.  This required shuffling around existing code, which hacked
x and X conversions to death when both the '#' and '0' flags were specified:  the
hacks weren't strong enough to deal with the simultaneous possibility of the ' ' or
'+' flags too, since signs were always meaningless before for x and X conversions.
Isomorphic shuffling was required in unicodeobject.c.
Also added dozens of non-trivial new unbounded-int test cases to test_format.py.
2000-09-21 05:43:11 +00:00
Marc-André Lemburg d1ba443206 This patch adds a new Python C API called PyString_AsStringAndSize()
which implements the automatic conversion from Unicode to a string
object using the default encoding.

The new API is then put to use to have eval() and exec accept
Unicode objects as code parameter. This closes bugs #110924
and #113890.

As side-effect, the traditional C APIs PyString_Size() and
PyString_AsString() will also accept Unicode objects as
parameters.
2000-09-19 21:04:18 +00:00
Tim Peters 8f422461b4 Fix for bug 113934. string*n and unicode*n did no overflow checking at
all, either to see whether the # of chars fit in an int, or that the
amount of memory needed fit in a size_t.  Checking these is expensive, but
the alternative is silently wrong answers (as in the bug report) or
core dumps (which were easy to provoke using Unicode strings).
2000-09-09 06:13:41 +00:00
Guido van Rossum 8586991099 REMOVED all CWI, CNRI and BeOpen copyright markings.
This should match the situation in the 1.6b1 tree.
2000-09-01 23:29:29 +00:00