object.c, PyObject_Str: Don't try to optimize anything except exact
string objects here; in particular, let str subclasses go thru tp_str,
same as non-str objects. This allows overrides of tp_str to take
effect.
stringobject.c:
+ string_print (str's tp_print): If the argument isn't an exact string
object, get one from PyObject_Str.
+ string_str (str's tp_str): Make a genuine-string copy of the object if
it's of a proper str subclass type. str() applied to a str subclass
that doesn't override __str__ ends up here.
test_descr.py: New str_of_str_subclass() test.
many types were subclassable but had a xxx_dealloc function that
called PyObject_DEL(self) directly instead of deferring to
self->ob_type->tp_free(self). It is permissible to set tp_free in the
type object directly to _PyObject_Del, for non-GC types, or to
_PyObject_GC_Del, for GC types. Still, PyObject_DEL was a tad faster,
so I'm fearing that our pystone rating is going down again. I'm not
sure if doing something like
void xxx_dealloc(PyObject *self)
{
if (PyXxxCheckExact(self))
PyObject_DEL(self);
else
self->ob_type->tp_free(self);
}
is any faster than always calling the else branch, so I haven't
attempted that -- however those types whose own dealloc is fancier
(int, float, unicode) do use this pattern.
Unknown whether this fixes it.
- stringobject.c, PyString_FromFormatV: don't assume that va_list is of
a type that can be copied via an initializer.
- errors.c, PyErr_Format: add a va_end() to balance the va_start().
with the same value instead. This ensures that a string (or string
subclass) object's ob_sinterned pointer is always a str (or NULL), and
that the dict of interned strings only has strs as keys.
+ These were leaving the hash fields at 0, which all string and unicode
routines believe is a legitimate hash code. As a result, hash() applied
to str and unicode subclass instances always returned 0, which in turn
confused dict operations, etc.
+ Changed local names "new"; no point to antagonizing C++ compilers.
subclasses, all "the usual" ones (slicing etc), plus replace, translate,
ljust, rjust, center and strip. I don't know how to be sure they've all
been caught.
Question: Should we complain if someone tries to intern an instance of
a string subclass? I hate to slow any code on those paths.
PyString_FromFormatV(): In the final resize at the end, we can use
PyString_AS_STRING() since we know the object is a string and can
avoid the typechecking.
PyString_FromFormat(): GS sez: "For safety/propriety, you should call
va_end() on the vargs variable."
at least in the first two characters. %p is ill-defined, and people will
forever commit bad tests otherwise ("bad" in the sense that they fall
over (at least on Windows) for lack of a leading '0x'; 5 of the 7 tests
in test_repr.py failed on Windows for that reason this time around).
PyErr_Format() these new C API methods can be used instead of
sprintf()'s into hardcoded char* buffers. This allows us to fix
many situation where long package, module, or class names get
truncated in reprs.
PyString_FromFormat() is the varargs variety.
PyString_FromFormatV() is the va_list variety
Original PyErr_Format() code was modified to allow %p and %ld
expansions.
Many reprs were converted to this, checkins coming soo. Not
changed: complex_repr(), float_repr(), float_print(), float_str(),
int_repr(). There may be other candidates not yet converted.
Closes patch #454743.
- Do not compile unicodeobject, unicodectype, and unicodedata if Unicode is disabled
- check for Py_USING_UNICODE in all places that use Unicode functions
- disables unicode literals, and the builtin functions
- add the types.StringTypes list
- remove Unicode literals from most tests.
And remove all the extern decls in the middle of .c files.
Apparently, it was excluded from the header file because it is
intended for internal use by the interpreter. It's still intended for
internal use and documented as such in the header file.
Gave Python linear-time repr() implementations for dicts, lists, strings.
This means, e.g., that repr(range(50000)) is no longer 50x slower than
pprint.pprint() in 2.2 <wink>.
I don't consider this a bugfix candidate, as it's a performance boost.
Added _PyString_Join() to the internal string API. If we want that in the
public API, fine, but then it requires runtime error checks instead of
asserts.
and introduces a new method .decode().
The major change is that strg.encode() will no longer try to convert
Unicode returns from the codec into a string, but instead pass along
the Unicode object as-is. The same is now true for all other codec
return types. The underlying C APIs were changed accordingly.
Note that even though this does have the potential of breaking
existing code, the chances are low since conversion from Unicode
previously took place using the default encoding which is normally
set to ASCII rendering this auto-conversion mechanism useless for
most Unicode encodings.
The good news is that you can now use .encode() and .decode() with
much greater ease and that the door was opened for better accessibility
of the builtin codecs.
As demonstration of the new feature, the patch includes a few new
codecs which allow string to string encoding and decoding (rot13,
hex, zip, uu, base64).
Written by Marc-Andre Lemburg. Copyright assigned to the PSF.
interned when created, so the cached versions generally aren't ever
interned. With the patch, the
Py_INCREF(t);
*p = t;
Py_DECREF(s);
return;
indirection block in PyString_InternInPlace() is never executed during a
full run of the test suite, but was executed very many times before. So
I'm trading more work when creating one-character strings for doing less
work later. Note that the "more work" here can happen at most 256 times
per program run, so it's trivial. The same reasoning accounts for the
patch's simplification of string_item (the new version can call
PyString_FromStringAndSize() no more than 256 times per run, so there's
no point to inlining that stuff -- if we were serious about saving time
here, we'd pre-initialize the characters vector so that no runtime testing
at all was needed!).
to string.join(), so that when the latter figures out in midstream that
it really needs unicode.join() instead, unicode.join() can actually get
all the sequence elements (i.e., there's no guarantee that the sequence
passed to string.join() can be iterated over *again* by unicode.join(),
so string.join() must not pass on the original sequence object anymore).
Patch #419651: Metrowerks on Mac adds 0x itself
C std says %#x and %#X conversion of 0 do not add the 0x/0X base marker.
Metrowerks apparently does. Mark Favas reported the same bug under a
Compaq compiler on Tru64 Unix, but no other libc broken in this respect
is known (known to be OK under MSVC and gcc).
So just try the damn thing at runtime and see what the platform does.
Note that we've always had bugs here, but never knew it before because
a relevant test case didn't exist before 2.1.
new slot tp_iter in type object, plus new flag Py_TPFLAGS_HAVE_ITER
new C API PyObject_GetIter(), calls tp_iter
new builtin iter(), with two forms: iter(obj), and iter(function, sentinel)
new internal object types iterobject and calliterobject
new exception StopIteration
new opcodes for "for" loops, GET_ITER and FOR_ITER (also supported by dis.py)
new magic number for .pyc files
new special method for instances: __iter__() returns an iterator
iteration over dictionaries: "for x in dict" iterates over the keys
iteration over files: "for x in file" iterates over lines
TODO:
documentation
test suite
decide whether to use a different way to spell iter(function, sentinal)
decide whether "for key in dict" is a good idea
use iterators in map/filter/reduce, min/max, and elsewhere (in/not in?)
speed tuning (make next() a slot tp_next???)
"%#x" % 0
blew up, at heart because C sprintf supplies a base marker if and only if
the value is not 0. I then fixed that, by tolerating C's inconsistency
when it does %#x, and taking away that *Python* produced 0x0 when
formatting 0L (the "long" flavor of 0) under %#x itself. But after talking
with Guido, we agreed it would be better to supply 0x for the short int
case too, despite that it's inconsistent with C, because C is inconsistent
with itself and with Python's hex(0) (plus, while "%#x" % 0 didn't work
before, "%#x" % 0L *did*, and returned "0x0"). Similarly for %#X conversion.
http://sourceforge.net/tracker/index.php?func=detail&aid=415514&group_id=5470&atid=105470
For short ints, Python defers to the platform C library to figure out what
%#x should do. The code asserted that the platform C returned a string
beginning with "0x". However, that's not true when-- and only when --the
*value* being formatted is 0. Changed the code to live with C's inconsistency
here. In the meantime, the problem does not arise if you format a long 0 (0L)
instead. However, that's because the code *we* wrote to do %#x conversions on
longs produces a leading "0x" regardless of value. That's probably wrong too:
we should drop leading "0x", for consistency with C, when (& only when) formatting
0L. So I changed the long formatting code to do that too.
release the interned string dictionary. This is useful for memory
use debugging because it eliminates a huge source of noise from the
reports. Only defined when INTERN_STRINGS is defined.
Also fixes two long-standing bugs (present in 2.0):
1. .join() didn't check that the result size fit in an int.
2. string.join(s) when len(s)==1 returned s[0] regardless of s[0]'s
type; e.g., "".join([3]) returned 3 (overly optimistic optimization).
I resisted a keen temptation to make .join() apply str() automagically.
in case the parameters are out of bounds and fixes error handling
for .count(), .startswith() and .endswith() for the case of
mixed string/Unicode objects.
This patch adds Python style index semantics to PyUnicode_Count()
indices (including the special handling of negative indices).
The patch is an extended version of patch #103249 submitted
by Michael Hudson (mwh) on SF. It also includes new test cases.
Add definitions of INT_MAX and LONG_MAX to pyport.h.
Remove includes of limits.h and conditional definitions of INT_MAX
and LONG_MAX elsewhere.
This closes SourceForge patch #101659 and bug #115323.