for 'str' and 'unicode', and can be used instead of
types.StringTypes, e.g. to test whether something is "a string":
isinstance(x, string) is True for Unicode and 8-bit strings. This
is an abstract base class and cannot be instantiated directly.
PyErr_Format() these new C API methods can be used instead of
sprintf()'s into hardcoded char* buffers. This allows us to fix
many situation where long package, module, or class names get
truncated in reprs.
PyString_FromFormat() is the varargs variety.
PyString_FromFormatV() is the va_list variety
Original PyErr_Format() code was modified to allow %p and %ld
expansions.
Many reprs were converted to this, checkins coming soo. Not
changed: complex_repr(), float_repr(), float_print(), float_str(),
int_repr(). There may be other candidates not yet converted.
Closes patch #454743.
Gave Python linear-time repr() implementations for dicts, lists, strings.
This means, e.g., that repr(range(50000)) is no longer 50x slower than
pprint.pprint() in 2.2 <wink>.
I don't consider this a bugfix candidate, as it's a performance boost.
Added _PyString_Join() to the internal string API. If we want that in the
public API, fine, but then it requires runtime error checks instead of
asserts.
and introduces a new method .decode().
The major change is that strg.encode() will no longer try to convert
Unicode returns from the codec into a string, but instead pass along
the Unicode object as-is. The same is now true for all other codec
return types. The underlying C APIs were changed accordingly.
Note that even though this does have the potential of breaking
existing code, the chances are low since conversion from Unicode
previously took place using the default encoding which is normally
set to ASCII rendering this auto-conversion mechanism useless for
most Unicode encodings.
The good news is that you can now use .encode() and .decode() with
much greater ease and that the door was opened for better accessibility
of the builtin codecs.
As demonstration of the new feature, the patch includes a few new
codecs which allow string to string encoding and decoding (rot13,
hex, zip, uu, base64).
Written by Marc-Andre Lemburg. Copyright assigned to the PSF.
release the interned string dictionary. This is useful for memory
use debugging because it eliminates a huge source of noise from the
reports. Only defined when INTERN_STRINGS is defined.
Note a curious extension to the std C rules: x, X and o formatting can never produce
a sign character in C, so the '+' and ' ' flags are meaningless for them. But
unbounded ints *can* produce a sign character under these conversions (no fixed-
width bitstring is wide enough to hold all negative values in 2's-comp form). So
these flags become meaningful in Python when formatting a Python long which is too
big to fit in a C long. This required shuffling around existing code, which hacked
x and X conversions to death when both the '#' and '0' flags were specified: the
hacks weren't strong enough to deal with the simultaneous possibility of the ' ' or
'+' flags too, since signs were always meaningless before for x and X conversions.
Isomorphic shuffling was required in unicodeobject.c.
Also added dozens of non-trivial new unbounded-int test cases to test_format.py.
which implements the automatic conversion from Unicode to a string
object using the default encoding.
The new API is then put to use to have eval() and exec accept
Unicode objects as code parameter. This closes bugs #110924
and #113890.
As side-effect, the traditional C APIs PyString_Size() and
PyString_AsString() will also accept Unicode objects as
parameters.
match the ones in the Unicode implementation, but were extended
to be able to reuse the existing Unicode codecs for string
purposes too.
Conversion from string to Unicode and back are done using the
default encoding.
hash value. Interning strings (which requires hash caching) tries to
ensure that only one string object with a given value exists, so
equality tests are one pointer comparison. Together, these can speed
the interpreter up by as much as 20%. Each costs the size of a long
or pointer per string object. In addition, interned strings live
until the end of times. If you are concerned about memory footprint,
simply comment the #define out here (and rebuild everything!).
use the new names exclusively, and the linker will see the new names.
Files that import "Python.h" also only see the new names. Files that
import "allobjects.h" will continue to be able to use the old names,
due to the inclusion (in allobjects.h) of "rename2.h".
object.h: made sizes and refcnts signed ints.
stringobject.h: make getstrsize() signed int.
methodobject.h: add METH_VARARGS and METH_FREENAME flag bit definitions.
* Makefile: change location of FORMS library.
* posixmodule.c: turn #if 0 into #ifdef MSDOS (stuff in unistd.h or not)
* Almost all .h files: added CPP magic to avoid duplicate inclusions and
to support inclusion from C++.
* Fixcprt.py: added [-y file] option, do only files younger than file.
* modsupport.[ch]: added vmkvalue().
* intobject.c: use mkvalue().
* stringobject.c: added "formatstring"; renamed string* to string_*;
ceval.c: call formatstring for string % value.
* longobject.c: close memory leak in divmod.
* parsetok.c: set result node to NULL when returning an error.