which implements the automatic conversion from Unicode to a string
object using the default encoding.
The new API is then put to use to have eval() and exec accept
Unicode objects as code parameter. This closes bugs #110924
and #113890.
As side-effect, the traditional C APIs PyString_Size() and
PyString_AsString() will also accept Unicode objects as
parameters.
objects for the attribute name. Unicode objects are converted to
a string using the default encoding before trying the lookup.
Note that previously it was allowed to pass arbitrary objects as
attribute name in case the tp_getattro/setattro slots were defined.
This patch fixes this by applying an explicit string check first:
all uses of these slots expect string objects and do not check
for the type resulting in a core dump. The tp_getattro/setattro
are still useful as optimization for lookups using interned
string objects though.
This patch fixes bug #113829.
that Py_INCREF boosts global _Py_RefTotal when Py_REF_DEBUG is defined
but Py_TRACE_REFS isn't.
There are, IMO, way too many preprocessor gimmicks in use for refcount
debugging (at least 3 distinct true/false symbols, but not all 8 combos
are supported by the code, etc etc), and no coherent documentation of
this stuff -- 'twas too painful to track this one down.
all, either to see whether the # of chars fit in an int, or that the
amount of memory needed fit in a size_t. Checking these is expensive, but
the alternative is silently wrong answers (as in the bug report) or
core dumps (which were easy to provoke using Unicode strings).
exception context. This avoids improperly propogating errors raised by
a user-defined __cmp__() by a subsequent lookup operation.
This patch does *not* include the performance enhancement patch for
dictionaries with string keys only; that will be checked in separately.
This closes SourceForge patch #101277 and bug #112558.
file.writelines() now tries to emulate the behaviour of file.write()
as closely as possible. Due to the problems with releasing the
interpreter lock the solution isn't exactly optimal, but still better
than not supporting the file.write() semantics at all.
types (i.e. Py_uintptr_t, our spelling of C9X's uintptr_t). ANSI
specifies that pointer compares other than == and != to non-related
structures are undefined. This quiets an Insure portability warning.
scope. Previously, s_buffer[] was defined inside the
PyUnicode_Check() scope, but referred to in the outer scope via
assignment to s. This quiets an Insure portability warning.
to integer types (i.e. Py_uintptr_t, our spelling of C9X's uintptr_t).
ANSI specifies that pointer compares other than == and != to
non-related structures are undefined. This quiets an Insure
portability warning.
is no __getslice__ available. Also does the same for C extension types.
Includes rudimentary documentation (it could use a cross reference to the
section on slice objects, I couldn't figure out how to do that) and a test
suite for all Python __hooks__ I could think of, including the new
behaviour.
shutdown time, but CVS log entry for revision 2.45 explains why this
is so. Simply include a comment so we don't have to re-figure it out
again 5 years from now.
This was a misleading bug -- the true "bug" was that hash(x) gave an error
return when x is an infinity. Fixed that. Added new Py_IS_INFINITY macro to
pyport.h. Rearranged code to reduce growing duplication in hashing of float and
complex numbers, pushing Trent's earlier stab at that to a logical conclusion.
Fixed exceedingly rare bug where hashing of floats could return -1 even if there
wasn't an error (didn't waste time trying to construct a test case, it was simply
obvious from the code that it *could* happen). Improved complex hash so that
hash(complex(x, y)) doesn't systematically equal hash(complex(y, x)) anymore.
resized after creation. 0-length strings are usually shared
and _PyString_Resize() fails on these shared strings.
Fixes [ Bug #111667 ] unicode core dump.
Properly end a comment block. It was terminated fine later but by a subsequent
block and. It was also in #if 0. This patch is so trivial I can't believe I am
talking about it. :)
function (together with other locale aware ones) should into a new collation
support module. See python-dev for a discussion of this removal.
Note: This patch should also be applied to the 1.6 branch.
the Python Unicode implementation.
The internal buffer used for implementing the buffer protocol
is renamed to defenc to make this change visible. It now holds the
default encoded version of the Unicode object and is calculated
on demand (NULL otherwise).
Since the default encoding defaults to ASCII, this will mean that
Unicode objects which hold non-ASCII characters will no longer
work on C APIs using the "s" or "t" parser markers. C APIs must now
explicitly provide Unicode support via the "u", "U" or "es"/"es#"
parser markers in order to work with non-ASCII Unicode strings.
(Note: this patch will also have to be applied to the 1.6 branch
of the CVS tree.)
This doesn't change the copyright status for these files -- just the
markings! Doing it on the main branch for these three files for which
the HEAD revision was pushed back into 1.6.
The UTF-8 decoder is still buggy (i.e. it doesn't pass Markus Kuhn's
stress test), mainly due to the following construct:
#define UTF8_ERROR(details) do { \
if (utf8_decoding_error(&s, &p, errors, details)) \
goto onError; \
continue; \
} while (0)
(The "continue" statement is supposed to exit from the outer loop,
but of course, it doesn't. Indeed, this is a marvelous example of
the dangers of the C programming language and especially of the C
preprocessor.)
comments, docstrings or error messages. I fixed two minor things in
test_winreg.py ("didn't" -> "Didn't" and "Didnt" -> "Didn't").
There is a minor style issue involved: Guido seems to have preferred English
grammar (behaviour, honour) in a couple places. This patch changes that to
American, which is the more prominent style in the source. I prefer English
myself, so if English is preferred, I'd be happy to supply a patch myself ;)
use PyString_AS_STRING macro on local string object
when resizing string, make sure resized string will always be big enough
split string containing error message across two lines
add test to string_tests that causes resizing
seqlen==1 clause, before returning item, we need to DECREF seq. In
the res=PyString... failure clause, we need to goto finally to also
decref seq (and the DECREF of res in finally is changed to a
XDECREF). Also, we need to DECREF seq just before the
PyUnicode_Join() return.
implementation -- use PySequence_Fast interface to iterate over elements
interface -- if instance object reports wrong length, ignore it;
previous version raised an IndexError if reported length was too high
value is calculated from the character values, in a way
that makes sure an 8-bit ASCII string and a unicode string
with the same contents get the same hash value.
(as a side effect, this also works for ISO Latin 1 strings).
for more details, see the python-dev discussion.
was cascades of warnings about mismatching const decls. Overall,
I think const creates lots of headaches and solves almost
nothing. Added enough consts to shut up the warnings, but
this did require casting away const in one spot too (another
usual outcome of starting down this path): the function
mymemreplace can't return const char*, but sometimes wants to
return its first argument as-is, which latter must be declared
const char* in order to avoid const warnings at mymemreplace's
call sites. So, in the case the function wants to return the
first arg, that arg's declared constness must be subverted.
This was a convenient excuse to create the pyport.h file recently
discussed!
Please use new Py_ARITHMETIC_RIGHT_SHIFT when right-shifting a
signed int and you *need* sign-extension. This is #define'd in
pyport.h, keying off new config symbol SIGNED_RIGHT_SHIFT_ZERO_FILLS.
If you're running on a platform that needs that symbol #define'd,
the std tests never would have worked for you (in particular,
at least test_long would have failed).
The autoconfig stuff got added to Python after my Unix days, so
I don't know how that works. Would someone please look into doing
& testing an auto-config of the SIGNED_RIGHT_SHIFT_ZERO_FILLS
symbol? It needs to be defined if & only if, e.g., (-1) >> 3 is
not -1.
Stein -- thanks!). Incidentally removed all the Py_PROTO macros
from object.h, as they prevented my editor from magically finding
the definitions of the "coercion", "cmpfunc" and "reprfunc"
typedefs that were being redundantly applied in longobject.c.
works just like the Unicode one. The C APIs match the ones in the Unicode
implementation, but were extended to be able to reuse the existing
Unicode codecs for string purposes too.
Conversions from string to Unicode and back are done using the
default encoding.
implementation. This was really to test whether my new CVS+SSH
setup is more usable than the old one -- and turns out it is (for
whatever reason, it was impossible to do a commit before that
involved more than one directory).
The common technique for printing out a pointer has been to cast to a long
and use the "%lx" printf modifier. This is incorrect on Win64 where casting
to a long truncates the pointer. The "%p" formatter should be used instead.
The problem as stated by Tim:
> Unfortunately, the C committee refused to define what %p conversion "looks
> like" -- they explicitly allowed it to be implementation-defined. Older
> versions of Microsoft C even stuck a colon in the middle of the address (in
> the days of segment+offset addressing)!
The result is that the hex value of a pointer will maybe/maybe not have a 0x
prepended to it.
Notes on the patch:
There are two main classes of changes:
- in the various repr() functions that print out pointers
- debugging printf's in the various thread_*.h files (these are why the
patch is large)
Closes SourceForge patch #100505.
errors in some of the hash algorithms. For exmaple, in float_hash and
complex_hash a certain part of the value is not included in the hash
calculation. See Tim's, Guido's, and my discussion of this on
python-dev in May under the title "fix float_hash and complex_hash for
64-bit *nix"
(2) The hash algorithms that use pointers (e.g. func_hash, code_hash)
are universally not correct on Win64 (they assume that sizeof(long) ==
sizeof(void*))
As well, this patch significantly cleans up the hash code. It adds the
two function _Py_HashDouble and _PyHash_VoidPtr that the various
hashing routine are changed to use.
These help maintain the hash function invariant: (a==b) =>
(hash(a)==hash(b))) I have added Lib/test/test_hash.py and
Lib/test/output/test_hash to test this for some cases.
Avoid calling the dealloc function, previously triggered with
DECREF(inst). This caused a segfault in PyDict_GetItem, called with a
NULL dict, whenever inst->in_dict fails under low-memory conditions.
This patch modifies the type structures of objects that
participate in GC. The object's tp_basicsize is increased when
GC is enabled. GC information is prefixed to the object to
maintain binary compatibility. GC objects also define the
tp_flag Py_TPFLAGS_GC.
Fixed a bug in PyUnicode_Count() which would have caused a
core dump in case of substring coercion failure.
Synchronized .count() with the string method of the same name
to return len(s)+1 for s.count('').
The following patch adds "sq_contains" support to rangeobject, and enables
the already-written support for sq_contains in listobject and tupleobject.
The rangeobject "contains" code should be a bit more efficient than the
current default "in" implementation ;-) It might not get used much, but it's
not that much to add.
listobject.c and tupleobject.c already had code for sq_contains, and the
proper struct member was set, but the PyType structure was not extended to
include tp_flags, so the object-specific code was not getting called (Go
ahead, test it ;-). I also did this for the immutable_list_type in
listobject.c, eventhough it is probably never used. Symmetry and all that.
Fixed %c formatting to check for one character arguments. Thanks
to Finn Bock for finding this bug.
Added a fix for bug PR#348 which originated from not resetting
the globals correctly in _PyUnicode_Fini().
Change the default encoding to 'ascii' (it was previously
defined as UTF-8).
Note: The implementation still uses UTF-8 to implement
the buffer protocol, so C APIs will still see UTF-8. This
is on purpose: rather than fixing the Unicode implementation,
the C APIs should be made Unicode aware.
This patch correct bounds checking in PyLong_FromLongLong. Currently, it does
not check properly for negative values when checking to see if the incoming
value fits in a long or unsigned long. This results in possible silent
truncation of the value for very large negative values.