The UTF-8 decoder is still buggy (i.e. it doesn't pass Markus Kuhn's
stress test), mainly due to the following construct:
#define UTF8_ERROR(details) do { \
if (utf8_decoding_error(&s, &p, errors, details)) \
goto onError; \
continue; \
} while (0)
(The "continue" statement is supposed to exit from the outer loop,
but of course, it doesn't. Indeed, this is a marvelous example of
the dangers of the C programming language and especially of the C
preprocessor.)
comments, docstrings or error messages. I fixed two minor things in
test_winreg.py ("didn't" -> "Didn't" and "Didnt" -> "Didn't").
There is a minor style issue involved: Guido seems to have preferred English
grammar (behaviour, honour) in a couple places. This patch changes that to
American, which is the more prominent style in the source. I prefer English
myself, so if English is preferred, I'd be happy to supply a patch myself ;)
use PyString_AS_STRING macro on local string object
when resizing string, make sure resized string will always be big enough
split string containing error message across two lines
add test to string_tests that causes resizing
seqlen==1 clause, before returning item, we need to DECREF seq. In
the res=PyString... failure clause, we need to goto finally to also
decref seq (and the DECREF of res in finally is changed to a
XDECREF). Also, we need to DECREF seq just before the
PyUnicode_Join() return.
implementation -- use PySequence_Fast interface to iterate over elements
interface -- if instance object reports wrong length, ignore it;
previous version raised an IndexError if reported length was too high
value is calculated from the character values, in a way
that makes sure an 8-bit ASCII string and a unicode string
with the same contents get the same hash value.
(as a side effect, this also works for ISO Latin 1 strings).
for more details, see the python-dev discussion.
was cascades of warnings about mismatching const decls. Overall,
I think const creates lots of headaches and solves almost
nothing. Added enough consts to shut up the warnings, but
this did require casting away const in one spot too (another
usual outcome of starting down this path): the function
mymemreplace can't return const char*, but sometimes wants to
return its first argument as-is, which latter must be declared
const char* in order to avoid const warnings at mymemreplace's
call sites. So, in the case the function wants to return the
first arg, that arg's declared constness must be subverted.
This was a convenient excuse to create the pyport.h file recently
discussed!
Please use new Py_ARITHMETIC_RIGHT_SHIFT when right-shifting a
signed int and you *need* sign-extension. This is #define'd in
pyport.h, keying off new config symbol SIGNED_RIGHT_SHIFT_ZERO_FILLS.
If you're running on a platform that needs that symbol #define'd,
the std tests never would have worked for you (in particular,
at least test_long would have failed).
The autoconfig stuff got added to Python after my Unix days, so
I don't know how that works. Would someone please look into doing
& testing an auto-config of the SIGNED_RIGHT_SHIFT_ZERO_FILLS
symbol? It needs to be defined if & only if, e.g., (-1) >> 3 is
not -1.
Stein -- thanks!). Incidentally removed all the Py_PROTO macros
from object.h, as they prevented my editor from magically finding
the definitions of the "coercion", "cmpfunc" and "reprfunc"
typedefs that were being redundantly applied in longobject.c.
works just like the Unicode one. The C APIs match the ones in the Unicode
implementation, but were extended to be able to reuse the existing
Unicode codecs for string purposes too.
Conversions from string to Unicode and back are done using the
default encoding.
implementation. This was really to test whether my new CVS+SSH
setup is more usable than the old one -- and turns out it is (for
whatever reason, it was impossible to do a commit before that
involved more than one directory).
The common technique for printing out a pointer has been to cast to a long
and use the "%lx" printf modifier. This is incorrect on Win64 where casting
to a long truncates the pointer. The "%p" formatter should be used instead.
The problem as stated by Tim:
> Unfortunately, the C committee refused to define what %p conversion "looks
> like" -- they explicitly allowed it to be implementation-defined. Older
> versions of Microsoft C even stuck a colon in the middle of the address (in
> the days of segment+offset addressing)!
The result is that the hex value of a pointer will maybe/maybe not have a 0x
prepended to it.
Notes on the patch:
There are two main classes of changes:
- in the various repr() functions that print out pointers
- debugging printf's in the various thread_*.h files (these are why the
patch is large)
Closes SourceForge patch #100505.
errors in some of the hash algorithms. For exmaple, in float_hash and
complex_hash a certain part of the value is not included in the hash
calculation. See Tim's, Guido's, and my discussion of this on
python-dev in May under the title "fix float_hash and complex_hash for
64-bit *nix"
(2) The hash algorithms that use pointers (e.g. func_hash, code_hash)
are universally not correct on Win64 (they assume that sizeof(long) ==
sizeof(void*))
As well, this patch significantly cleans up the hash code. It adds the
two function _Py_HashDouble and _PyHash_VoidPtr that the various
hashing routine are changed to use.
These help maintain the hash function invariant: (a==b) =>
(hash(a)==hash(b))) I have added Lib/test/test_hash.py and
Lib/test/output/test_hash to test this for some cases.
Avoid calling the dealloc function, previously triggered with
DECREF(inst). This caused a segfault in PyDict_GetItem, called with a
NULL dict, whenever inst->in_dict fails under low-memory conditions.
This patch modifies the type structures of objects that
participate in GC. The object's tp_basicsize is increased when
GC is enabled. GC information is prefixed to the object to
maintain binary compatibility. GC objects also define the
tp_flag Py_TPFLAGS_GC.
Fixed a bug in PyUnicode_Count() which would have caused a
core dump in case of substring coercion failure.
Synchronized .count() with the string method of the same name
to return len(s)+1 for s.count('').
The following patch adds "sq_contains" support to rangeobject, and enables
the already-written support for sq_contains in listobject and tupleobject.
The rangeobject "contains" code should be a bit more efficient than the
current default "in" implementation ;-) It might not get used much, but it's
not that much to add.
listobject.c and tupleobject.c already had code for sq_contains, and the
proper struct member was set, but the PyType structure was not extended to
include tp_flags, so the object-specific code was not getting called (Go
ahead, test it ;-). I also did this for the immutable_list_type in
listobject.c, eventhough it is probably never used. Symmetry and all that.
Fixed %c formatting to check for one character arguments. Thanks
to Finn Bock for finding this bug.
Added a fix for bug PR#348 which originated from not resetting
the globals correctly in _PyUnicode_Fini().
Change the default encoding to 'ascii' (it was previously
defined as UTF-8).
Note: The implementation still uses UTF-8 to implement
the buffer protocol, so C APIs will still see UTF-8. This
is on purpose: rather than fixing the Unicode implementation,
the C APIs should be made Unicode aware.
This patch correct bounds checking in PyLong_FromLongLong. Currently, it does
not check properly for negative values when checking to see if the incoming
value fits in a long or unsigned long. This results in possible silent
truncation of the value for very large negative values.
Added support for user settable default encodings. The
current implementation uses a per-process global which
defines the value of the encoding parameter in case it
is set to NULL (meaning: use the default encoding).
Fix the string methods that implement slice-like semantics with
optional args (count, find, endswith, etc.) to properly handle
indeces outside [INT_MIN, INT_MAX]. Previously the "i" formatter
for PyArg_ParseTuple was used to get the indices. These could overflow.
This patch changes the string methods to use the "O&" formatter with
the slice_index() function from ceval.c which is used to do the same
job for Python code slices (e.g. 'abcabcabc'[0:1000000000L]).
Fix the string methods that implement slice-like semantics with
optional args (count, find, endswith, etc.) to properly handle
indeces outside [INT_MIN, INT_MAX]. Previously the "i" formatter
for PyArg_ParseTuple was used to get the indices. These could overflow.
This patch changes the string methods to use the "O&" formatter with
the slice_index() function from ceval.c which is used to do the same
job for Python code slices (e.g. 'abcabcabc'[0:1000000000L]). slice_index()
is renamed _PyEval_SliceIndex() and is now exported. As well, the return
values for success/fail were changed to make slice_index directly
usable as required by the "O&" formatter.
[GvR: shouldn't a similar patch be applied to unicodeobject.c?]
gave bogus results for chars in the range 128-255, because their
implementation was using signed characters. Fixed this by using
unsigned character pointers (as opposed to using Py_CHARMASK()).