match the ones in the Unicode implementation, but were extended
to be able to reuse the existing Unicode codecs for string
purposes too.
Conversion from string to Unicode and back are done using the
default encoding.
to switch on support for BSD and SysV on platforms which use glibc
such as Linux.
These #defines are documented in e.g. the file /usr/include/features.h
on Linux platforms and the SUSv2 docs.
implementation. This was really to test whether my new CVS+SSH
setup is more usable than the old one -- and turns out it is (for
whatever reason, it was impossible to do a commit before that
involved more than one directory).
which are true for alphabetic and alphanumeric characters resp.
The macros are currently implemented using the existing is* tables
but will have to be updated to meet the Unicode standard definitions
(add tables for non-cased letters and letter modifiers).
errors in some of the hash algorithms. For exmaple, in float_hash and
complex_hash a certain part of the value is not included in the hash
calculation. See Tim's, Guido's, and my discussion of this on
python-dev in May under the title "fix float_hash and complex_hash for
64-bit *nix"
(2) The hash algorithms that use pointers (e.g. func_hash, code_hash)
are universally not correct on Win64 (they assume that sizeof(long) ==
sizeof(void*))
As well, this patch significantly cleans up the hash code. It adds the
two function _Py_HashDouble and _PyHash_VoidPtr that the various
hashing routine are changed to use.
These help maintain the hash function invariant: (a==b) =>
(hash(a)==hash(b))) I have added Lib/test/test_hash.py and
Lib/test/output/test_hash to test this for some cases.
This patch modifies the type structures of objects that
participate in GC. The object's tp_basicsize is increased when
GC is enabled. GC information is prefixed to the object to
maintain binary compatibility. GC objects also define the
tp_flag Py_TPFLAGS_GC.
the number of children of a node exceeds the max possible value for
the short that is used to count them. The Python runtime converts
this parser error into the SyntaxError "expression too long."
or fini of the builtin module.
_PyBuiltin_Init_1 => _PyBuiltin_Init
_PyBuiltin_Init_2 removed
_PyBuiltin_Fini_1 removed
_PyBuiltin_Fini_2 removed
These functions are used to initialize the _exceptions module.
init_exceptions added
fini_exceptions added
For more comments, read the patches@python.org archives.
For documentation read the comments in mymalloc.h and objimpl.h.
(This is not exactly what Vladimir posted to the patches list; I've
made a few changes, and Vladimir sent me a fix in private email for a
problem that only occurs in debug mode. I'm also holding back on his
change to main.c, which seems unnecessary to me.)
Improvements:
- does no longer need any extra memory
- has no relationship to tstate
- works in debug mode
- can easily be modified for free threading (hi Greg:)
Side effects:
Trashcan does change the order of object destruction.
Prevending that would be quite an immense effort, as
my attempts have shown. This version works always
the same, with debug mode or not. The slightly
changed destruction order should therefore be no problem.
Algorithm:
While the old idea of delaying the destruction of some
obejcts at a certain recursion level was kept, we now
no longer aloocate an object to hold these objects.
The delayed objects are instead chained together
via their ob_type field. The type is encoded via
ob_refcnt. When it comes to the destruction of the
chain of waiting objects, the topmost object is popped
off the chain and revived with type and refcount 1,
then it gets a normal Py_DECREF.
I am confident that this solution is near optimum
for minimizing side effects and code bloat.
Changed PyUnicode_Splitlines() maxsplit argument to keepends.
The maxsplit functionality was replaced by the keepends
functionality which allows keeping the line end markers together
with the string.
his copy of test_contains.py seems to be broken -- the lines he
deleted were already absent). Checkin messages:
New Unicode support for int(), float(), complex() and long().
- new APIs PyInt_FromUnicode() and PyLong_FromUnicode()
- added support for Unicode to PyFloat_FromString()
- new encoding API PyUnicode_EncodeDecimal() which converts
Unicode to a decimal char* string (used in the above new
APIs)
- shortcuts for calls like int(<int object>) and float(<float obj>)
- tests for all of the above
Unicode compares and contains checks:
- comparing Unicode and non-string types now works; TypeErrors
are masked, all other errors such as ValueError during
Unicode coercion are passed through (note that PyUnicode_Compare
does not implement the masking -- PyObject_Compare does this)
- contains now works for non-string types too; TypeErrors are
masked and 0 returned; all other errors are passed through
Better testing support for the standard codecs.
Misc minor enhancements, such as an alias dbcs for the mbcs codec.
Changes:
- PyLong_FromString() now applies the same error checks as
does PyInt_FromString(): trailing garbage is reported
as error and not longer silently ignored. The only characters
which may be trailing the digits are 'L' and 'l' -- these
are still silently ignored.
- string.ato?() now directly interface to int(), long() and
float(). The error strings are now a little different, but
the type still remains the same. These functions are now
ready to get declared obsolete ;-)
- PyNumber_Int() now also does a check for embedded NULL chars
in the input string; PyNumber_Long() already did this (and
still does)
Followed by:
Looks like I've gone a step too far there... (and test_contains.py
seem to have a bug too).
I've changed back to reporting all errors in PyUnicode_Contains()
and added a few more test cases to test_contains.py (plus corrected
the join() NameError).
executive summary:
Instead of typing 'apply(f, args, kwargs)' you can type 'f(*arg, **kwargs)'.
Some file-by-file details follow.
Grammar/Grammar:
simplify varargslist, replacing '*' '*' with '**'
add * & ** options to arglist
Include/opcode.h & Lib/dis.py:
define three new opcodes
CALL_FUNCTION_VAR
CALL_FUNCTION_KW
CALL_FUNCTION_VAR_KW
Python/ceval.c:
extend TypeError "keyword parameter redefined" message to include
the name of the offending keyword
reindent CALL_FUNCTION using four spaces
add handling of sequences and dictionaries using extend calls
fix function import_from to use PyErr_Format
The attached patch set includes a workaround to get Python with
Unicode compile on BSDI 4.x (courtesy Thomas Wouters; the cause
is a bug in the BSDI wchar.h header file) and Python interfaces
for the MBCS codec donated by Mark Hammond.
Also included are some minor corrections w/r to the docs of
the new "es" and "es#" parser markers (use PyMem_Free() instead
of free(); thanks to Mark Hammond for finding these).
The unicodedata tests are now in a separate file
(test_unicodedata.py) to avoid problems if the module cannot
be found.