implementation details inside the ucnhash module.
also cleaned up the unicode copyright blurb a little; Secret Labs'
internal revision history isn't that interesting...
except that it always returns Unicode objects.
A new C API PyObject_Unicode() is also provided.
This closes patch #101664.
Written by Marc-Andre Lemburg. Copyright assigned to Guido van Rossum.
Closes SF patch #103123.
funcobject.h:
PyFunctionObject: add the func_dict slot.
funcobject.c:
PyFunction_New(): Initialize the func_dict slot to NULL.
func_getattr(): Rename to func_getattro() and change the
signature. It's more efficient to use attro methods and dig the C
string out than it is to re-convert a C string to a PyString.
Also, add support for getting the __dict__ (a.k.a. func_dict)
attribute, and for getting an arbitrary function attribute.
func_setattr(): Rename to func_setattro() and change the signature
for the same reason. Also add support for setting __dict__
(a.k.a. func_dict) and any arbitrary function attribute.
func_dealloc(): Be sure to DECREF the func_dict slot.
func_traverse(): Be sure to traverse func_dict too.
PyFunction_Type: make the necessary func_?etattro() changes.
classobject.c:
instancemethod_memberlist: Add __dict__
instancemethod_setattro(): New method to set arbitrary attributes
on methods (really the underlying im_func). Raise TypeError when
the instance is bound or when you're trying to set one of the
reserved im_* attributes.
instancemethod_getattr(): Renamed to instancemethod_getattro()
since that's what it really is. Also, added support fo getting
arbitrary attributes through the im_func.
PyMethod_Type: Do the ?etattr{,o} dance.
regardless of whether the system getopt() does what we want. This avoids the
hassle with prototypes and externs, and the check to see if the system
getopt() does what we want. Prefix optind, optarg and opterr with _PyOS_ to
avoid name clashes. Add new include file to define the right symbols. Fix
Demo/pyserv/pyserv.c to include getopt.h itself, instead of relying on
Python to provide it.
#define'd to an unreasonable value (several recent gcc systems have
misdefined it, causing bogus overflows in integer multiplication). Nuke
CHAR_BIT entirely.
Add definitions of INT_MAX and LONG_MAX to pyport.h.
Remove includes of limits.h and conditional definitions of INT_MAX
and LONG_MAX elsewhere.
This closes SourceForge patch #101659 and bug #115323.
Add three new convenience functions to the PyModule_*() family:
PyModule_AddObject(), PyModule_AddIntConstant(), PyModule_AddStringConstant().
This closes SourceForge patch #101233.
Note a curious extension to the std C rules: x, X and o formatting can never produce
a sign character in C, so the '+' and ' ' flags are meaningless for them. But
unbounded ints *can* produce a sign character under these conversions (no fixed-
width bitstring is wide enough to hold all negative values in 2's-comp form). So
these flags become meaningful in Python when formatting a Python long which is too
big to fit in a C long. This required shuffling around existing code, which hacked
x and X conversions to death when both the '#' and '0' flags were specified: the
hacks weren't strong enough to deal with the simultaneous possibility of the ' ' or
'+' flags too, since signs were always meaningless before for x and X conversions.
Isomorphic shuffling was required in unicodeobject.c.
Also added dozens of non-trivial new unbounded-int test cases to test_format.py.
which implements the automatic conversion from Unicode to a string
object using the default encoding.
The new API is then put to use to have eval() and exec accept
Unicode objects as code parameter. This closes bugs #110924
and #113890.
As side-effect, the traditional C APIs PyString_Size() and
PyString_AsString() will also accept Unicode objects as
parameters.
I can't test this, so I'm just checking it in with blind faith in Andy.
I've tested that it doesn't broeak a non-Pth build on Linux.
Changes include:
- There's a --with-pth configure option.
- Instead of _GNU_PTH, we test for HAVE_PTH.
- Better signal handling.
- (The config.h.in file is regenerated in a slightly different order.)
It's hard to sort out what the bug was, exactly. So, Big Hammer:
1. Python shouldn't be in the business of #define'ing NULL, period.
2. Users of the Python C API shouldn't be in the business of not including
Python.h, period.
Hence:
1. Removed all #define's of NULL in Python source code (pyport.h and
object.h).
2. Since we're *relying* on stdio.h defining NULL, put an #error in
Python.h after its #include of stdio.h if NULL isn't defined then.
PyRun_FileEx(). These are the same as their non-Ex counterparts but
have an extra argument, a flag telling them to close the file when
done.
Then this is used by Py_Main() and execfile() to close the file after
it is parsed but before it is executed.
Adding APIs seems strange given the feature freeze but it's the only
way I see to close the bug report without incompatible changes.
[ Bug #110616 ] source file stays open after parsing is done (PR#209)
Add the EXTENDED_ARG opcode to the virtual machine, allowing 32-bit
arguments to opcodes instead of being forced to stick to the 16-bit
limit. This is especially useful for machine-generated code, which
can be too long for the SET_LINENO parameter to fit into 16 bits.
This closes the implementation portion of SourceForge patch #100893.
by the following.
typedef in a portable way the Python name for the C9X uintptr_t type.
This latter is the most portable way to spell an integral type to
which a void* can be cast to and back again without losing
information. Parallel checkin hacks configure to check if the
platform/compiler supports the C9X name.
name as n'. By doing some twists and turns, "as" is not a reserved word.
There is a slight change in semantics for 'from module import name' (it will
now honour the 'global' keyword) but only in cases that are explicitly
undocumented.
This was a misleading bug -- the true "bug" was that hash(x) gave an error
return when x is an infinity. Fixed that. Added new Py_IS_INFINITY macro to
pyport.h. Rearranged code to reduce growing duplication in hashing of float and
complex numbers, pushing Trent's earlier stab at that to a logical conclusion.
Fixed exceedingly rare bug where hashing of floats could return -1 even if there
wasn't an error (didn't waste time trying to construct a test case, it was simply
obvious from the code that it *could* happen). Improved complex hash so that
hash(complex(x, y)) doesn't systematically equal hash(complex(y, x)) anymore.
did the same anyway.
I'm not sure what to do with Tools/compiler/compiler/* -- that isn't part of
distutils, is it ? Should it try to be compatible with old bytecode version ?
the Python Unicode implementation.
The internal buffer used for implementing the buffer protocol
is renamed to defenc to make this change visible. It now holds the
default encoded version of the Unicode object and is calculated
on demand (NULL otherwise).
Since the default encoding defaults to ASCII, this will mean that
Unicode objects which hold non-ASCII characters will no longer
work on C APIs using the "s" or "t" parser markers. C APIs must now
explicitly provide Unicode support via the "u", "U" or "es"/"es#"
parser markers in order to work with non-ASCII Unicode strings.
(Note: this patch will also have to be applied to the 1.6 branch
of the CVS tree.)
This doesn't change the copyright status for these files -- just the
markings! Doing it on the main branch for these three files for which
the HEAD revision was pushed back into 1.6.
for systems that are missing those declarations from system include files.
Start by moving a pointy-haired ones from their previous locations to the
new section.
(The gethostname() one, for instance, breaks on several systems, because
some define it as (char *, size_t) and some as (char *, int).)
I purposely decided not to include the summary of used #defines like Tim did
in the first section of pyport.h. In my opinion, the number of #defines
likedly to be used by this section would make such an overview unwieldy. I
would suggest documenting the non-obvious ones, though.
handlers "return void", according to ANSI C.
Removed the new Py_RETURN_FROM_SIGNAL_HANDLER macro.
Left RETSIGTYPE in the config stuff, because it's not clear to
me that others aren't relying on it (e.g., extension modules).
good C practice hasn't been available to everything all along.
Added Py_SAFE_DOWNCAST(VALUE, WIDE, NARROW) macro to pyport.h; this
just casts VALUE from type WIDE to type NARROW, but assert-fails if
Py_DEBUG is defined and info is lost due to casting.
Replaced a line in Fredrik's fix to marshal.c to use the new macro.
#if RETSIGTYPE != void
That isn't C, and MSVC properly refuses to compile it.
Introduced new Py_RETURN_FROM_SIGNAL_HANDLER macro in pyport.h
to expand to the correct thing based on RETSIGTYPE. However,
only void is ANSI! Do we still have platforms that return int?
The Unix config mess appears to #define RETSIGTYPE by magic
without being asked to, so I assume it's "a problem" across
Unices still.
comments, docstrings or error messages. I fixed two minor things in
test_winreg.py ("didn't" -> "Didn't" and "Didnt" -> "Didn't").
There is a minor style issue involved: Guido seems to have preferred English
grammar (behaviour, honour) in a couple places. This patch changes that to
American, which is the more prominent style in the source. I prefer English
myself, so if English is preferred, I'd be happy to supply a patch myself ;)
used for indentation related errors. This patch includes Ping's
improvements for indentation-related error messages.
Closes SourceForge patches #100734 and #100856.
This was a convenient excuse to create the pyport.h file recently
discussed!
Please use new Py_ARITHMETIC_RIGHT_SHIFT when right-shifting a
signed int and you *need* sign-extension. This is #define'd in
pyport.h, keying off new config symbol SIGNED_RIGHT_SHIFT_ZERO_FILLS.
If you're running on a platform that needs that symbol #define'd,
the std tests never would have worked for you (in particular,
at least test_long would have failed).
The autoconfig stuff got added to Python after my Unix days, so
I don't know how that works. Would someone please look into doing
& testing an auto-config of the SIGNED_RIGHT_SHIFT_ZERO_FILLS
symbol? It needs to be defined if & only if, e.g., (-1) >> 3 is
not -1.
Stein -- thanks!). Incidentally removed all the Py_PROTO macros
from object.h, as they prevented my editor from magically finding
the definitions of the "coercion", "cmpfunc" and "reprfunc"
typedefs that were being redundantly applied in longobject.c.
match the ones in the Unicode implementation, but were extended
to be able to reuse the existing Unicode codecs for string
purposes too.
Conversion from string to Unicode and back are done using the
default encoding.
to switch on support for BSD and SysV on platforms which use glibc
such as Linux.
These #defines are documented in e.g. the file /usr/include/features.h
on Linux platforms and the SUSv2 docs.
implementation. This was really to test whether my new CVS+SSH
setup is more usable than the old one -- and turns out it is (for
whatever reason, it was impossible to do a commit before that
involved more than one directory).
which are true for alphabetic and alphanumeric characters resp.
The macros are currently implemented using the existing is* tables
but will have to be updated to meet the Unicode standard definitions
(add tables for non-cased letters and letter modifiers).
errors in some of the hash algorithms. For exmaple, in float_hash and
complex_hash a certain part of the value is not included in the hash
calculation. See Tim's, Guido's, and my discussion of this on
python-dev in May under the title "fix float_hash and complex_hash for
64-bit *nix"
(2) The hash algorithms that use pointers (e.g. func_hash, code_hash)
are universally not correct on Win64 (they assume that sizeof(long) ==
sizeof(void*))
As well, this patch significantly cleans up the hash code. It adds the
two function _Py_HashDouble and _PyHash_VoidPtr that the various
hashing routine are changed to use.
These help maintain the hash function invariant: (a==b) =>
(hash(a)==hash(b))) I have added Lib/test/test_hash.py and
Lib/test/output/test_hash to test this for some cases.
This patch modifies the type structures of objects that
participate in GC. The object's tp_basicsize is increased when
GC is enabled. GC information is prefixed to the object to
maintain binary compatibility. GC objects also define the
tp_flag Py_TPFLAGS_GC.
the number of children of a node exceeds the max possible value for
the short that is used to count them. The Python runtime converts
this parser error into the SyntaxError "expression too long."
or fini of the builtin module.
_PyBuiltin_Init_1 => _PyBuiltin_Init
_PyBuiltin_Init_2 removed
_PyBuiltin_Fini_1 removed
_PyBuiltin_Fini_2 removed
These functions are used to initialize the _exceptions module.
init_exceptions added
fini_exceptions added
For more comments, read the patches@python.org archives.
For documentation read the comments in mymalloc.h and objimpl.h.
(This is not exactly what Vladimir posted to the patches list; I've
made a few changes, and Vladimir sent me a fix in private email for a
problem that only occurs in debug mode. I'm also holding back on his
change to main.c, which seems unnecessary to me.)
Improvements:
- does no longer need any extra memory
- has no relationship to tstate
- works in debug mode
- can easily be modified for free threading (hi Greg:)
Side effects:
Trashcan does change the order of object destruction.
Prevending that would be quite an immense effort, as
my attempts have shown. This version works always
the same, with debug mode or not. The slightly
changed destruction order should therefore be no problem.
Algorithm:
While the old idea of delaying the destruction of some
obejcts at a certain recursion level was kept, we now
no longer aloocate an object to hold these objects.
The delayed objects are instead chained together
via their ob_type field. The type is encoded via
ob_refcnt. When it comes to the destruction of the
chain of waiting objects, the topmost object is popped
off the chain and revived with type and refcount 1,
then it gets a normal Py_DECREF.
I am confident that this solution is near optimum
for minimizing side effects and code bloat.
Changed PyUnicode_Splitlines() maxsplit argument to keepends.
The maxsplit functionality was replaced by the keepends
functionality which allows keeping the line end markers together
with the string.
his copy of test_contains.py seems to be broken -- the lines he
deleted were already absent). Checkin messages:
New Unicode support for int(), float(), complex() and long().
- new APIs PyInt_FromUnicode() and PyLong_FromUnicode()
- added support for Unicode to PyFloat_FromString()
- new encoding API PyUnicode_EncodeDecimal() which converts
Unicode to a decimal char* string (used in the above new
APIs)
- shortcuts for calls like int(<int object>) and float(<float obj>)
- tests for all of the above
Unicode compares and contains checks:
- comparing Unicode and non-string types now works; TypeErrors
are masked, all other errors such as ValueError during
Unicode coercion are passed through (note that PyUnicode_Compare
does not implement the masking -- PyObject_Compare does this)
- contains now works for non-string types too; TypeErrors are
masked and 0 returned; all other errors are passed through
Better testing support for the standard codecs.
Misc minor enhancements, such as an alias dbcs for the mbcs codec.
Changes:
- PyLong_FromString() now applies the same error checks as
does PyInt_FromString(): trailing garbage is reported
as error and not longer silently ignored. The only characters
which may be trailing the digits are 'L' and 'l' -- these
are still silently ignored.
- string.ato?() now directly interface to int(), long() and
float(). The error strings are now a little different, but
the type still remains the same. These functions are now
ready to get declared obsolete ;-)
- PyNumber_Int() now also does a check for embedded NULL chars
in the input string; PyNumber_Long() already did this (and
still does)
Followed by:
Looks like I've gone a step too far there... (and test_contains.py
seem to have a bug too).
I've changed back to reporting all errors in PyUnicode_Contains()
and added a few more test cases to test_contains.py (plus corrected
the join() NameError).
executive summary:
Instead of typing 'apply(f, args, kwargs)' you can type 'f(*arg, **kwargs)'.
Some file-by-file details follow.
Grammar/Grammar:
simplify varargslist, replacing '*' '*' with '**'
add * & ** options to arglist
Include/opcode.h & Lib/dis.py:
define three new opcodes
CALL_FUNCTION_VAR
CALL_FUNCTION_KW
CALL_FUNCTION_VAR_KW
Python/ceval.c:
extend TypeError "keyword parameter redefined" message to include
the name of the offending keyword
reindent CALL_FUNCTION using four spaces
add handling of sequences and dictionaries using extend calls
fix function import_from to use PyErr_Format
The attached patch set includes a workaround to get Python with
Unicode compile on BSDI 4.x (courtesy Thomas Wouters; the cause
is a bug in the BSDI wchar.h header file) and Python interfaces
for the MBCS codec donated by Mark Hammond.
Also included are some minor corrections w/r to the docs of
the new "es" and "es#" parser markers (use PyMem_Free() instead
of free(); thanks to Mark Hammond for finding these).
The unicodedata tests are now in a separate file
(test_unicodedata.py) to avoid problems if the module cannot
be found.
/* More standard operations (at end for binary compatibility) */
should now be:
/* More standard operations (here for binary compatibility) */
since they're no longer at the end!
Attached you find an update of the Unicode implementation.
The patch is against the current CVS version. I would appreciate
if someone with CVS checkin permissions could check the changes
in.
The patch contains all bugs and patches sent this week and also
fixes a leak in the codecs code and a bug in the free list code
for Unicode objects (which only shows up when compiling Python
with Py_DEBUG; thanks to MarkH for spotting this one).
Added wrapping macros to dictobject.c, listobject.c, tupleobject.c,
frameobject.c, traceback.c that safely prevends core dumps
on stack overflow. Macros and functions in object.c, object.h.
The method is an "elevator destructor" that turns cascading
deletes into tail recursive behavior when some limit is hit.
a new proc type (objobjproc), a new slot sq_contains to
PySequenceMethods, and a new flag Py_TPFLAGS_HAVE_SEQUENCE_IN to
Py_TPFLAGS_DEFAULT. More to follow.
Introduce a new builtin exception, UnboundLocalError, raised when ceval.c
tries to retrieve or delete a local name that isn't bound to a value.
Currently raises NameError, which makes this behavior a FAQ since the same
error is raised for "missing" global names too: when the user has a global
of the same name as the unbound local, NameError makes no sense to them.
Even in the absence of shadowing, knowing whether a bogus name is local or
global is a real aid to quick understanding.
Example:
D:\src\PCbuild>type local.py
x = 42
def f():
print x
x = 13
return x
f()
D:\src\PCbuild>python local.py
Traceback (innermost last):
File "local.py", line 8, in ?
f()
File "local.py", line 4, in f
print x
UnboundLocalError: x
D:\src\PCbuild>
Note that UnboundLocalError is a subclass of NameError, for compatibility
with existing class-exception code that may be trying to catch this as a
NameError. Unfortunately, I see no way to make this wholly compatible
with -X (see comments in bltinmodule.c): under -X, [UnboundLocalError
is an alias for NameError --GvR].
[The ceval.c patch differs slightly from the second version that Tim
submitted; I decided not to raise UnboundLocalError for DELETE_NAME,
only for DELETE_LOCAL. DELETE_NAME is only generated at the module
level, and since at that level a NameError is raised for referencing
an undefined name, it should also be raised for deleting one.]
indicate to those that are using the CVS access that they are using a
newer-than-1.2.5 version, without committing to a particular version
number or patch level.
PycStringIO_IMPORT. While arguably the name used in the documentation
is more consistent, I think it's probably safer not to change the
macro definition and instead fix the doco.
Add a new member to the PyBufferProcs struct, bf_getcharbuffer. For
backward compatibility, this member should only be used (this includes
testing for NULL!) when the flag Py_TPFLAGS_HAVE_GETCHARBUFFER is set
in the type structure, below. Note that if its flag is not set, we
may be looking at an extension module compiled for 1.5.1, which will
have garbage at the bf_getcharbuffer member (because the struct wasn't
as long then). If the flag is one, the pointer may still be NULL.
The function found at this member is used in a similar manner as
bf_getreadbuffer, but it is known to point to 8-bit character data.
(See discussion in getargs.c checked in later.)
As a general feature for extending the type structure and the various
structures that (may) hang off it in a backwards compatible way, we
rename the tp_xxx4 "spare" slot to tp_flags. In 1.5.1 and before,
this slot was always zero. In 1.5.1, it may contain various flags
indicating extra fields that weren't present in 1.5.1. The only flag
defined so far is for the bf_getcharbuffer member of the PyBufferProcs
struct.
Note that the new spares (tp_xxx5 - tp_xxx8), once they become used,
should also be protected by a flag (or flags) in tp_flags.
The MS compiler doesn't call it 'long long', it uses __int64,
so a new #define, LONG_LONG, has been added and all occurrences
of 'long long' are replaced with it.
clear_carefully() used to do in import.c. Differences: leave only
__builtins__ alone in the 2nd pass; and don't clear the dictionary (on
the theory that as long as there are references left to the
dictionary, those might be destructors that might expect __builtins__
to be alive when they run; and __builtins__ can't normally be part of
a cycle).
embedders to force a different PYTHONHOME.
- Add new interface PyErr_PrintEx(flag); same as PyErr_Print() but
flag determines whether sys.last_* are set or not. PyErr_Print()
now simply calls PyErr_PrintEx(1).
signal handlers in a fork()ed child process when Python is compiled with
thread support. The bug was reported by Scott <scott@chronis.icgroup.com>.
What happens is that after a fork(), the variables used by the signal
module to determine whether this is the main thread or not are bogus,
and it decides that no thread is the main thread, so no signals will
be delivered.
The solution is the addition of PyOS_AfterFork(), which fixes the signal
module's variables. A dummy version of the function is present in the
intrcheck.c source file which is linked when the signal module is not
used.
is like PyImport_ImporModule(name) but receives the globals and locals
dict and the fromlist arguments as well. (The name is a char*; the
others are PyObject*s).
PyExc_NumberError, and PyExc_LookupError. Also added extern for
pre-instantiated exception instance PyExc_MemoryErrorInst.
Removed extern of obsolete exception PyExc_AccessError.
- int PyErr_GivenExceptionMatches(obj1, obj2)
Returns 1 if obj1 and obj2 are the same object, or if obj1 is an
instance of type obj2, or of a class derived from obj2
- int PyErr_ExceptionMatches(obj)
Higher level wrapper around PyErr_GivenExceptionMatches() which uses
PyErr_Occurred() as obj1. This will be the more commonly called
function.
- void PyErr_NormalizeException(typeptr, valptr, tbptr)
Normalizes exceptions, and places the normalized values in the
arguments. If type is not a class, this does nothing. If type is a
class, then it makes sure that value is an instance of the class by:
1. if instance is of the type, or a class derived from type, it does
nothing.
2. otherwise it instantiates the class, using the value as an
argument. If value is None, it uses an empty arg tuple, and if
the value is a tuple, it uses just that.