-32768..65535 is acceptable. Added B specifier (with values from
-128..255). No L added (which would have completed the set) because l
already accepts any value (and the letter L is taken for quadwords).
python-dev discussion.
This should catch future version incompatibilities on Windows. Alas,
this doesn't help for 1.5 vs. 1.6; but it will help for 1.6 vs. 2.0.
the Python Unicode implementation.
The internal buffer used for implementing the buffer protocol
is renamed to defenc to make this change visible. It now holds the
default encoded version of the Unicode object and is calculated
on demand (NULL otherwise).
Since the default encoding defaults to ASCII, this will mean that
Unicode objects which hold non-ASCII characters will no longer
work on C APIs using the "s" or "t" parser markers. C APIs must now
explicitly provide Unicode support via the "u", "U" or "es"/"es#"
parser markers in order to work with non-ASCII Unicode strings.
(Note: this patch will also have to be applied to the 1.6 branch
of the CVS tree.)
accepted by the BDFL.
builtin_zip(): New function to implement the zip() function described
in the above proposal.
zip_doc[]: Docstring for zip().
builtin_methods[]: added entry for zip()
for systems that are missing those declarations from system include files.
Start by moving a pointy-haired ones from their previous locations to the
new section.
(The gethostname() one, for instance, breaks on several systems, because
some define it as (char *, size_t) and some as (char *, int).)
I purposely decided not to include the summary of used #defines like Tim did
in the first section of pyport.h. In my opinion, the number of #defines
likedly to be used by this section would make such an overview unwieldy. I
would suggest documenting the non-obvious ones, though.
good C practice hasn't been available to everything all along.
Added Py_SAFE_DOWNCAST(VALUE, WIDE, NARROW) macro to pyport.h; this
just casts VALUE from type WIDE to type NARROW, but assert-fails if
Py_DEBUG is defined and info is lost due to casting.
Replaced a line in Fredrik's fix to marshal.c to use the new macro.
MAGIC number. When updating it next time, be sure it's higher than 50715 *
constants. (Shouldn't be a problem if everyone keeps to the proper
algorithm.)
comments, docstrings or error messages. I fixed two minor things in
test_winreg.py ("didn't" -> "Didn't" and "Didnt" -> "Didn't").
There is a minor style issue involved: Guido seems to have preferred English
grammar (behaviour, honour) in a couple places. This patch changes that to
American, which is the more prominent style in the source. I prefer English
myself, so if English is preferred, I'd be happy to supply a patch myself ;)
used for indentation related errors. This patch includes Ping's
improvements for indentation-related error messages.
Closes SourceForge patches #100734 and #100856.
This adds support for instance to the constructor (instances
have to define __str__ and can return Unicode objects via that
hook; string return values are decoded into Unicode using the
current default encoding).
The common technique for printing out a pointer has been to cast to a long
and use the "%lx" printf modifier. This is incorrect on Win64 where casting
to a long truncates the pointer. The "%p" formatter should be used instead.
The problem as stated by Tim:
> Unfortunately, the C committee refused to define what %p conversion "looks
> like" -- they explicitly allowed it to be implementation-defined. Older
> versions of Microsoft C even stuck a colon in the middle of the address (in
> the days of segment+offset addressing)!
The result is that the hex value of a pointer will maybe/maybe not have a 0x
prepended to it.
Notes on the patch:
There are two main classes of changes:
- in the various repr() functions that print out pointers
- debugging printf's in the various thread_*.h files (these are why the
patch is large)
Closes SourceForge patch #100505.
This patch fixes possible overflow in the use of
PyOS_GetLastModificationTime in getmtime.c and Python/import.c.
Currently PyOS_GetLastModificationTime returns a C long. This can
overflow on Win64 where sizeof(time_t) > sizeof(long). Besides it
should logically return a time_t anyway (this patch changes this).
As well, import.c uses PyOS_GetLastModificationTime for .pyc
timestamping. There has been recent discussion about the .pyc header
format on python-dev. This patch adds oveflow checking to import.c so
that an exception will be raised if the modification time
overflows. There are a few other minor 64-bit readiness changes made
to the module as well:
- size_t instead of int or long for function-local buffer and string
length variables
- one buffer overflow check was added (raises an exception on possible
overflow, this overflow chance exists on 32-bit platforms as well), no
other possible buffer overflows existed (from my analysis anyway)
Closes SourceForge patch #100509.
The common technique for printing out a pointer has been to cast to a long
and use the "%lx" printf modifier. This is incorrect on Win64 where casting
to a long truncates the pointer. The "%p" formatter should be used instead.
The problem as stated by Tim:
> Unfortunately, the C committee refused to define what %p conversion "looks
> like" -- they explicitly allowed it to be implementation-defined. Older
> versions of Microsoft C even stuck a colon in the middle of the address (in
> the days of segment+offset addressing)!
The result is that the hex value of a pointer will maybe/maybe not have a 0x
prepended to it.
Notes on the patch:
There are two main classes of changes:
- in the various repr() functions that print out pointers
- debugging printf's in the various thread_*.h files (these are why the
patch is large)
Closes SourceForge patch #100505.
This patch fixes a problem on AIX with the signed int case code in
getargs.c, after Trent Mick's intervention about MIN/MAX overflow
checks. The AIX compiler/optimizer generates bogus code with the
default flags "-g -O" causing test_builtin to fail: int("10", 16) <>
16L. Swapping the two checks in the signed int code makes the problem
go away.
Also, make the error messages fit in 80 char lines in the
source.
The depth field was never decremented inside w_object(), and it was
never initialized in PyMarshal_WriteObjectToFile().
This caused imports from .pyc files to fil mysteriously when the .pyc
file was written by the broken code -- w_object() would bail out
early, but PyMarshal_WriteObjectToFile() doesn't check the error or
return an error code, and apparently the marshalling code doesn't call
PyErr_Check() either. (That's a separate patch if I feel like it.)
Various small fixes to the builtin module to ensure no buffer
overflows.
- chunk #1:
Proper casting to ensure no truncation, and hence no surprises, in the
comparison.
- chunk #2:
The id() function guarantees a unique return value for different
objects. It does this by returning the pointer to the object. By
returning a PyInt, on Win64 (sizeof(long) < sizeof(void*)) the pointer
is truncated and the guarantee may be proven false. The appropriate
return function is PyLong_FromVoidPtr, this returns a PyLong if that
is necessary to return the pointer without truncation.
[GvR: note that this means that id() can now return a long on Win32
platforms. This *might* break some code...]
- chunk #3:
Ensure no overflow in raw_input(). Granted the user would have to pass
in >2GB of data but it *is* a possible buffer overflow condition.
As I really do not have anything better to do at the moment, I have written
a patch to Python/marshal.c that prevents Python dumping core when trying
to marshal stack bustingly deep (or recursive) data structure.
It just throws an exception; even slightly clever handling of recursive
data is what pickle is for...
[Fred Drake:] Moved magic constant 5000 to a #define.
This closes SourceForge patch #100645.
the number of children of a node exceeds the max possible value for
the short that is used to count them. The Python runtime converts
this parser error into the SyntaxError "expression too long."
module and into _exceptions.c. This includes all the PyExc_* globals,
the bltin_exc table, init_class_exc(), fini_instances(),
finierrors().
Renamed _PyBuiltin_Init_1() to _PyBuiltin_Init() since the two phase
initializations are necessary any more.
Removed as obsolete _PyBuiltin_Init_2(), _PyBuiltin_Fini_1() and
_PyBuiltin_Fini_2().
need two phase init or fini of the builtin module. Change the call of
_PyBuiltin_Init_1() to _PyBuiltin_Init(). Add a call to
init_exceptions().
Py_Finalize(): Don't call _PyBuiltin_Fini_1(). Instead call
fini_exceptions() but move this to before the thread state is
cleared.
Limit the 'b' formatter of PyArg_ParseTuple to valid values of an unsigned
char, i.e. [0,UCHAR_MAX]. It is expected that this is the common usage of 'b'.
An OverflowError is raised if the parsed value is outside this range.
Changes the 'b', 'h', and 'i' formatters in PyArg_ParseTuple to raise an
Overflow exception if they overflow (previously they just silently
overflowed).
Changes by Guido: always accept values [0..255] (in addition to
[CHAR_MIN..CHAR_MAX]) for 'b' format; changed some spaces into tabs in
other code.
who wrote:
Here's the new version of thread_nt.h. More particular, there is a
new version of thread lock that uses kernel object (e.g. semaphore)
only in case of contention; in other case it simply uses interlocked
functions, which are faster by the order of magnitude. It doesn't
make much difference without threads present, but as soon as thread
machinery initialised and (mostly) the interpreter global lock is on,
difference becomes tremendous. I've included a small script, which
initialises threads and launches pystone. With original thread_nt.h,
Pystone results with initialised threads are twofold worse then w/o
threads. With the new version, only 10% worse. I have used this
patch for about 6 months (with threaded and non-threaded
applications). It works remarkably well (though I'd desperately
prefer Python was free-threaded; I hope, it will soon).
For more comments, read the patches@python.org archives.
For documentation read the comments in mymalloc.h and objimpl.h.
(This is not exactly what Vladimir posted to the patches list; I've
made a few changes, and Vladimir sent me a fix in private email for a
problem that only occurs in debug mode. I'm also holding back on his
change to main.c, which seems unnecessary to me.)
- When 'import exceptions' fails, don't suggest to use -v to print the traceback;
this doesn't actually work.
- Remove comment about fallback to string exceptions.
- Remove a PyErr_Occurred() check after all is said and done that can
never trigger.
- Remove static function newstdexception() which is no longer called.
Added 'u' and 'u#' tags for PyArg_ParseTuple - these turn a
PyUnicodeObject argument into a Py_UNICODE * buffer, or a Py_UNICODE *
buffer plus a length with the '#'. Also added an analog to 'U'
for Py_BuildValue.
return 0 (exceptions don't match). This means that if an ImportError
is raised because exceptions.py can't be imported, the interpreter
will exit "cleanly" with an error message instead of just core
dumping.
PyErr_SetFromErrnoWithFilename(), PyErr_SetFromWindowsErrWithFilename():
Don't test on Py_UseClassExceptionsFlag.
are no longer supported (i.e. -X option is removed).
_PyBuiltin_Init_1(): Don't call initerrors(). This does mean that it
is possible to raise an ImportError before that exception has been
initialized, say because exceptions.py can't be found, or contains
bogosity. See changes to errors.c for how this is handled.
_PyBuiltin_Init_2(): Don't test Py_UseClassExceptionsFlag, just go
ahead and initialize the class-based standard exceptions. If this
fails, we throw a Py_FatalError.
Changed all references to the MAGIC constant to use a global
pyc_magic instead. This global is initially set to MAGIC, but can be
changed by the _PyImport_Init() function to provide for
special features implemented in the compiler which are settable
using command line switches and affect the way PYC files are
generated.
Currently this change is only done for the -U flag.
Support for the new -U command line option option:
with the option enabled the Python compiler
interprets all "..." strings as u"..." (same with r"..." and
ur"...").
Follow a suggestion in an /*XXX*/ comment [in com_add()] to speed up
compilation by using supplemental dictionaries to keep track of names
and constants, eliminating quadratic behavior. With this patch in
place, the time to import a 5000-line file with lots of constants [at
the global level] is reduced from 20 seconds to under 3 on my system.
Here's a patch which changes modsupport to add 'u' and 'u#',
to support building Unicode objects from a null-terminated
Py_UNICODE *, and a Py_UNICODE * with length, respectively.
[Conversion from 'U' to 'u' by Fred, based on python-dev comments.]
Note that the use of None for NULL values of the Py_UNICODE* value is
still in; I'm not sure of the conclusion on that issue.
remaining object references if the environment variable PYTHONDUMPREFS
exists. The default behaviour caused problems for background or
otherwise invisible processes that use the debug build of Python.
Fixed a memory leak found by Fredrik Lundh. Instead of
PyUnicode_AsUTF8String() we now use _PyUnicode_AsUTF8String() which
returns the string object without incremented refcount (and assures
that the so obtained object remains alive until the Unicode object is
garbage collected).
"""
Running "test_extcall" repeatedly results in memory leaks.
One of these can't be fixed (at least not easily!), it happens since
this code:
def saboteur(**kw):
kw['x'] = locals()
d = {}
saboteur(a=1, **d)
creates a circular reference - d['x']['d']==d
The others are due to some missing decrefs in ceval.c, fixed by the
patch attached below.
Note: I originally wrote this without the "goto", just adding the
missing decref's where needed. But I think the goto is justified in
keeping the executable code size of ceval as small as possible.
"""
[I think the circular reference is more like kw['x']['kw'] == kw. --GvR]
Added special case to unicode(): when being passed a
Unicode object as first argument, return the object as-is.
Raises an exception when given a Unicode object *and* an
encoding name.
comparing code objects. This give sless surprising results in
-Optimized code. It also sorts code objects by name, now.
[I changed the patch to hash() slightly to touch fewer lines.]
his copy of test_contains.py seems to be broken -- the lines he
deleted were already absent). Checkin messages:
New Unicode support for int(), float(), complex() and long().
- new APIs PyInt_FromUnicode() and PyLong_FromUnicode()
- added support for Unicode to PyFloat_FromString()
- new encoding API PyUnicode_EncodeDecimal() which converts
Unicode to a decimal char* string (used in the above new
APIs)
- shortcuts for calls like int(<int object>) and float(<float obj>)
- tests for all of the above
Unicode compares and contains checks:
- comparing Unicode and non-string types now works; TypeErrors
are masked, all other errors such as ValueError during
Unicode coercion are passed through (note that PyUnicode_Compare
does not implement the masking -- PyObject_Compare does this)
- contains now works for non-string types too; TypeErrors are
masked and 0 returned; all other errors are passed through
Better testing support for the standard codecs.
Misc minor enhancements, such as an alias dbcs for the mbcs codec.
Changes:
- PyLong_FromString() now applies the same error checks as
does PyInt_FromString(): trailing garbage is reported
as error and not longer silently ignored. The only characters
which may be trailing the digits are 'L' and 'l' -- these
are still silently ignored.
- string.ato?() now directly interface to int(), long() and
float(). The error strings are now a little different, but
the type still remains the same. These functions are now
ready to get declared obsolete ;-)
- PyNumber_Int() now also does a check for embedded NULL chars
in the input string; PyNumber_Long() already did this (and
still does)
Followed by:
Looks like I've gone a step too far there... (and test_contains.py
seem to have a bug too).
I've changed back to reporting all errors in PyUnicode_Contains()
and added a few more test cases to test_contains.py (plus corrected
the join() NameError).
If a non-tuple sequence is passed as the *arg, convert it to a tuple
before checking its length.
If named keyword arguments are used in combination with **kwargs, make
a copy of kwargs before inserting the new keys.
the return value of PySequence_Length(). If an exception occurred,
the returned length will be -1. Make sure this doesn't get obscurred,
and that the bogus length isn't used.
executive summary:
Instead of typing 'apply(f, args, kwargs)' you can type 'f(*arg, **kwargs)'.
Some file-by-file details follow.
Grammar/Grammar:
simplify varargslist, replacing '*' '*' with '**'
add * & ** options to arglist
Include/opcode.h & Lib/dis.py:
define three new opcodes
CALL_FUNCTION_VAR
CALL_FUNCTION_KW
CALL_FUNCTION_VAR_KW
Python/ceval.c:
extend TypeError "keyword parameter redefined" message to include
the name of the offending keyword
reindent CALL_FUNCTION using four spaces
add handling of sequences and dictionaries using extend calls
fix function import_from to use PyErr_Format
The attached patch set includes a workaround to get Python with
Unicode compile on BSDI 4.x (courtesy Thomas Wouters; the cause
is a bug in the BSDI wchar.h header file) and Python interfaces
for the MBCS codec donated by Mark Hammond.
Also included are some minor corrections w/r to the docs of
the new "es" and "es#" parser markers (use PyMem_Free() instead
of free(); thanks to Mark Hammond for finding these).
The unicodedata tests are now in a separate file
(test_unicodedata.py) to avoid problems if the module cannot
be found.
Attached you find the latest update of the Unicode implementation.
The patch is against the current CVS version.
It includes the fix I posted yesterday for the core dump problem
in codecs.c (was introduced by my previous patch set -- sorry),
adds more tests for the codecs and two new parser markers
"es" and "es#".
Andy Robinson noted a core dump in the codecs.c file. This
was introduced by my latest patch which fixed a memory leak
in codecs.c. The bug causes all successful codec lookups to fail.
Attached you find an update of the Unicode implementation.
The patch is against the current CVS version. I would appreciate
if someone with CVS checkin permissions could check the changes
in.
The patch contains all bugs and patches sent this week and also
fixes a leak in the codecs code and a bug in the free list code
for Unicode objects (which only shows up when compiling Python
with Py_DEBUG; thanks to MarkH for spotting this one).
Added wrapping macros to dictobject.c, listobject.c, tupleobject.c,
frameobject.c, traceback.c that safely prevends core dumps
on stack overflow. Macros and functions in object.c, object.h.
The method is an "elevator destructor" that turns cascading
deletes into tail recursive behavior when some limit is hit.
* Changes to a recent patch by Chris Tismer to errors.c. Chris' patch
always used FormatMessage() to get the error message passing the error code
from errno - but errno and FormatMessage use a different numbering scheme.
The main reason the patch looked OK was that ENOFILE==ERROR_FILE_NOT_FOUND -
but that is about the only shared error code :-). The MS CRT docs tell you
to use _sys_errlist()/_sys_nerr. My patch does also this, and adds a very
similar function specifically for win32 error codes.
PR#175 -- when exec is passed a code object, it didn't sync the locals
from the dictionary back into their fast representation.
Also took the time to remove some repetitive code there and to do the
syncing even when an exception is raised (since a partial effect
should still be synced).
* in import.c, #ifdef out references to dynamic loading based on
HAVE_DYNAMIC_LOADING
* clean out the platform-specific crud from importdl.c.
[ maybe fold this function into import.c and drop the importdl.c file? Greg.]
* change GetDynLoadFunc's "funcname" parameter to "shortname". change
"name" to "fqname" for clarification.
* each GetDynLoadFunc now creates its own funcname value.
WARNING: as I mentioned previously, we may run into an issue with a
missing "_" on some platforms. Testing will show this pretty quickly,
however.
* move pathname munging into dynload_shlib.c
Here's a patch that avoids a warning caused by the "const char* pathname"
declaration for _PyImport_GetDynLoadFunc (in dynload_aix). The "aix_load"
function's 1st arg is prototyped as "char *pathname".