In C++, it's an error to pass a string literal to a char* function
without a const_cast(). Rather than require every C++ extension
module to put a cast around string literals, fix the API to state the
const-ness.
I focused on parts of the API where people usually pass literals:
PyArg_ParseTuple() and friends, Py_BuildValue(), PyMethodDef, the type
slots, etc. Predictably, there were a large set of functions that
needed to be fixed as a result of these changes. The most pervasive
change was to make the keyword args list passed to
PyArg_ParseTupleAndKewords() to be a const char *kwlist[].
One cast was required as a result of the changes: A type object
mallocs the memory for its tp_doc slot and later frees it.
PyTypeObject says that tp_doc is const char *; but if the type was
created by type_new(), we know it is safe to cast to char *.
This change implements a new bytecode compiler, based on a
transformation of the parse tree to an abstract syntax defined in
Parser/Python.asdl.
The compiler implementation is not complete, but it is in stable
enough shape to run the entire test suite excepting two disabled
tests.
Fix over-aggressive PyErr_Clear(). The same code fragment appears in
various guises in list.extend(), map(), filter(), zip(), and internally
in PySequence_Tuple().
interning were not clear here -- a subclass could be mutable, for
example -- and had bugs. Explicitly interning a subclass of string
via intern() will raise a TypeError. Internal operations that attempt
to intern a string subclass will have no effect.
Added a few tests to test_builtin that includes the old buggy code and
verifies that calls like PyObject_SetAttr() don't fail. Perhaps these
tests should have gone in test_string.
* Fixes an incorrect variable in a PyDict_CheckExact.
* Allow general mapping locals arguments for the execfile() function
and exec statement.
* Add tests.
__oct__, and __hex__. Raise TypeError if an invalid type is
returned. Note that PyNumber_Int and PyNumber_Long can still
return ints or longs. Fixes SF bug #966618.
[ 960406 ] unblock signals in threads
although the changes do not correspond exactly to any patch attached to
that report.
Non-main threads no longer have all signals masked.
A different interface to readline is used.
The handling of signals inside calls to PyOS_Readline is now rather
different.
These changes are all a bit scary! Review and cross-platform testing
much appreciated.
The builtin eval() function now accepts any mapping for the locals argument.
Time sensitive steps guarded by PyDict_CheckExact() to keep from slowing
down the normal case. My timings so no measurable impact.
* Install the unittests, docs, newsitem, include file, and makefile update.
* Exercise the new functions whereever sets.py was being used.
Includes the docs for libfuncs.tex. Separate docs for the types are
forthcoming.
the code erroneously decrefed the istep argument in an error case. This
caused a co_consts tuple to lose a float constant prematurely, which
eventually caused gc to try executing static data in floatobject.c (don't
ask <wink>). So reworked this extensively to ensure refcount correctness.
- range() now works even if the arguments are longs with magnitude
larger than sys.maxint, as long as the total length of the sequence
fits. E.g., range(2**100, 2**101, 2**100) is the following list:
[1267650600228229401496703205376L]. (SF patch #707427.)
Arranged that all the objects exposed by __builtin__ appear in the list
of all objects. I basically peed away two days tracking down a mystery
leak in sys.gettotalrefcount() in a ZODB app (== tons of code), because
the object leaking the references didn't appear in the sys.getobjects(0)
list. The object happened to be False. Now False is in the list, along
with other popular & previously missing leak candidates (like None).
Alas, we still don't have a choke point covering *all* Python objects,
so the list of all objects may still be incomplete.
with an indented code block but no newline would raise SyntaxError.
This would have been a four-line change in parsetok.c... Except
codeop.py depends on this behavior, so a compilation flag had to be
invented that causes the tokenizer to revert to the old behavior;
this required extra changes to 2 .h files, 2 .c files, and 2 .py
files. (Fixes SF bug #501622.)
object is not a real str or unicode but an instance
of a subclass, construct the output via looping
over __getitem__. This guarantees that the result
is the same for function==None and function==lambda x:x
This doesn't happen for tuples, because filtertuple()
uses PyTuple_GetItem().
(This was discussed on SF bug #665835).
blindly assumed that tp_as_sequence->sq_item always returns
a str or unicode object. This might fail with str or unicode
subclasses.
This patch checks whether the object returned from __getitem__
is a str/unicode object and raises a TypeError if not (and
the filter function returned true).
Furthermore the result for __getitem__ can be more than one
character long, so checks for enough memory have to be done.
Obtain cleaner coding and a system wide
performance boost by using the fast, pre-parsed
PyArg_Unpack function instead of PyArg_ParseTuple
function which is driven by a format string.
supported as the second argument. This has the same meaning as
for isinstance(), i.e. issubclass(X, (A, B)) is equivalent
to issubclass(X, A) or issubclass(X, B). Compared to isinstance(),
this patch does not search the tuple recursively for classes, i.e.
any entry in the tuple that is not a class, will result in a
TypeError.
This closes SF patch #649608.
- Use PyObject_Call() instead of PyEval_CallObject(), saves several
layers of calls and checks.
- Pre-allocate the argument tuple rather than calling Py_BuildValue()
each time round the loop.
- For filter(None, seq), avoid an INCREF and a DECREF.
These built-in functions are replaced by their (now callable) type:
slice()
buffer()
and these types can also be called (but have no built-in named
function named after them)
classobj (type name used to be "class")
code
function
instance
instancemethod (type name used to be "instance method")
The module "new" has been replaced with a small backward compatibility
placeholder in Python.
A large portion of the patch simply removes the new module from
various platform-specific build recipes. The following binary Mac
project files still have references to it:
Mac/Build/PythonCore.mcp
Mac/Build/PythonStandSmall.mcp
Mac/Build/PythonStandalone.mcp
[I've tweaked the code layout and the doc strings here and there, and
added a comment to types.py about StringTypes vs. basestring. --Guido]
for 'str' and 'unicode', and can be used instead of
types.StringTypes, e.g. to test whether something is "a string":
isinstance(x, string) is True for Unicode and 8-bit strings. This
is an abstract base class and cannot be instantiated directly.
don't understand how this function works, also beefed up the docs. The
most common usage error is of this form (often spread out across gotos):
if (_PyString_Resize(&s, n) < 0) {
Py_DECREF(s);
s = NULL;
goto outtahere;
}
The error is that if _PyString_Resize runs out of memory, it automatically
decrefs the input string object s (which also deallocates it, since its
refcount must be 1 upon entry), and sets s to NULL. So if the "if"
branch ever triggers, it's an error to call Py_DECREF(s): s is already
NULL! A correct way to write the above is the simpler (and intended)
if (_PyString_Resize(&s, n) < 0)
goto outtahere;
Bugfix candidate.
Highlights: import and friends will understand any of \r, \n and \r\n
as end of line. Python file input will do the same if you use mode 'U'.
Everything can be disabled by configuring with --without-universal-newlines.
See PEP278 for details.
PEP 285. Everything described in the PEP is here, and there is even
some documentation. I had to fix 12 unit tests; all but one of these
were printing Boolean outcomes that changed from 0/1 to False/True.
(The exception is test_unicode.py, which did a type(x) == type(y)
style comparison. I could've fixed that with a single line using
issubtype(x, type(y)), but instead chose to be explicit about those
places where a bool is expected.
Still to do: perhaps more documentation; change standard library
modules to return False/True from predicates.
Based on the patch from Danny Yoo. The fix is in exec_statement() in
ceval.c.
There are also changes to introduce use of PyCode_GetNumFree() in
several places.
of PyMapping_Keys because we know we have a real dict. Tolerate that
objects may have an attr named "__dict__" that's not a dict (Py_None
popped up during testing).
test_descr.py, test_dir(): Test the new classic-class behavior; beef up
the new-style class test similarly.
test_pyclbr.py, checkModule(): dir(C) is no longer a synonym for
C.__dict__.keys() when C is a classic class (looks like the same thing
that burned distutils! -- should it be *made* a synoym again? Then it
would be inconsistent with new-style class behavior.).
bag. It's clearly wrong for classic classes, at heart because a classic
class doesn't have a __class__ attribute, and I'm unclear on whether
that's feature or bug. I'll repair this once I find out (in the
meantime, dir() applied to classic classes won't find the base classes,
while dir() applied to a classic-class instance *will* find the base
classes but not *their* base classes).
Please give the new dir() a try and see whether you love it or hate it.
The new dir([]) behavior is something I could come to love. Here's
something to hate:
>>> class C:
... pass
...
>>> c = C()
>>> dir(c)
['__doc__', '__module__']
>>>
The idea that an instance has a __doc__ attribute is jarring (of course
it's really c.__class__.__doc__ == C.__doc__; likewise for __module__).
OTOH, the code already has too many special cases, and dir(x) doesn't
have a compelling or clear purpose when x isn't a module.
builtin_eval wasn't merging in the compiler flags from the current frame;
I suppose we never noticed this before because future division is the
first future-feature that can affect expressions (nested_scopes and
generators had only statement-level effects).
- Do not compile unicodeobject, unicodectype, and unicodedata if Unicode is disabled
- check for Py_USING_UNICODE in all places that use Unicode functions
- disables unicode literals, and the builtin functions
- add the types.StringTypes list
- remove Unicode literals from most tests.
Fix suggested by Michael Hudson: Raise TypeError if attribute name
passed to getattr() is not a string or Unicode. There is some
unfortunate duplication of code between builtin_getattr() and
PyObject_GetAttr(), but it appears to be unavoidable.
that info to code dynamically compiled *by* code compiled with generators
enabled. Doesn't yet work because there's still no way to tell the parser
that "yield" is OK (unlike nested_scopes, the parser has its fingers in
this too).
Replaced PyEval_GetNestedScopes by a more-general
PyEval_MergeCompilerFlags. Perhaps I should not have? I doubted it was
*intended* to be part of the public API, so just did.
- the correct range for the error message is range(0x110000);
- put the 4-byte Unicode-size code inside the same else branch as the
2-byte code, rather generating unreachable code in the 2-byte case.
- Don't hide the 'else' behine the '}'.
(I would prefer that in 4-byte mode, any value should be accepted, but
reasonable people can argue about that, so I'll put that off.)
Add configure option --enable-unicode.
Add config.h macros Py_USING_UNICODE, PY_UNICODE_TYPE, Py_UNICODE_SIZE,
SIZEOF_WCHAR_T.
Define Py_UCS2.
Encode and decode large UTF-8 characters into single Py_UNICODE values
for wide Unicode types; likewise for UTF-16.
Remove test whether sizeof Py_UNICODE is two.
NEEDS DOC CHANGES.
More AttributeErrors transmuted into TypeErrors, in test_b2.py, and,
again, this strikes me as a good thing.
This checkin completes the iterator generalization work that obviously
needed to be done. Can anyone think of others that should be changed?
NEEDS DOC CHANGES.
Possibly contentious: The first time s.next() yields StopIteration (for
a given map argument s) is the last time map() *tries* s.next(). That
is, if other sequence args are longer, s will never again contribute
anything but None values to the result, even if trying s.next() again
could yield another result. This is the same behavior map() used to have
wrt IndexError, so it's the only way to be wholly backward-compatible.
I'm not a fan of letting StopIteration mean "try again later" anyway.
Also a 2.1 bugfix candidate (am I supposed to do something with those?).
Took away map()'s insistence that sequences support __len__, and cleaned
up the convoluted code that made it *look* like it really cared about
__len__ (in fact the old ->len field was only *used* as a flag bit, as
the main loop only looked at its sign bit, setting the field to -1 when
IndexError got raised; renamed the field to ->saw_IndexError instead).
new slot tp_iter in type object, plus new flag Py_TPFLAGS_HAVE_ITER
new C API PyObject_GetIter(), calls tp_iter
new builtin iter(), with two forms: iter(obj), and iter(function, sentinel)
new internal object types iterobject and calliterobject
new exception StopIteration
new opcodes for "for" loops, GET_ITER and FOR_ITER (also supported by dis.py)
new magic number for .pyc files
new special method for instances: __iter__() returns an iterator
iteration over dictionaries: "for x in dict" iterates over the keys
iteration over files: "for x in file" iterates over lines
TODO:
documentation
test suite
decide whether to use a different way to spell iter(function, sentinal)
decide whether "for key in dict" is a good idea
use iterators in map/filter/reduce, min/max, and elsewhere (in/not in?)
speed tuning (make next() a slot tp_next???)
Jeffery Collins pointed out that filterstring decrefs a character object
before it's done using it. This works by accident today because another
module always happens to have an active reference too at the time. The
accident doesn't work after his Pippy modifications, and since it *is*
an accident even in the mainline Python, it should work by design there too.
The patch accomplishes that.
If a module has a future statement enabling nested scopes, they are
also enable for the exec statement and the functions compile() and
execfile() if they occur in the module.
If Python is run with the -i option, which enters interactive mode
after executing a script, and the script it runs enables nested
scopes, they are also enabled in interactive mode.
XXX The use of -i with -c "from __future__ import nested_scopes" is
not supported. What's the point?
To support these changes, many function variants have been added to
pythonrun.c. All the variants names end with Flags and they take an
extra PyCompilerFlags * argument. It is possible that this complexity
will be eliminated in a future version of the interpreter in which
nested scopes are not optional.
except that it always returns Unicode objects.
A new C API PyObject_Unicode() is also provided.
This closes patch #101664.
Written by Marc-Andre Lemburg. Copyright assigned to Guido van Rossum.
- Make error messages from issubclass() and isinstance() a bit more
descriptive (Ping, modified by Guido)
- Couple of tiny fixes to other docstrings (Ping)
- Get rid of trailing whitespace (Guido)
Add definitions of INT_MAX and LONG_MAX to pyport.h.
Remove includes of limits.h and conditional definitions of INT_MAX
and LONG_MAX elsewhere.
This closes SourceForge patch #101659 and bug #115323.
which implements the automatic conversion from Unicode to a string
object using the default encoding.
The new API is then put to use to have eval() and exec accept
Unicode objects as code parameter. This closes bugs #110924
and #113890.
As side-effect, the traditional C APIs PyString_Size() and
PyString_AsString() will also accept Unicode objects as
parameters.
PyRun_FileEx(). These are the same as their non-Ex counterparts but
have an extra argument, a flag telling them to close the file when
done.
Then this is used by Py_Main() and execfile() to close the file after
it is parsed but before it is executed.
Adding APIs seems strange given the feature freeze but it's the only
way I see to close the bug report without incompatible changes.
[ Bug #110616 ] source file stays open after parsing is done (PR#209)
scope. Previously, s_buffer[] was defined inside the
PyUnicode_Check() scope, but referred to in the outer scope via
assignment to s. This quiets an Insure portability warning.
accepted by the BDFL.
builtin_zip(): New function to implement the zip() function described
in the above proposal.
zip_doc[]: Docstring for zip().
builtin_methods[]: added entry for zip()
This adds support for instance to the constructor (instances
have to define __str__ and can return Unicode objects via that
hook; string return values are decoded into Unicode using the
current default encoding).
Various small fixes to the builtin module to ensure no buffer
overflows.
- chunk #1:
Proper casting to ensure no truncation, and hence no surprises, in the
comparison.
- chunk #2:
The id() function guarantees a unique return value for different
objects. It does this by returning the pointer to the object. By
returning a PyInt, on Win64 (sizeof(long) < sizeof(void*)) the pointer
is truncated and the guarantee may be proven false. The appropriate
return function is PyLong_FromVoidPtr, this returns a PyLong if that
is necessary to return the pointer without truncation.
[GvR: note that this means that id() can now return a long on Win32
platforms. This *might* break some code...]
- chunk #3:
Ensure no overflow in raw_input(). Granted the user would have to pass
in >2GB of data but it *is* a possible buffer overflow condition.
module and into _exceptions.c. This includes all the PyExc_* globals,
the bltin_exc table, init_class_exc(), fini_instances(),
finierrors().
Renamed _PyBuiltin_Init_1() to _PyBuiltin_Init() since the two phase
initializations are necessary any more.
Removed as obsolete _PyBuiltin_Init_2(), _PyBuiltin_Fini_1() and
_PyBuiltin_Fini_2().
For more comments, read the patches@python.org archives.
For documentation read the comments in mymalloc.h and objimpl.h.
(This is not exactly what Vladimir posted to the patches list; I've
made a few changes, and Vladimir sent me a fix in private email for a
problem that only occurs in debug mode. I'm also holding back on his
change to main.c, which seems unnecessary to me.)
- When 'import exceptions' fails, don't suggest to use -v to print the traceback;
this doesn't actually work.
- Remove comment about fallback to string exceptions.
- Remove a PyErr_Occurred() check after all is said and done that can
never trigger.
- Remove static function newstdexception() which is no longer called.
are no longer supported (i.e. -X option is removed).
_PyBuiltin_Init_1(): Don't call initerrors(). This does mean that it
is possible to raise an ImportError before that exception has been
initialized, say because exceptions.py can't be found, or contains
bogosity. See changes to errors.c for how this is handled.
_PyBuiltin_Init_2(): Don't test Py_UseClassExceptionsFlag, just go
ahead and initialize the class-based standard exceptions. If this
fails, we throw a Py_FatalError.
Added special case to unicode(): when being passed a
Unicode object as first argument, return the object as-is.
Raises an exception when given a Unicode object *and* an
encoding name.
his copy of test_contains.py seems to be broken -- the lines he
deleted were already absent). Checkin messages:
New Unicode support for int(), float(), complex() and long().
- new APIs PyInt_FromUnicode() and PyLong_FromUnicode()
- added support for Unicode to PyFloat_FromString()
- new encoding API PyUnicode_EncodeDecimal() which converts
Unicode to a decimal char* string (used in the above new
APIs)
- shortcuts for calls like int(<int object>) and float(<float obj>)
- tests for all of the above
Unicode compares and contains checks:
- comparing Unicode and non-string types now works; TypeErrors
are masked, all other errors such as ValueError during
Unicode coercion are passed through (note that PyUnicode_Compare
does not implement the masking -- PyObject_Compare does this)
- contains now works for non-string types too; TypeErrors are
masked and 0 returned; all other errors are passed through
Better testing support for the standard codecs.
Misc minor enhancements, such as an alias dbcs for the mbcs codec.
Changes:
- PyLong_FromString() now applies the same error checks as
does PyInt_FromString(): trailing garbage is reported
as error and not longer silently ignored. The only characters
which may be trailing the digits are 'L' and 'l' -- these
are still silently ignored.
- string.ato?() now directly interface to int(), long() and
float(). The error strings are now a little different, but
the type still remains the same. These functions are now
ready to get declared obsolete ;-)
- PyNumber_Int() now also does a check for embedded NULL chars
in the input string; PyNumber_Long() already did this (and
still does)
Followed by:
Looks like I've gone a step too far there... (and test_contains.py
seem to have a bug too).
I've changed back to reporting all errors in PyUnicode_Contains()
and added a few more test cases to test_contains.py (plus corrected
the join() NameError).
Introduce a new builtin exception, UnboundLocalError, raised when ceval.c
tries to retrieve or delete a local name that isn't bound to a value.
Currently raises NameError, which makes this behavior a FAQ since the same
error is raised for "missing" global names too: when the user has a global
of the same name as the unbound local, NameError makes no sense to them.
Even in the absence of shadowing, knowing whether a bogus name is local or
global is a real aid to quick understanding.
Example:
D:\src\PCbuild>type local.py
x = 42
def f():
print x
x = 13
return x
f()
D:\src\PCbuild>python local.py
Traceback (innermost last):
File "local.py", line 8, in ?
f()
File "local.py", line 4, in f
print x
UnboundLocalError: x
D:\src\PCbuild>
Note that UnboundLocalError is a subclass of NameError, for compatibility
with existing class-exception code that may be trying to catch this as a
NameError. Unfortunately, I see no way to make this wholly compatible
with -X (see comments in bltinmodule.c): under -X, [UnboundLocalError
is an alias for NameError --GvR].
[The ceval.c patch differs slightly from the second version that Tim
submitted; I decided not to raise UnboundLocalError for DELETE_NAME,
only for DELETE_LOCAL. DELETE_NAME is only generated at the module
level, and since at that level a NameError is raised for referencing
an undefined name, it should also be raised for deleting one.]
ExtensionClasses in isinstance() and issubclass().
- abstract instance and class protocols are used *only* in those
cases that would generate errors before the patch. That is, there's
no penalty for the normal case.
- instance protocol: an object smells like an instance if it
has a __class__ attribute that smells like a class.
- class protocol: an object smells like a class if it has a
__bases__ attribute that is a tuple with elements that
smell like classes (although not all elements may actually get
sniffed ;).
xrange(), especially for platforms where int and long are different
sizes (so sys.maxint isn't actually the theoretical limit for the
length of a list, but the largest C int is -- sys.maxint is the
largest Python int, which is actually a C long).