This change implements a new bytecode compiler, based on a
transformation of the parse tree to an abstract syntax defined in
Parser/Python.asdl.
The compiler implementation is not complete, but it is in stable
enough shape to run the entire test suite excepting two disabled
tests.
SF patch #1015989
The basic idea of this patch is to compute lineno attributes for all AST nodes. The actual
implementation lead to a lot of restructing and code cleanup.
The generated AST nodes now have an optional lineno argument to constructor. Remove the
top-level asList(), since it didn't seem to serve any purpose. Add an __iter__ to ast nodes.
Use isinstance() instead of explicit type tests.
Change transformer to use the new lineno attribute, which replaces three lines of code with one.
Use universal newlines so that we can get rid of special-case code for line endings. Use
lookup_node() in a few more frequently called, but simple com_xxx methods(). Change string
exception to class exception.
[ 1009560 ] Fix @decorator evaluation order
From the description:
Changes in this patch:
- Change Grammar/Grammar to require
newlines between adjacent decorators.
- Fix order of evaluation of decorators
in the C (compile.c) and python
(Lib/compiler/pycodegen.py) compilers
- Add better order of evaluation check
to test_decorators.py (test_eval_order)
- Update the decorator documentation in
the reference manual (improve description
of evaluation order and update syntax
description)
and the comment:
Used Brett's evaluation order (see
http://mail.python.org/pipermail/python-dev/2004-August/047835.html)
(I'm checking this in for Anthony who was having problems getting SF to
talk to him)
(Code contributed by Jiwon Seo.)
The documentation portion of the patch is being re-worked and will be
checked-in soon. Likewise, PEP 289 will be updated to reflect Guido's
rationale for the design decisions on binding behavior (as described in
in his patch comments and in discussions on python-dev).
The test file, test_genexps.py, is written in doctest format and is
meant to exercise all aspects of the the patch. Further additions are
welcome from everyone. Please stress test this new feature as much as
possible before the alpha release.
Remove broken code in visitDict(). I assume the code was trying to
add set lineno events for each line of a dict constructor, but I think
it was using the wrong object (node instead of k or v).
A variety of changes from Michael Hudson to get the compiler working
with 2.3. The primary change is the handling of SET_LINENO:
# The set_lineno() function and the explicit emit() calls for
# SET_LINENO below are only used to generate the line number table.
# As of Python 2.3, the interpreter does not have a SET_LINENO
# instruction. pyassem treats SET_LINENO opcodes as a special case.
A few other small changes:
- Remove unused code from pycodegen and pyassem.
- Fix error handling in parsermodule. When PyParser_SimplerParseString()
fails, it sets an exception with detailed info. The parsermodule
was clobbering that exception and replacing it was a generic
"could not parse string" exception. Keep the original exception.
[#448679] Left to right
* Python/compile.c
(com_dictmaker): Reordered evaluation of dictionaries to follow strict
LTR evaluation.
* Lib/compiler/pycodegen.py
(CodeGenerator.visitDict): Reordered evaluation of dictionaries to
follow strict LTR evaluation.
* Doc/ref/ref5.tex
Documented the general LTR evaluation order idea.
* Misc/NEWS
Documented change in evaluation order of dictionaries.
SF bug #522264 reported by Evelyn Mitchell.
The code included a comment about "STAR STAR" which was translated
into the code as the bogus attribute token.STARSTAR. This name never
caused an attribute error because it was never retrieved. The code
was based on an old version of the grammar that specified kwargs as
two tokens ('*' '*'). I checked as far back as 2.1 and didn't find
this production.
The fix is simple, because token.DOUBLESTAR is the only token
allowed. Also update the grammar fragment in com_arglist().
XXX I'll bet lots of other grammar fragments in comments are out of
date, probably in this module and in compile.c.
Fix by Neil Schemenauer. Visit the Subscript node when trying to find
the operation for a statement.
XXX Not sure if there are other nodes that should be visited.
There are now no known cases where the compiler package computes a
stack depth lower than the one computed by the builtin compiler. (To
achieve this state, we had to fix bugs in both compilers :-).
The chief change is to do the depth calculations with respect to basic
blocks. The stack effect of block is calculated. Then the flow graph
is traversed using breadth-first search to find the max weight path
through the graph.
Had to fix the StackDepthTracker to calculate the right info for
several opcodes: LOAD_ATTR, CALL_FUNCTION (and friends), MAKE_CLOSURE,
and DUP_TOPX.
XXX Still need to handle free variables in MAKE_CLOSURE.
XXX There are still a lot of places where the computed stack depth is
larger than for the builtin compiler. These won't cause the
interpreter to overflow the frame, but they waste space.
compile() becomes replacement for builtin compile()
compileFile() generates a .pyc from a .py
both are exported in __init__
compiler.parse() gets optional second argument to specify compilation
mode, e.g. single, eval, exec
Add AbstractCompileMode as parent class and Module, Expression, and
Interactive as concrete subclasses. Each corresponds to a compilation
mode.
THe AbstractCompileMode instances in turn delegate to CodeGeneration
subclasses specialized for their particular functions --
ModuleCodeGenerator, ExpressionCodeGeneration,
InteractiveCodeGenerator.
Remove the only test in the syntax module. It ends up that the
transformer must handle this error case.
In the transformer, check for a list compression in com_assign_list()
by looking for a list_for node where a comma is expected.
In pycodegen.compile() re-raise the SyntaxError rather than catching
it and exiting
Invoke compiler.syntax.check() after building AST. If a SyntaxError
occurs, print the error and exit without generating a .pyc file.
Refactor code to use compiler.misc.set_filename() rather than passing
filename argument around to each CodeGenerator instance.
Remove the option to have nested scopes or old LGB scopes. This has a
large impact on the code base, by removing the need for two variants
of each CodeGenerator.
Add a get_module() method to CodeGenerator objects, used to get the
future features for the current module.
Set CO_GENERATOR, CO_GENERATOR_ALLOWED, and CO_FUTURE_DIVISION flags
as appropriate.
Attempt to fix the value of nlocals in newCodeObject(), assuming that
nlocals is 0 if CO_NEWLOCALS is not defined.
Fix list comp code generation -- emit GET_ITER instead of Const(0)
after the list.
Add CO_GENERATOR flag to generators.
Get CO_xxx flags from the new module
try/except or try/finally.
Previous versions had only track SETUP_LOOP blocks and ignored the
exception part. This meant that it allowed continue inside a
try/except but generated buggy code. Now it does the right thing.
As the doc string for _lookupName() explains:
This routine uses a list instead of a dictionary, because a
dictionary can't store two different keys if the keys have the
same value but different types, e.g. 2 and 2L. The compiler
must treat these two separately, so it does an explicit type
comparison before comparing the values.
Avoid if/elif/elif/else tests where the final else is supposed to
handle exactly one case instead of all other cases. When the list of
operators is extended, the catchall else treats all new operators as
the last operator in the set of tests. Instead, raise an exception if
an unexpected operator occurs.
Use a dictionary instead of a list to map objects to their offsets in
a const/name tuple of a code object.
XXX The conversion is perhaps incomplete, in that we shouldn't have to
do the list2dict to start.
Add support for floor division (// and //=)
The implementation of getChildren() and getChildNodes() is intended to
be faster, because it avoids calling flatten() on every return value.
But it's not clear that it is a lot faster, because constructing a
tuple with just the right values ends up being slow. (Too many
attribute lookups probably.)
The ast.txt file is much more complicated, with funny characters at
the ends of names (*, &, !) to indicate the types of each child node.
The astgen script is also much more complex, making me wonder if it's
still useful.
varnames should list all the local variables (with arguments first).
The XXX_NAME ops typically occur at the module level and assignment
ops should create locals.
(Hard to believe these were never handled before)
Add misc.mangle() that mangles based on the rules in compile.c.
XXX Need to test the corner cases
Update CodeGenerator with a class_name attribute bound to None. If a
particular instance is created within a class scope, the instance's
class_name is bound to that class's name.
Add mangle() method to CodeGenerator that mangles if the class_name
has a class_name in it.
Modify the FunctionCodeGenerator family to handle an extra argument--
the class_name.
Wrap all name ops and attrnames in calls to self.mangle()
Make nested scopes enabled by default
Add is_constant_false() helper so that compiled code and symbols are
consistent with builtin compiler's handling of "if 0:"
Fix doc string handling to be consistent with recent change that
eliminates the doc string from the Module's node attribute.
Add fix to print handling from Evan & Shane.
Track change to visitor api by making "verbose" explicit.
Comment out setting CO_NESTED flag (it's unnecessary in 2.2).
Evan Simpson's fix. And his explanation:
If you defined two nested functions in a row that refer to the
same non-global variable, the second one will be generated as
though the variable were global.
The use of com_node() introduces a lot of extra stack frames, enough
to cause a stack overflow compiling test.test_parser with the standard
interpreter recursionlimit. The com_node() is a convenience function
that hides the dispatch details, but comes at a very high cost. It is
more efficient to dispatch directly in the callers. In these cases,
use lookup_node() and call the dispatched node directly.
Also handle yield_stmt in a way that will work with Python 2.1
(suggested by Shane Hathaway)
Remove _preorder as alias for dispatch and call dispatch directly.
Add an extra optional argument to walk()
XXX Also comment out some code that does debugging prints.
Armin Rigo pointed out that the way the line-# table got built didn't work
for lines generating more than 255 bytes of bytecode. Fixed as he
suggested, plus corresponding changes to pyassem.py, plus added some
long overdue docs about this subtle table to compile.c.
Bugfix candidate (line numbers may be off in tracebacks under -O).
Always emit a SET_LINENO 0 at the beginning of the module. The
builtin compiler does this, and it's much easier to compare bytecode
generated by the two compilers if they both do.
Move the SET_LINENO inside the FOR_LOOP block for list
comprehensions. Also for compat. with builtin compiler.
Fix annoying bugs in flow graph layout code. In some cases the
implicit control transfers weren't honored. In other cases,
JUMP_FORWARD instructions jumped backwards.
Remove unused arg from nextBlock().
pycodegen.py
Add optional force kwarg to set_lineno() that will emit a
SET_LINENO even if it is the same as the previous lineno.
Use explicit LOAD_FAST and STORE_FAST to access list comp implicit
variables. (The symbol table doesn't know about them.)