As the doc string for _lookupName() explains:
This routine uses a list instead of a dictionary, because a
dictionary can't store two different keys if the keys have the
same value but different types, e.g. 2 and 2L. The compiler
must treat these two separately, so it does an explicit type
comparison before comparing the values.
Avoid if/elif/elif/else tests where the final else is supposed to
handle exactly one case instead of all other cases. When the list of
operators is extended, the catchall else treats all new operators as
the last operator in the set of tests. Instead, raise an exception if
an unexpected operator occurs.
Use a dictionary instead of a list to map objects to their offsets in
a const/name tuple of a code object.
XXX The conversion is perhaps incomplete, in that we shouldn't have to
do the list2dict to start.
Add support for floor division (// and //=)
The implementation of getChildren() and getChildNodes() is intended to
be faster, because it avoids calling flatten() on every return value.
But it's not clear that it is a lot faster, because constructing a
tuple with just the right values ends up being slow. (Too many
attribute lookups probably.)
The ast.txt file is much more complicated, with funny characters at
the ends of names (*, &, !) to indicate the types of each child node.
The astgen script is also much more complex, making me wonder if it's
still useful.
varnames should list all the local variables (with arguments first).
The XXX_NAME ops typically occur at the module level and assignment
ops should create locals.
(Hard to believe these were never handled before)
Add misc.mangle() that mangles based on the rules in compile.c.
XXX Need to test the corner cases
Update CodeGenerator with a class_name attribute bound to None. If a
particular instance is created within a class scope, the instance's
class_name is bound to that class's name.
Add mangle() method to CodeGenerator that mangles if the class_name
has a class_name in it.
Modify the FunctionCodeGenerator family to handle an extra argument--
the class_name.
Wrap all name ops and attrnames in calls to self.mangle()
Make nested scopes enabled by default
Add is_constant_false() helper so that compiled code and symbols are
consistent with builtin compiler's handling of "if 0:"
Fix doc string handling to be consistent with recent change that
eliminates the doc string from the Module's node attribute.
Add fix to print handling from Evan & Shane.
Track change to visitor api by making "verbose" explicit.
Comment out setting CO_NESTED flag (it's unnecessary in 2.2).
Evan Simpson's fix. And his explanation:
If you defined two nested functions in a row that refer to the
same non-global variable, the second one will be generated as
though the variable were global.
The use of com_node() introduces a lot of extra stack frames, enough
to cause a stack overflow compiling test.test_parser with the standard
interpreter recursionlimit. The com_node() is a convenience function
that hides the dispatch details, but comes at a very high cost. It is
more efficient to dispatch directly in the callers. In these cases,
use lookup_node() and call the dispatched node directly.
Also handle yield_stmt in a way that will work with Python 2.1
(suggested by Shane Hathaway)
Remove _preorder as alias for dispatch and call dispatch directly.
Add an extra optional argument to walk()
XXX Also comment out some code that does debugging prints.
Armin Rigo pointed out that the way the line-# table got built didn't work
for lines generating more than 255 bytes of bytecode. Fixed as he
suggested, plus corresponding changes to pyassem.py, plus added some
long overdue docs about this subtle table to compile.c.
Bugfix candidate (line numbers may be off in tracebacks under -O).
Always emit a SET_LINENO 0 at the beginning of the module. The
builtin compiler does this, and it's much easier to compare bytecode
generated by the two compilers if they both do.
Move the SET_LINENO inside the FOR_LOOP block for list
comprehensions. Also for compat. with builtin compiler.
Fix annoying bugs in flow graph layout code. In some cases the
implicit control transfers weren't honored. In other cases,
JUMP_FORWARD instructions jumped backwards.
Remove unused arg from nextBlock().
pycodegen.py
Add optional force kwarg to set_lineno() that will emit a
SET_LINENO even if it is the same as the previous lineno.
Use explicit LOAD_FAST and STORE_FAST to access list comp implicit
variables. (The symbol table doesn't know about them.)
Add mangling support
Add get_children() and add_child() methods to Scope
Skip nodes when If test is a false constant
Add test code that checks results against symtable module
Fix com_NEWLINE() so that is accepts arguments, which occurs for lines like:
stmt; # note trailing semicolon
Add XXX about checking for assignment to list comps
embedded code objects (e.g. functions) rather than the generated code
object. This change means that the compiler generates code for
everything at the end, rather then generating code for each function
as it finds it. Implementation note: _convert_LOAD_CONST in
pyassem.py must be change to call getCode().
Other changes follow. Several changes creates extra edges between
basic blocks to reflect control flow for loops and exceptions. These
missing edges had gone unnoticed because they do not affect the
current compilation process.
pyassem.py:
Add _enable_debug() and _disable_debug() methods that print
instructions and blocks to stdout as they are generated.
Add edges between blocks for instructions like SETUP_LOOP,
FOR_LOOP, etc.
Add pruneNext to get rid of bogus edges remaining after
unconditional transfer ops (e.g. JUMP_FORWARD)
Change repr of Block to omit block length.
pycodegen.py:
Make sure a new block is started after FOR_LOOP, etc.
Change assert implementation to use RAISE_VARARGS 1 when there is
no user-specified failure output.
misc.py:
Implement __contains__ and copy for Set.
Reformatting -- long lines, "[ ]" -> "[]", a few indentation nits.
Replace calls to Node function (which constructed ast nodes) with
calls to actual constructors imported from ast module.
Optimize com_node (most frequently used method) for the common case --
the appropriate method is found in _dispatch.
Fix com_augassign to use class object's rather than node names
(rendered invalid by recent changes to ast)
Remove expensive tests for sequence-ness in com_stmt and
com_append_stmt. These tests should never fail; if they do, something
is really broken and exception will be raised elsewhere.
Fix com_stmt and com_append_stmt to use isinstance rather than
testing's type slot of ast node (this slot disappeared with recent
changes to ast).
1.5.2. The compiler generates code for the version of the interpreter
it is run under.
ast.py:
Print and Printnl add dest attr for extended print
new node AugAssign for augmented assignments
new nodes ListComp, ListCompFor, and ListCompIf for list
comprehensions
pyassem.py:
add work around for string-Unicode comparison raising UnicodeError
on comparison of two objects in code object's const table
pycodegen.py:
define VERSION, the Python major version number
get magic number using imp.get_magic() instead of hard coding
implement list comprehensions, extended print, and augmented
assignment; augmented assignment uses Delegator classes (see
doc string)
fix import and tuple unpacking for 1.5.2
transformer.py:
various changes to support new 2.0 grammar and old 1.5 grammar
add debug_tree helper than converts and symbol and token numbers
to their names
Fix import support to work with import as variant of Python 2.0. The
grammar for import changed, requiring changes in transformer and code
generator, even to handle compilation of imports with as.
- fix tab space issues (SF patch #101167 by Neil Schemenauer)
- fix co_flags for classes to include CO_NEWLOCALS (SF patch #101145 by Neil)
- fix for merger of UNPACK_LIST and UNPACK_TUPLE into UNPACK_SEQUENCE,
(SF patch #101168 by, well, Neil :)
- Adjust bytecode MAGIC to current bytecode.
TODO: teach compile.py about list comprehensions.
Attached is a set of diffs for the .py compiler that adds support
for the new extended call syntax.
compiler/ast.py:
CallFunc node gets 2 new children to support extended call syntax -
"star_args" (for "*args") and "dstar_args" (for "**args")
compiler/pyassem.py
It appear that self.lnotab is supposed to be responsible for
tracking line numbers, but self.firstlineno was still hanging
around. Removed self.firstlineno completely. NOTE - I didnt
actually test that the generated code has the correct line numbers!!
Stack depth tracking appeared a little broken - the checks never
made it beyond the "self.patterns" check - thus, the custom methods
were never called! Fixed this.
(XXX Jeremy notes: I think this code is still broken because it
doesn't track stack effects across block bounaries.)
Added support for the new extended call syntax opcodes for depth
calculations.
compiler/pycodegen.py
Added support for the new extended call syntax opcodes.
compiler/transformer.py
Added support for the new extended call syntax.
code generator uses flowgraph as intermediate representation. the old
rep uses a list with explicit "StackRefs" to indicate the target
of jumps.
pyassem converts flowgraph to bytecode, breaks up individual steps of
generating bytecode
fix imports
remove parse functions and visitor code
track name change: Classdef to Class
add some comments and tweak order of visitXXX methods
get rid of if __name__ == "__main__ section
- removed now (happily) unused second arg
- need to verify results of [].index are correct; for building consts,
need to have same value and same type, e.g. 2 not the same as 2L
(big surprise). new solution is a little less hackish.
Code gen adds a TupleArg instance in the argument slot. The tuple arg
includes a copy of the names that it is responsble for binding. The
PyAssembler uses this information to calculate the correct argcount.
all fix this wacky case: del (a, ((b,), c)), d
which is the same as: del a, b, c, d
(Can't wait for Guido to tell me why.)
solution uses findOp which walks a tree to find out whether it
contains OP_ASSIGN or OP_DELETE or ...
- added a number of support methods to generate code just before the
body
- hack protocol for communicating number of args to PyAssembler
fix TryExcept generation for case where exception handler has no body
fix visitAssAttr
add comment about incomplete visitAssName
stop using the ExampleASTVisitor
change script invocation to accept a list of .py files (e.g. Lib/*.py)
named OPTIMIZED, which matches compile.c and makes more sense for
classes
revamp call signature for PythonVMCode.__init__; doesn't really matter
'cuz this code is going to be refactored out of existence
add generateClassCode and modify Func & Lambda generation
add support for nodes Classdef, Keyword,
fix CallFunc to generate right op arg when calling w/ keywords
add ugly hack to properly compute offsets when the same stack ref is
used multiple times
change resolution of local name ops (LOAD_FAST). i think it makes
sense now. if it is an argument or a local var name that it used, it
must be in varnames. if it is a local var name that is used, it must
also be in names
FUNCTION_NAMESPACE. initialize in __init__ and reset in
generateFunctionCode.
replace direct issue of STORE_FAST, STORE_GLOBAL, etc. with call to
storeName; same for loadName and deleteName
the new {store,load,delete}Name methods use the namespace attr and the
local variable stack to determine the correct bytecode to issue
* prints out examples of nodes that are handled by visitor. simply a
development convenience
remove NestedCodeGenerator -- it was bogus after all
replace with generateFunctionCode, a method to call to generate code
for a function instead of a top-level module
fix impl of visitDiscard (most pop stack)
emit lineno for pass
handle the following new node types: Import, From, Getattr, Subscript,
Slice, AssAttr, AssTuple, Mod, Not, And, Or, List
LocalNameFinder: remove names declared as globals for locals
PythonVMCode: pass arg names to constructor, force varnames to contain
them all (even if they aren't referenced)
add -q option on command line to disable stdout
VERBOSE setting for the ASTVisitor
add getopt handling for one or more -v args
rename ForwardRef to StackRef, because it isn't necessarily directional
CodeGenerator:
* add assertStackEmpty method. prints warning if stack is not empty
when it should be
* define methods for AssName, UNARY_*, For
PythonVMCode:
* fix mix up between hasjrel and hasjabs for address calculation
language.
CodeGenerator:
* modify to track stack depth
* add emit method that call's PythonVMCode's makeCodeObject
* thread filenames through in hackish way
* set flags for code objects for modules and functions
XXX the docs for the flags seem out of date and/or incomplete
PythonVMCode:
* add doc string describing the elements of a real code object
LineAddrTable:
* creates an lnotab (no quite correctly though)
handle most of the language syntax yet)
create NestedCodeGenerator used to generator the separate code object
that needs to be passed as an argument to MAKE_FUNCTION when a def
stmt is found (probably useful for class too)
change CodeGenerator.visitFunction to use the NestedCG
add CompiledModule class to handle creation of .pyc (pretty minimal
for now)
add makeCodeObject method to PythonVMCode that replaces symbolic names
with indexes into slots of the code code. the design of this
class will probably need to be revised.
compile.py: ASTVisitor framework plus bits of a code generator that
should be bug-for-buf compatible with compile.c
misc.py: Set and Stack helpers
test.py: a bit of simple sample code that compile.py will work on