mirror of https://github.com/python/cpython
gh-119786: copy compiler doc from devguide to InternalDocs and convert to markdown (#120134)
* gh-119876: move compiler doc from devguide to InternalDocs
Copy of 78fc0d7aa9
Co-Authored-By: Adam Turner <9087854+AA-Turner@users.noreply.github.com>
Co-Authored-By: Adam Turner <9087854+aa-turner@users.noreply.github.com>
Co-Authored-By: Brett Cannon <brett@python.org>
Co-Authored-By: Carol Willing <carolcode@willingconsulting.com>
Co-Authored-By: Daniel Porteous <danielporteous1@gmail.com>
Co-Authored-By: Dennis Sweeney <36520290+sweeneyde@users.noreply.github.com>
Co-Authored-By: Éric Araujo <merwok@netwok.org>
Co-Authored-By: Erlend Egeberg Aasland <erlend.aasland@innova.no>
Co-Authored-By: Ezio Melotti <ezio.melotti@gmail.com>
Co-Authored-By: Georg Brandl <georg@python.org>
Co-Authored-By: Guido van Rossum <guido@python.org>
Co-Authored-By: Hugo van Kemenade <hugovk@users.noreply.github.com>
Co-Authored-By: Irit Katriel <1055913+iritkatriel@users.noreply.github.com>
Co-Authored-By: Jeff Allen <ja.py@farowl.co.uk>
Co-Authored-By: Jim Fasarakis-Hilliard <d.f.hilliard@gmail.com>
Co-Authored-By: Ken Jin <28750310+Fidget-Spinner@users.noreply.github.com>
Co-Authored-By: Lincoln <71312724+Lincoln-developer@users.noreply.github.com>
Co-Authored-By: Mariatta <Mariatta@users.noreply.github.com>
Co-Authored-By: Muhammad Mahad <mahadpy@gmail.com>
Co-Authored-By: Ned Deily <nad@acm.org>
Co-Authored-By: Pablo Galindo Salgado <Pablogsal@gmail.com>
Co-Authored-By: Serhiy Storchaka <storchaka@gmail.com>
Co-Authored-By: Stéphane Wirtel <stephane@wirtel.be>
Co-Authored-By: Suriyaa ✌️️ <isc.suriyaa@gmail.com>
Co-Authored-By: Zachary Ware <zachary.ware@gmail.com>
Co-Authored-By: psyker156 <242220+psyker156@users.noreply.github.com>
Co-Authored-By: slateny <46876382+slateny@users.noreply.github.com>
Co-Authored-By: svelankar <17737361+svelankar@users.noreply.github.com>
Co-Authored-By: zikcheng <surfingbyte@gmail.com>
* convert to markdown
* add to index
* update more of the out of date stuff
---------
Co-authored-by: Adam Turner <9087854+AA-Turner@users.noreply.github.com>
Co-authored-by: Brett Cannon <brett@python.org>
Co-authored-by: Carol Willing <carolcode@willingconsulting.com>
Co-authored-by: Daniel Porteous <danielporteous1@gmail.com>
Co-authored-by: Dennis Sweeney <36520290+sweeneyde@users.noreply.github.com>
Co-authored-by: Éric Araujo <merwok@netwok.org>
Co-authored-by: Erlend Egeberg Aasland <erlend.aasland@innova.no>
Co-authored-by: Ezio Melotti <ezio.melotti@gmail.com>
Co-authored-by: Georg Brandl <georg@python.org>
Co-authored-by: Guido van Rossum <guido@python.org>
Co-authored-by: Hugo van Kemenade <hugovk@users.noreply.github.com>
Co-authored-by: Jeff Allen <ja.py@farowl.co.uk>
Co-authored-by: Jim Fasarakis-Hilliard <d.f.hilliard@gmail.com>
Co-authored-by: Ken Jin <28750310+Fidget-Spinner@users.noreply.github.com>
Co-authored-by: Lincoln <71312724+Lincoln-developer@users.noreply.github.com>
Co-authored-by: Mariatta <Mariatta@users.noreply.github.com>
Co-authored-by: Muhammad Mahad <mahadpy@gmail.com>
Co-authored-by: Ned Deily <nad@acm.org>
Co-authored-by: Pablo Galindo Salgado <Pablogsal@gmail.com>
Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
Co-authored-by: Stéphane Wirtel <stephane@wirtel.be>
Co-authored-by: Suriyaa ✌️️ <isc.suriyaa@gmail.com>
Co-authored-by: Zachary Ware <zachary.ware@gmail.com>
Co-authored-by: psyker156 <242220+psyker156@users.noreply.github.com>
Co-authored-by: slateny <46876382+slateny@users.noreply.github.com>
Co-authored-by: svelankar <17737361+svelankar@users.noreply.github.com>
Co-authored-by: zikcheng <surfingbyte@gmail.com>
This commit is contained in:
parent
c3b6dbff2c
commit
56c3815ba1
|
@ -12,6 +12,8 @@ it is not, please report that through the
|
|||
[issue tracker](https://github.com/python/cpython/issues).
|
||||
|
||||
|
||||
[Compiler Design](compiler.md)
|
||||
|
||||
[Exception Handling](exception_handling.md)
|
||||
|
||||
[Adaptive Instruction Families](adaptive.md)
|
||||
|
|
|
@ -0,0 +1,651 @@
|
|||
|
||||
Compiler design
|
||||
===============
|
||||
|
||||
Abstract
|
||||
--------
|
||||
|
||||
In CPython, the compilation from source code to bytecode involves several steps:
|
||||
|
||||
1. Tokenize the source code
|
||||
[Parser/lexer/](https://github.com/python/cpython/blob/main/Parser/lexer/)
|
||||
and [Parser/tokenizer/](https://github.com/python/cpython/blob/main/Parser/tokenizer/).
|
||||
2. Parse the stream of tokens into an Abstract Syntax Tree
|
||||
[Parser/parser.c](https://github.com/python/cpython/blob/main/Parser/parser.c).
|
||||
3. Transform AST into an instruction sequence
|
||||
[Python/compile.c](https://github.com/python/cpython/blob/main/Python/compile.c).
|
||||
4. Construct a Control Flow Graph and apply optimizations to it
|
||||
[Python/flowgraph.c](https://github.com/python/cpython/blob/main/Python/flowgraph.c).
|
||||
5. Emit bytecode based on the Control Flow Graph
|
||||
[Python/assemble.c](https://github.com/python/cpython/blob/main/Python/assemble.c).
|
||||
|
||||
This document outlines how these steps of the process work.
|
||||
|
||||
This document only describes parsing in enough depth to explain what is needed
|
||||
for understanding compilation. This document provides a detailed, though not
|
||||
exhaustive, view of the how the entire system works. You will most likely need
|
||||
to read some source code to have an exact understanding of all details.
|
||||
|
||||
|
||||
Parsing
|
||||
=======
|
||||
|
||||
As of Python 3.9, Python's parser is a PEG parser of a somewhat
|
||||
unusual design. It is unusual in the sense that the parser's input is a stream
|
||||
of tokens rather than a stream of characters which is more common with PEG
|
||||
parsers.
|
||||
|
||||
The grammar file for Python can be found in
|
||||
[Grammar/python.gram](https://github.com/python/cpython/blob/main/Grammar/python.gram).
|
||||
The definitions for literal tokens (such as ``:``, numbers, etc.) can be found in
|
||||
[Grammar/Tokens](https://github.com/python/cpython/blob/main/Grammar/Tokens).
|
||||
Various C files, including
|
||||
[Parser/parser.c](https://github.com/python/cpython/blob/main/Parser/parser.c)
|
||||
are generated from these.
|
||||
|
||||
See Also:
|
||||
|
||||
* [Guide to the parser](https://devguide.python.org/internals/parser/index.html)
|
||||
for a detailed description of the parser.
|
||||
|
||||
* [Changing CPython’s grammar](https://devguide.python.org/developer-workflow/grammar/#grammar)
|
||||
for a detailed description of the grammar.
|
||||
|
||||
|
||||
Abstract syntax trees (AST)
|
||||
===========================
|
||||
|
||||
|
||||
The abstract syntax tree (AST) is a high-level representation of the
|
||||
program structure without the necessity of containing the source code;
|
||||
it can be thought of as an abstract representation of the source code. The
|
||||
specification of the AST nodes is specified using the Zephyr Abstract
|
||||
Syntax Definition Language (ASDL) [^1], [^2].
|
||||
|
||||
The definition of the AST nodes for Python is found in the file
|
||||
[Parser/Python.asdl](https://github.com/python/cpython/blob/main/Parser/Python.asdl).
|
||||
|
||||
Each AST node (representing statements, expressions, and several
|
||||
specialized types, like list comprehensions and exception handlers) is
|
||||
defined by the ASDL. Most definitions in the AST correspond to a
|
||||
particular source construct, such as an 'if' statement or an attribute
|
||||
lookup. The definition is independent of its realization in any
|
||||
particular programming language.
|
||||
|
||||
The following fragment of the Python ASDL construct demonstrates the
|
||||
approach and syntax:
|
||||
|
||||
```
|
||||
module Python
|
||||
{
|
||||
stmt = FunctionDef(identifier name, arguments args, stmt* body,
|
||||
expr* decorators)
|
||||
| Return(expr? value) | Yield(expr? value)
|
||||
attributes (int lineno)
|
||||
}
|
||||
```
|
||||
|
||||
The preceding example describes two different kinds of statements and an
|
||||
expression: function definitions, return statements, and yield expressions.
|
||||
All three kinds are considered of type ``stmt`` as shown by ``|`` separating
|
||||
the various kinds. They all take arguments of various kinds and amounts.
|
||||
|
||||
Modifiers on the argument type specify the number of values needed; ``?``
|
||||
means it is optional, ``*`` means 0 or more, while no modifier means only one
|
||||
value for the argument and it is required. ``FunctionDef``, for instance,
|
||||
takes an ``identifier`` for the *name*, ``arguments`` for *args*, zero or more
|
||||
``stmt`` arguments for *body*, and zero or more ``expr`` arguments for
|
||||
*decorators*.
|
||||
|
||||
Do notice that something like 'arguments', which is a node type, is
|
||||
represented as a single AST node and not as a sequence of nodes as with
|
||||
stmt as one might expect.
|
||||
|
||||
All three kinds also have an 'attributes' argument; this is shown by the
|
||||
fact that 'attributes' lacks a '|' before it.
|
||||
|
||||
The statement definitions above generate the following C structure type:
|
||||
|
||||
|
||||
```
|
||||
typedef struct _stmt *stmt_ty;
|
||||
|
||||
struct _stmt {
|
||||
enum { FunctionDef_kind=1, Return_kind=2, Yield_kind=3 } kind;
|
||||
union {
|
||||
struct {
|
||||
identifier name;
|
||||
arguments_ty args;
|
||||
asdl_seq *body;
|
||||
} FunctionDef;
|
||||
|
||||
struct {
|
||||
expr_ty value;
|
||||
} Return;
|
||||
|
||||
struct {
|
||||
expr_ty value;
|
||||
} Yield;
|
||||
} v;
|
||||
int lineno;
|
||||
}
|
||||
```
|
||||
|
||||
Also generated are a series of constructor functions that allocate (in
|
||||
this case) a ``stmt_ty`` struct with the appropriate initialization. The
|
||||
``kind`` field specifies which component of the union is initialized. The
|
||||
``FunctionDef()`` constructor function sets 'kind' to ``FunctionDef_kind`` and
|
||||
initializes the *name*, *args*, *body*, and *attributes* fields.
|
||||
|
||||
See also
|
||||
[Green Tree Snakes - The missing Python AST docs](https://greentreesnakes.readthedocs.io/en/latest)
|
||||
by Thomas Kluyver.
|
||||
|
||||
Memory management
|
||||
=================
|
||||
|
||||
Before discussing the actual implementation of the compiler, a discussion of
|
||||
how memory is handled is in order. To make memory management simple, an **arena**
|
||||
is used that pools memory in a single location for easy
|
||||
allocation and removal. This enables the removal of explicit memory
|
||||
deallocation. Because memory allocation for all needed memory in the compiler
|
||||
registers that memory with the arena, a single call to free the arena is all
|
||||
that is needed to completely free all memory used by the compiler.
|
||||
|
||||
In general, unless you are working on the critical core of the compiler, memory
|
||||
management can be completely ignored. But if you are working at either the
|
||||
very beginning of the compiler or the end, you need to care about how the arena
|
||||
works. All code relating to the arena is in either
|
||||
[Include/internal/pycore_pyarena.h](https://github.com/python/cpython/blob/main/Include/internal/pycore_pyarena.h)
|
||||
or [Python/pyarena.c](https://github.com/python/cpython/blob/main/Python/pyarena.c).
|
||||
|
||||
``PyArena_New()`` will create a new arena. The returned ``PyArena`` structure
|
||||
will store pointers to all memory given to it. This does the bookkeeping of
|
||||
what memory needs to be freed when the compiler is finished with the memory it
|
||||
used. That freeing is done with ``PyArena_Free()``. This only needs to be
|
||||
called in strategic areas where the compiler exits.
|
||||
|
||||
As stated above, in general you should not have to worry about memory
|
||||
management when working on the compiler. The technical details of memory
|
||||
management have been designed to be hidden from you for most cases.
|
||||
|
||||
The only exception comes about when managing a PyObject. Since the rest
|
||||
of Python uses reference counting, there is extra support added
|
||||
to the arena to cleanup each PyObject that was allocated. These cases
|
||||
are very rare. However, if you've allocated a PyObject, you must tell
|
||||
the arena about it by calling ``PyArena_AddPyObject()``.
|
||||
|
||||
|
||||
Source code to AST
|
||||
==================
|
||||
|
||||
The AST is generated from source code using the function
|
||||
``_PyParser_ASTFromString()`` or ``_PyParser_ASTFromFile()``
|
||||
[Parser/peg_api.c](https://github.com/python/cpython/blob/main/Parser/peg_api.c).
|
||||
|
||||
After some checks, a helper function in
|
||||
[Parser/parser.c](https://github.com/python/cpython/blob/main/Parser/parser.c)
|
||||
begins applying production rules on the source code it receives; converting source
|
||||
code to tokens and matching these tokens recursively to their corresponding rule. The
|
||||
production rule's corresponding rule function is called on every match. These rule
|
||||
functions follow the format `xx_rule`. Where *xx* is the grammar rule
|
||||
that the function handles and is automatically derived from
|
||||
[Grammar/python.gram](https://github.com/python/cpython/blob/main/Grammar/python.gram) by
|
||||
[Tools/peg_generator/pegen/c_generator.py](https://github.com/python/cpython/blob/main/Tools/peg_generator/pegen/c_generator.py).
|
||||
|
||||
Each rule function in turn creates an AST node as it goes along. It does this
|
||||
by allocating all the new nodes it needs, calling the proper AST node creation
|
||||
functions for any required supporting functions and connecting them as needed.
|
||||
This continues until all nonterminal symbols are replaced with terminals. If an
|
||||
error occurs, the rule functions backtrack and try another rule function. If
|
||||
there are no more rules, an error is set and the parsing ends.
|
||||
|
||||
The AST node creation helper functions have the name `_PyAST_{xx}`
|
||||
where *xx* is the AST node that the function creates. These are defined by the
|
||||
ASDL grammar and contained in
|
||||
[Python/Python-ast.c](https://github.com/python/cpython/blob/main/Python/Python-ast.c)
|
||||
(which is generated by
|
||||
[Parser/asdl_c.py](https://github.com/python/cpython/blob/main/Parser/asdl_c.py)
|
||||
from
|
||||
[Parser/Python.asdl](https://github.com/python/cpython/blob/main/Parser/Python.asdl)).
|
||||
This all leads to a sequence of AST nodes stored in ``asdl_seq`` structs.
|
||||
|
||||
To demonstrate everything explained so far, here's the
|
||||
rule function responsible for a simple named import statement such as
|
||||
``import sys``. Note that error-checking and debugging code has been
|
||||
omitted. Removed parts are represented by ``...``.
|
||||
Furthermore, some comments have been added for explanation. These comments
|
||||
may not be present in the actual code.
|
||||
|
||||
|
||||
```
|
||||
// This is the production rule (from python.gram) the rule function
|
||||
// corresponds to:
|
||||
// import_name: 'import' dotted_as_names
|
||||
static stmt_ty
|
||||
import_name_rule(Parser *p)
|
||||
{
|
||||
...
|
||||
stmt_ty _res = NULL;
|
||||
{ // 'import' dotted_as_names
|
||||
...
|
||||
Token * _keyword;
|
||||
asdl_alias_seq* a;
|
||||
// The tokenizing steps.
|
||||
if (
|
||||
(_keyword = _PyPegen_expect_token(p, 513)) // token='import'
|
||||
&&
|
||||
(a = dotted_as_names_rule(p)) // dotted_as_names
|
||||
)
|
||||
{
|
||||
...
|
||||
// Generate an AST for the import statement.
|
||||
_res = _PyAST_Import ( a , ...);
|
||||
...
|
||||
goto done;
|
||||
}
|
||||
...
|
||||
}
|
||||
_res = NULL;
|
||||
done:
|
||||
...
|
||||
return _res;
|
||||
}
|
||||
```
|
||||
|
||||
|
||||
To improve backtracking performance, some rules (chosen by applying a
|
||||
``(memo)`` flag in the grammar file) are memoized. Each rule function checks if
|
||||
a memoized version exists and returns that if so, else it continues in the
|
||||
manner stated in the previous paragraphs.
|
||||
|
||||
There are macros for creating and using ``asdl_xx_seq *`` types, where *xx* is
|
||||
a type of the ASDL sequence. Three main types are defined
|
||||
manually -- ``generic``, ``identifier`` and ``int``. These types are found in
|
||||
[Python/asdl.c](https://github.com/python/cpython/blob/main/Python/asdl.c)
|
||||
and its corresponding header file
|
||||
[Include/internal/pycore_asdl.h](https://github.com/python/cpython/blob/main/Include/internal/pycore_asdl.h).
|
||||
Functions and macros for creating ``asdl_xx_seq *`` types are as follows:
|
||||
|
||||
``_Py_asdl_generic_seq_new(Py_ssize_t, PyArena *)``
|
||||
Allocate memory for an ``asdl_generic_seq`` of the specified length
|
||||
``_Py_asdl_identifier_seq_new(Py_ssize_t, PyArena *)``
|
||||
Allocate memory for an ``asdl_identifier_seq`` of the specified length
|
||||
``_Py_asdl_int_seq_new(Py_ssize_t, PyArena *)``
|
||||
Allocate memory for an ``asdl_int_seq`` of the specified length
|
||||
|
||||
In addition to the three types mentioned above, some ASDL sequence types are
|
||||
automatically generated by
|
||||
[Parser/asdl_c.py](https://github.com/python/cpython/blob/main/Parser/asdl_c.py)
|
||||
and found in
|
||||
[Include/internal/pycore_ast.h](https://github.com/python/cpython/blob/main/Include/internal/pycore_ast.h).
|
||||
Macros for using both manually defined and automatically generated ASDL
|
||||
sequence types are as follows:
|
||||
|
||||
``asdl_seq_GET(asdl_xx_seq *, int)``
|
||||
Get item held at a specific position in an ``asdl_xx_seq``
|
||||
``asdl_seq_SET(asdl_xx_seq *, int, stmt_ty)``
|
||||
Set a specific index in an ``asdl_xx_seq`` to the specified value
|
||||
|
||||
Untyped counterparts exist for some of the typed macros. These are useful
|
||||
when a function needs to manipulate a generic ASDL sequence:
|
||||
|
||||
``asdl_seq_GET_UNTYPED(asdl_seq *, int)``
|
||||
Get item held at a specific position in an ``asdl_seq``
|
||||
``asdl_seq_SET_UNTYPED(asdl_seq *, int, stmt_ty)``
|
||||
Set a specific index in an ``asdl_seq`` to the specified value
|
||||
``asdl_seq_LEN(asdl_seq *)``
|
||||
Return the length of an ``asdl_seq`` or ``asdl_xx_seq``
|
||||
|
||||
Note that typed macros and functions are recommended over their untyped
|
||||
counterparts. Typed macros carry out checks in debug mode and aid
|
||||
debugging errors caused by incorrectly casting from ``void *``.
|
||||
|
||||
If you are working with statements, you must also worry about keeping
|
||||
track of what line number generated the statement. Currently the line
|
||||
number is passed as the last parameter to each ``stmt_ty`` function.
|
||||
|
||||
See also [PEP 617: New PEG parser for CPython](https://peps.python.org/pep-0617/).
|
||||
|
||||
|
||||
Control flow graphs
|
||||
===================
|
||||
|
||||
A **control flow graph** (often referenced by its acronym, **CFG**) is a
|
||||
directed graph that models the flow of a program. A node of a CFG is
|
||||
not an individual bytecode instruction, but instead represents a
|
||||
sequence of bytecode instructions that always execute sequentially.
|
||||
Each node is called a *basic block* and must always execute from
|
||||
start to finish, with a single entry point at the beginning and a
|
||||
single exit point at the end. If some bytecode instruction *a* needs
|
||||
to jump to some other bytecode instruction *b*, then *a* must occur at
|
||||
the end of its basic block, and *b* must occur at the start of its
|
||||
basic block.
|
||||
|
||||
As an example, consider the following code snippet:
|
||||
|
||||
.. code-block:: Python
|
||||
|
||||
if x < 10:
|
||||
f1()
|
||||
f2()
|
||||
else:
|
||||
g()
|
||||
end()
|
||||
|
||||
The ``x < 10`` guard is represented by its own basic block that
|
||||
compares ``x`` with ``10`` and then ends in a conditional jump based on
|
||||
the result of the comparison. This conditional jump allows the block
|
||||
to point to both the body of the ``if`` and the body of the ``else``. The
|
||||
``if`` basic block contains the ``f1()`` and ``f2()`` calls and points to
|
||||
the ``end()`` basic block. The ``else`` basic block contains the ``g()``
|
||||
call and similarly points to the ``end()`` block.
|
||||
|
||||
Note that more complex code in the guard, the ``if`` body, or the ``else``
|
||||
body may be represented by multiple basic blocks. For instance,
|
||||
short-circuiting boolean logic in a guard like ``if x or y:``
|
||||
will produce one basic block that tests the truth value of ``x``
|
||||
and then points both (1) to the start of the ``if`` body and (2) to
|
||||
a different basic block that tests the truth value of y.
|
||||
|
||||
CFGs are useful as an intermediate representation of the code because
|
||||
they are a convenient data structure for optimizations.
|
||||
|
||||
AST to CFG to bytecode
|
||||
======================
|
||||
|
||||
The conversion of an ``AST`` to bytecode is initiated by a call to the function
|
||||
``_PyAST_Compile()`` in
|
||||
[Python/compile.c](https://github.com/python/cpython/blob/main/Python/compile.c).
|
||||
|
||||
The first step is to construct the symbol table. This is implemented by
|
||||
``_PySymtable_Build()`` in
|
||||
[Python/symtable.c](https://github.com/python/cpython/blob/main/Python/symtable.c).
|
||||
This function begins by entering the starting code block for the AST (passed-in)
|
||||
and then calling the proper `symtable_visit_{xx}` function (with *xx* being the
|
||||
AST node type). Next, the AST tree is walked with the various code blocks that
|
||||
delineate the reach of a local variable as blocks are entered and exited using
|
||||
``symtable_enter_block()`` and ``symtable_exit_block()``, respectively.
|
||||
|
||||
Once the symbol table is created, the ``AST`` is transformed by ``compiler_codegen()``
|
||||
in [Python/compile.c](https://github.com/python/cpython/blob/main/Python/compile.c)
|
||||
into a sequence of pseudo instructions. These are similar to bytecode, but
|
||||
in some cases they are more abstract, and are resolved later into actual
|
||||
bytecode. The construction of this instruction sequence is handled by several
|
||||
functions that break the task down by various AST node types. The functions are
|
||||
all named `compiler_visit_{xx}` where *xx* is the name of the node type (such
|
||||
as ``stmt``, ``expr``, etc.). Each function receives a ``struct compiler *``
|
||||
and `{xx}_ty` where *xx* is the AST node type. Typically these functions
|
||||
consist of a large 'switch' statement, branching based on the kind of
|
||||
node type passed to it. Simple things are handled inline in the
|
||||
'switch' statement with more complex transformations farmed out to other
|
||||
functions named `compiler_{xx}` with *xx* being a descriptive name of what is
|
||||
being handled.
|
||||
|
||||
When transforming an arbitrary AST node, use the ``VISIT()`` macro.
|
||||
The appropriate `compiler_visit_{xx}` function is called, based on the value
|
||||
passed in for <node type> (so `VISIT({c}, expr, {node})` calls
|
||||
`compiler_visit_expr({c}, {node})`). The ``VISIT_SEQ()`` macro is very similar,
|
||||
but is called on AST node sequences (those values that were created as
|
||||
arguments to a node that used the '*' modifier).
|
||||
|
||||
Emission of bytecode is handled by the following macros:
|
||||
|
||||
* ``ADDOP(struct compiler *, location, int)``
|
||||
add a specified opcode
|
||||
* ``ADDOP_IN_SCOPE(struct compiler *, location, int)``
|
||||
like ``ADDOP``, but also exits current scope; used for adding return value
|
||||
opcodes in lambdas and closures
|
||||
* ``ADDOP_I(struct compiler *, location, int, Py_ssize_t)``
|
||||
add an opcode that takes an integer argument
|
||||
* ``ADDOP_O(struct compiler *, location, int, PyObject *, TYPE)``
|
||||
add an opcode with the proper argument based on the position of the
|
||||
specified PyObject in PyObject sequence object, but with no handling of
|
||||
mangled names; used for when you
|
||||
need to do named lookups of objects such as globals, consts, or
|
||||
parameters where name mangling is not possible and the scope of the
|
||||
name is known; *TYPE* is the name of PyObject sequence
|
||||
(``names`` or ``varnames``)
|
||||
* ``ADDOP_N(struct compiler *, location, int, PyObject *, TYPE)``
|
||||
just like ``ADDOP_O``, but steals a reference to PyObject
|
||||
* ``ADDOP_NAME(struct compiler *, location, int, PyObject *, TYPE)``
|
||||
just like ``ADDOP_O``, but name mangling is also handled; used for
|
||||
attribute loading or importing based on name
|
||||
* ``ADDOP_LOAD_CONST(struct compiler *, location, PyObject *)``
|
||||
add the ``LOAD_CONST`` opcode with the proper argument based on the
|
||||
position of the specified PyObject in the consts table.
|
||||
* ``ADDOP_LOAD_CONST_NEW(struct compiler *, location, PyObject *)``
|
||||
just like ``ADDOP_LOAD_CONST_NEW``, but steals a reference to PyObject
|
||||
* ``ADDOP_JUMP(struct compiler *, location, int, basicblock *)``
|
||||
create a jump to a basic block
|
||||
|
||||
The ``location`` argument is a struct with the source location to be
|
||||
associated with this instruction. It is typically extracted from an
|
||||
``AST`` node with the ``LOC`` macro. The ``NO_LOCATION`` can be used
|
||||
for *synthetic* instructions, which we do not associate with a line
|
||||
number at this stage. For example, the implicit ``return None``
|
||||
which is added at the end of a function is not associated with any
|
||||
line in the source code.
|
||||
|
||||
There are several helper functions that will emit pseudo-instructions
|
||||
and are named `compiler_{xx}()` where *xx* is what the function helps
|
||||
with (``list``, ``boolop``, etc.). A rather useful one is ``compiler_nameop()``.
|
||||
This function looks up the scope of a variable and, based on the
|
||||
expression context, emits the proper opcode to load, store, or delete
|
||||
the variable.
|
||||
|
||||
Once the instruction sequence is created, it is transformed into a CFG
|
||||
by ``_PyCfg_FromInstructionSequence()``. Then ``_PyCfg_OptimizeCodeUnit()``
|
||||
applies various peephole optimizations, and
|
||||
``_PyCfg_OptimizedCfgToInstructionSequence()`` converts the optimized ``CFG``
|
||||
back into an instruction sequence. These conversions and optimizations are
|
||||
implemented in
|
||||
[Python/flowgraph.c](https://github.com/python/cpython/blob/main/Python/flowgraph.c).
|
||||
|
||||
Finally, the sequence of pseudo-instructions is converted into actual
|
||||
bytecode. This includes transforming pseudo instructions into actual instructions,
|
||||
converting jump targets from logical labels to relative offsets, and
|
||||
construction of the
|
||||
[exception table](exception_handling.md) and
|
||||
[locations table](https://github.com/python/cpython/blob/main/Objects/locations.md).
|
||||
The bytecode and tables are then wrapped into a ``PyCodeObject`` along with additional
|
||||
metadata, including the ``consts`` and ``names`` arrays, information about function
|
||||
reference to the source code (filename, etc). All of this is implemented by
|
||||
``_PyAssemble_MakeCodeObject()`` in
|
||||
[Python/assemble.c](https://github.com/python/cpython/blob/main/Python/assemble.c).
|
||||
|
||||
|
||||
Code objects
|
||||
============
|
||||
|
||||
The result of ``PyAST_CompileObject()`` is a ``PyCodeObject`` which is defined in
|
||||
[Include/cpython/code.h](https://github.com/python/cpython/blob/main/Include/cpython/code.h).
|
||||
And with that you now have executable Python bytecode!
|
||||
|
||||
The code objects (byte code) are executed in
|
||||
[Python/ceval.c](https://github.com/python/cpython/blob/main/Python/ceval.c).
|
||||
This file will also need a new case statement for the new opcode in the big switch
|
||||
statement in ``_PyEval_EvalFrameDefault()``.
|
||||
|
||||
|
||||
Important files
|
||||
===============
|
||||
|
||||
* [Parser/](https://github.com/python/cpython/blob/main/Parser/)
|
||||
|
||||
* [Parser/Python.asdl](https://github.com/python/cpython/blob/main/Parser/Python.asdl):
|
||||
ASDL syntax file.
|
||||
|
||||
* [Parser/asdl.py](https://github.com/python/cpython/blob/main/Parser/asdl.py):
|
||||
Parser for ASDL definition files.
|
||||
Reads in an ASDL description and parses it into an AST that describes it.
|
||||
|
||||
* [Parser/asdl_c.py](https://github.com/python/cpython/blob/main/Parser/asdl_c.py):
|
||||
Generate C code from an ASDL description. Generates
|
||||
[Python/Python-ast.c](https://github.com/python/cpython/blob/main/Python/Python-ast.c)
|
||||
and
|
||||
[Include/internal/pycore_ast.h](https://github.com/python/cpython/blob/main/Include/internal/pycore_ast.h).
|
||||
|
||||
* [Parser/parser.c](https://github.com/python/cpython/blob/main/Parser/parser.c):
|
||||
The new PEG parser introduced in Python 3.9.
|
||||
Generated by
|
||||
[Tools/peg_generator/pegen/c_generator.py](https://github.com/python/cpython/blob/main/Tools/peg_generator/pegen/c_generator.py)
|
||||
from the grammar [Grammar/python.gram](https://github.com/python/cpython/blob/main/Grammar/python.gram).
|
||||
Creates the AST from source code. Rule functions for their corresponding production
|
||||
rules are found here.
|
||||
|
||||
* [Parser/peg_api.c](https://github.com/python/cpython/blob/main/Parser/peg_api.c):
|
||||
Contains high-level functions which are
|
||||
used by the interpreter to create an AST from source code.
|
||||
|
||||
* [Parser/pegen.c](https://github.com/python/cpython/blob/main/Parser/pegen.c):
|
||||
Contains helper functions which are used by functions in
|
||||
[Parser/parser.c](https://github.com/python/cpython/blob/main/Parser/parser.c)
|
||||
to construct the AST. Also contains helper functions which help raise better error messages
|
||||
when parsing source code.
|
||||
|
||||
* [Parser/pegen.h](https://github.com/python/cpython/blob/main/Parser/pegen.h):
|
||||
Header file for the corresponding
|
||||
[Parser/pegen.c](https://github.com/python/cpython/blob/main/Parser/pegen.c).
|
||||
Also contains definitions of the ``Parser`` and ``Token`` structs.
|
||||
|
||||
* [Python/](https://github.com/python/cpython/blob/main/Python)
|
||||
|
||||
* [Python/Python-ast.c](https://github.com/python/cpython/blob/main/Python/Python-ast.c):
|
||||
Creates C structs corresponding to the ASDL types. Also contains code for
|
||||
marshalling AST nodes (core ASDL types have marshalling code in
|
||||
[Python/asdl.c](https://github.com/python/cpython/blob/main/Python/asdl.c)).
|
||||
"File automatically generated by
|
||||
[Parser/asdl_c.py](https://github.com/python/cpython/blob/main/Parser/asdl_c.py).
|
||||
This file must be committed separately after every grammar change
|
||||
is committed since the ``__version__`` value is set to the latest
|
||||
grammar change revision number.
|
||||
|
||||
* [Python/asdl.c](https://github.com/python/cpython/blob/main/Python/asdl.c):
|
||||
Contains code to handle the ASDL sequence type.
|
||||
Also has code to handle marshalling the core ASDL types, such as number
|
||||
and identifier. Used by
|
||||
[Python/Python-ast.c](https://github.com/python/cpython/blob/main/Python/Python-ast.c)
|
||||
for marshalling AST nodes.
|
||||
|
||||
* [Python/ast.c](https://github.com/python/cpython/blob/main/Python/ast.c):
|
||||
Used for validating the AST.
|
||||
|
||||
* [Python/ast_opt.c](https://github.com/python/cpython/blob/main/Python/ast_opt.c):
|
||||
Optimizes the AST.
|
||||
|
||||
* [Python/ast_unparse.c](https://github.com/python/cpython/blob/main/Python/ast_unparse.c):
|
||||
Converts the AST expression node back into a string (for string annotations).
|
||||
|
||||
* [Python/ceval.c](https://github.com/python/cpython/blob/main/Python/ceval.c):
|
||||
Executes byte code (aka, eval loop).
|
||||
|
||||
* [Python/symtable.c](https://github.com/python/cpython/blob/main/Python/symtable.c):
|
||||
Generates a symbol table from AST.
|
||||
|
||||
* [Python/pyarena.c](https://github.com/python/cpython/blob/main/Python/pyarena.c):
|
||||
Implementation of the arena memory manager.
|
||||
|
||||
* [Python/compile.c](https://github.com/python/cpython/blob/main/Python/compile.c):
|
||||
Emits pseudo bytecode based on the AST.
|
||||
|
||||
* [Python/flowgraph.c](https://github.com/python/cpython/blob/main/Python/flowgraph.c):
|
||||
Implements peephole optimizations.
|
||||
|
||||
* [Python/assemble.c](https://github.com/python/cpython/blob/main/Python/assemble.c):
|
||||
Constructs a code object from a sequence of pseudo instructions.
|
||||
|
||||
* [Python/instruction_sequence.c.c](https://github.com/python/cpython/blob/main/Python/instruction_sequence.c.c):
|
||||
A data structure representing a sequence of bytecode-like pseudo-instructions.
|
||||
|
||||
* [Include/](https://github.com/python/cpython/blob/main/Include/)
|
||||
|
||||
* [Include/cpython/code.h](https://github.com/python/cpython/blob/main/Include/cpython/code.h)
|
||||
: Header file for
|
||||
[Objects/codeobject.c](https://github.com/python/cpython/blob/main/Objects/codeobject.c);
|
||||
contains definition of ``PyCodeObject``.
|
||||
|
||||
* [Include/opcode.h](https://github.com/python/cpython/blob/main/Include/opcode.h)
|
||||
: One of the files that must be modified if
|
||||
[Lib/opcode.py](https://github.com/python/cpython/blob/main/Lib/opcode.py) is.
|
||||
|
||||
* [Include/internal/pycore_ast.h](https://github.com/python/cpython/blob/main/Include/internal/pycore_ast.h)
|
||||
: Contains the actual definitions of the C structs as generated by
|
||||
[Python/Python-ast.c](https://github.com/python/cpython/blob/main/Python/Python-ast.c)
|
||||
"Automatically generated by
|
||||
[Parser/asdl_c.py](https://github.com/python/cpython/blob/main/Parser/asdl_c.py).
|
||||
|
||||
* [Include/internal/pycore_asdl.h](https://github.com/python/cpython/blob/main/Include/internal/pycore_asdl.h)
|
||||
: Header for the corresponding
|
||||
[Python/ast.c](https://github.com/python/cpython/blob/main/Python/ast.c).
|
||||
|
||||
* [Include/internal/pycore_ast.h](https://github.com/python/cpython/blob/main/Include/internal/pycore_ast.h)
|
||||
: Declares ``_PyAST_Validate()`` external (from
|
||||
[Python/ast.c](https://github.com/python/cpython/blob/main/Python/ast.c)).
|
||||
|
||||
* [Include/internal/pycore_symtable.h](https://github.com/python/cpython/blob/main/Include/internal/pycore_symtable.h)
|
||||
: Header for
|
||||
[Python/symtable.c](https://github.com/python/cpython/blob/main/Python/symtable.c).
|
||||
``struct symtable`` and ``PySTEntryObject`` are defined here.
|
||||
|
||||
* [Include/internal/pycore_parser.h](https://github.com/python/cpython/blob/main/Include/internal/pycore_parser.h)
|
||||
: Header for the corresponding
|
||||
[Parser/peg_api.c](https://github.com/python/cpython/blob/main/Parser/peg_api.c).
|
||||
|
||||
* [Include/internal/pycore_pyarena.h](https://github.com/python/cpython/blob/main/Include/internal/pycore_pyarena.h)
|
||||
: Header file for the corresponding
|
||||
[Python/pyarena.c](https://github.com/python/cpython/blob/main/Python/pyarena.c).
|
||||
|
||||
* [Include/opcode_ids.h](https://github.com/python/cpython/blob/main/Include/opcode_ids.h)
|
||||
: List of opcodes. Generated from
|
||||
[Python/bytecodes.c](https://github.com/python/cpython/blob/main/Python/bytecodes.c)
|
||||
by
|
||||
[Tools/cases_generator/opcode_id_generator.py](https://github.com/python/cpython/blob/main/Tools/cases_generator/opcode_id_generator.py).
|
||||
|
||||
* [Objects/](https://github.com/python/cpython/blob/main/Objects/)
|
||||
|
||||
* [Objects/codeobject.c](https://github.com/python/cpython/blob/main/Objects/codeobject.c)
|
||||
: Contains PyCodeObject-related code.
|
||||
|
||||
* [Objects/frameobject.c](https://github.com/python/cpython/blob/main/Objects/frameobject.c)
|
||||
: Contains the ``frame_setlineno()`` function which should determine whether it is allowed
|
||||
to make a jump between two points in a bytecode.
|
||||
|
||||
* [Lib/](https://github.com/python/cpython/blob/main/Lib/)
|
||||
|
||||
* [Lib/opcode.py](https://github.com/python/cpython/blob/main/Lib/opcode.py)
|
||||
: opcode utilities exposed to Python.
|
||||
|
||||
* [Lib/importlib/_bootstrap_external.py](https://github.com/python/cpython/blob/main/Lib/importlib/_bootstrap_external.py)
|
||||
: Home of the magic number (named ``MAGIC_NUMBER``) for bytecode versioning.
|
||||
|
||||
|
||||
Objects
|
||||
=======
|
||||
|
||||
* [Objects/locations.md](https://github.com/python/cpython/blob/main/Objects/locations.md): Describes the location table
|
||||
* [Objects/frame_layout.md](https://github.com/python/cpython/blob/main/Objects/frame_layout.md): Describes the frame stack
|
||||
* [Objects/object_layout.md](https://github.com/python/cpython/blob/main/Objects/object_layout.md): Descibes object layout for 3.11 and later
|
||||
* [Exception Handling](exception_handling.md): Describes the exception table
|
||||
|
||||
|
||||
Specializing Adaptive Interpreter
|
||||
=================================
|
||||
|
||||
Adding a specializing, adaptive interpreter to CPython will bring significant
|
||||
performance improvements. These documents provide more information:
|
||||
|
||||
* [PEP 659: Specializing Adaptive Interpreter](https://peps.python.org/pep-0659/).
|
||||
* [Adding or extending a family of adaptive instructions](adaptive.md)
|
||||
|
||||
|
||||
References
|
||||
==========
|
||||
|
||||
[^1]: Daniel C. Wang, Andrew W. Appel, Jeff L. Korn, and Chris
|
||||
S. Serra. `The Zephyr Abstract Syntax Description Language.`_
|
||||
In Proceedings of the Conference on Domain-Specific Languages,
|
||||
pp. 213--227, 1997.
|
||||
|
||||
[^2]: The Zephyr Abstract Syntax Description Language.:
|
||||
https://www.cs.princeton.edu/research/techreps/TR-554-97
|
Loading…
Reference in New Issue