mirror of https://github.com/python/cpython
gh-119786: [doc] more consistent syntax in InternalDocs (#125815)
This commit is contained in:
parent
4848b0b92c
commit
d0bfff47fb
|
@ -31,8 +31,7 @@ although these are not fundamental and may change:
|
|||
|
||||
## Example family
|
||||
|
||||
The `LOAD_GLOBAL` instruction (in
|
||||
[Python/bytecodes.c](https://github.com/python/cpython/blob/main/Python/bytecodes.c))
|
||||
The `LOAD_GLOBAL` instruction (in [Python/bytecodes.c](../Python/bytecodes.c))
|
||||
already has an adaptive family that serves as a relatively simple example.
|
||||
|
||||
The `LOAD_GLOBAL` instruction performs adaptive specialization,
|
||||
|
|
|
@ -7,17 +7,16 @@ Abstract
|
|||
|
||||
In CPython, the compilation from source code to bytecode involves several steps:
|
||||
|
||||
1. Tokenize the source code
|
||||
[Parser/lexer/](https://github.com/python/cpython/blob/main/Parser/lexer/)
|
||||
and [Parser/tokenizer/](https://github.com/python/cpython/blob/main/Parser/tokenizer/).
|
||||
1. Tokenize the source code [Parser/lexer/](../Parser/lexer/)
|
||||
and [Parser/tokenizer/](../Parser/tokenizer/).
|
||||
2. Parse the stream of tokens into an Abstract Syntax Tree
|
||||
[Parser/parser.c](https://github.com/python/cpython/blob/main/Parser/parser.c).
|
||||
[Parser/parser.c](../Parser/parser.c).
|
||||
3. Transform AST into an instruction sequence
|
||||
[Python/compile.c](https://github.com/python/cpython/blob/main/Python/compile.c).
|
||||
[Python/compile.c](../Python/compile.c).
|
||||
4. Construct a Control Flow Graph and apply optimizations to it
|
||||
[Python/flowgraph.c](https://github.com/python/cpython/blob/main/Python/flowgraph.c).
|
||||
[Python/flowgraph.c](../Python/flowgraph.c).
|
||||
5. Emit bytecode based on the Control Flow Graph
|
||||
[Python/assemble.c](https://github.com/python/cpython/blob/main/Python/assemble.c).
|
||||
[Python/assemble.c](../Python/assemble.c).
|
||||
|
||||
This document outlines how these steps of the process work.
|
||||
|
||||
|
@ -36,12 +35,10 @@ of tokens rather than a stream of characters which is more common with PEG
|
|||
parsers.
|
||||
|
||||
The grammar file for Python can be found in
|
||||
[Grammar/python.gram](https://github.com/python/cpython/blob/main/Grammar/python.gram).
|
||||
The definitions for literal tokens (such as ``:``, numbers, etc.) can be found in
|
||||
[Grammar/Tokens](https://github.com/python/cpython/blob/main/Grammar/Tokens).
|
||||
Various C files, including
|
||||
[Parser/parser.c](https://github.com/python/cpython/blob/main/Parser/parser.c)
|
||||
are generated from these.
|
||||
[Grammar/python.gram](../Grammar/python.gram).
|
||||
The definitions for literal tokens (such as `:`, numbers, etc.) can be found in
|
||||
[Grammar/Tokens](../Grammar/Tokens). Various C files, including
|
||||
[Parser/parser.c](../Parser/parser.c) are generated from these.
|
||||
|
||||
See Also:
|
||||
|
||||
|
@ -63,7 +60,7 @@ specification of the AST nodes is specified using the Zephyr Abstract
|
|||
Syntax Definition Language (ASDL) [^1], [^2].
|
||||
|
||||
The definition of the AST nodes for Python is found in the file
|
||||
[Parser/Python.asdl](https://github.com/python/cpython/blob/main/Parser/Python.asdl).
|
||||
[Parser/Python.asdl](../Parser/Python.asdl).
|
||||
|
||||
Each AST node (representing statements, expressions, and several
|
||||
specialized types, like list comprehensions and exception handlers) is
|
||||
|
@ -87,14 +84,14 @@ approach and syntax:
|
|||
|
||||
The preceding example describes two different kinds of statements and an
|
||||
expression: function definitions, return statements, and yield expressions.
|
||||
All three kinds are considered of type ``stmt`` as shown by ``|`` separating
|
||||
All three kinds are considered of type `stmt` as shown by `|` separating
|
||||
the various kinds. They all take arguments of various kinds and amounts.
|
||||
|
||||
Modifiers on the argument type specify the number of values needed; ``?``
|
||||
means it is optional, ``*`` means 0 or more, while no modifier means only one
|
||||
value for the argument and it is required. ``FunctionDef``, for instance,
|
||||
takes an ``identifier`` for the *name*, ``arguments`` for *args*, zero or more
|
||||
``stmt`` arguments for *body*, and zero or more ``expr`` arguments for
|
||||
Modifiers on the argument type specify the number of values needed; `?`
|
||||
means it is optional, `*` means 0 or more, while no modifier means only one
|
||||
value for the argument and it is required. `FunctionDef`, for instance,
|
||||
takes an `identifier` for the *name*, `arguments` for *args*, zero or more
|
||||
`stmt` arguments for *body*, and zero or more `expr` arguments for
|
||||
*decorators*.
|
||||
|
||||
Do notice that something like 'arguments', which is a node type, is
|
||||
|
@ -132,9 +129,9 @@ The statement definitions above generate the following C structure type:
|
|||
```
|
||||
|
||||
Also generated are a series of constructor functions that allocate (in
|
||||
this case) a ``stmt_ty`` struct with the appropriate initialization. The
|
||||
``kind`` field specifies which component of the union is initialized. The
|
||||
``FunctionDef()`` constructor function sets 'kind' to ``FunctionDef_kind`` and
|
||||
this case) a `stmt_ty` struct with the appropriate initialization. The
|
||||
`kind` field specifies which component of the union is initialized. The
|
||||
`FunctionDef()` constructor function sets 'kind' to `FunctionDef_kind` and
|
||||
initializes the *name*, *args*, *body*, and *attributes* fields.
|
||||
|
||||
See also
|
||||
|
@ -156,13 +153,13 @@ In general, unless you are working on the critical core of the compiler, memory
|
|||
management can be completely ignored. But if you are working at either the
|
||||
very beginning of the compiler or the end, you need to care about how the arena
|
||||
works. All code relating to the arena is in either
|
||||
[Include/internal/pycore_pyarena.h](https://github.com/python/cpython/blob/main/Include/internal/pycore_pyarena.h)
|
||||
or [Python/pyarena.c](https://github.com/python/cpython/blob/main/Python/pyarena.c).
|
||||
[Include/internal/pycore_pyarena.h](../Include/internal/pycore_pyarena.h)
|
||||
or [Python/pyarena.c](../Python/pyarena.c).
|
||||
|
||||
``PyArena_New()`` will create a new arena. The returned ``PyArena`` structure
|
||||
`PyArena_New()` will create a new arena. The returned `PyArena` structure
|
||||
will store pointers to all memory given to it. This does the bookkeeping of
|
||||
what memory needs to be freed when the compiler is finished with the memory it
|
||||
used. That freeing is done with ``PyArena_Free()``. This only needs to be
|
||||
used. That freeing is done with `PyArena_Free()`. This only needs to be
|
||||
called in strategic areas where the compiler exits.
|
||||
|
||||
As stated above, in general you should not have to worry about memory
|
||||
|
@ -173,25 +170,25 @@ The only exception comes about when managing a PyObject. Since the rest
|
|||
of Python uses reference counting, there is extra support added
|
||||
to the arena to cleanup each PyObject that was allocated. These cases
|
||||
are very rare. However, if you've allocated a PyObject, you must tell
|
||||
the arena about it by calling ``PyArena_AddPyObject()``.
|
||||
the arena about it by calling `PyArena_AddPyObject()`.
|
||||
|
||||
|
||||
Source code to AST
|
||||
==================
|
||||
|
||||
The AST is generated from source code using the function
|
||||
``_PyParser_ASTFromString()`` or ``_PyParser_ASTFromFile()``
|
||||
[Parser/peg_api.c](https://github.com/python/cpython/blob/main/Parser/peg_api.c).
|
||||
`_PyParser_ASTFromString()` or `_PyParser_ASTFromFile()`
|
||||
[Parser/peg_api.c](../Parser/peg_api.c).
|
||||
|
||||
After some checks, a helper function in
|
||||
[Parser/parser.c](https://github.com/python/cpython/blob/main/Parser/parser.c)
|
||||
[Parser/parser.c](../Parser/parser.c)
|
||||
begins applying production rules on the source code it receives; converting source
|
||||
code to tokens and matching these tokens recursively to their corresponding rule. The
|
||||
production rule's corresponding rule function is called on every match. These rule
|
||||
functions follow the format `xx_rule`. Where *xx* is the grammar rule
|
||||
that the function handles and is automatically derived from
|
||||
[Grammar/python.gram](https://github.com/python/cpython/blob/main/Grammar/python.gram) by
|
||||
[Tools/peg_generator/pegen/c_generator.py](https://github.com/python/cpython/blob/main/Tools/peg_generator/pegen/c_generator.py).
|
||||
[Grammar/python.gram](../Grammar/python.gram) by
|
||||
[Tools/peg_generator/pegen/c_generator.py](../Tools/peg_generator/pegen/c_generator.py).
|
||||
|
||||
Each rule function in turn creates an AST node as it goes along. It does this
|
||||
by allocating all the new nodes it needs, calling the proper AST node creation
|
||||
|
@ -202,18 +199,15 @@ there are no more rules, an error is set and the parsing ends.
|
|||
|
||||
The AST node creation helper functions have the name `_PyAST_{xx}`
|
||||
where *xx* is the AST node that the function creates. These are defined by the
|
||||
ASDL grammar and contained in
|
||||
[Python/Python-ast.c](https://github.com/python/cpython/blob/main/Python/Python-ast.c)
|
||||
(which is generated by
|
||||
[Parser/asdl_c.py](https://github.com/python/cpython/blob/main/Parser/asdl_c.py)
|
||||
from
|
||||
[Parser/Python.asdl](https://github.com/python/cpython/blob/main/Parser/Python.asdl)).
|
||||
This all leads to a sequence of AST nodes stored in ``asdl_seq`` structs.
|
||||
ASDL grammar and contained in [Python/Python-ast.c](../Python/Python-ast.c)
|
||||
(which is generated by [Parser/asdl_c.py](../Parser/asdl_c.py)
|
||||
from [Parser/Python.asdl](../Parser/Python.asdl)).
|
||||
This all leads to a sequence of AST nodes stored in `asdl_seq` structs.
|
||||
|
||||
To demonstrate everything explained so far, here's the
|
||||
rule function responsible for a simple named import statement such as
|
||||
``import sys``. Note that error-checking and debugging code has been
|
||||
omitted. Removed parts are represented by ``...``.
|
||||
`import sys`. Note that error-checking and debugging code has been
|
||||
omitted. Removed parts are represented by `...`.
|
||||
Furthermore, some comments have been added for explanation. These comments
|
||||
may not be present in the actual code.
|
||||
|
||||
|
@ -255,55 +249,52 @@ may not be present in the actual code.
|
|||
|
||||
|
||||
To improve backtracking performance, some rules (chosen by applying a
|
||||
``(memo)`` flag in the grammar file) are memoized. Each rule function checks if
|
||||
`(memo)` flag in the grammar file) are memoized. Each rule function checks if
|
||||
a memoized version exists and returns that if so, else it continues in the
|
||||
manner stated in the previous paragraphs.
|
||||
|
||||
There are macros for creating and using ``asdl_xx_seq *`` types, where *xx* is
|
||||
There are macros for creating and using `asdl_xx_seq *` types, where *xx* is
|
||||
a type of the ASDL sequence. Three main types are defined
|
||||
manually -- ``generic``, ``identifier`` and ``int``. These types are found in
|
||||
[Python/asdl.c](https://github.com/python/cpython/blob/main/Python/asdl.c)
|
||||
and its corresponding header file
|
||||
[Include/internal/pycore_asdl.h](https://github.com/python/cpython/blob/main/Include/internal/pycore_asdl.h).
|
||||
Functions and macros for creating ``asdl_xx_seq *`` types are as follows:
|
||||
manually -- `generic`, `identifier` and `int`. These types are found in
|
||||
[Python/asdl.c](../Python/asdl.c) and its corresponding header file
|
||||
[Include/internal/pycore_asdl.h](../Include/internal/pycore_asdl.h).
|
||||
Functions and macros for creating `asdl_xx_seq *` types are as follows:
|
||||
|
||||
``_Py_asdl_generic_seq_new(Py_ssize_t, PyArena *)``
|
||||
Allocate memory for an ``asdl_generic_seq`` of the specified length
|
||||
``_Py_asdl_identifier_seq_new(Py_ssize_t, PyArena *)``
|
||||
Allocate memory for an ``asdl_identifier_seq`` of the specified length
|
||||
``_Py_asdl_int_seq_new(Py_ssize_t, PyArena *)``
|
||||
Allocate memory for an ``asdl_int_seq`` of the specified length
|
||||
`_Py_asdl_generic_seq_new(Py_ssize_t, PyArena *)`
|
||||
Allocate memory for an `asdl_generic_seq` of the specified length
|
||||
`_Py_asdl_identifier_seq_new(Py_ssize_t, PyArena *)`
|
||||
Allocate memory for an `asdl_identifier_seq` of the specified length
|
||||
`_Py_asdl_int_seq_new(Py_ssize_t, PyArena *)`
|
||||
Allocate memory for an `asdl_int_seq` of the specified length
|
||||
|
||||
In addition to the three types mentioned above, some ASDL sequence types are
|
||||
automatically generated by
|
||||
[Parser/asdl_c.py](https://github.com/python/cpython/blob/main/Parser/asdl_c.py)
|
||||
and found in
|
||||
[Include/internal/pycore_ast.h](https://github.com/python/cpython/blob/main/Include/internal/pycore_ast.h).
|
||||
automatically generated by [Parser/asdl_c.py](../Parser/asdl_c.py) and found in
|
||||
[Include/internal/pycore_ast.h](../Include/internal/pycore_ast.h).
|
||||
Macros for using both manually defined and automatically generated ASDL
|
||||
sequence types are as follows:
|
||||
|
||||
``asdl_seq_GET(asdl_xx_seq *, int)``
|
||||
Get item held at a specific position in an ``asdl_xx_seq``
|
||||
``asdl_seq_SET(asdl_xx_seq *, int, stmt_ty)``
|
||||
Set a specific index in an ``asdl_xx_seq`` to the specified value
|
||||
`asdl_seq_GET(asdl_xx_seq *, int)`
|
||||
Get item held at a specific position in an `asdl_xx_seq`
|
||||
`asdl_seq_SET(asdl_xx_seq *, int, stmt_ty)`
|
||||
Set a specific index in an `asdl_xx_seq` to the specified value
|
||||
|
||||
Untyped counterparts exist for some of the typed macros. These are useful
|
||||
when a function needs to manipulate a generic ASDL sequence:
|
||||
|
||||
``asdl_seq_GET_UNTYPED(asdl_seq *, int)``
|
||||
Get item held at a specific position in an ``asdl_seq``
|
||||
``asdl_seq_SET_UNTYPED(asdl_seq *, int, stmt_ty)``
|
||||
Set a specific index in an ``asdl_seq`` to the specified value
|
||||
``asdl_seq_LEN(asdl_seq *)``
|
||||
Return the length of an ``asdl_seq`` or ``asdl_xx_seq``
|
||||
`asdl_seq_GET_UNTYPED(asdl_seq *, int)`
|
||||
Get item held at a specific position in an `asdl_seq`
|
||||
`asdl_seq_SET_UNTYPED(asdl_seq *, int, stmt_ty)`
|
||||
Set a specific index in an `asdl_seq` to the specified value
|
||||
`asdl_seq_LEN(asdl_seq *)`
|
||||
Return the length of an `asdl_seq` or `asdl_xx_seq`
|
||||
|
||||
Note that typed macros and functions are recommended over their untyped
|
||||
counterparts. Typed macros carry out checks in debug mode and aid
|
||||
debugging errors caused by incorrectly casting from ``void *``.
|
||||
debugging errors caused by incorrectly casting from `void *`.
|
||||
|
||||
If you are working with statements, you must also worry about keeping
|
||||
track of what line number generated the statement. Currently the line
|
||||
number is passed as the last parameter to each ``stmt_ty`` function.
|
||||
number is passed as the last parameter to each `stmt_ty` function.
|
||||
|
||||
See also [PEP 617: New PEG parser for CPython](https://peps.python.org/pep-0617/).
|
||||
|
||||
|
@ -333,19 +324,19 @@ else:
|
|||
end()
|
||||
```
|
||||
|
||||
The ``x < 10`` guard is represented by its own basic block that
|
||||
compares ``x`` with ``10`` and then ends in a conditional jump based on
|
||||
The `x < 10` guard is represented by its own basic block that
|
||||
compares `x` with `10` and then ends in a conditional jump based on
|
||||
the result of the comparison. This conditional jump allows the block
|
||||
to point to both the body of the ``if`` and the body of the ``else``. The
|
||||
``if`` basic block contains the ``f1()`` and ``f2()`` calls and points to
|
||||
the ``end()`` basic block. The ``else`` basic block contains the ``g()``
|
||||
call and similarly points to the ``end()`` block.
|
||||
to point to both the body of the `if` and the body of the `else`. The
|
||||
`if` basic block contains the `f1()` and `f2()` calls and points to
|
||||
the `end()` basic block. The `else` basic block contains the `g()`
|
||||
call and similarly points to the `end()` block.
|
||||
|
||||
Note that more complex code in the guard, the ``if`` body, or the ``else``
|
||||
Note that more complex code in the guard, the `if` body, or the `else`
|
||||
body may be represented by multiple basic blocks. For instance,
|
||||
short-circuiting boolean logic in a guard like ``if x or y:``
|
||||
will produce one basic block that tests the truth value of ``x``
|
||||
and then points both (1) to the start of the ``if`` body and (2) to
|
||||
short-circuiting boolean logic in a guard like `if x or y:`
|
||||
will produce one basic block that tests the truth value of `x`
|
||||
and then points both (1) to the start of the `if` body and (2) to
|
||||
a different basic block that tests the truth value of y.
|
||||
|
||||
CFGs are useful as an intermediate representation of the code because
|
||||
|
@ -354,27 +345,24 @@ they are a convenient data structure for optimizations.
|
|||
AST to CFG to bytecode
|
||||
======================
|
||||
|
||||
The conversion of an ``AST`` to bytecode is initiated by a call to the function
|
||||
``_PyAST_Compile()`` in
|
||||
[Python/compile.c](https://github.com/python/cpython/blob/main/Python/compile.c).
|
||||
The conversion of an `AST` to bytecode is initiated by a call to the function
|
||||
`_PyAST_Compile()` in [Python/compile.c](../Python/compile.c).
|
||||
|
||||
The first step is to construct the symbol table. This is implemented by
|
||||
``_PySymtable_Build()`` in
|
||||
[Python/symtable.c](https://github.com/python/cpython/blob/main/Python/symtable.c).
|
||||
`_PySymtable_Build()` in [Python/symtable.c](../Python/symtable.c).
|
||||
This function begins by entering the starting code block for the AST (passed-in)
|
||||
and then calling the proper `symtable_visit_{xx}` function (with *xx* being the
|
||||
AST node type). Next, the AST tree is walked with the various code blocks that
|
||||
delineate the reach of a local variable as blocks are entered and exited using
|
||||
``symtable_enter_block()`` and ``symtable_exit_block()``, respectively.
|
||||
`symtable_enter_block()` and `symtable_exit_block()`, respectively.
|
||||
|
||||
Once the symbol table is created, the ``AST`` is transformed by ``compiler_codegen()``
|
||||
in [Python/compile.c](https://github.com/python/cpython/blob/main/Python/compile.c)
|
||||
into a sequence of pseudo instructions. These are similar to bytecode, but
|
||||
in some cases they are more abstract, and are resolved later into actual
|
||||
bytecode. The construction of this instruction sequence is handled by several
|
||||
functions that break the task down by various AST node types. The functions are
|
||||
all named `compiler_visit_{xx}` where *xx* is the name of the node type (such
|
||||
as ``stmt``, ``expr``, etc.). Each function receives a ``struct compiler *``
|
||||
Once the symbol table is created, the `AST` is transformed by `compiler_codegen()`
|
||||
in [Python/compile.c](../Python/compile.c) into a sequence of pseudo instructions.
|
||||
These are similar to bytecode, but in some cases they are more abstract, and are
|
||||
resolved later into actual bytecode. The construction of this instruction sequence
|
||||
is handled by several functions that break the task down by various AST node types.
|
||||
The functions are all named `compiler_visit_{xx}` where *xx* is the name of the node
|
||||
type (such as `stmt`, `expr`, etc.). Each function receives a `struct compiler *`
|
||||
and `{xx}_ty` where *xx* is the AST node type. Typically these functions
|
||||
consist of a large 'switch' statement, branching based on the kind of
|
||||
node type passed to it. Simple things are handled inline in the
|
||||
|
@ -382,242 +370,224 @@ node type passed to it. Simple things are handled inline in the
|
|||
functions named `compiler_{xx}` with *xx* being a descriptive name of what is
|
||||
being handled.
|
||||
|
||||
When transforming an arbitrary AST node, use the ``VISIT()`` macro.
|
||||
When transforming an arbitrary AST node, use the `VISIT()` macro.
|
||||
The appropriate `compiler_visit_{xx}` function is called, based on the value
|
||||
passed in for <node type> (so `VISIT({c}, expr, {node})` calls
|
||||
`compiler_visit_expr({c}, {node})`). The ``VISIT_SEQ()`` macro is very similar,
|
||||
`compiler_visit_expr({c}, {node})`). The `VISIT_SEQ()` macro is very similar,
|
||||
but is called on AST node sequences (those values that were created as
|
||||
arguments to a node that used the '*' modifier).
|
||||
|
||||
Emission of bytecode is handled by the following macros:
|
||||
|
||||
* ``ADDOP(struct compiler *, location, int)``
|
||||
* `ADDOP(struct compiler *, location, int)`
|
||||
add a specified opcode
|
||||
* ``ADDOP_IN_SCOPE(struct compiler *, location, int)``
|
||||
like ``ADDOP``, but also exits current scope; used for adding return value
|
||||
* `ADDOP_IN_SCOPE(struct compiler *, location, int)`
|
||||
like `ADDOP`, but also exits current scope; used for adding return value
|
||||
opcodes in lambdas and closures
|
||||
* ``ADDOP_I(struct compiler *, location, int, Py_ssize_t)``
|
||||
* `ADDOP_I(struct compiler *, location, int, Py_ssize_t)`
|
||||
add an opcode that takes an integer argument
|
||||
* ``ADDOP_O(struct compiler *, location, int, PyObject *, TYPE)``
|
||||
* `ADDOP_O(struct compiler *, location, int, PyObject *, TYPE)`
|
||||
add an opcode with the proper argument based on the position of the
|
||||
specified PyObject in PyObject sequence object, but with no handling of
|
||||
mangled names; used for when you
|
||||
need to do named lookups of objects such as globals, consts, or
|
||||
parameters where name mangling is not possible and the scope of the
|
||||
name is known; *TYPE* is the name of PyObject sequence
|
||||
(``names`` or ``varnames``)
|
||||
* ``ADDOP_N(struct compiler *, location, int, PyObject *, TYPE)``
|
||||
just like ``ADDOP_O``, but steals a reference to PyObject
|
||||
* ``ADDOP_NAME(struct compiler *, location, int, PyObject *, TYPE)``
|
||||
just like ``ADDOP_O``, but name mangling is also handled; used for
|
||||
(`names` or `varnames`)
|
||||
* `ADDOP_N(struct compiler *, location, int, PyObject *, TYPE)`
|
||||
just like `ADDOP_O`, but steals a reference to PyObject
|
||||
* `ADDOP_NAME(struct compiler *, location, int, PyObject *, TYPE)`
|
||||
just like `ADDOP_O`, but name mangling is also handled; used for
|
||||
attribute loading or importing based on name
|
||||
* ``ADDOP_LOAD_CONST(struct compiler *, location, PyObject *)``
|
||||
add the ``LOAD_CONST`` opcode with the proper argument based on the
|
||||
* `ADDOP_LOAD_CONST(struct compiler *, location, PyObject *)`
|
||||
add the `LOAD_CONST` opcode with the proper argument based on the
|
||||
position of the specified PyObject in the consts table.
|
||||
* ``ADDOP_LOAD_CONST_NEW(struct compiler *, location, PyObject *)``
|
||||
just like ``ADDOP_LOAD_CONST_NEW``, but steals a reference to PyObject
|
||||
* ``ADDOP_JUMP(struct compiler *, location, int, basicblock *)``
|
||||
* `ADDOP_LOAD_CONST_NEW(struct compiler *, location, PyObject *)`
|
||||
just like `ADDOP_LOAD_CONST_NEW`, but steals a reference to PyObject
|
||||
* `ADDOP_JUMP(struct compiler *, location, int, basicblock *)`
|
||||
create a jump to a basic block
|
||||
|
||||
The ``location`` argument is a struct with the source location to be
|
||||
The `location` argument is a struct with the source location to be
|
||||
associated with this instruction. It is typically extracted from an
|
||||
``AST`` node with the ``LOC`` macro. The ``NO_LOCATION`` can be used
|
||||
`AST` node with the `LOC` macro. The `NO_LOCATION` can be used
|
||||
for *synthetic* instructions, which we do not associate with a line
|
||||
number at this stage. For example, the implicit ``return None``
|
||||
number at this stage. For example, the implicit `return None`
|
||||
which is added at the end of a function is not associated with any
|
||||
line in the source code.
|
||||
|
||||
There are several helper functions that will emit pseudo-instructions
|
||||
and are named `compiler_{xx}()` where *xx* is what the function helps
|
||||
with (``list``, ``boolop``, etc.). A rather useful one is ``compiler_nameop()``.
|
||||
with (`list`, `boolop`, etc.). A rather useful one is `compiler_nameop()`.
|
||||
This function looks up the scope of a variable and, based on the
|
||||
expression context, emits the proper opcode to load, store, or delete
|
||||
the variable.
|
||||
|
||||
Once the instruction sequence is created, it is transformed into a CFG
|
||||
by ``_PyCfg_FromInstructionSequence()``. Then ``_PyCfg_OptimizeCodeUnit()``
|
||||
by `_PyCfg_FromInstructionSequence()`. Then `_PyCfg_OptimizeCodeUnit()`
|
||||
applies various peephole optimizations, and
|
||||
``_PyCfg_OptimizedCfgToInstructionSequence()`` converts the optimized ``CFG``
|
||||
`_PyCfg_OptimizedCfgToInstructionSequence()` converts the optimized `CFG`
|
||||
back into an instruction sequence. These conversions and optimizations are
|
||||
implemented in
|
||||
[Python/flowgraph.c](https://github.com/python/cpython/blob/main/Python/flowgraph.c).
|
||||
implemented in [Python/flowgraph.c](../Python/flowgraph.c).
|
||||
|
||||
Finally, the sequence of pseudo-instructions is converted into actual
|
||||
bytecode. This includes transforming pseudo instructions into actual instructions,
|
||||
converting jump targets from logical labels to relative offsets, and
|
||||
construction of the
|
||||
[exception table](exception_handling.md) and
|
||||
[locations table](https://github.com/python/cpython/blob/main/InternalDocs/locations.md).
|
||||
The bytecode and tables are then wrapped into a ``PyCodeObject`` along with additional
|
||||
metadata, including the ``consts`` and ``names`` arrays, information about function
|
||||
construction of the [exception table](exception_handling.md) and
|
||||
[locations table](locations.md).
|
||||
The bytecode and tables are then wrapped into a `PyCodeObject` along with additional
|
||||
metadata, including the `consts` and `names` arrays, information about function
|
||||
reference to the source code (filename, etc). All of this is implemented by
|
||||
``_PyAssemble_MakeCodeObject()`` in
|
||||
[Python/assemble.c](https://github.com/python/cpython/blob/main/Python/assemble.c).
|
||||
`_PyAssemble_MakeCodeObject()` in [Python/assemble.c](../Python/assemble.c).
|
||||
|
||||
|
||||
Code objects
|
||||
============
|
||||
|
||||
The result of ``PyAST_CompileObject()`` is a ``PyCodeObject`` which is defined in
|
||||
[Include/cpython/code.h](https://github.com/python/cpython/blob/main/Include/cpython/code.h).
|
||||
The result of `PyAST_CompileObject()` is a `PyCodeObject` which is defined in
|
||||
[Include/cpython/code.h](../Include/cpython/code.h).
|
||||
And with that you now have executable Python bytecode!
|
||||
|
||||
The code objects (byte code) are executed in
|
||||
[Python/ceval.c](https://github.com/python/cpython/blob/main/Python/ceval.c).
|
||||
The code objects (byte code) are executed in [Python/ceval.c](../Python/ceval.c).
|
||||
This file will also need a new case statement for the new opcode in the big switch
|
||||
statement in ``_PyEval_EvalFrameDefault()``.
|
||||
statement in `_PyEval_EvalFrameDefault()`.
|
||||
|
||||
|
||||
Important files
|
||||
===============
|
||||
|
||||
* [Parser/](https://github.com/python/cpython/blob/main/Parser/)
|
||||
* [Parser/](../Parser/)
|
||||
|
||||
* [Parser/Python.asdl](https://github.com/python/cpython/blob/main/Parser/Python.asdl):
|
||||
* [Parser/Python.asdl](../Parser/Python.asdl):
|
||||
ASDL syntax file.
|
||||
|
||||
* [Parser/asdl.py](https://github.com/python/cpython/blob/main/Parser/asdl.py):
|
||||
* [Parser/asdl.py](../Parser/asdl.py):
|
||||
Parser for ASDL definition files.
|
||||
Reads in an ASDL description and parses it into an AST that describes it.
|
||||
|
||||
* [Parser/asdl_c.py](https://github.com/python/cpython/blob/main/Parser/asdl_c.py):
|
||||
* [Parser/asdl_c.py](../Parser/asdl_c.py):
|
||||
Generate C code from an ASDL description. Generates
|
||||
[Python/Python-ast.c](https://github.com/python/cpython/blob/main/Python/Python-ast.c)
|
||||
and
|
||||
[Include/internal/pycore_ast.h](https://github.com/python/cpython/blob/main/Include/internal/pycore_ast.h).
|
||||
[Python/Python-ast.c](../Python/Python-ast.c) and
|
||||
[Include/internal/pycore_ast.h](../Include/internal/pycore_ast.h).
|
||||
|
||||
* [Parser/parser.c](https://github.com/python/cpython/blob/main/Parser/parser.c):
|
||||
The new PEG parser introduced in Python 3.9.
|
||||
Generated by
|
||||
[Tools/peg_generator/pegen/c_generator.py](https://github.com/python/cpython/blob/main/Tools/peg_generator/pegen/c_generator.py)
|
||||
from the grammar [Grammar/python.gram](https://github.com/python/cpython/blob/main/Grammar/python.gram).
|
||||
* [Parser/parser.c](../Parser/parser.c):
|
||||
The new PEG parser introduced in Python 3.9. Generated by
|
||||
[Tools/peg_generator/pegen/c_generator.py](../Tools/peg_generator/pegen/c_generator.py)
|
||||
from the grammar [Grammar/python.gram](../Grammar/python.gram).
|
||||
Creates the AST from source code. Rule functions for their corresponding production
|
||||
rules are found here.
|
||||
|
||||
* [Parser/peg_api.c](https://github.com/python/cpython/blob/main/Parser/peg_api.c):
|
||||
Contains high-level functions which are
|
||||
used by the interpreter to create an AST from source code.
|
||||
* [Parser/peg_api.c](../Parser/peg_api.c):
|
||||
Contains high-level functions which are used by the interpreter to create
|
||||
an AST from source code.
|
||||
|
||||
* [Parser/pegen.c](https://github.com/python/cpython/blob/main/Parser/pegen.c):
|
||||
* [Parser/pegen.c](../Parser/pegen.c):
|
||||
Contains helper functions which are used by functions in
|
||||
[Parser/parser.c](https://github.com/python/cpython/blob/main/Parser/parser.c)
|
||||
to construct the AST. Also contains helper functions which help raise better error messages
|
||||
when parsing source code.
|
||||
[Parser/parser.c](../Parser/parser.c) to construct the AST. Also contains
|
||||
helper functions which help raise better error messages when parsing source code.
|
||||
|
||||
* [Parser/pegen.h](https://github.com/python/cpython/blob/main/Parser/pegen.h):
|
||||
Header file for the corresponding
|
||||
[Parser/pegen.c](https://github.com/python/cpython/blob/main/Parser/pegen.c).
|
||||
Also contains definitions of the ``Parser`` and ``Token`` structs.
|
||||
* [Parser/pegen.h](../Parser/pegen.h):
|
||||
Header file for the corresponding [Parser/pegen.c](../Parser/pegen.c).
|
||||
Also contains definitions of the `Parser` and `Token` structs.
|
||||
|
||||
* [Python/](https://github.com/python/cpython/blob/main/Python)
|
||||
* [Python/](../Python)
|
||||
|
||||
* [Python/Python-ast.c](https://github.com/python/cpython/blob/main/Python/Python-ast.c):
|
||||
* [Python/Python-ast.c](../Python/Python-ast.c):
|
||||
Creates C structs corresponding to the ASDL types. Also contains code for
|
||||
marshalling AST nodes (core ASDL types have marshalling code in
|
||||
[Python/asdl.c](https://github.com/python/cpython/blob/main/Python/asdl.c)).
|
||||
File automatically generated by
|
||||
[Parser/asdl_c.py](https://github.com/python/cpython/blob/main/Parser/asdl_c.py).
|
||||
[Python/asdl.c](../Python/asdl.c)).
|
||||
File automatically generated by [Parser/asdl_c.py](../Parser/asdl_c.py).
|
||||
This file must be committed separately after every grammar change
|
||||
is committed since the ``__version__`` value is set to the latest
|
||||
is committed since the `__version__` value is set to the latest
|
||||
grammar change revision number.
|
||||
|
||||
* [Python/asdl.c](https://github.com/python/cpython/blob/main/Python/asdl.c):
|
||||
* [Python/asdl.c](../Python/asdl.c):
|
||||
Contains code to handle the ASDL sequence type.
|
||||
Also has code to handle marshalling the core ASDL types, such as number
|
||||
and identifier. Used by
|
||||
[Python/Python-ast.c](https://github.com/python/cpython/blob/main/Python/Python-ast.c)
|
||||
and identifier. Used by [Python/Python-ast.c](../Python/Python-ast.c)
|
||||
for marshalling AST nodes.
|
||||
|
||||
* [Python/ast.c](https://github.com/python/cpython/blob/main/Python/ast.c):
|
||||
* [Python/ast.c](../Python/ast.c):
|
||||
Used for validating the AST.
|
||||
|
||||
* [Python/ast_opt.c](https://github.com/python/cpython/blob/main/Python/ast_opt.c):
|
||||
* [Python/ast_opt.c](../Python/ast_opt.c):
|
||||
Optimizes the AST.
|
||||
|
||||
* [Python/ast_unparse.c](https://github.com/python/cpython/blob/main/Python/ast_unparse.c):
|
||||
* [Python/ast_unparse.c](../Python/ast_unparse.c):
|
||||
Converts the AST expression node back into a string (for string annotations).
|
||||
|
||||
* [Python/ceval.c](https://github.com/python/cpython/blob/main/Python/ceval.c):
|
||||
* [Python/ceval.c](../Python/ceval.c):
|
||||
Executes byte code (aka, eval loop).
|
||||
|
||||
* [Python/symtable.c](https://github.com/python/cpython/blob/main/Python/symtable.c):
|
||||
* [Python/symtable.c](../Python/symtable.c):
|
||||
Generates a symbol table from AST.
|
||||
|
||||
* [Python/pyarena.c](https://github.com/python/cpython/blob/main/Python/pyarena.c):
|
||||
* [Python/pyarena.c](../Python/pyarena.c):
|
||||
Implementation of the arena memory manager.
|
||||
|
||||
* [Python/compile.c](https://github.com/python/cpython/blob/main/Python/compile.c):
|
||||
* [Python/compile.c](../Python/compile.c):
|
||||
Emits pseudo bytecode based on the AST.
|
||||
|
||||
* [Python/flowgraph.c](https://github.com/python/cpython/blob/main/Python/flowgraph.c):
|
||||
* [Python/flowgraph.c](../Python/flowgraph.c):
|
||||
Implements peephole optimizations.
|
||||
|
||||
* [Python/assemble.c](https://github.com/python/cpython/blob/main/Python/assemble.c):
|
||||
* [Python/assemble.c](../Python/assemble.c):
|
||||
Constructs a code object from a sequence of pseudo instructions.
|
||||
|
||||
* [Python/instruction_sequence.c](https://github.com/python/cpython/blob/main/Python/instruction_sequence.c):
|
||||
* [Python/instruction_sequence.c](../Python/instruction_sequence.c):
|
||||
A data structure representing a sequence of bytecode-like pseudo-instructions.
|
||||
|
||||
* [Include/](https://github.com/python/cpython/blob/main/Include/)
|
||||
* [Include/](../Include/)
|
||||
|
||||
* [Include/cpython/code.h](https://github.com/python/cpython/blob/main/Include/cpython/code.h)
|
||||
: Header file for
|
||||
[Objects/codeobject.c](https://github.com/python/cpython/blob/main/Objects/codeobject.c);
|
||||
contains definition of ``PyCodeObject``.
|
||||
* [Include/cpython/code.h](../Include/cpython/code.h)
|
||||
: Header file for [Objects/codeobject.c](../Objects/codeobject.c);
|
||||
contains definition of `PyCodeObject`.
|
||||
|
||||
* [Include/opcode.h](https://github.com/python/cpython/blob/main/Include/opcode.h)
|
||||
: One of the files that must be modified if
|
||||
[Lib/opcode.py](https://github.com/python/cpython/blob/main/Lib/opcode.py) is.
|
||||
* [Include/opcode.h](../Include/opcode.h)
|
||||
: One of the files that must be modified whenever
|
||||
[Lib/opcode.py](../Lib/opcode.py) is.
|
||||
|
||||
* [Include/internal/pycore_ast.h](https://github.com/python/cpython/blob/main/Include/internal/pycore_ast.h)
|
||||
* [Include/internal/pycore_ast.h](../Include/internal/pycore_ast.h)
|
||||
: Contains the actual definitions of the C structs as generated by
|
||||
[Python/Python-ast.c](https://github.com/python/cpython/blob/main/Python/Python-ast.c)
|
||||
Automatically generated by
|
||||
[Parser/asdl_c.py](https://github.com/python/cpython/blob/main/Parser/asdl_c.py).
|
||||
[Python/Python-ast.c](../Python/Python-ast.c).
|
||||
Automatically generated by [Parser/asdl_c.py](../Parser/asdl_c.py).
|
||||
|
||||
* [Include/internal/pycore_asdl.h](https://github.com/python/cpython/blob/main/Include/internal/pycore_asdl.h)
|
||||
: Header for the corresponding
|
||||
[Python/ast.c](https://github.com/python/cpython/blob/main/Python/ast.c).
|
||||
* [Include/internal/pycore_asdl.h](../Include/internal/pycore_asdl.h)
|
||||
: Header for the corresponding [Python/ast.c](../Python/ast.c).
|
||||
|
||||
* [Include/internal/pycore_ast.h](https://github.com/python/cpython/blob/main/Include/internal/pycore_ast.h)
|
||||
: Declares ``_PyAST_Validate()`` external (from
|
||||
[Python/ast.c](https://github.com/python/cpython/blob/main/Python/ast.c)).
|
||||
* [Include/internal/pycore_ast.h](../Include/internal/pycore_ast.h)
|
||||
: Declares `_PyAST_Validate()` external (from [Python/ast.c](../Python/ast.c)).
|
||||
|
||||
* [Include/internal/pycore_symtable.h](https://github.com/python/cpython/blob/main/Include/internal/pycore_symtable.h)
|
||||
: Header for
|
||||
[Python/symtable.c](https://github.com/python/cpython/blob/main/Python/symtable.c).
|
||||
``struct symtable`` and ``PySTEntryObject`` are defined here.
|
||||
* [Include/internal/pycore_symtable.h](../Include/internal/pycore_symtable.h)
|
||||
: Header for [Python/symtable.c](../Python/symtable.c).
|
||||
`struct symtable` and `PySTEntryObject` are defined here.
|
||||
|
||||
* [Include/internal/pycore_parser.h](https://github.com/python/cpython/blob/main/Include/internal/pycore_parser.h)
|
||||
: Header for the corresponding
|
||||
[Parser/peg_api.c](https://github.com/python/cpython/blob/main/Parser/peg_api.c).
|
||||
* [Include/internal/pycore_parser.h](../Include/internal/pycore_parser.h)
|
||||
: Header for the corresponding [Parser/peg_api.c](../Parser/peg_api.c).
|
||||
|
||||
* [Include/internal/pycore_pyarena.h](https://github.com/python/cpython/blob/main/Include/internal/pycore_pyarena.h)
|
||||
: Header file for the corresponding
|
||||
[Python/pyarena.c](https://github.com/python/cpython/blob/main/Python/pyarena.c).
|
||||
* [Include/internal/pycore_pyarena.h](../Include/internal/pycore_pyarena.h)
|
||||
: Header file for the corresponding [Python/pyarena.c](../Python/pyarena.c).
|
||||
|
||||
* [Include/opcode_ids.h](https://github.com/python/cpython/blob/main/Include/opcode_ids.h)
|
||||
: List of opcodes. Generated from
|
||||
[Python/bytecodes.c](https://github.com/python/cpython/blob/main/Python/bytecodes.c)
|
||||
* [Include/opcode_ids.h](../Include/opcode_ids.h)
|
||||
: List of opcodes. Generated from [Python/bytecodes.c](../Python/bytecodes.c)
|
||||
by
|
||||
[Tools/cases_generator/opcode_id_generator.py](https://github.com/python/cpython/blob/main/Tools/cases_generator/opcode_id_generator.py).
|
||||
[Tools/cases_generator/opcode_id_generator.py](../Tools/cases_generator/opcode_id_generator.py).
|
||||
|
||||
* [Objects/](https://github.com/python/cpython/blob/main/Objects/)
|
||||
* [Objects/](../Objects/)
|
||||
|
||||
* [Objects/codeobject.c](https://github.com/python/cpython/blob/main/Objects/codeobject.c)
|
||||
* [Objects/codeobject.c](../Objects/codeobject.c)
|
||||
: Contains PyCodeObject-related code.
|
||||
|
||||
* [Objects/frameobject.c](https://github.com/python/cpython/blob/main/Objects/frameobject.c)
|
||||
: Contains the ``frame_setlineno()`` function which should determine whether it is allowed
|
||||
* [Objects/frameobject.c](../Objects/frameobject.c)
|
||||
: Contains the `frame_setlineno()` function which should determine whether it is allowed
|
||||
to make a jump between two points in a bytecode.
|
||||
|
||||
* [Lib/](https://github.com/python/cpython/blob/main/Lib/)
|
||||
* [Lib/](../Lib/)
|
||||
|
||||
* [Lib/opcode.py](https://github.com/python/cpython/blob/main/Lib/opcode.py)
|
||||
* [Lib/opcode.py](../Lib/opcode.py)
|
||||
: opcode utilities exposed to Python.
|
||||
|
||||
* [Include/core/pycore_magic_number.h](https://github.com/python/cpython/blob/main/Include/internal/pycore_magic_number.h)
|
||||
: Home of the magic number (named ``MAGIC_NUMBER``) for bytecode versioning.
|
||||
* [Include/core/pycore_magic_number.h](../Include/internal/pycore_magic_number.h)
|
||||
: Home of the magic number (named `MAGIC_NUMBER`) for bytecode versioning.
|
||||
|
||||
|
||||
Objects
|
||||
|
@ -625,7 +595,7 @@ Objects
|
|||
|
||||
* [Locations](locations.md): Describes the location table
|
||||
* [Frames](frames.md): Describes frames and the frame stack
|
||||
* [Objects/object_layout.md](https://github.com/python/cpython/blob/main/Objects/object_layout.md): Describes object layout for 3.11 and later
|
||||
* [Objects/object_layout.md](../Objects/object_layout.md): Describes object layout for 3.11 and later
|
||||
* [Exception Handling](exception_handling.md): Describes the exception table
|
||||
|
||||
|
||||
|
|
|
@ -68,18 +68,16 @@ Handling Exceptions
|
|||
-------------------
|
||||
|
||||
At runtime, when an exception occurs, the interpreter calls
|
||||
``get_exception_handler()`` in
|
||||
[Python/ceval.c](https://github.com/python/cpython/blob/main/Python/ceval.c)
|
||||
`get_exception_handler()` in [Python/ceval.c](../Python/ceval.c)
|
||||
to look up the offset of the current instruction in the exception
|
||||
table. If it finds a handler, control flow transfers to it. Otherwise, the
|
||||
exception bubbles up to the caller, and the caller's frame is
|
||||
checked for a handler covering the `CALL` instruction. This
|
||||
repeats until a handler is found or the topmost frame is reached.
|
||||
If no handler is found, then the interpreter function
|
||||
(``_PyEval_EvalFrameDefault()``) returns NULL. During unwinding,
|
||||
(`_PyEval_EvalFrameDefault()`) returns NULL. During unwinding,
|
||||
the traceback is constructed as each frame is added to it by
|
||||
``PyTraceBack_Here()``, which is in
|
||||
[Python/traceback.c](https://github.com/python/cpython/blob/main/Python/traceback.c).
|
||||
`PyTraceBack_Here()`, which is in [Python/traceback.c](../Python/traceback.c).
|
||||
|
||||
Along with the location of an exception handler, each entry of the
|
||||
exception table also contains the stack depth of the `try` instruction
|
||||
|
@ -174,22 +172,20 @@ which is then encoded as:
|
|||
|
||||
for a total of five bytes.
|
||||
|
||||
The code to construct the exception table is in ``assemble_exception_table()``
|
||||
in [Python/assemble.c](https://github.com/python/cpython/blob/main/Python/assemble.c).
|
||||
The code to construct the exception table is in `assemble_exception_table()`
|
||||
in [Python/assemble.c](../Python/assemble.c).
|
||||
|
||||
The interpreter's function to lookup the table by instruction offset is
|
||||
``get_exception_handler()`` in
|
||||
[Python/ceval.c](https://github.com/python/cpython/blob/main/Python/ceval.c).
|
||||
The Python function ``_parse_exception_table()`` in
|
||||
[Lib/dis.py](https://github.com/python/cpython/blob/main/Lib/dis.py)
|
||||
`get_exception_handler()` in [Python/ceval.c](../Python/ceval.c).
|
||||
The Python function `_parse_exception_table()` in [Lib/dis.py](../Lib/dis.py)
|
||||
returns the exception table content as a list of namedtuple instances.
|
||||
|
||||
Exception Chaining Implementation
|
||||
---------------------------------
|
||||
|
||||
[Exception chaining](https://docs.python.org/dev/tutorial/errors.html#exception-chaining)
|
||||
refers to setting the ``__context__`` and ``__cause__`` fields of an exception as it is
|
||||
being raised. The ``__context__`` field is set by ``_PyErr_SetObject()`` in
|
||||
[Python/errors.c](https://github.com/python/cpython/blob/main/Python/errors.c)
|
||||
(which is ultimately called by all ``PyErr_Set*()`` functions).
|
||||
The ``__cause__`` field (explicit chaining) is set by the ``RAISE_VARARGS`` bytecode.
|
||||
refers to setting the `__context__` and `__cause__` fields of an exception as it is
|
||||
being raised. The `__context__` field is set by `_PyErr_SetObject()` in
|
||||
[Python/errors.c](../Python/errors.c) (which is ultimately called by all
|
||||
`PyErr_Set*()` functions). The `__cause__` field (explicit chaining) is set by
|
||||
the `RAISE_VARARGS` bytecode.
|
||||
|
|
|
@ -10,20 +10,19 @@ of three conceptual sections:
|
|||
globals dict, code object, instruction pointer, stack depth, the
|
||||
previous frame, etc.
|
||||
|
||||
The definition of the ``_PyInterpreterFrame`` struct is in
|
||||
[Include/internal/pycore_frame.h](https://github.com/python/cpython/blob/main/Include/internal/pycore_frame.h).
|
||||
The definition of the `_PyInterpreterFrame` struct is in
|
||||
[Include/internal/pycore_frame.h](../Include/internal/pycore_frame.h).
|
||||
|
||||
# Allocation
|
||||
|
||||
Python semantics allows frames to outlive the activation, so they need to
|
||||
be allocated outside the C call stack. To reduce overhead and improve locality
|
||||
of reference, most frames are allocated contiguously in a per-thread stack
|
||||
(see ``_PyThreadState_PushFrame`` in
|
||||
[Python/pystate.c](https://github.com/python/cpython/blob/main/Python/pystate.c)).
|
||||
(see `_PyThreadState_PushFrame` in [Python/pystate.c](../Python/pystate.c)).
|
||||
|
||||
Frames of generators and coroutines are embedded in the generator and coroutine
|
||||
objects, so are not allocated in the per-thread stack. See ``PyGenObject`` in
|
||||
[Include/internal/pycore_genobject.h](https://github.com/python/cpython/blob/main/Include/internal/pycore_genobject.h).
|
||||
objects, so are not allocated in the per-thread stack. See `PyGenObject` in
|
||||
[Include/internal/pycore_genobject.h](../Include/internal/pycore_genobject.h).
|
||||
|
||||
## Layout
|
||||
|
||||
|
@ -82,16 +81,15 @@ frames for each activation, but with low runtime overhead.
|
|||
|
||||
### Generators and Coroutines
|
||||
|
||||
Generators (objects of type ``PyGen_Type``, ``PyCoro_Type`` or
|
||||
``PyAsyncGen_Type``) have a `_PyInterpreterFrame` embedded in them, so
|
||||
Generators (objects of type `PyGen_Type`, `PyCoro_Type` or
|
||||
`PyAsyncGen_Type`) have a `_PyInterpreterFrame` embedded in them, so
|
||||
that they can be created with a single memory allocation.
|
||||
When such an embedded frame is iterated or awaited, it can be linked with
|
||||
frames on the per-thread stack via the linkage fields.
|
||||
|
||||
If a frame object associated with a generator outlives the generator, then
|
||||
the embedded `_PyInterpreterFrame` is copied into the frame object (see
|
||||
``take_ownership()`` in
|
||||
[Python/frame.c](https://github.com/python/cpython/blob/main/Python/frame.c)).
|
||||
`take_ownership()` in [Python/frame.c](../Python/frame.c)).
|
||||
|
||||
### Field names
|
||||
|
||||
|
|
|
@ -12,7 +12,7 @@ a local variable in some C function. When an object’s reference count becomes
|
|||
the object is deallocated. If it contains references to other objects, their
|
||||
reference counts are decremented. Those other objects may be deallocated in turn, if
|
||||
this decrement makes their reference count become zero, and so on. The reference
|
||||
count field can be examined using the ``sys.getrefcount()`` function (notice that the
|
||||
count field can be examined using the `sys.getrefcount()` function (notice that the
|
||||
value returned by this function is always 1 more as the function also has a reference
|
||||
to the object when called):
|
||||
|
||||
|
@ -39,7 +39,7 @@ cycles. For instance, consider this code:
|
|||
>>> del container
|
||||
```
|
||||
|
||||
In this example, ``container`` holds a reference to itself, so even when we remove
|
||||
In this example, `container` holds a reference to itself, so even when we remove
|
||||
our reference to it (the variable "container") the reference count never falls to 0
|
||||
because it still has its own internal reference. Therefore it would never be
|
||||
cleaned just by simple reference counting. For this reason some additional machinery
|
||||
|
@ -127,7 +127,7 @@ GC for the free-threaded build
|
|||
------------------------------
|
||||
|
||||
In the free-threaded build, Python objects contain a 1-byte field
|
||||
``ob_gc_bits`` that is used to track garbage collection related state. The
|
||||
`ob_gc_bits` that is used to track garbage collection related state. The
|
||||
field exists in all objects, including ones that do not support cyclic
|
||||
garbage collection. The field is used to identify objects that are tracked
|
||||
by the collector, ensure that finalizers are called only once per object,
|
||||
|
@ -146,14 +146,14 @@ and, during garbage collection, differentiate reachable vs. unreachable objects.
|
|||
| ... |
|
||||
```
|
||||
|
||||
Note that not all fields are to scale. ``pad`` is two bytes, ``ob_mutex`` and
|
||||
``ob_gc_bits`` are each one byte, and ``ob_ref_local`` is four bytes. The
|
||||
other fields, ``ob_tid``, ``ob_ref_shared``, and ``ob_type``, are all
|
||||
Note that not all fields are to scale. `pad` is two bytes, `ob_mutex` and
|
||||
`ob_gc_bits` are each one byte, and `ob_ref_local` is four bytes. The
|
||||
other fields, `ob_tid`, `ob_ref_shared`, and `ob_type`, are all
|
||||
pointer-sized (that is, eight bytes on a 64-bit platform).
|
||||
|
||||
|
||||
The garbage collector also temporarily repurposes the ``ob_tid`` (thread ID)
|
||||
and ``ob_ref_local`` (local reference count) fields for other purposes during
|
||||
The garbage collector also temporarily repurposes the `ob_tid` (thread ID)
|
||||
and `ob_ref_local` (local reference count) fields for other purposes during
|
||||
collections.
|
||||
|
||||
|
||||
|
@ -165,17 +165,17 @@ objects with GC support. These APIs can be found in the
|
|||
[Garbage Collector C API documentation](https://docs.python.org/3/c-api/gcsupport.html).
|
||||
|
||||
Apart from this object structure, the type object for objects supporting garbage
|
||||
collection must include the ``Py_TPFLAGS_HAVE_GC`` in its ``tp_flags`` slot and
|
||||
provide an implementation of the ``tp_traverse`` handler. Unless it can be proven
|
||||
collection must include the `Py_TPFLAGS_HAVE_GC` in its `tp_flags` slot and
|
||||
provide an implementation of the `tp_traverse` handler. Unless it can be proven
|
||||
that the objects cannot form reference cycles with only objects of its type or unless
|
||||
the type is immutable, a ``tp_clear`` implementation must also be provided.
|
||||
the type is immutable, a `tp_clear` implementation must also be provided.
|
||||
|
||||
|
||||
Identifying reference cycles
|
||||
============================
|
||||
|
||||
The algorithm that CPython uses to detect those reference cycles is
|
||||
implemented in the ``gc`` module. The garbage collector **only focuses**
|
||||
implemented in the `gc` module. The garbage collector **only focuses**
|
||||
on cleaning container objects (that is, objects that can contain a reference
|
||||
to one or more objects). These can be arrays, dictionaries, lists, custom
|
||||
class instances, classes in extension modules, etc. One could think that
|
||||
|
@ -195,7 +195,7 @@ the interpreter create cycles everywhere. Some notable examples:
|
|||
To correctly dispose of these objects once they become unreachable, they need
|
||||
to be identified first. To understand how the algorithm works, let’s take
|
||||
the case of a circular linked list which has one link referenced by a
|
||||
variable ``A``, and one self-referencing object which is completely
|
||||
variable `A`, and one self-referencing object which is completely
|
||||
unreachable:
|
||||
|
||||
```pycon
|
||||
|
@ -234,7 +234,7 @@ objects have a refcount larger than the number of incoming references from
|
|||
within the candidate set.
|
||||
|
||||
Every object that supports garbage collection will have an extra reference
|
||||
count field initialized to the reference count (``gc_ref`` in the figures)
|
||||
count field initialized to the reference count (`gc_ref` in the figures)
|
||||
of that object when the algorithm starts. This is because the algorithm needs
|
||||
to modify the reference count to do the computations and in this way the
|
||||
interpreter will not modify the real reference count field.
|
||||
|
@ -243,43 +243,43 @@ interpreter will not modify the real reference count field.
|
|||
|
||||
The GC then iterates over all containers in the first list and decrements by one the
|
||||
`gc_ref` field of any other object that container is referencing. Doing
|
||||
this makes use of the ``tp_traverse`` slot in the container class (implemented
|
||||
this makes use of the `tp_traverse` slot in the container class (implemented
|
||||
using the C API or inherited by a superclass) to know what objects are referenced by
|
||||
each container. After all the objects have been scanned, only the objects that have
|
||||
references from outside the “objects to scan” list will have ``gc_ref > 0``.
|
||||
references from outside the “objects to scan” list will have `gc_ref > 0`.
|
||||
|
||||
![gc-image2](images/python-cyclic-gc-2-new-page.png)
|
||||
|
||||
Notice that having ``gc_ref == 0`` does not imply that the object is unreachable.
|
||||
This is because another object that is reachable from the outside (``gc_ref > 0``)
|
||||
can still have references to it. For instance, the ``link_2`` object in our example
|
||||
ended having ``gc_ref == 0`` but is referenced still by the ``link_1`` object that
|
||||
Notice that having `gc_ref == 0` does not imply that the object is unreachable.
|
||||
This is because another object that is reachable from the outside (`gc_ref > 0`)
|
||||
can still have references to it. For instance, the `link_2` object in our example
|
||||
ended having `gc_ref == 0` but is referenced still by the `link_1` object that
|
||||
is reachable from the outside. To obtain the set of objects that are really
|
||||
unreachable, the garbage collector re-scans the container objects using the
|
||||
``tp_traverse`` slot; this time with a different traverse function that marks objects with
|
||||
``gc_ref == 0`` as "tentatively unreachable" and then moves them to the
|
||||
`tp_traverse` slot; this time with a different traverse function that marks objects with
|
||||
`gc_ref == 0` as "tentatively unreachable" and then moves them to the
|
||||
tentatively unreachable list. The following image depicts the state of the lists in a
|
||||
moment when the GC processed the ``link_3`` and ``link_4`` objects but has not
|
||||
processed ``link_1`` and ``link_2`` yet.
|
||||
moment when the GC processed the `link_3` and `link_4` objects but has not
|
||||
processed `link_1` and `link_2` yet.
|
||||
|
||||
![gc-image3](images/python-cyclic-gc-3-new-page.png)
|
||||
|
||||
Then the GC scans the next ``link_1`` object. Because it has ``gc_ref == 1``,
|
||||
Then the GC scans the next `link_1` object. Because it has `gc_ref == 1`,
|
||||
the gc does not do anything special because it knows it has to be reachable (and is
|
||||
already in what will become the reachable list):
|
||||
|
||||
![gc-image4](images/python-cyclic-gc-4-new-page.png)
|
||||
|
||||
When the GC encounters an object which is reachable (``gc_ref > 0``), it traverses
|
||||
its references using the ``tp_traverse`` slot to find all the objects that are
|
||||
When the GC encounters an object which is reachable (`gc_ref > 0`), it traverses
|
||||
its references using the `tp_traverse` slot to find all the objects that are
|
||||
reachable from it, moving them to the end of the list of reachable objects (where
|
||||
they started originally) and setting its ``gc_ref`` field to 1. This is what happens
|
||||
to ``link_2`` and ``link_3`` below as they are reachable from ``link_1``. From the
|
||||
state in the previous image and after examining the objects referred to by ``link_1``
|
||||
the GC knows that ``link_3`` is reachable after all, so it is moved back to the
|
||||
original list and its ``gc_ref`` field is set to 1 so that if the GC visits it again,
|
||||
they started originally) and setting its `gc_ref` field to 1. This is what happens
|
||||
to `link_2` and `link_3` below as they are reachable from `link_1`. From the
|
||||
state in the previous image and after examining the objects referred to by `link_1`
|
||||
the GC knows that `link_3` is reachable after all, so it is moved back to the
|
||||
original list and its `gc_ref` field is set to 1 so that if the GC visits it again,
|
||||
it will know that it's reachable. To avoid visiting an object twice, the GC marks all
|
||||
objects that have already been visited once (by unsetting the ``PREV_MASK_COLLECTING``
|
||||
objects that have already been visited once (by unsetting the `PREV_MASK_COLLECTING`
|
||||
flag) so that if an object that has already been processed is referenced by some other
|
||||
object, the GC does not process it twice.
|
||||
|
||||
|
@ -295,7 +295,7 @@ list are really unreachable and can thus be garbage collected.
|
|||
Pragmatically, it's important to note that no recursion is required by any of this,
|
||||
and neither does it in any other way require additional memory proportional to the
|
||||
number of objects, number of pointers, or the lengths of pointer chains. Apart from
|
||||
``O(1)`` storage for internal C needs, the objects themselves contain all the storage
|
||||
`O(1)` storage for internal C needs, the objects themselves contain all the storage
|
||||
the GC algorithms require.
|
||||
|
||||
Why moving unreachable objects is better
|
||||
|
@ -331,7 +331,7 @@ with the objective of completely destroying these objects. Roughly, the process
|
|||
follows these steps in order:
|
||||
|
||||
1. Handle and clear weak references (if any). Weak references to unreachable objects
|
||||
are set to ``None``. If the weak reference has an associated callback, the callback
|
||||
are set to `None`. If the weak reference has an associated callback, the callback
|
||||
is enqueued to be called once the clearing of weak references is finished. We only
|
||||
invoke callbacks for weak references that are themselves reachable. If both the weak
|
||||
reference and the pointed-to object are unreachable we do not execute the callback.
|
||||
|
@ -339,15 +339,15 @@ follows these steps in order:
|
|||
object and support for weak references predates support for object resurrection.
|
||||
Ignoring the weak reference's callback is fine because both the object and the weakref
|
||||
are going away, so it's legitimate to say the weak reference is going away first.
|
||||
2. If an object has legacy finalizers (``tp_del`` slot) move it to the
|
||||
``gc.garbage`` list.
|
||||
3. Call the finalizers (``tp_finalize`` slot) and mark the objects as already
|
||||
2. If an object has legacy finalizers (`tp_del` slot) move it to the
|
||||
`gc.garbage` list.
|
||||
3. Call the finalizers (`tp_finalize` slot) and mark the objects as already
|
||||
finalized to avoid calling finalizers twice if the objects are resurrected or
|
||||
if other finalizers have removed the object first.
|
||||
4. Deal with resurrected objects. If some objects have been resurrected, the GC
|
||||
finds the new subset of objects that are still unreachable by running the cycle
|
||||
detection algorithm again and continues with them.
|
||||
5. Call the ``tp_clear`` slot of every object so all internal links are broken and
|
||||
5. Call the `tp_clear` slot of every object so all internal links are broken and
|
||||
the reference counts fall to 0, triggering the destruction of all unreachable
|
||||
objects.
|
||||
|
||||
|
@ -376,9 +376,9 @@ generations. Every collection operates on the entire heap.
|
|||
|
||||
In order to decide when to run, the collector keeps track of the number of object
|
||||
allocations and deallocations since the last collection. When the number of
|
||||
allocations minus the number of deallocations exceeds ``threshold_0``,
|
||||
allocations minus the number of deallocations exceeds `threshold_0`,
|
||||
collection starts. Initially only generation 0 is examined. If generation 0 has
|
||||
been examined more than ``threshold_1`` times since generation 1 has been
|
||||
been examined more than `threshold_1` times since generation 1 has been
|
||||
examined, then generation 1 is examined as well. With generation 2,
|
||||
things are a bit more complicated; see
|
||||
[Collecting the oldest generation](#Collecting-the-oldest-generation) for
|
||||
|
@ -393,8 +393,8 @@ function:
|
|||
```
|
||||
|
||||
The content of these generations can be examined using the
|
||||
``gc.get_objects(generation=NUM)`` function and collections can be triggered
|
||||
specifically in a generation by calling ``gc.collect(generation=NUM)``.
|
||||
`gc.get_objects(generation=NUM)` function and collections can be triggered
|
||||
specifically in a generation by calling `gc.collect(generation=NUM)`.
|
||||
|
||||
```pycon
|
||||
>>> import gc
|
||||
|
@ -433,7 +433,7 @@ Collecting the oldest generation
|
|||
--------------------------------
|
||||
|
||||
In addition to the various configurable thresholds, the GC only triggers a full
|
||||
collection of the oldest generation if the ratio ``long_lived_pending / long_lived_total``
|
||||
collection of the oldest generation if the ratio `long_lived_pending / long_lived_total`
|
||||
is above a given value (hardwired to 25%). The reason is that, while "non-full"
|
||||
collections (that is, collections of the young and middle generations) will always
|
||||
examine roughly the same number of objects (determined by the aforementioned
|
||||
|
@ -463,12 +463,12 @@ used for tags or to keep other information – most often as a bit field (each
|
|||
bit a separate tag) – as long as code that uses the pointer masks out these
|
||||
bits before accessing memory. For example, on a 32-bit architecture (for both
|
||||
addresses and word size), a word is 32 bits = 4 bytes, so word-aligned
|
||||
addresses are always a multiple of 4, hence end in ``00``, leaving the last 2 bits
|
||||
addresses are always a multiple of 4, hence end in `00`, leaving the last 2 bits
|
||||
available; while on a 64-bit architecture, a word is 64 bits = 8 bytes, so
|
||||
word-aligned addresses end in ``000``, leaving the last 3 bits available.
|
||||
word-aligned addresses end in `000`, leaving the last 3 bits available.
|
||||
|
||||
The CPython GC makes use of two fat pointers that correspond to the extra fields
|
||||
of ``PyGC_Head`` discussed in the `Memory layout and object structure`_ section:
|
||||
of `PyGC_Head` discussed in the `Memory layout and object structure`_ section:
|
||||
|
||||
> [!WARNING]
|
||||
> Because the presence of extra information, "tagged" or "fat" pointers cannot be
|
||||
|
@ -478,23 +478,23 @@ of ``PyGC_Head`` discussed in the `Memory layout and object structure`_ section:
|
|||
> normally assume the pointers inside the lists are in a consistent state.
|
||||
|
||||
|
||||
- The ``_gc_prev`` field is normally used as the "previous" pointer to maintain the
|
||||
- The `_gc_prev` field is normally used as the "previous" pointer to maintain the
|
||||
doubly linked list but its lowest two bits are used to keep the flags
|
||||
``PREV_MASK_COLLECTING`` and ``_PyGC_PREV_MASK_FINALIZED``. Between collections,
|
||||
the only flag that can be present is ``_PyGC_PREV_MASK_FINALIZED`` that indicates
|
||||
if an object has been already finalized. During collections ``_gc_prev`` is
|
||||
temporarily used for storing a copy of the reference count (``gc_ref``), in
|
||||
`PREV_MASK_COLLECTING` and `_PyGC_PREV_MASK_FINALIZED`. Between collections,
|
||||
the only flag that can be present is `_PyGC_PREV_MASK_FINALIZED` that indicates
|
||||
if an object has been already finalized. During collections `_gc_prev` is
|
||||
temporarily used for storing a copy of the reference count (`gc_ref`), in
|
||||
addition to two flags, and the GC linked list becomes a singly linked list until
|
||||
``_gc_prev`` is restored.
|
||||
`_gc_prev` is restored.
|
||||
|
||||
- The ``_gc_next`` field is used as the "next" pointer to maintain the doubly linked
|
||||
- The `_gc_next` field is used as the "next" pointer to maintain the doubly linked
|
||||
list but during collection its lowest bit is used to keep the
|
||||
``NEXT_MASK_UNREACHABLE`` flag that indicates if an object is tentatively
|
||||
`NEXT_MASK_UNREACHABLE` flag that indicates if an object is tentatively
|
||||
unreachable during the cycle detection algorithm. This is a drawback to using only
|
||||
doubly linked lists to implement partitions: while most needed operations are
|
||||
constant-time, there is no efficient way to determine which partition an object is
|
||||
currently in. Instead, when that's needed, ad hoc tricks (like the
|
||||
``NEXT_MASK_UNREACHABLE`` flag) are employed.
|
||||
`NEXT_MASK_UNREACHABLE` flag) are employed.
|
||||
|
||||
Optimization: delay tracking containers
|
||||
=======================================
|
||||
|
@ -531,7 +531,7 @@ benefit from delayed tracking:
|
|||
full garbage collection (all generations), the collector will untrack any dictionaries
|
||||
whose contents are not tracked.
|
||||
|
||||
The garbage collector module provides the Python function ``is_tracked(obj)``, which returns
|
||||
The garbage collector module provides the Python function `is_tracked(obj)`, which returns
|
||||
the current tracking status of the object. Subsequent garbage collections may change the
|
||||
tracking status of the object.
|
||||
|
||||
|
@ -556,20 +556,20 @@ Differences between GC implementations
|
|||
This section summarizes the differences between the GC implementation in the
|
||||
default build and the implementation in the free-threaded build.
|
||||
|
||||
The default build implementation makes extensive use of the ``PyGC_Head`` data
|
||||
The default build implementation makes extensive use of the `PyGC_Head` data
|
||||
structure, while the free-threaded build implementation does not use that
|
||||
data structure.
|
||||
|
||||
- The default build implementation stores all tracked objects in a doubly
|
||||
linked list using ``PyGC_Head``. The free-threaded build implementation
|
||||
linked list using `PyGC_Head`. The free-threaded build implementation
|
||||
instead relies on the embedded mimalloc memory allocator to scan the heap
|
||||
for tracked objects.
|
||||
- The default build implementation uses ``PyGC_Head`` for the unreachable
|
||||
- The default build implementation uses `PyGC_Head` for the unreachable
|
||||
object list. The free-threaded build implementation repurposes the
|
||||
``ob_tid`` field to store a unreachable objects linked list.
|
||||
- The default build implementation stores flags in the ``_gc_prev`` field of
|
||||
``PyGC_Head``. The free-threaded build implementation stores these flags
|
||||
in ``ob_gc_bits``.
|
||||
`ob_tid` field to store a unreachable objects linked list.
|
||||
- The default build implementation stores flags in the `_gc_prev` field of
|
||||
`PyGC_Head`. The free-threaded build implementation stores these flags
|
||||
in `ob_gc_bits`.
|
||||
|
||||
|
||||
The default build implementation relies on the
|
||||
|
|
|
@ -9,12 +9,12 @@ Python's Parser is currently a
|
|||
[`PEG` (Parser Expression Grammar)](https://en.wikipedia.org/wiki/Parsing_expression_grammar)
|
||||
parser. It was introduced in
|
||||
[PEP 617: New PEG parser for CPython](https://peps.python.org/pep-0617/) to replace
|
||||
the original [``LL(1)``](https://en.wikipedia.org/wiki/LL_parser) parser.
|
||||
the original [`LL(1)`](https://en.wikipedia.org/wiki/LL_parser) parser.
|
||||
|
||||
The code implementing the parser is generated from a grammar definition by a
|
||||
[parser generator](https://en.wikipedia.org/wiki/Compiler-compiler).
|
||||
Therefore, changes to the Python language are made by modifying the
|
||||
[grammar file](https://github.com/python/cpython/blob/main/Grammar/python.gram).
|
||||
[grammar file](../Grammar/python.gram).
|
||||
Developers rarely need to modify the generator itself.
|
||||
|
||||
See the devguide's [Changing CPython's grammar](https://devguide.python.org/developer-workflow/grammar/#grammar)
|
||||
|
@ -33,9 +33,9 @@ is ordered. This means that when writing:
|
|||
rule: A | B | C
|
||||
```
|
||||
|
||||
a parser that implements a context-free-grammar (such as an ``LL(1)`` parser) will
|
||||
a parser that implements a context-free-grammar (such as an `LL(1)` parser) will
|
||||
generate constructions that, given an input string, *deduce* which alternative
|
||||
(``A``, ``B`` or ``C``) must be expanded. On the other hand, a PEG parser will
|
||||
(`A`, `B` or `C`) must be expanded. On the other hand, a PEG parser will
|
||||
check each alternative, in the order in which they are specified, and select
|
||||
that first one that succeeds.
|
||||
|
||||
|
@ -67,21 +67,21 @@ time complexity with a technique called
|
|||
which not only loads the entire program in memory before parsing it but also
|
||||
allows the parser to backtrack arbitrarily. This is made efficient by memoizing
|
||||
the rules already matched for each position. The cost of the memoization cache
|
||||
is that the parser will naturally use more memory than a simple ``LL(1)`` parser,
|
||||
is that the parser will naturally use more memory than a simple `LL(1)` parser,
|
||||
which normally are table-based.
|
||||
|
||||
|
||||
Key ideas
|
||||
---------
|
||||
|
||||
- Alternatives are ordered ( ``A | B`` is not the same as ``B | A`` ).
|
||||
- Alternatives are ordered ( `A | B` is not the same as `B | A` ).
|
||||
- If a rule returns a failure, it doesn't mean that the parsing has failed,
|
||||
it just means "try something else".
|
||||
- By default PEG parsers run in exponential time, which can be optimized to linear by
|
||||
using memoization.
|
||||
- If parsing fails completely (no rule succeeds in parsing all the input text), the
|
||||
PEG parser doesn't have a concept of "where the
|
||||
[``SyntaxError``](https://docs.python.org/3/library/exceptions.html#SyntaxError) is".
|
||||
[`SyntaxError`](https://docs.python.org/3/library/exceptions.html#SyntaxError) is".
|
||||
|
||||
|
||||
> [!IMPORTANT]
|
||||
|
@ -111,16 +111,16 @@ the following two rules (in these examples, a token is an individual character):
|
|||
second_rule: ('aa' | 'a' ) 'a'
|
||||
```
|
||||
|
||||
In a regular EBNF grammar, both rules specify the language ``{aa, aaa}`` but
|
||||
in PEG, one of these two rules accepts the string ``aaa`` but not the string
|
||||
``aa``. The other does the opposite -- it accepts the string ``aa``
|
||||
but not the string ``aaa``. The rule ``('a'|'aa')'a'`` does
|
||||
not accept ``aaa`` because ``'a'|'aa'`` consumes the first ``a``, letting the
|
||||
final ``a`` in the rule consume the second, and leaving out the third ``a``.
|
||||
In a regular EBNF grammar, both rules specify the language `{aa, aaa}` but
|
||||
in PEG, one of these two rules accepts the string `aaa` but not the string
|
||||
`aa`. The other does the opposite -- it accepts the string `aa`
|
||||
but not the string `aaa`. The rule `('a'|'aa')'a'` does
|
||||
not accept `aaa` because `'a'|'aa'` consumes the first `a`, letting the
|
||||
final `a` in the rule consume the second, and leaving out the third `a`.
|
||||
As the rule has succeeded, no attempt is ever made to go back and let
|
||||
``'a'|'aa'`` try the second alternative. The expression ``('aa'|'a')'a'`` does
|
||||
not accept ``aa`` because ``'aa'|'a'`` accepts all of ``aa``, leaving nothing
|
||||
for the final ``a``. Again, the second alternative of ``'aa'|'a'`` is not
|
||||
`'a'|'aa'` try the second alternative. The expression `('aa'|'a')'a'` does
|
||||
not accept `aa` because `'aa'|'a'` accepts all of `aa`, leaving nothing
|
||||
for the final `a`. Again, the second alternative of `'aa'|'a'` is not
|
||||
tried.
|
||||
|
||||
> [!CAUTION]
|
||||
|
@ -137,7 +137,7 @@ one is in almost all cases a mistake, for example:
|
|||
```
|
||||
|
||||
In this example, the second alternative will never be tried because the first one will
|
||||
succeed first (even if the input string has an ``'else' block`` that follows). To correctly
|
||||
succeed first (even if the input string has an `'else' block` that follows). To correctly
|
||||
write this rule you can simply alter the order:
|
||||
|
||||
```
|
||||
|
@ -146,7 +146,7 @@ write this rule you can simply alter the order:
|
|||
| 'if' expression 'then' block
|
||||
```
|
||||
|
||||
In this case, if the input string doesn't have an ``'else' block``, the first alternative
|
||||
In this case, if the input string doesn't have an `'else' block`, the first alternative
|
||||
will fail and the second will be attempted.
|
||||
|
||||
Grammar Syntax
|
||||
|
@ -166,8 +166,8 @@ the rule:
|
|||
rule_name[return_type]: expression
|
||||
```
|
||||
|
||||
If the return type is omitted, then a ``void *`` is returned in C and an
|
||||
``Any`` in Python.
|
||||
If the return type is omitted, then a `void *` is returned in C and an
|
||||
`Any` in Python.
|
||||
|
||||
Grammar expressions
|
||||
-------------------
|
||||
|
@ -214,7 +214,7 @@ Variables in the grammar
|
|||
------------------------
|
||||
|
||||
A sub-expression can be named by preceding it with an identifier and an
|
||||
``=`` sign. The name can then be used in the action (see below), like this:
|
||||
`=` sign. The name can then be used in the action (see below), like this:
|
||||
|
||||
```
|
||||
rule_name[return_type]: '(' a=some_other_rule ')' { a }
|
||||
|
@ -387,9 +387,9 @@ returns a valid C-based Python AST:
|
|||
| NUMBER
|
||||
```
|
||||
|
||||
Here ``EXTRA`` is a macro that expands to ``start_lineno, start_col_offset,
|
||||
end_lineno, end_col_offset, p->arena``, those being variables automatically
|
||||
injected by the parser; ``p`` points to an object that holds on to all state
|
||||
Here `EXTRA` is a macro that expands to `start_lineno, start_col_offset,
|
||||
end_lineno, end_col_offset, p->arena`, those being variables automatically
|
||||
injected by the parser; `p` points to an object that holds on to all state
|
||||
for the parser.
|
||||
|
||||
A similar grammar written to target Python AST objects:
|
||||
|
@ -422,50 +422,47 @@ Pegen
|
|||
|
||||
Pegen is the parser generator used in CPython to produce the final PEG parser
|
||||
used by the interpreter. It is the program that can be used to read the python
|
||||
grammar located in
|
||||
[`Grammar/python.gram`](https://github.com/python/cpython/blob/main/Grammar/python.gram)
|
||||
and produce the final C parser. It contains the following pieces:
|
||||
grammar located in [`Grammar/python.gram`](../Grammar/python.gram) and produce
|
||||
the final C parser. It contains the following pieces:
|
||||
|
||||
- A parser generator that can read a grammar file and produce a PEG parser
|
||||
written in Python or C that can parse said grammar. The generator is located at
|
||||
[`Tools/peg_generator/pegen`](https://github.com/python/cpython/blob/main/Tools/peg_generator/pegen).
|
||||
[`Tools/peg_generator/pegen`](../Tools/peg_generator/pegen).
|
||||
- A PEG meta-grammar that automatically generates a Python parser which is used
|
||||
for the parser generator itself (this means that there are no manually-written
|
||||
parsers). The meta-grammar is located at
|
||||
[`Tools/peg_generator/pegen/metagrammar.gram`](https://github.com/python/cpython/blob/main/Tools/peg_generator/pegen/metagrammar.gram).
|
||||
[`Tools/peg_generator/pegen/metagrammar.gram`](../Tools/peg_generator/pegen/metagrammar.gram).
|
||||
- A generated parser (using the parser generator) that can directly produce C and Python AST objects.
|
||||
|
||||
The source code for Pegen lives at
|
||||
[`Tools/peg_generator/pegen`](https://github.com/python/cpython/blob/main/Tools/peg_generator/pegen)
|
||||
The source code for Pegen lives at [`Tools/peg_generator/pegen`](../Tools/peg_generator/pegen)
|
||||
but normally all typical commands to interact with the parser generator are executed from
|
||||
the main makefile.
|
||||
|
||||
How to regenerate the parser
|
||||
----------------------------
|
||||
|
||||
Once you have made the changes to the grammar files, to regenerate the ``C``
|
||||
Once you have made the changes to the grammar files, to regenerate the `C`
|
||||
parser (the one used by the interpreter) just execute:
|
||||
|
||||
```
|
||||
make regen-pegen
|
||||
```
|
||||
|
||||
using the ``Makefile`` in the main directory. If you are on Windows you can
|
||||
using the `Makefile` in the main directory. If you are on Windows you can
|
||||
use the Visual Studio project files to regenerate the parser or to execute:
|
||||
|
||||
```
|
||||
./PCbuild/build.bat --regen
|
||||
```
|
||||
|
||||
The generated parser file is located at
|
||||
[`Parser/parser.c`](https://github.com/python/cpython/blob/main/Parser/parser.c).
|
||||
The generated parser file is located at [`Parser/parser.c`](../Parser/parser.c).
|
||||
|
||||
How to regenerate the meta-parser
|
||||
---------------------------------
|
||||
|
||||
The meta-grammar (the grammar that describes the grammar for the grammar files
|
||||
themselves) is located at
|
||||
[`Tools/peg_generator/pegen/metagrammar.gram`](https://github.com/python/cpython/blob/main/Tools/peg_generator/pegen/metagrammar.gram).
|
||||
[`Tools/peg_generator/pegen/metagrammar.gram`](../Tools/peg_generator/pegen/metagrammar.gram).
|
||||
Although it is very unlikely that you will ever need to modify it, if you make
|
||||
any modifications to this file (in order to implement new Pegen features) you will
|
||||
need to regenerate the meta-parser (the parser that parses the grammar files).
|
||||
|
@ -488,11 +485,11 @@ Grammatical elements and rules
|
|||
|
||||
Pegen has some special grammatical elements and rules:
|
||||
|
||||
- Strings with single quotes (') (for example, ``'class'``) denote KEYWORDS.
|
||||
- Strings with double quotes (") (for example, ``"match"``) denote SOFT KEYWORDS.
|
||||
- Uppercase names (for example, ``NAME``) denote tokens in the
|
||||
[`Grammar/Tokens`](https://github.com/python/cpython/blob/main/Grammar/Tokens) file.
|
||||
- Rule names starting with ``invalid_`` are used for specialized syntax errors.
|
||||
- Strings with single quotes (') (for example, `'class'`) denote KEYWORDS.
|
||||
- Strings with double quotes (") (for example, `"match"`) denote SOFT KEYWORDS.
|
||||
- Uppercase names (for example, `NAME`) denote tokens in the
|
||||
[`Grammar/Tokens`](../Grammar/Tokens) file.
|
||||
- Rule names starting with `invalid_` are used for specialized syntax errors.
|
||||
|
||||
- These rules are NOT used in the first pass of the parser.
|
||||
- Only if the first pass fails to parse, a second pass including the invalid
|
||||
|
@ -509,14 +506,13 @@ Tokenization
|
|||
It is common among PEG parser frameworks that the parser does both the parsing
|
||||
and the tokenization, but this does not happen in Pegen. The reason is that the
|
||||
Python language needs a custom tokenizer to handle things like indentation
|
||||
boundaries, some special keywords like ``ASYNC`` and ``AWAIT`` (for
|
||||
boundaries, some special keywords like `ASYNC` and `AWAIT` (for
|
||||
compatibility purposes), backtracking errors (such as unclosed parenthesis),
|
||||
dealing with encoding, interactive mode and much more. Some of these reasons
|
||||
are also there for historical purposes, and some others are useful even today.
|
||||
|
||||
The list of tokens (all uppercase names in the grammar) that you can use can
|
||||
be found in thei
|
||||
[`Grammar/Tokens`](https://github.com/python/cpython/blob/main/Grammar/Tokens)
|
||||
be found in the [`Grammar/Tokens`](../Grammar/Tokens)
|
||||
file. If you change this file to add new tokens, make sure to regenerate the
|
||||
files by executing:
|
||||
|
||||
|
@ -532,9 +528,7 @@ the tokens or to execute:
|
|||
```
|
||||
|
||||
How tokens are generated and the rules governing this are completely up to the tokenizer
|
||||
([`Parser/lexer`](https://github.com/python/cpython/blob/main/Parser/lexer)
|
||||
and
|
||||
[`Parser/tokenizer`](https://github.com/python/cpython/blob/main/Parser/tokenizer));
|
||||
([`Parser/lexer`](../Parser/lexer) and [`Parser/tokenizer`](../Parser/tokenizer));
|
||||
the parser just receives tokens from it.
|
||||
|
||||
Memoization
|
||||
|
@ -548,7 +542,7 @@ both in memory and time. Although the memory cost is obvious (the parser needs
|
|||
memory for storing previous results in the cache) the execution time cost comes
|
||||
for continuously checking if the given rule has a cache hit or not. In many
|
||||
situations, just parsing it again can be faster. Pegen **disables memoization
|
||||
by default** except for rules with the special marker ``memo`` after the rule
|
||||
by default** except for rules with the special marker `memo` after the rule
|
||||
name (and type, if present):
|
||||
|
||||
```
|
||||
|
@ -567,8 +561,7 @@ To determine whether a new rule needs memoization or not, benchmarking is requir
|
|||
(comparing execution times and memory usage of some considerably large files with
|
||||
and without memoization). There is a very simple instrumentation API available
|
||||
in the generated C parse code that allows to measure how much each rule uses
|
||||
memoization (check the
|
||||
[`Parser/pegen.c`](https://github.com/python/cpython/blob/main/Parser/pegen.c)
|
||||
memoization (check the [`Parser/pegen.c`](../Parser/pegen.c)
|
||||
file for more information) but it needs to be manually activated.
|
||||
|
||||
Automatic variables
|
||||
|
@ -578,9 +571,9 @@ To make writing actions easier, Pegen injects some automatic variables in the
|
|||
namespace available when writing actions. In the C parser, some of these
|
||||
automatic variable names are:
|
||||
|
||||
- ``p``: The parser structure.
|
||||
- ``EXTRA``: This is a macro that expands to
|
||||
``(_start_lineno, _start_col_offset, _end_lineno, _end_col_offset, p->arena)``,
|
||||
- `p`: The parser structure.
|
||||
- `EXTRA`: This is a macro that expands to
|
||||
`(_start_lineno, _start_col_offset, _end_lineno, _end_col_offset, p->arena)`,
|
||||
which is normally used to create AST nodes as almost all constructors need these
|
||||
attributes to be provided. All of the location variables are taken from the
|
||||
location information of the current token.
|
||||
|
@ -590,13 +583,13 @@ Hard and soft keywords
|
|||
|
||||
> [!NOTE]
|
||||
> In the grammar files, keywords are defined using **single quotes** (for example,
|
||||
> ``'class'``) while soft keywords are defined using **double quotes** (for example,
|
||||
> ``"match"``).
|
||||
> `'class'`) while soft keywords are defined using **double quotes** (for example,
|
||||
> `"match"`).
|
||||
|
||||
There are two kinds of keywords allowed in pegen grammars: *hard* and *soft*
|
||||
keywords. The difference between hard and soft keywords is that hard keywords
|
||||
are always reserved words, even in positions where they make no sense
|
||||
(for example, ``x = class + 1``), while soft keywords only get a special
|
||||
(for example, `x = class + 1`), while soft keywords only get a special
|
||||
meaning in context. Trying to use a hard keyword as a variable will always
|
||||
fail:
|
||||
|
||||
|
@ -621,7 +614,7 @@ one where they are defined as keywords:
|
|||
>>> foo(match="Yeah!")
|
||||
```
|
||||
|
||||
The ``match`` and ``case`` keywords are soft keywords, so that they are
|
||||
The `match` and `case` keywords are soft keywords, so that they are
|
||||
recognized as keywords at the beginning of a match statement or case block
|
||||
respectively, but are allowed to be used in other places as variable or
|
||||
argument names.
|
||||
|
@ -662,7 +655,7 @@ is, and it will unwind the stack and report the exception. This means that if a
|
|||
[rule action](#grammar-actions) raises an exception, all parsing will
|
||||
stop at that exact point. This is done to allow to correctly propagate any
|
||||
exception set by calling Python's C API functions. This also includes
|
||||
[``SyntaxError``](https://docs.python.org/3/library/exceptions.html#SyntaxError)
|
||||
[`SyntaxError`](https://docs.python.org/3/library/exceptions.html#SyntaxError)
|
||||
exceptions and it is the main mechanism the parser uses to report custom syntax
|
||||
error messages.
|
||||
|
||||
|
@ -684,10 +677,10 @@ grammar.
|
|||
To report generic syntax errors, pegen uses a common heuristic in PEG parsers:
|
||||
the location of *generic* syntax errors is reported to be the furthest token that
|
||||
was attempted to be matched but failed. This is only done if parsing has failed
|
||||
(the parser returns ``NULL`` in C or ``None`` in Python) but no exception has
|
||||
(the parser returns `NULL` in C or `None` in Python) but no exception has
|
||||
been raised.
|
||||
|
||||
As the Python grammar was primordially written as an ``LL(1)`` grammar, this heuristic
|
||||
As the Python grammar was primordially written as an `LL(1)` grammar, this heuristic
|
||||
has an extremely high success rate, but some PEG features, such as lookaheads,
|
||||
can impact this.
|
||||
|
||||
|
@ -699,19 +692,19 @@ can impact this.
|
|||
To generate more precise syntax errors, custom rules are used. This is a common
|
||||
practice also in context free grammars: the parser will try to accept some
|
||||
construct that is known to be incorrect just to report a specific syntax error
|
||||
for that construct. In pegen grammars, these rules start with the ``invalid_``
|
||||
for that construct. In pegen grammars, these rules start with the `invalid_`
|
||||
prefix. This is because trying to match these rules normally has a performance
|
||||
impact on parsing (and can also affect the 'correct' grammar itself in some
|
||||
tricky cases, depending on the ordering of the rules) so the generated parser
|
||||
acts in two phases:
|
||||
|
||||
1. The first phase will try to parse the input stream without taking into
|
||||
account rules that start with the ``invalid_`` prefix. If the parsing
|
||||
account rules that start with the `invalid_` prefix. If the parsing
|
||||
succeeds it will return the generated AST and the second phase will be
|
||||
skipped.
|
||||
|
||||
2. If the first phase failed, a second parsing attempt is done including the
|
||||
rules that start with an ``invalid_`` prefix. By design this attempt
|
||||
rules that start with an `invalid_` prefix. By design this attempt
|
||||
**cannot succeed** and is only executed to give to the invalid rules a
|
||||
chance to detect specific situations where custom, more precise, syntax
|
||||
errors can be raised. This also allows to trade a bit of performance for
|
||||
|
@ -723,15 +716,15 @@ acts in two phases:
|
|||
> When defining invalid rules:
|
||||
>
|
||||
> - Make sure all custom invalid rules raise
|
||||
> [``SyntaxError``](https://docs.python.org/3/library/exceptions.html#SyntaxError)
|
||||
> [`SyntaxError`](https://docs.python.org/3/library/exceptions.html#SyntaxError)
|
||||
> exceptions (or a subclass of it).
|
||||
> - Make sure **all** invalid rules start with the ``invalid_`` prefix to not
|
||||
> - Make sure **all** invalid rules start with the `invalid_` prefix to not
|
||||
> impact performance of parsing correct Python code.
|
||||
> - Make sure the parser doesn't behave differently for regular rules when you introduce invalid rules
|
||||
> (see the [how PEG parsers work](#how-peg-parsers-work) section for more information).
|
||||
|
||||
You can find a collection of macros to raise specialized syntax errors in the
|
||||
[`Parser/pegen.h`](https://github.com/python/cpython/blob/main/Parser/pegen.h)
|
||||
[`Parser/pegen.h`](../Parser/pegen.h)
|
||||
header file. These macros allow also to report ranges for
|
||||
the custom errors, which will be highlighted in the tracebacks that will be
|
||||
displayed when the error is reported.
|
||||
|
@ -746,35 +739,33 @@ displayed when the error is reported.
|
|||
<valid python code> $ 42
|
||||
```
|
||||
|
||||
should trigger the syntax error in the ``$`` character. If your rule is not correctly defined this
|
||||
should trigger the syntax error in the `$` character. If your rule is not correctly defined this
|
||||
won't happen. As another example, suppose that you try to define a rule to match Python 2 style
|
||||
``print`` statements in order to create a better error message and you define it as:
|
||||
`print` statements in order to create a better error message and you define it as:
|
||||
|
||||
```
|
||||
invalid_print: "print" expression
|
||||
```
|
||||
|
||||
This will **seem** to work because the parser will correctly parse ``print(something)`` because it is valid
|
||||
code and the second phase will never execute but if you try to parse ``print(something) $ 3`` the first pass
|
||||
of the parser will fail (because of the ``$``) and in the second phase, the rule will match the
|
||||
``print(something)`` as ``print`` followed by the variable ``something`` between parentheses and the error
|
||||
will be reported there instead of the ``$`` character.
|
||||
This will **seem** to work because the parser will correctly parse `print(something)` because it is valid
|
||||
code and the second phase will never execute but if you try to parse `print(something) $ 3` the first pass
|
||||
of the parser will fail (because of the `$`) and in the second phase, the rule will match the
|
||||
`print(something)` as `print` followed by the variable `something` between parentheses and the error
|
||||
will be reported there instead of the `$` character.
|
||||
|
||||
Generating AST objects
|
||||
----------------------
|
||||
|
||||
The output of the C parser used by CPython, which is generated from the
|
||||
[grammar file](https://github.com/python/cpython/blob/main/Grammar/python.gram),
|
||||
is a Python AST object (using C structures). This means that the actions in the
|
||||
grammar file generate AST objects when they succeed. Constructing these objects
|
||||
can be quite cumbersome (see the [AST compiler section](compiler.md#abstract-syntax-trees-ast)
|
||||
[grammar file](../Grammar/python.gram), is a Python AST object (using C
|
||||
structures). This means that the actions in the grammar file generate AST
|
||||
objects when they succeed. Constructing these objects can be quite cumbersome
|
||||
(see the [AST compiler section](compiler.md#abstract-syntax-trees-ast)
|
||||
for more information on how these objects are constructed and how they are used
|
||||
by the compiler), so special helper functions are used. These functions are
|
||||
declared in the
|
||||
[`Parser/pegen.h`](https://github.com/python/cpython/blob/main/Parser/pegen.h)
|
||||
header file and defined in the
|
||||
[`Parser/action_helpers.c`](https://github.com/python/cpython/blob/main/Parser/action_helpers.c)
|
||||
file. The helpers include functions that join AST sequences, get specific elements
|
||||
declared in the [`Parser/pegen.h`](../Parser/pegen.h) header file and defined
|
||||
in the [`Parser/action_helpers.c`](../Parser/action_helpers.c) file. The
|
||||
helpers include functions that join AST sequences, get specific elements
|
||||
from them or to perform extra processing on the generated tree.
|
||||
|
||||
|
||||
|
@ -788,11 +779,9 @@ from them or to perform extra processing on the generated tree.
|
|||
|
||||
As a general rule, if an action spawns multiple lines or requires something more
|
||||
complicated than a single expression of C code, is normally better to create a
|
||||
custom helper in
|
||||
[`Parser/action_helpers.c`](https://github.com/python/cpython/blob/main/Parser/action_helpers.c)
|
||||
and expose it in the
|
||||
[`Parser/pegen.h`](https://github.com/python/cpython/blob/main/Parser/pegen.h)
|
||||
header file so that it can be used from the grammar.
|
||||
custom helper in [`Parser/action_helpers.c`](../Parser/action_helpers.c)
|
||||
and expose it in the [`Parser/pegen.h`](../Parser/pegen.h) header file so that
|
||||
it can be used from the grammar.
|
||||
|
||||
When parsing succeeds, the parser **must** return a **valid** AST object.
|
||||
|
||||
|
@ -801,16 +790,15 @@ Testing
|
|||
|
||||
There are three files that contain tests for the grammar and the parser:
|
||||
|
||||
- [test_grammar.py](https://github.com/python/cpython/blob/main/Lib/test/test_grammar.py)
|
||||
- [test_syntax.py](https://github.com/python/cpython/blob/main/Lib/test/test_syntax.py)
|
||||
- [test_exceptions.py](https://github.com/python/cpython/blob/main/Lib/test/test_exceptions.py)
|
||||
- [test_grammar.py](../Lib/test/test_grammar.py)
|
||||
- [test_syntax.py](../Lib/test/test_syntax.py)
|
||||
- [test_exceptions.py](../Lib/test/test_exceptions.py)
|
||||
|
||||
Check the contents of these files to know which is the best place for new tests, depending
|
||||
on the nature of the new feature you are adding.
|
||||
Check the contents of these files to know which is the best place for new
|
||||
tests, depending on the nature of the new feature you are adding.
|
||||
|
||||
Tests for the parser generator itself can be found in the
|
||||
[test_peg_generator](https://github.com/python/cpython/blob/main/Lib/test_peg_generator)
|
||||
directory.
|
||||
[test_peg_generator](../Lib/test_peg_generator) directory.
|
||||
|
||||
|
||||
Debugging generated parsers
|
||||
|
@ -825,33 +813,32 @@ correctly compile and execute Python anymore. This makes it a bit challenging
|
|||
to debug when something goes wrong, especially when experimenting.
|
||||
|
||||
For this reason it is a good idea to experiment first by generating a Python
|
||||
parser. To do this, you can go to the
|
||||
[Tools/peg_generator](https://github.com/python/cpython/blob/main/Tools/peg_generator)
|
||||
parser. To do this, you can go to the [Tools/peg_generator](../Tools/peg_generator)
|
||||
directory on the CPython repository and manually call the parser generator by executing:
|
||||
|
||||
```
|
||||
$ python -m pegen python <PATH TO YOUR GRAMMAR FILE>
|
||||
```
|
||||
|
||||
This will generate a file called ``parse.py`` in the same directory that you
|
||||
This will generate a file called `parse.py` in the same directory that you
|
||||
can use to parse some input:
|
||||
|
||||
```
|
||||
$ python parse.py file_with_source_code_to_test.py
|
||||
```
|
||||
|
||||
As the generated ``parse.py`` file is just Python code, you can modify it
|
||||
As the generated `parse.py` file is just Python code, you can modify it
|
||||
and add breakpoints to debug or better understand some complex situations.
|
||||
|
||||
|
||||
Verbose mode
|
||||
------------
|
||||
|
||||
When Python is compiled in debug mode (by adding ``--with-pydebug`` when
|
||||
running the configure step in Linux or by adding ``-d`` when calling the
|
||||
[PCbuild/build.bat](https://github.com/python/cpython/blob/main/PCbuild/build.bat)),
|
||||
it is possible to activate a **very** verbose mode in the generated parser. This
|
||||
is very useful to debug the generated parser and to understand how it works, but it
|
||||
When Python is compiled in debug mode (by adding `--with-pydebug` when
|
||||
running the configure step in Linux or by adding `-d` when calling the
|
||||
[PCbuild/build.bat](../PCbuild/build.bat)), it is possible to activate a
|
||||
**very** verbose mode in the generated parser. This is very useful to
|
||||
debug the generated parser and to understand how it works, but it
|
||||
can be a bit hard to understand at first.
|
||||
|
||||
> [!NOTE]
|
||||
|
@ -859,13 +846,13 @@ can be a bit hard to understand at first.
|
|||
> interactive mode as it can be much harder to understand, because interactive
|
||||
> mode involves some special steps compared to regular parsing.
|
||||
|
||||
To activate verbose mode you can add the ``-d`` flag when executing Python:
|
||||
To activate verbose mode you can add the `-d` flag when executing Python:
|
||||
|
||||
```
|
||||
$ python -d file_to_test.py
|
||||
```
|
||||
|
||||
This will print **a lot** of output to ``stderr`` so it is probably better to dump
|
||||
This will print **a lot** of output to `stderr` so it is probably better to dump
|
||||
it to a file for further analysis. The output consists of trace lines with the
|
||||
following structure::
|
||||
|
||||
|
@ -873,17 +860,17 @@ following structure::
|
|||
<indentation> ('>'|'-'|'+'|'!') <rule_name>[<token_location>]: <alternative> ...
|
||||
```
|
||||
|
||||
Every line is indented by a different amount (``<indentation>``) depending on how
|
||||
Every line is indented by a different amount (`<indentation>`) depending on how
|
||||
deep the call stack is. The next character marks the type of the trace:
|
||||
|
||||
- ``>`` indicates that a rule is going to be attempted to be parsed.
|
||||
- ``-`` indicates that a rule has failed to be parsed.
|
||||
- ``+`` indicates that a rule has been parsed correctly.
|
||||
- ``!`` indicates that an exception or an error has been detected and the parser is unwinding.
|
||||
- `>` indicates that a rule is going to be attempted to be parsed.
|
||||
- `-` indicates that a rule has failed to be parsed.
|
||||
- `+` indicates that a rule has been parsed correctly.
|
||||
- `!` indicates that an exception or an error has been detected and the parser is unwinding.
|
||||
|
||||
The ``<token_location>`` part indicates the current index in the token array,
|
||||
the ``<rule_name>`` part indicates what rule is being parsed and
|
||||
the ``<alternative>`` part indicates what alternative within that rule
|
||||
The `<token_location>` part indicates the current index in the token array,
|
||||
the `<rule_name>` part indicates what rule is being parsed and
|
||||
the `<alternative>` part indicates what alternative within that rule
|
||||
is being attempted.
|
||||
|
||||
|
||||
|
@ -891,4 +878,5 @@ is being attempted.
|
|||
> **Document history**
|
||||
>
|
||||
> Pablo Galindo Salgado - Original author
|
||||
>
|
||||
> Irit Katriel and Jacob Coffee - Convert to Markdown
|
||||
|
|
Loading…
Reference in New Issue