mirror of https://github.com/python/cpython
gh-119786: [doc] more consistent syntax in InternalDocs (#125815)
This commit is contained in:
parent
4848b0b92c
commit
d0bfff47fb
|
@ -31,8 +31,7 @@ although these are not fundamental and may change:
|
||||||
|
|
||||||
## Example family
|
## Example family
|
||||||
|
|
||||||
The `LOAD_GLOBAL` instruction (in
|
The `LOAD_GLOBAL` instruction (in [Python/bytecodes.c](../Python/bytecodes.c))
|
||||||
[Python/bytecodes.c](https://github.com/python/cpython/blob/main/Python/bytecodes.c))
|
|
||||||
already has an adaptive family that serves as a relatively simple example.
|
already has an adaptive family that serves as a relatively simple example.
|
||||||
|
|
||||||
The `LOAD_GLOBAL` instruction performs adaptive specialization,
|
The `LOAD_GLOBAL` instruction performs adaptive specialization,
|
||||||
|
|
|
@ -7,17 +7,16 @@ Abstract
|
||||||
|
|
||||||
In CPython, the compilation from source code to bytecode involves several steps:
|
In CPython, the compilation from source code to bytecode involves several steps:
|
||||||
|
|
||||||
1. Tokenize the source code
|
1. Tokenize the source code [Parser/lexer/](../Parser/lexer/)
|
||||||
[Parser/lexer/](https://github.com/python/cpython/blob/main/Parser/lexer/)
|
and [Parser/tokenizer/](../Parser/tokenizer/).
|
||||||
and [Parser/tokenizer/](https://github.com/python/cpython/blob/main/Parser/tokenizer/).
|
|
||||||
2. Parse the stream of tokens into an Abstract Syntax Tree
|
2. Parse the stream of tokens into an Abstract Syntax Tree
|
||||||
[Parser/parser.c](https://github.com/python/cpython/blob/main/Parser/parser.c).
|
[Parser/parser.c](../Parser/parser.c).
|
||||||
3. Transform AST into an instruction sequence
|
3. Transform AST into an instruction sequence
|
||||||
[Python/compile.c](https://github.com/python/cpython/blob/main/Python/compile.c).
|
[Python/compile.c](../Python/compile.c).
|
||||||
4. Construct a Control Flow Graph and apply optimizations to it
|
4. Construct a Control Flow Graph and apply optimizations to it
|
||||||
[Python/flowgraph.c](https://github.com/python/cpython/blob/main/Python/flowgraph.c).
|
[Python/flowgraph.c](../Python/flowgraph.c).
|
||||||
5. Emit bytecode based on the Control Flow Graph
|
5. Emit bytecode based on the Control Flow Graph
|
||||||
[Python/assemble.c](https://github.com/python/cpython/blob/main/Python/assemble.c).
|
[Python/assemble.c](../Python/assemble.c).
|
||||||
|
|
||||||
This document outlines how these steps of the process work.
|
This document outlines how these steps of the process work.
|
||||||
|
|
||||||
|
@ -36,12 +35,10 @@ of tokens rather than a stream of characters which is more common with PEG
|
||||||
parsers.
|
parsers.
|
||||||
|
|
||||||
The grammar file for Python can be found in
|
The grammar file for Python can be found in
|
||||||
[Grammar/python.gram](https://github.com/python/cpython/blob/main/Grammar/python.gram).
|
[Grammar/python.gram](../Grammar/python.gram).
|
||||||
The definitions for literal tokens (such as ``:``, numbers, etc.) can be found in
|
The definitions for literal tokens (such as `:`, numbers, etc.) can be found in
|
||||||
[Grammar/Tokens](https://github.com/python/cpython/blob/main/Grammar/Tokens).
|
[Grammar/Tokens](../Grammar/Tokens). Various C files, including
|
||||||
Various C files, including
|
[Parser/parser.c](../Parser/parser.c) are generated from these.
|
||||||
[Parser/parser.c](https://github.com/python/cpython/blob/main/Parser/parser.c)
|
|
||||||
are generated from these.
|
|
||||||
|
|
||||||
See Also:
|
See Also:
|
||||||
|
|
||||||
|
@ -63,7 +60,7 @@ specification of the AST nodes is specified using the Zephyr Abstract
|
||||||
Syntax Definition Language (ASDL) [^1], [^2].
|
Syntax Definition Language (ASDL) [^1], [^2].
|
||||||
|
|
||||||
The definition of the AST nodes for Python is found in the file
|
The definition of the AST nodes for Python is found in the file
|
||||||
[Parser/Python.asdl](https://github.com/python/cpython/blob/main/Parser/Python.asdl).
|
[Parser/Python.asdl](../Parser/Python.asdl).
|
||||||
|
|
||||||
Each AST node (representing statements, expressions, and several
|
Each AST node (representing statements, expressions, and several
|
||||||
specialized types, like list comprehensions and exception handlers) is
|
specialized types, like list comprehensions and exception handlers) is
|
||||||
|
@ -87,14 +84,14 @@ approach and syntax:
|
||||||
|
|
||||||
The preceding example describes two different kinds of statements and an
|
The preceding example describes two different kinds of statements and an
|
||||||
expression: function definitions, return statements, and yield expressions.
|
expression: function definitions, return statements, and yield expressions.
|
||||||
All three kinds are considered of type ``stmt`` as shown by ``|`` separating
|
All three kinds are considered of type `stmt` as shown by `|` separating
|
||||||
the various kinds. They all take arguments of various kinds and amounts.
|
the various kinds. They all take arguments of various kinds and amounts.
|
||||||
|
|
||||||
Modifiers on the argument type specify the number of values needed; ``?``
|
Modifiers on the argument type specify the number of values needed; `?`
|
||||||
means it is optional, ``*`` means 0 or more, while no modifier means only one
|
means it is optional, `*` means 0 or more, while no modifier means only one
|
||||||
value for the argument and it is required. ``FunctionDef``, for instance,
|
value for the argument and it is required. `FunctionDef`, for instance,
|
||||||
takes an ``identifier`` for the *name*, ``arguments`` for *args*, zero or more
|
takes an `identifier` for the *name*, `arguments` for *args*, zero or more
|
||||||
``stmt`` arguments for *body*, and zero or more ``expr`` arguments for
|
`stmt` arguments for *body*, and zero or more `expr` arguments for
|
||||||
*decorators*.
|
*decorators*.
|
||||||
|
|
||||||
Do notice that something like 'arguments', which is a node type, is
|
Do notice that something like 'arguments', which is a node type, is
|
||||||
|
@ -132,9 +129,9 @@ The statement definitions above generate the following C structure type:
|
||||||
```
|
```
|
||||||
|
|
||||||
Also generated are a series of constructor functions that allocate (in
|
Also generated are a series of constructor functions that allocate (in
|
||||||
this case) a ``stmt_ty`` struct with the appropriate initialization. The
|
this case) a `stmt_ty` struct with the appropriate initialization. The
|
||||||
``kind`` field specifies which component of the union is initialized. The
|
`kind` field specifies which component of the union is initialized. The
|
||||||
``FunctionDef()`` constructor function sets 'kind' to ``FunctionDef_kind`` and
|
`FunctionDef()` constructor function sets 'kind' to `FunctionDef_kind` and
|
||||||
initializes the *name*, *args*, *body*, and *attributes* fields.
|
initializes the *name*, *args*, *body*, and *attributes* fields.
|
||||||
|
|
||||||
See also
|
See also
|
||||||
|
@ -156,13 +153,13 @@ In general, unless you are working on the critical core of the compiler, memory
|
||||||
management can be completely ignored. But if you are working at either the
|
management can be completely ignored. But if you are working at either the
|
||||||
very beginning of the compiler or the end, you need to care about how the arena
|
very beginning of the compiler or the end, you need to care about how the arena
|
||||||
works. All code relating to the arena is in either
|
works. All code relating to the arena is in either
|
||||||
[Include/internal/pycore_pyarena.h](https://github.com/python/cpython/blob/main/Include/internal/pycore_pyarena.h)
|
[Include/internal/pycore_pyarena.h](../Include/internal/pycore_pyarena.h)
|
||||||
or [Python/pyarena.c](https://github.com/python/cpython/blob/main/Python/pyarena.c).
|
or [Python/pyarena.c](../Python/pyarena.c).
|
||||||
|
|
||||||
``PyArena_New()`` will create a new arena. The returned ``PyArena`` structure
|
`PyArena_New()` will create a new arena. The returned `PyArena` structure
|
||||||
will store pointers to all memory given to it. This does the bookkeeping of
|
will store pointers to all memory given to it. This does the bookkeeping of
|
||||||
what memory needs to be freed when the compiler is finished with the memory it
|
what memory needs to be freed when the compiler is finished with the memory it
|
||||||
used. That freeing is done with ``PyArena_Free()``. This only needs to be
|
used. That freeing is done with `PyArena_Free()`. This only needs to be
|
||||||
called in strategic areas where the compiler exits.
|
called in strategic areas where the compiler exits.
|
||||||
|
|
||||||
As stated above, in general you should not have to worry about memory
|
As stated above, in general you should not have to worry about memory
|
||||||
|
@ -173,25 +170,25 @@ The only exception comes about when managing a PyObject. Since the rest
|
||||||
of Python uses reference counting, there is extra support added
|
of Python uses reference counting, there is extra support added
|
||||||
to the arena to cleanup each PyObject that was allocated. These cases
|
to the arena to cleanup each PyObject that was allocated. These cases
|
||||||
are very rare. However, if you've allocated a PyObject, you must tell
|
are very rare. However, if you've allocated a PyObject, you must tell
|
||||||
the arena about it by calling ``PyArena_AddPyObject()``.
|
the arena about it by calling `PyArena_AddPyObject()`.
|
||||||
|
|
||||||
|
|
||||||
Source code to AST
|
Source code to AST
|
||||||
==================
|
==================
|
||||||
|
|
||||||
The AST is generated from source code using the function
|
The AST is generated from source code using the function
|
||||||
``_PyParser_ASTFromString()`` or ``_PyParser_ASTFromFile()``
|
`_PyParser_ASTFromString()` or `_PyParser_ASTFromFile()`
|
||||||
[Parser/peg_api.c](https://github.com/python/cpython/blob/main/Parser/peg_api.c).
|
[Parser/peg_api.c](../Parser/peg_api.c).
|
||||||
|
|
||||||
After some checks, a helper function in
|
After some checks, a helper function in
|
||||||
[Parser/parser.c](https://github.com/python/cpython/blob/main/Parser/parser.c)
|
[Parser/parser.c](../Parser/parser.c)
|
||||||
begins applying production rules on the source code it receives; converting source
|
begins applying production rules on the source code it receives; converting source
|
||||||
code to tokens and matching these tokens recursively to their corresponding rule. The
|
code to tokens and matching these tokens recursively to their corresponding rule. The
|
||||||
production rule's corresponding rule function is called on every match. These rule
|
production rule's corresponding rule function is called on every match. These rule
|
||||||
functions follow the format `xx_rule`. Where *xx* is the grammar rule
|
functions follow the format `xx_rule`. Where *xx* is the grammar rule
|
||||||
that the function handles and is automatically derived from
|
that the function handles and is automatically derived from
|
||||||
[Grammar/python.gram](https://github.com/python/cpython/blob/main/Grammar/python.gram) by
|
[Grammar/python.gram](../Grammar/python.gram) by
|
||||||
[Tools/peg_generator/pegen/c_generator.py](https://github.com/python/cpython/blob/main/Tools/peg_generator/pegen/c_generator.py).
|
[Tools/peg_generator/pegen/c_generator.py](../Tools/peg_generator/pegen/c_generator.py).
|
||||||
|
|
||||||
Each rule function in turn creates an AST node as it goes along. It does this
|
Each rule function in turn creates an AST node as it goes along. It does this
|
||||||
by allocating all the new nodes it needs, calling the proper AST node creation
|
by allocating all the new nodes it needs, calling the proper AST node creation
|
||||||
|
@ -202,18 +199,15 @@ there are no more rules, an error is set and the parsing ends.
|
||||||
|
|
||||||
The AST node creation helper functions have the name `_PyAST_{xx}`
|
The AST node creation helper functions have the name `_PyAST_{xx}`
|
||||||
where *xx* is the AST node that the function creates. These are defined by the
|
where *xx* is the AST node that the function creates. These are defined by the
|
||||||
ASDL grammar and contained in
|
ASDL grammar and contained in [Python/Python-ast.c](../Python/Python-ast.c)
|
||||||
[Python/Python-ast.c](https://github.com/python/cpython/blob/main/Python/Python-ast.c)
|
(which is generated by [Parser/asdl_c.py](../Parser/asdl_c.py)
|
||||||
(which is generated by
|
from [Parser/Python.asdl](../Parser/Python.asdl)).
|
||||||
[Parser/asdl_c.py](https://github.com/python/cpython/blob/main/Parser/asdl_c.py)
|
This all leads to a sequence of AST nodes stored in `asdl_seq` structs.
|
||||||
from
|
|
||||||
[Parser/Python.asdl](https://github.com/python/cpython/blob/main/Parser/Python.asdl)).
|
|
||||||
This all leads to a sequence of AST nodes stored in ``asdl_seq`` structs.
|
|
||||||
|
|
||||||
To demonstrate everything explained so far, here's the
|
To demonstrate everything explained so far, here's the
|
||||||
rule function responsible for a simple named import statement such as
|
rule function responsible for a simple named import statement such as
|
||||||
``import sys``. Note that error-checking and debugging code has been
|
`import sys`. Note that error-checking and debugging code has been
|
||||||
omitted. Removed parts are represented by ``...``.
|
omitted. Removed parts are represented by `...`.
|
||||||
Furthermore, some comments have been added for explanation. These comments
|
Furthermore, some comments have been added for explanation. These comments
|
||||||
may not be present in the actual code.
|
may not be present in the actual code.
|
||||||
|
|
||||||
|
@ -255,55 +249,52 @@ may not be present in the actual code.
|
||||||
|
|
||||||
|
|
||||||
To improve backtracking performance, some rules (chosen by applying a
|
To improve backtracking performance, some rules (chosen by applying a
|
||||||
``(memo)`` flag in the grammar file) are memoized. Each rule function checks if
|
`(memo)` flag in the grammar file) are memoized. Each rule function checks if
|
||||||
a memoized version exists and returns that if so, else it continues in the
|
a memoized version exists and returns that if so, else it continues in the
|
||||||
manner stated in the previous paragraphs.
|
manner stated in the previous paragraphs.
|
||||||
|
|
||||||
There are macros for creating and using ``asdl_xx_seq *`` types, where *xx* is
|
There are macros for creating and using `asdl_xx_seq *` types, where *xx* is
|
||||||
a type of the ASDL sequence. Three main types are defined
|
a type of the ASDL sequence. Three main types are defined
|
||||||
manually -- ``generic``, ``identifier`` and ``int``. These types are found in
|
manually -- `generic`, `identifier` and `int`. These types are found in
|
||||||
[Python/asdl.c](https://github.com/python/cpython/blob/main/Python/asdl.c)
|
[Python/asdl.c](../Python/asdl.c) and its corresponding header file
|
||||||
and its corresponding header file
|
[Include/internal/pycore_asdl.h](../Include/internal/pycore_asdl.h).
|
||||||
[Include/internal/pycore_asdl.h](https://github.com/python/cpython/blob/main/Include/internal/pycore_asdl.h).
|
Functions and macros for creating `asdl_xx_seq *` types are as follows:
|
||||||
Functions and macros for creating ``asdl_xx_seq *`` types are as follows:
|
|
||||||
|
|
||||||
``_Py_asdl_generic_seq_new(Py_ssize_t, PyArena *)``
|
`_Py_asdl_generic_seq_new(Py_ssize_t, PyArena *)`
|
||||||
Allocate memory for an ``asdl_generic_seq`` of the specified length
|
Allocate memory for an `asdl_generic_seq` of the specified length
|
||||||
``_Py_asdl_identifier_seq_new(Py_ssize_t, PyArena *)``
|
`_Py_asdl_identifier_seq_new(Py_ssize_t, PyArena *)`
|
||||||
Allocate memory for an ``asdl_identifier_seq`` of the specified length
|
Allocate memory for an `asdl_identifier_seq` of the specified length
|
||||||
``_Py_asdl_int_seq_new(Py_ssize_t, PyArena *)``
|
`_Py_asdl_int_seq_new(Py_ssize_t, PyArena *)`
|
||||||
Allocate memory for an ``asdl_int_seq`` of the specified length
|
Allocate memory for an `asdl_int_seq` of the specified length
|
||||||
|
|
||||||
In addition to the three types mentioned above, some ASDL sequence types are
|
In addition to the three types mentioned above, some ASDL sequence types are
|
||||||
automatically generated by
|
automatically generated by [Parser/asdl_c.py](../Parser/asdl_c.py) and found in
|
||||||
[Parser/asdl_c.py](https://github.com/python/cpython/blob/main/Parser/asdl_c.py)
|
[Include/internal/pycore_ast.h](../Include/internal/pycore_ast.h).
|
||||||
and found in
|
|
||||||
[Include/internal/pycore_ast.h](https://github.com/python/cpython/blob/main/Include/internal/pycore_ast.h).
|
|
||||||
Macros for using both manually defined and automatically generated ASDL
|
Macros for using both manually defined and automatically generated ASDL
|
||||||
sequence types are as follows:
|
sequence types are as follows:
|
||||||
|
|
||||||
``asdl_seq_GET(asdl_xx_seq *, int)``
|
`asdl_seq_GET(asdl_xx_seq *, int)`
|
||||||
Get item held at a specific position in an ``asdl_xx_seq``
|
Get item held at a specific position in an `asdl_xx_seq`
|
||||||
``asdl_seq_SET(asdl_xx_seq *, int, stmt_ty)``
|
`asdl_seq_SET(asdl_xx_seq *, int, stmt_ty)`
|
||||||
Set a specific index in an ``asdl_xx_seq`` to the specified value
|
Set a specific index in an `asdl_xx_seq` to the specified value
|
||||||
|
|
||||||
Untyped counterparts exist for some of the typed macros. These are useful
|
Untyped counterparts exist for some of the typed macros. These are useful
|
||||||
when a function needs to manipulate a generic ASDL sequence:
|
when a function needs to manipulate a generic ASDL sequence:
|
||||||
|
|
||||||
``asdl_seq_GET_UNTYPED(asdl_seq *, int)``
|
`asdl_seq_GET_UNTYPED(asdl_seq *, int)`
|
||||||
Get item held at a specific position in an ``asdl_seq``
|
Get item held at a specific position in an `asdl_seq`
|
||||||
``asdl_seq_SET_UNTYPED(asdl_seq *, int, stmt_ty)``
|
`asdl_seq_SET_UNTYPED(asdl_seq *, int, stmt_ty)`
|
||||||
Set a specific index in an ``asdl_seq`` to the specified value
|
Set a specific index in an `asdl_seq` to the specified value
|
||||||
``asdl_seq_LEN(asdl_seq *)``
|
`asdl_seq_LEN(asdl_seq *)`
|
||||||
Return the length of an ``asdl_seq`` or ``asdl_xx_seq``
|
Return the length of an `asdl_seq` or `asdl_xx_seq`
|
||||||
|
|
||||||
Note that typed macros and functions are recommended over their untyped
|
Note that typed macros and functions are recommended over their untyped
|
||||||
counterparts. Typed macros carry out checks in debug mode and aid
|
counterparts. Typed macros carry out checks in debug mode and aid
|
||||||
debugging errors caused by incorrectly casting from ``void *``.
|
debugging errors caused by incorrectly casting from `void *`.
|
||||||
|
|
||||||
If you are working with statements, you must also worry about keeping
|
If you are working with statements, you must also worry about keeping
|
||||||
track of what line number generated the statement. Currently the line
|
track of what line number generated the statement. Currently the line
|
||||||
number is passed as the last parameter to each ``stmt_ty`` function.
|
number is passed as the last parameter to each `stmt_ty` function.
|
||||||
|
|
||||||
See also [PEP 617: New PEG parser for CPython](https://peps.python.org/pep-0617/).
|
See also [PEP 617: New PEG parser for CPython](https://peps.python.org/pep-0617/).
|
||||||
|
|
||||||
|
@ -333,19 +324,19 @@ else:
|
||||||
end()
|
end()
|
||||||
```
|
```
|
||||||
|
|
||||||
The ``x < 10`` guard is represented by its own basic block that
|
The `x < 10` guard is represented by its own basic block that
|
||||||
compares ``x`` with ``10`` and then ends in a conditional jump based on
|
compares `x` with `10` and then ends in a conditional jump based on
|
||||||
the result of the comparison. This conditional jump allows the block
|
the result of the comparison. This conditional jump allows the block
|
||||||
to point to both the body of the ``if`` and the body of the ``else``. The
|
to point to both the body of the `if` and the body of the `else`. The
|
||||||
``if`` basic block contains the ``f1()`` and ``f2()`` calls and points to
|
`if` basic block contains the `f1()` and `f2()` calls and points to
|
||||||
the ``end()`` basic block. The ``else`` basic block contains the ``g()``
|
the `end()` basic block. The `else` basic block contains the `g()`
|
||||||
call and similarly points to the ``end()`` block.
|
call and similarly points to the `end()` block.
|
||||||
|
|
||||||
Note that more complex code in the guard, the ``if`` body, or the ``else``
|
Note that more complex code in the guard, the `if` body, or the `else`
|
||||||
body may be represented by multiple basic blocks. For instance,
|
body may be represented by multiple basic blocks. For instance,
|
||||||
short-circuiting boolean logic in a guard like ``if x or y:``
|
short-circuiting boolean logic in a guard like `if x or y:`
|
||||||
will produce one basic block that tests the truth value of ``x``
|
will produce one basic block that tests the truth value of `x`
|
||||||
and then points both (1) to the start of the ``if`` body and (2) to
|
and then points both (1) to the start of the `if` body and (2) to
|
||||||
a different basic block that tests the truth value of y.
|
a different basic block that tests the truth value of y.
|
||||||
|
|
||||||
CFGs are useful as an intermediate representation of the code because
|
CFGs are useful as an intermediate representation of the code because
|
||||||
|
@ -354,27 +345,24 @@ they are a convenient data structure for optimizations.
|
||||||
AST to CFG to bytecode
|
AST to CFG to bytecode
|
||||||
======================
|
======================
|
||||||
|
|
||||||
The conversion of an ``AST`` to bytecode is initiated by a call to the function
|
The conversion of an `AST` to bytecode is initiated by a call to the function
|
||||||
``_PyAST_Compile()`` in
|
`_PyAST_Compile()` in [Python/compile.c](../Python/compile.c).
|
||||||
[Python/compile.c](https://github.com/python/cpython/blob/main/Python/compile.c).
|
|
||||||
|
|
||||||
The first step is to construct the symbol table. This is implemented by
|
The first step is to construct the symbol table. This is implemented by
|
||||||
``_PySymtable_Build()`` in
|
`_PySymtable_Build()` in [Python/symtable.c](../Python/symtable.c).
|
||||||
[Python/symtable.c](https://github.com/python/cpython/blob/main/Python/symtable.c).
|
|
||||||
This function begins by entering the starting code block for the AST (passed-in)
|
This function begins by entering the starting code block for the AST (passed-in)
|
||||||
and then calling the proper `symtable_visit_{xx}` function (with *xx* being the
|
and then calling the proper `symtable_visit_{xx}` function (with *xx* being the
|
||||||
AST node type). Next, the AST tree is walked with the various code blocks that
|
AST node type). Next, the AST tree is walked with the various code blocks that
|
||||||
delineate the reach of a local variable as blocks are entered and exited using
|
delineate the reach of a local variable as blocks are entered and exited using
|
||||||
``symtable_enter_block()`` and ``symtable_exit_block()``, respectively.
|
`symtable_enter_block()` and `symtable_exit_block()`, respectively.
|
||||||
|
|
||||||
Once the symbol table is created, the ``AST`` is transformed by ``compiler_codegen()``
|
Once the symbol table is created, the `AST` is transformed by `compiler_codegen()`
|
||||||
in [Python/compile.c](https://github.com/python/cpython/blob/main/Python/compile.c)
|
in [Python/compile.c](../Python/compile.c) into a sequence of pseudo instructions.
|
||||||
into a sequence of pseudo instructions. These are similar to bytecode, but
|
These are similar to bytecode, but in some cases they are more abstract, and are
|
||||||
in some cases they are more abstract, and are resolved later into actual
|
resolved later into actual bytecode. The construction of this instruction sequence
|
||||||
bytecode. The construction of this instruction sequence is handled by several
|
is handled by several functions that break the task down by various AST node types.
|
||||||
functions that break the task down by various AST node types. The functions are
|
The functions are all named `compiler_visit_{xx}` where *xx* is the name of the node
|
||||||
all named `compiler_visit_{xx}` where *xx* is the name of the node type (such
|
type (such as `stmt`, `expr`, etc.). Each function receives a `struct compiler *`
|
||||||
as ``stmt``, ``expr``, etc.). Each function receives a ``struct compiler *``
|
|
||||||
and `{xx}_ty` where *xx* is the AST node type. Typically these functions
|
and `{xx}_ty` where *xx* is the AST node type. Typically these functions
|
||||||
consist of a large 'switch' statement, branching based on the kind of
|
consist of a large 'switch' statement, branching based on the kind of
|
||||||
node type passed to it. Simple things are handled inline in the
|
node type passed to it. Simple things are handled inline in the
|
||||||
|
@ -382,242 +370,224 @@ node type passed to it. Simple things are handled inline in the
|
||||||
functions named `compiler_{xx}` with *xx* being a descriptive name of what is
|
functions named `compiler_{xx}` with *xx* being a descriptive name of what is
|
||||||
being handled.
|
being handled.
|
||||||
|
|
||||||
When transforming an arbitrary AST node, use the ``VISIT()`` macro.
|
When transforming an arbitrary AST node, use the `VISIT()` macro.
|
||||||
The appropriate `compiler_visit_{xx}` function is called, based on the value
|
The appropriate `compiler_visit_{xx}` function is called, based on the value
|
||||||
passed in for <node type> (so `VISIT({c}, expr, {node})` calls
|
passed in for <node type> (so `VISIT({c}, expr, {node})` calls
|
||||||
`compiler_visit_expr({c}, {node})`). The ``VISIT_SEQ()`` macro is very similar,
|
`compiler_visit_expr({c}, {node})`). The `VISIT_SEQ()` macro is very similar,
|
||||||
but is called on AST node sequences (those values that were created as
|
but is called on AST node sequences (those values that were created as
|
||||||
arguments to a node that used the '*' modifier).
|
arguments to a node that used the '*' modifier).
|
||||||
|
|
||||||
Emission of bytecode is handled by the following macros:
|
Emission of bytecode is handled by the following macros:
|
||||||
|
|
||||||
* ``ADDOP(struct compiler *, location, int)``
|
* `ADDOP(struct compiler *, location, int)`
|
||||||
add a specified opcode
|
add a specified opcode
|
||||||
* ``ADDOP_IN_SCOPE(struct compiler *, location, int)``
|
* `ADDOP_IN_SCOPE(struct compiler *, location, int)`
|
||||||
like ``ADDOP``, but also exits current scope; used for adding return value
|
like `ADDOP`, but also exits current scope; used for adding return value
|
||||||
opcodes in lambdas and closures
|
opcodes in lambdas and closures
|
||||||
* ``ADDOP_I(struct compiler *, location, int, Py_ssize_t)``
|
* `ADDOP_I(struct compiler *, location, int, Py_ssize_t)`
|
||||||
add an opcode that takes an integer argument
|
add an opcode that takes an integer argument
|
||||||
* ``ADDOP_O(struct compiler *, location, int, PyObject *, TYPE)``
|
* `ADDOP_O(struct compiler *, location, int, PyObject *, TYPE)`
|
||||||
add an opcode with the proper argument based on the position of the
|
add an opcode with the proper argument based on the position of the
|
||||||
specified PyObject in PyObject sequence object, but with no handling of
|
specified PyObject in PyObject sequence object, but with no handling of
|
||||||
mangled names; used for when you
|
mangled names; used for when you
|
||||||
need to do named lookups of objects such as globals, consts, or
|
need to do named lookups of objects such as globals, consts, or
|
||||||
parameters where name mangling is not possible and the scope of the
|
parameters where name mangling is not possible and the scope of the
|
||||||
name is known; *TYPE* is the name of PyObject sequence
|
name is known; *TYPE* is the name of PyObject sequence
|
||||||
(``names`` or ``varnames``)
|
(`names` or `varnames`)
|
||||||
* ``ADDOP_N(struct compiler *, location, int, PyObject *, TYPE)``
|
* `ADDOP_N(struct compiler *, location, int, PyObject *, TYPE)`
|
||||||
just like ``ADDOP_O``, but steals a reference to PyObject
|
just like `ADDOP_O`, but steals a reference to PyObject
|
||||||
* ``ADDOP_NAME(struct compiler *, location, int, PyObject *, TYPE)``
|
* `ADDOP_NAME(struct compiler *, location, int, PyObject *, TYPE)`
|
||||||
just like ``ADDOP_O``, but name mangling is also handled; used for
|
just like `ADDOP_O`, but name mangling is also handled; used for
|
||||||
attribute loading or importing based on name
|
attribute loading or importing based on name
|
||||||
* ``ADDOP_LOAD_CONST(struct compiler *, location, PyObject *)``
|
* `ADDOP_LOAD_CONST(struct compiler *, location, PyObject *)`
|
||||||
add the ``LOAD_CONST`` opcode with the proper argument based on the
|
add the `LOAD_CONST` opcode with the proper argument based on the
|
||||||
position of the specified PyObject in the consts table.
|
position of the specified PyObject in the consts table.
|
||||||
* ``ADDOP_LOAD_CONST_NEW(struct compiler *, location, PyObject *)``
|
* `ADDOP_LOAD_CONST_NEW(struct compiler *, location, PyObject *)`
|
||||||
just like ``ADDOP_LOAD_CONST_NEW``, but steals a reference to PyObject
|
just like `ADDOP_LOAD_CONST_NEW`, but steals a reference to PyObject
|
||||||
* ``ADDOP_JUMP(struct compiler *, location, int, basicblock *)``
|
* `ADDOP_JUMP(struct compiler *, location, int, basicblock *)`
|
||||||
create a jump to a basic block
|
create a jump to a basic block
|
||||||
|
|
||||||
The ``location`` argument is a struct with the source location to be
|
The `location` argument is a struct with the source location to be
|
||||||
associated with this instruction. It is typically extracted from an
|
associated with this instruction. It is typically extracted from an
|
||||||
``AST`` node with the ``LOC`` macro. The ``NO_LOCATION`` can be used
|
`AST` node with the `LOC` macro. The `NO_LOCATION` can be used
|
||||||
for *synthetic* instructions, which we do not associate with a line
|
for *synthetic* instructions, which we do not associate with a line
|
||||||
number at this stage. For example, the implicit ``return None``
|
number at this stage. For example, the implicit `return None`
|
||||||
which is added at the end of a function is not associated with any
|
which is added at the end of a function is not associated with any
|
||||||
line in the source code.
|
line in the source code.
|
||||||
|
|
||||||
There are several helper functions that will emit pseudo-instructions
|
There are several helper functions that will emit pseudo-instructions
|
||||||
and are named `compiler_{xx}()` where *xx* is what the function helps
|
and are named `compiler_{xx}()` where *xx* is what the function helps
|
||||||
with (``list``, ``boolop``, etc.). A rather useful one is ``compiler_nameop()``.
|
with (`list`, `boolop`, etc.). A rather useful one is `compiler_nameop()`.
|
||||||
This function looks up the scope of a variable and, based on the
|
This function looks up the scope of a variable and, based on the
|
||||||
expression context, emits the proper opcode to load, store, or delete
|
expression context, emits the proper opcode to load, store, or delete
|
||||||
the variable.
|
the variable.
|
||||||
|
|
||||||
Once the instruction sequence is created, it is transformed into a CFG
|
Once the instruction sequence is created, it is transformed into a CFG
|
||||||
by ``_PyCfg_FromInstructionSequence()``. Then ``_PyCfg_OptimizeCodeUnit()``
|
by `_PyCfg_FromInstructionSequence()`. Then `_PyCfg_OptimizeCodeUnit()`
|
||||||
applies various peephole optimizations, and
|
applies various peephole optimizations, and
|
||||||
``_PyCfg_OptimizedCfgToInstructionSequence()`` converts the optimized ``CFG``
|
`_PyCfg_OptimizedCfgToInstructionSequence()` converts the optimized `CFG`
|
||||||
back into an instruction sequence. These conversions and optimizations are
|
back into an instruction sequence. These conversions and optimizations are
|
||||||
implemented in
|
implemented in [Python/flowgraph.c](../Python/flowgraph.c).
|
||||||
[Python/flowgraph.c](https://github.com/python/cpython/blob/main/Python/flowgraph.c).
|
|
||||||
|
|
||||||
Finally, the sequence of pseudo-instructions is converted into actual
|
Finally, the sequence of pseudo-instructions is converted into actual
|
||||||
bytecode. This includes transforming pseudo instructions into actual instructions,
|
bytecode. This includes transforming pseudo instructions into actual instructions,
|
||||||
converting jump targets from logical labels to relative offsets, and
|
converting jump targets from logical labels to relative offsets, and
|
||||||
construction of the
|
construction of the [exception table](exception_handling.md) and
|
||||||
[exception table](exception_handling.md) and
|
[locations table](locations.md).
|
||||||
[locations table](https://github.com/python/cpython/blob/main/InternalDocs/locations.md).
|
The bytecode and tables are then wrapped into a `PyCodeObject` along with additional
|
||||||
The bytecode and tables are then wrapped into a ``PyCodeObject`` along with additional
|
metadata, including the `consts` and `names` arrays, information about function
|
||||||
metadata, including the ``consts`` and ``names`` arrays, information about function
|
|
||||||
reference to the source code (filename, etc). All of this is implemented by
|
reference to the source code (filename, etc). All of this is implemented by
|
||||||
``_PyAssemble_MakeCodeObject()`` in
|
`_PyAssemble_MakeCodeObject()` in [Python/assemble.c](../Python/assemble.c).
|
||||||
[Python/assemble.c](https://github.com/python/cpython/blob/main/Python/assemble.c).
|
|
||||||
|
|
||||||
|
|
||||||
Code objects
|
Code objects
|
||||||
============
|
============
|
||||||
|
|
||||||
The result of ``PyAST_CompileObject()`` is a ``PyCodeObject`` which is defined in
|
The result of `PyAST_CompileObject()` is a `PyCodeObject` which is defined in
|
||||||
[Include/cpython/code.h](https://github.com/python/cpython/blob/main/Include/cpython/code.h).
|
[Include/cpython/code.h](../Include/cpython/code.h).
|
||||||
And with that you now have executable Python bytecode!
|
And with that you now have executable Python bytecode!
|
||||||
|
|
||||||
The code objects (byte code) are executed in
|
The code objects (byte code) are executed in [Python/ceval.c](../Python/ceval.c).
|
||||||
[Python/ceval.c](https://github.com/python/cpython/blob/main/Python/ceval.c).
|
|
||||||
This file will also need a new case statement for the new opcode in the big switch
|
This file will also need a new case statement for the new opcode in the big switch
|
||||||
statement in ``_PyEval_EvalFrameDefault()``.
|
statement in `_PyEval_EvalFrameDefault()`.
|
||||||
|
|
||||||
|
|
||||||
Important files
|
Important files
|
||||||
===============
|
===============
|
||||||
|
|
||||||
* [Parser/](https://github.com/python/cpython/blob/main/Parser/)
|
* [Parser/](../Parser/)
|
||||||
|
|
||||||
* [Parser/Python.asdl](https://github.com/python/cpython/blob/main/Parser/Python.asdl):
|
* [Parser/Python.asdl](../Parser/Python.asdl):
|
||||||
ASDL syntax file.
|
ASDL syntax file.
|
||||||
|
|
||||||
* [Parser/asdl.py](https://github.com/python/cpython/blob/main/Parser/asdl.py):
|
* [Parser/asdl.py](../Parser/asdl.py):
|
||||||
Parser for ASDL definition files.
|
Parser for ASDL definition files.
|
||||||
Reads in an ASDL description and parses it into an AST that describes it.
|
Reads in an ASDL description and parses it into an AST that describes it.
|
||||||
|
|
||||||
* [Parser/asdl_c.py](https://github.com/python/cpython/blob/main/Parser/asdl_c.py):
|
* [Parser/asdl_c.py](../Parser/asdl_c.py):
|
||||||
Generate C code from an ASDL description. Generates
|
Generate C code from an ASDL description. Generates
|
||||||
[Python/Python-ast.c](https://github.com/python/cpython/blob/main/Python/Python-ast.c)
|
[Python/Python-ast.c](../Python/Python-ast.c) and
|
||||||
and
|
[Include/internal/pycore_ast.h](../Include/internal/pycore_ast.h).
|
||||||
[Include/internal/pycore_ast.h](https://github.com/python/cpython/blob/main/Include/internal/pycore_ast.h).
|
|
||||||
|
|
||||||
* [Parser/parser.c](https://github.com/python/cpython/blob/main/Parser/parser.c):
|
* [Parser/parser.c](../Parser/parser.c):
|
||||||
The new PEG parser introduced in Python 3.9.
|
The new PEG parser introduced in Python 3.9. Generated by
|
||||||
Generated by
|
[Tools/peg_generator/pegen/c_generator.py](../Tools/peg_generator/pegen/c_generator.py)
|
||||||
[Tools/peg_generator/pegen/c_generator.py](https://github.com/python/cpython/blob/main/Tools/peg_generator/pegen/c_generator.py)
|
from the grammar [Grammar/python.gram](../Grammar/python.gram).
|
||||||
from the grammar [Grammar/python.gram](https://github.com/python/cpython/blob/main/Grammar/python.gram).
|
|
||||||
Creates the AST from source code. Rule functions for their corresponding production
|
Creates the AST from source code. Rule functions for their corresponding production
|
||||||
rules are found here.
|
rules are found here.
|
||||||
|
|
||||||
* [Parser/peg_api.c](https://github.com/python/cpython/blob/main/Parser/peg_api.c):
|
* [Parser/peg_api.c](../Parser/peg_api.c):
|
||||||
Contains high-level functions which are
|
Contains high-level functions which are used by the interpreter to create
|
||||||
used by the interpreter to create an AST from source code.
|
an AST from source code.
|
||||||
|
|
||||||
* [Parser/pegen.c](https://github.com/python/cpython/blob/main/Parser/pegen.c):
|
* [Parser/pegen.c](../Parser/pegen.c):
|
||||||
Contains helper functions which are used by functions in
|
Contains helper functions which are used by functions in
|
||||||
[Parser/parser.c](https://github.com/python/cpython/blob/main/Parser/parser.c)
|
[Parser/parser.c](../Parser/parser.c) to construct the AST. Also contains
|
||||||
to construct the AST. Also contains helper functions which help raise better error messages
|
helper functions which help raise better error messages when parsing source code.
|
||||||
when parsing source code.
|
|
||||||
|
|
||||||
* [Parser/pegen.h](https://github.com/python/cpython/blob/main/Parser/pegen.h):
|
* [Parser/pegen.h](../Parser/pegen.h):
|
||||||
Header file for the corresponding
|
Header file for the corresponding [Parser/pegen.c](../Parser/pegen.c).
|
||||||
[Parser/pegen.c](https://github.com/python/cpython/blob/main/Parser/pegen.c).
|
Also contains definitions of the `Parser` and `Token` structs.
|
||||||
Also contains definitions of the ``Parser`` and ``Token`` structs.
|
|
||||||
|
|
||||||
* [Python/](https://github.com/python/cpython/blob/main/Python)
|
* [Python/](../Python)
|
||||||
|
|
||||||
* [Python/Python-ast.c](https://github.com/python/cpython/blob/main/Python/Python-ast.c):
|
* [Python/Python-ast.c](../Python/Python-ast.c):
|
||||||
Creates C structs corresponding to the ASDL types. Also contains code for
|
Creates C structs corresponding to the ASDL types. Also contains code for
|
||||||
marshalling AST nodes (core ASDL types have marshalling code in
|
marshalling AST nodes (core ASDL types have marshalling code in
|
||||||
[Python/asdl.c](https://github.com/python/cpython/blob/main/Python/asdl.c)).
|
[Python/asdl.c](../Python/asdl.c)).
|
||||||
File automatically generated by
|
File automatically generated by [Parser/asdl_c.py](../Parser/asdl_c.py).
|
||||||
[Parser/asdl_c.py](https://github.com/python/cpython/blob/main/Parser/asdl_c.py).
|
|
||||||
This file must be committed separately after every grammar change
|
This file must be committed separately after every grammar change
|
||||||
is committed since the ``__version__`` value is set to the latest
|
is committed since the `__version__` value is set to the latest
|
||||||
grammar change revision number.
|
grammar change revision number.
|
||||||
|
|
||||||
* [Python/asdl.c](https://github.com/python/cpython/blob/main/Python/asdl.c):
|
* [Python/asdl.c](../Python/asdl.c):
|
||||||
Contains code to handle the ASDL sequence type.
|
Contains code to handle the ASDL sequence type.
|
||||||
Also has code to handle marshalling the core ASDL types, such as number
|
Also has code to handle marshalling the core ASDL types, such as number
|
||||||
and identifier. Used by
|
and identifier. Used by [Python/Python-ast.c](../Python/Python-ast.c)
|
||||||
[Python/Python-ast.c](https://github.com/python/cpython/blob/main/Python/Python-ast.c)
|
|
||||||
for marshalling AST nodes.
|
for marshalling AST nodes.
|
||||||
|
|
||||||
* [Python/ast.c](https://github.com/python/cpython/blob/main/Python/ast.c):
|
* [Python/ast.c](../Python/ast.c):
|
||||||
Used for validating the AST.
|
Used for validating the AST.
|
||||||
|
|
||||||
* [Python/ast_opt.c](https://github.com/python/cpython/blob/main/Python/ast_opt.c):
|
* [Python/ast_opt.c](../Python/ast_opt.c):
|
||||||
Optimizes the AST.
|
Optimizes the AST.
|
||||||
|
|
||||||
* [Python/ast_unparse.c](https://github.com/python/cpython/blob/main/Python/ast_unparse.c):
|
* [Python/ast_unparse.c](../Python/ast_unparse.c):
|
||||||
Converts the AST expression node back into a string (for string annotations).
|
Converts the AST expression node back into a string (for string annotations).
|
||||||
|
|
||||||
* [Python/ceval.c](https://github.com/python/cpython/blob/main/Python/ceval.c):
|
* [Python/ceval.c](../Python/ceval.c):
|
||||||
Executes byte code (aka, eval loop).
|
Executes byte code (aka, eval loop).
|
||||||
|
|
||||||
* [Python/symtable.c](https://github.com/python/cpython/blob/main/Python/symtable.c):
|
* [Python/symtable.c](../Python/symtable.c):
|
||||||
Generates a symbol table from AST.
|
Generates a symbol table from AST.
|
||||||
|
|
||||||
* [Python/pyarena.c](https://github.com/python/cpython/blob/main/Python/pyarena.c):
|
* [Python/pyarena.c](../Python/pyarena.c):
|
||||||
Implementation of the arena memory manager.
|
Implementation of the arena memory manager.
|
||||||
|
|
||||||
* [Python/compile.c](https://github.com/python/cpython/blob/main/Python/compile.c):
|
* [Python/compile.c](../Python/compile.c):
|
||||||
Emits pseudo bytecode based on the AST.
|
Emits pseudo bytecode based on the AST.
|
||||||
|
|
||||||
* [Python/flowgraph.c](https://github.com/python/cpython/blob/main/Python/flowgraph.c):
|
* [Python/flowgraph.c](../Python/flowgraph.c):
|
||||||
Implements peephole optimizations.
|
Implements peephole optimizations.
|
||||||
|
|
||||||
* [Python/assemble.c](https://github.com/python/cpython/blob/main/Python/assemble.c):
|
* [Python/assemble.c](../Python/assemble.c):
|
||||||
Constructs a code object from a sequence of pseudo instructions.
|
Constructs a code object from a sequence of pseudo instructions.
|
||||||
|
|
||||||
* [Python/instruction_sequence.c](https://github.com/python/cpython/blob/main/Python/instruction_sequence.c):
|
* [Python/instruction_sequence.c](../Python/instruction_sequence.c):
|
||||||
A data structure representing a sequence of bytecode-like pseudo-instructions.
|
A data structure representing a sequence of bytecode-like pseudo-instructions.
|
||||||
|
|
||||||
* [Include/](https://github.com/python/cpython/blob/main/Include/)
|
* [Include/](../Include/)
|
||||||
|
|
||||||
* [Include/cpython/code.h](https://github.com/python/cpython/blob/main/Include/cpython/code.h)
|
* [Include/cpython/code.h](../Include/cpython/code.h)
|
||||||
: Header file for
|
: Header file for [Objects/codeobject.c](../Objects/codeobject.c);
|
||||||
[Objects/codeobject.c](https://github.com/python/cpython/blob/main/Objects/codeobject.c);
|
contains definition of `PyCodeObject`.
|
||||||
contains definition of ``PyCodeObject``.
|
|
||||||
|
|
||||||
* [Include/opcode.h](https://github.com/python/cpython/blob/main/Include/opcode.h)
|
* [Include/opcode.h](../Include/opcode.h)
|
||||||
: One of the files that must be modified if
|
: One of the files that must be modified whenever
|
||||||
[Lib/opcode.py](https://github.com/python/cpython/blob/main/Lib/opcode.py) is.
|
[Lib/opcode.py](../Lib/opcode.py) is.
|
||||||
|
|
||||||
* [Include/internal/pycore_ast.h](https://github.com/python/cpython/blob/main/Include/internal/pycore_ast.h)
|
* [Include/internal/pycore_ast.h](../Include/internal/pycore_ast.h)
|
||||||
: Contains the actual definitions of the C structs as generated by
|
: Contains the actual definitions of the C structs as generated by
|
||||||
[Python/Python-ast.c](https://github.com/python/cpython/blob/main/Python/Python-ast.c)
|
[Python/Python-ast.c](../Python/Python-ast.c).
|
||||||
Automatically generated by
|
Automatically generated by [Parser/asdl_c.py](../Parser/asdl_c.py).
|
||||||
[Parser/asdl_c.py](https://github.com/python/cpython/blob/main/Parser/asdl_c.py).
|
|
||||||
|
|
||||||
* [Include/internal/pycore_asdl.h](https://github.com/python/cpython/blob/main/Include/internal/pycore_asdl.h)
|
* [Include/internal/pycore_asdl.h](../Include/internal/pycore_asdl.h)
|
||||||
: Header for the corresponding
|
: Header for the corresponding [Python/ast.c](../Python/ast.c).
|
||||||
[Python/ast.c](https://github.com/python/cpython/blob/main/Python/ast.c).
|
|
||||||
|
|
||||||
* [Include/internal/pycore_ast.h](https://github.com/python/cpython/blob/main/Include/internal/pycore_ast.h)
|
* [Include/internal/pycore_ast.h](../Include/internal/pycore_ast.h)
|
||||||
: Declares ``_PyAST_Validate()`` external (from
|
: Declares `_PyAST_Validate()` external (from [Python/ast.c](../Python/ast.c)).
|
||||||
[Python/ast.c](https://github.com/python/cpython/blob/main/Python/ast.c)).
|
|
||||||
|
|
||||||
* [Include/internal/pycore_symtable.h](https://github.com/python/cpython/blob/main/Include/internal/pycore_symtable.h)
|
* [Include/internal/pycore_symtable.h](../Include/internal/pycore_symtable.h)
|
||||||
: Header for
|
: Header for [Python/symtable.c](../Python/symtable.c).
|
||||||
[Python/symtable.c](https://github.com/python/cpython/blob/main/Python/symtable.c).
|
`struct symtable` and `PySTEntryObject` are defined here.
|
||||||
``struct symtable`` and ``PySTEntryObject`` are defined here.
|
|
||||||
|
|
||||||
* [Include/internal/pycore_parser.h](https://github.com/python/cpython/blob/main/Include/internal/pycore_parser.h)
|
* [Include/internal/pycore_parser.h](../Include/internal/pycore_parser.h)
|
||||||
: Header for the corresponding
|
: Header for the corresponding [Parser/peg_api.c](../Parser/peg_api.c).
|
||||||
[Parser/peg_api.c](https://github.com/python/cpython/blob/main/Parser/peg_api.c).
|
|
||||||
|
|
||||||
* [Include/internal/pycore_pyarena.h](https://github.com/python/cpython/blob/main/Include/internal/pycore_pyarena.h)
|
* [Include/internal/pycore_pyarena.h](../Include/internal/pycore_pyarena.h)
|
||||||
: Header file for the corresponding
|
: Header file for the corresponding [Python/pyarena.c](../Python/pyarena.c).
|
||||||
[Python/pyarena.c](https://github.com/python/cpython/blob/main/Python/pyarena.c).
|
|
||||||
|
|
||||||
* [Include/opcode_ids.h](https://github.com/python/cpython/blob/main/Include/opcode_ids.h)
|
* [Include/opcode_ids.h](../Include/opcode_ids.h)
|
||||||
: List of opcodes. Generated from
|
: List of opcodes. Generated from [Python/bytecodes.c](../Python/bytecodes.c)
|
||||||
[Python/bytecodes.c](https://github.com/python/cpython/blob/main/Python/bytecodes.c)
|
|
||||||
by
|
by
|
||||||
[Tools/cases_generator/opcode_id_generator.py](https://github.com/python/cpython/blob/main/Tools/cases_generator/opcode_id_generator.py).
|
[Tools/cases_generator/opcode_id_generator.py](../Tools/cases_generator/opcode_id_generator.py).
|
||||||
|
|
||||||
* [Objects/](https://github.com/python/cpython/blob/main/Objects/)
|
* [Objects/](../Objects/)
|
||||||
|
|
||||||
* [Objects/codeobject.c](https://github.com/python/cpython/blob/main/Objects/codeobject.c)
|
* [Objects/codeobject.c](../Objects/codeobject.c)
|
||||||
: Contains PyCodeObject-related code.
|
: Contains PyCodeObject-related code.
|
||||||
|
|
||||||
* [Objects/frameobject.c](https://github.com/python/cpython/blob/main/Objects/frameobject.c)
|
* [Objects/frameobject.c](../Objects/frameobject.c)
|
||||||
: Contains the ``frame_setlineno()`` function which should determine whether it is allowed
|
: Contains the `frame_setlineno()` function which should determine whether it is allowed
|
||||||
to make a jump between two points in a bytecode.
|
to make a jump between two points in a bytecode.
|
||||||
|
|
||||||
* [Lib/](https://github.com/python/cpython/blob/main/Lib/)
|
* [Lib/](../Lib/)
|
||||||
|
|
||||||
* [Lib/opcode.py](https://github.com/python/cpython/blob/main/Lib/opcode.py)
|
* [Lib/opcode.py](../Lib/opcode.py)
|
||||||
: opcode utilities exposed to Python.
|
: opcode utilities exposed to Python.
|
||||||
|
|
||||||
* [Include/core/pycore_magic_number.h](https://github.com/python/cpython/blob/main/Include/internal/pycore_magic_number.h)
|
* [Include/core/pycore_magic_number.h](../Include/internal/pycore_magic_number.h)
|
||||||
: Home of the magic number (named ``MAGIC_NUMBER``) for bytecode versioning.
|
: Home of the magic number (named `MAGIC_NUMBER`) for bytecode versioning.
|
||||||
|
|
||||||
|
|
||||||
Objects
|
Objects
|
||||||
|
@ -625,7 +595,7 @@ Objects
|
||||||
|
|
||||||
* [Locations](locations.md): Describes the location table
|
* [Locations](locations.md): Describes the location table
|
||||||
* [Frames](frames.md): Describes frames and the frame stack
|
* [Frames](frames.md): Describes frames and the frame stack
|
||||||
* [Objects/object_layout.md](https://github.com/python/cpython/blob/main/Objects/object_layout.md): Describes object layout for 3.11 and later
|
* [Objects/object_layout.md](../Objects/object_layout.md): Describes object layout for 3.11 and later
|
||||||
* [Exception Handling](exception_handling.md): Describes the exception table
|
* [Exception Handling](exception_handling.md): Describes the exception table
|
||||||
|
|
||||||
|
|
||||||
|
|
|
@ -68,18 +68,16 @@ Handling Exceptions
|
||||||
-------------------
|
-------------------
|
||||||
|
|
||||||
At runtime, when an exception occurs, the interpreter calls
|
At runtime, when an exception occurs, the interpreter calls
|
||||||
``get_exception_handler()`` in
|
`get_exception_handler()` in [Python/ceval.c](../Python/ceval.c)
|
||||||
[Python/ceval.c](https://github.com/python/cpython/blob/main/Python/ceval.c)
|
|
||||||
to look up the offset of the current instruction in the exception
|
to look up the offset of the current instruction in the exception
|
||||||
table. If it finds a handler, control flow transfers to it. Otherwise, the
|
table. If it finds a handler, control flow transfers to it. Otherwise, the
|
||||||
exception bubbles up to the caller, and the caller's frame is
|
exception bubbles up to the caller, and the caller's frame is
|
||||||
checked for a handler covering the `CALL` instruction. This
|
checked for a handler covering the `CALL` instruction. This
|
||||||
repeats until a handler is found or the topmost frame is reached.
|
repeats until a handler is found or the topmost frame is reached.
|
||||||
If no handler is found, then the interpreter function
|
If no handler is found, then the interpreter function
|
||||||
(``_PyEval_EvalFrameDefault()``) returns NULL. During unwinding,
|
(`_PyEval_EvalFrameDefault()`) returns NULL. During unwinding,
|
||||||
the traceback is constructed as each frame is added to it by
|
the traceback is constructed as each frame is added to it by
|
||||||
``PyTraceBack_Here()``, which is in
|
`PyTraceBack_Here()`, which is in [Python/traceback.c](../Python/traceback.c).
|
||||||
[Python/traceback.c](https://github.com/python/cpython/blob/main/Python/traceback.c).
|
|
||||||
|
|
||||||
Along with the location of an exception handler, each entry of the
|
Along with the location of an exception handler, each entry of the
|
||||||
exception table also contains the stack depth of the `try` instruction
|
exception table also contains the stack depth of the `try` instruction
|
||||||
|
@ -174,22 +172,20 @@ which is then encoded as:
|
||||||
|
|
||||||
for a total of five bytes.
|
for a total of five bytes.
|
||||||
|
|
||||||
The code to construct the exception table is in ``assemble_exception_table()``
|
The code to construct the exception table is in `assemble_exception_table()`
|
||||||
in [Python/assemble.c](https://github.com/python/cpython/blob/main/Python/assemble.c).
|
in [Python/assemble.c](../Python/assemble.c).
|
||||||
|
|
||||||
The interpreter's function to lookup the table by instruction offset is
|
The interpreter's function to lookup the table by instruction offset is
|
||||||
``get_exception_handler()`` in
|
`get_exception_handler()` in [Python/ceval.c](../Python/ceval.c).
|
||||||
[Python/ceval.c](https://github.com/python/cpython/blob/main/Python/ceval.c).
|
The Python function `_parse_exception_table()` in [Lib/dis.py](../Lib/dis.py)
|
||||||
The Python function ``_parse_exception_table()`` in
|
|
||||||
[Lib/dis.py](https://github.com/python/cpython/blob/main/Lib/dis.py)
|
|
||||||
returns the exception table content as a list of namedtuple instances.
|
returns the exception table content as a list of namedtuple instances.
|
||||||
|
|
||||||
Exception Chaining Implementation
|
Exception Chaining Implementation
|
||||||
---------------------------------
|
---------------------------------
|
||||||
|
|
||||||
[Exception chaining](https://docs.python.org/dev/tutorial/errors.html#exception-chaining)
|
[Exception chaining](https://docs.python.org/dev/tutorial/errors.html#exception-chaining)
|
||||||
refers to setting the ``__context__`` and ``__cause__`` fields of an exception as it is
|
refers to setting the `__context__` and `__cause__` fields of an exception as it is
|
||||||
being raised. The ``__context__`` field is set by ``_PyErr_SetObject()`` in
|
being raised. The `__context__` field is set by `_PyErr_SetObject()` in
|
||||||
[Python/errors.c](https://github.com/python/cpython/blob/main/Python/errors.c)
|
[Python/errors.c](../Python/errors.c) (which is ultimately called by all
|
||||||
(which is ultimately called by all ``PyErr_Set*()`` functions).
|
`PyErr_Set*()` functions). The `__cause__` field (explicit chaining) is set by
|
||||||
The ``__cause__`` field (explicit chaining) is set by the ``RAISE_VARARGS`` bytecode.
|
the `RAISE_VARARGS` bytecode.
|
||||||
|
|
|
@ -10,20 +10,19 @@ of three conceptual sections:
|
||||||
globals dict, code object, instruction pointer, stack depth, the
|
globals dict, code object, instruction pointer, stack depth, the
|
||||||
previous frame, etc.
|
previous frame, etc.
|
||||||
|
|
||||||
The definition of the ``_PyInterpreterFrame`` struct is in
|
The definition of the `_PyInterpreterFrame` struct is in
|
||||||
[Include/internal/pycore_frame.h](https://github.com/python/cpython/blob/main/Include/internal/pycore_frame.h).
|
[Include/internal/pycore_frame.h](../Include/internal/pycore_frame.h).
|
||||||
|
|
||||||
# Allocation
|
# Allocation
|
||||||
|
|
||||||
Python semantics allows frames to outlive the activation, so they need to
|
Python semantics allows frames to outlive the activation, so they need to
|
||||||
be allocated outside the C call stack. To reduce overhead and improve locality
|
be allocated outside the C call stack. To reduce overhead and improve locality
|
||||||
of reference, most frames are allocated contiguously in a per-thread stack
|
of reference, most frames are allocated contiguously in a per-thread stack
|
||||||
(see ``_PyThreadState_PushFrame`` in
|
(see `_PyThreadState_PushFrame` in [Python/pystate.c](../Python/pystate.c)).
|
||||||
[Python/pystate.c](https://github.com/python/cpython/blob/main/Python/pystate.c)).
|
|
||||||
|
|
||||||
Frames of generators and coroutines are embedded in the generator and coroutine
|
Frames of generators and coroutines are embedded in the generator and coroutine
|
||||||
objects, so are not allocated in the per-thread stack. See ``PyGenObject`` in
|
objects, so are not allocated in the per-thread stack. See `PyGenObject` in
|
||||||
[Include/internal/pycore_genobject.h](https://github.com/python/cpython/blob/main/Include/internal/pycore_genobject.h).
|
[Include/internal/pycore_genobject.h](../Include/internal/pycore_genobject.h).
|
||||||
|
|
||||||
## Layout
|
## Layout
|
||||||
|
|
||||||
|
@ -82,16 +81,15 @@ frames for each activation, but with low runtime overhead.
|
||||||
|
|
||||||
### Generators and Coroutines
|
### Generators and Coroutines
|
||||||
|
|
||||||
Generators (objects of type ``PyGen_Type``, ``PyCoro_Type`` or
|
Generators (objects of type `PyGen_Type`, `PyCoro_Type` or
|
||||||
``PyAsyncGen_Type``) have a `_PyInterpreterFrame` embedded in them, so
|
`PyAsyncGen_Type`) have a `_PyInterpreterFrame` embedded in them, so
|
||||||
that they can be created with a single memory allocation.
|
that they can be created with a single memory allocation.
|
||||||
When such an embedded frame is iterated or awaited, it can be linked with
|
When such an embedded frame is iterated or awaited, it can be linked with
|
||||||
frames on the per-thread stack via the linkage fields.
|
frames on the per-thread stack via the linkage fields.
|
||||||
|
|
||||||
If a frame object associated with a generator outlives the generator, then
|
If a frame object associated with a generator outlives the generator, then
|
||||||
the embedded `_PyInterpreterFrame` is copied into the frame object (see
|
the embedded `_PyInterpreterFrame` is copied into the frame object (see
|
||||||
``take_ownership()`` in
|
`take_ownership()` in [Python/frame.c](../Python/frame.c)).
|
||||||
[Python/frame.c](https://github.com/python/cpython/blob/main/Python/frame.c)).
|
|
||||||
|
|
||||||
### Field names
|
### Field names
|
||||||
|
|
||||||
|
|
|
@ -12,7 +12,7 @@ a local variable in some C function. When an object’s reference count becomes
|
||||||
the object is deallocated. If it contains references to other objects, their
|
the object is deallocated. If it contains references to other objects, their
|
||||||
reference counts are decremented. Those other objects may be deallocated in turn, if
|
reference counts are decremented. Those other objects may be deallocated in turn, if
|
||||||
this decrement makes their reference count become zero, and so on. The reference
|
this decrement makes their reference count become zero, and so on. The reference
|
||||||
count field can be examined using the ``sys.getrefcount()`` function (notice that the
|
count field can be examined using the `sys.getrefcount()` function (notice that the
|
||||||
value returned by this function is always 1 more as the function also has a reference
|
value returned by this function is always 1 more as the function also has a reference
|
||||||
to the object when called):
|
to the object when called):
|
||||||
|
|
||||||
|
@ -39,7 +39,7 @@ cycles. For instance, consider this code:
|
||||||
>>> del container
|
>>> del container
|
||||||
```
|
```
|
||||||
|
|
||||||
In this example, ``container`` holds a reference to itself, so even when we remove
|
In this example, `container` holds a reference to itself, so even when we remove
|
||||||
our reference to it (the variable "container") the reference count never falls to 0
|
our reference to it (the variable "container") the reference count never falls to 0
|
||||||
because it still has its own internal reference. Therefore it would never be
|
because it still has its own internal reference. Therefore it would never be
|
||||||
cleaned just by simple reference counting. For this reason some additional machinery
|
cleaned just by simple reference counting. For this reason some additional machinery
|
||||||
|
@ -127,7 +127,7 @@ GC for the free-threaded build
|
||||||
------------------------------
|
------------------------------
|
||||||
|
|
||||||
In the free-threaded build, Python objects contain a 1-byte field
|
In the free-threaded build, Python objects contain a 1-byte field
|
||||||
``ob_gc_bits`` that is used to track garbage collection related state. The
|
`ob_gc_bits` that is used to track garbage collection related state. The
|
||||||
field exists in all objects, including ones that do not support cyclic
|
field exists in all objects, including ones that do not support cyclic
|
||||||
garbage collection. The field is used to identify objects that are tracked
|
garbage collection. The field is used to identify objects that are tracked
|
||||||
by the collector, ensure that finalizers are called only once per object,
|
by the collector, ensure that finalizers are called only once per object,
|
||||||
|
@ -146,14 +146,14 @@ and, during garbage collection, differentiate reachable vs. unreachable objects.
|
||||||
| ... |
|
| ... |
|
||||||
```
|
```
|
||||||
|
|
||||||
Note that not all fields are to scale. ``pad`` is two bytes, ``ob_mutex`` and
|
Note that not all fields are to scale. `pad` is two bytes, `ob_mutex` and
|
||||||
``ob_gc_bits`` are each one byte, and ``ob_ref_local`` is four bytes. The
|
`ob_gc_bits` are each one byte, and `ob_ref_local` is four bytes. The
|
||||||
other fields, ``ob_tid``, ``ob_ref_shared``, and ``ob_type``, are all
|
other fields, `ob_tid`, `ob_ref_shared`, and `ob_type`, are all
|
||||||
pointer-sized (that is, eight bytes on a 64-bit platform).
|
pointer-sized (that is, eight bytes on a 64-bit platform).
|
||||||
|
|
||||||
|
|
||||||
The garbage collector also temporarily repurposes the ``ob_tid`` (thread ID)
|
The garbage collector also temporarily repurposes the `ob_tid` (thread ID)
|
||||||
and ``ob_ref_local`` (local reference count) fields for other purposes during
|
and `ob_ref_local` (local reference count) fields for other purposes during
|
||||||
collections.
|
collections.
|
||||||
|
|
||||||
|
|
||||||
|
@ -165,17 +165,17 @@ objects with GC support. These APIs can be found in the
|
||||||
[Garbage Collector C API documentation](https://docs.python.org/3/c-api/gcsupport.html).
|
[Garbage Collector C API documentation](https://docs.python.org/3/c-api/gcsupport.html).
|
||||||
|
|
||||||
Apart from this object structure, the type object for objects supporting garbage
|
Apart from this object structure, the type object for objects supporting garbage
|
||||||
collection must include the ``Py_TPFLAGS_HAVE_GC`` in its ``tp_flags`` slot and
|
collection must include the `Py_TPFLAGS_HAVE_GC` in its `tp_flags` slot and
|
||||||
provide an implementation of the ``tp_traverse`` handler. Unless it can be proven
|
provide an implementation of the `tp_traverse` handler. Unless it can be proven
|
||||||
that the objects cannot form reference cycles with only objects of its type or unless
|
that the objects cannot form reference cycles with only objects of its type or unless
|
||||||
the type is immutable, a ``tp_clear`` implementation must also be provided.
|
the type is immutable, a `tp_clear` implementation must also be provided.
|
||||||
|
|
||||||
|
|
||||||
Identifying reference cycles
|
Identifying reference cycles
|
||||||
============================
|
============================
|
||||||
|
|
||||||
The algorithm that CPython uses to detect those reference cycles is
|
The algorithm that CPython uses to detect those reference cycles is
|
||||||
implemented in the ``gc`` module. The garbage collector **only focuses**
|
implemented in the `gc` module. The garbage collector **only focuses**
|
||||||
on cleaning container objects (that is, objects that can contain a reference
|
on cleaning container objects (that is, objects that can contain a reference
|
||||||
to one or more objects). These can be arrays, dictionaries, lists, custom
|
to one or more objects). These can be arrays, dictionaries, lists, custom
|
||||||
class instances, classes in extension modules, etc. One could think that
|
class instances, classes in extension modules, etc. One could think that
|
||||||
|
@ -195,7 +195,7 @@ the interpreter create cycles everywhere. Some notable examples:
|
||||||
To correctly dispose of these objects once they become unreachable, they need
|
To correctly dispose of these objects once they become unreachable, they need
|
||||||
to be identified first. To understand how the algorithm works, let’s take
|
to be identified first. To understand how the algorithm works, let’s take
|
||||||
the case of a circular linked list which has one link referenced by a
|
the case of a circular linked list which has one link referenced by a
|
||||||
variable ``A``, and one self-referencing object which is completely
|
variable `A`, and one self-referencing object which is completely
|
||||||
unreachable:
|
unreachable:
|
||||||
|
|
||||||
```pycon
|
```pycon
|
||||||
|
@ -234,7 +234,7 @@ objects have a refcount larger than the number of incoming references from
|
||||||
within the candidate set.
|
within the candidate set.
|
||||||
|
|
||||||
Every object that supports garbage collection will have an extra reference
|
Every object that supports garbage collection will have an extra reference
|
||||||
count field initialized to the reference count (``gc_ref`` in the figures)
|
count field initialized to the reference count (`gc_ref` in the figures)
|
||||||
of that object when the algorithm starts. This is because the algorithm needs
|
of that object when the algorithm starts. This is because the algorithm needs
|
||||||
to modify the reference count to do the computations and in this way the
|
to modify the reference count to do the computations and in this way the
|
||||||
interpreter will not modify the real reference count field.
|
interpreter will not modify the real reference count field.
|
||||||
|
@ -243,43 +243,43 @@ interpreter will not modify the real reference count field.
|
||||||
|
|
||||||
The GC then iterates over all containers in the first list and decrements by one the
|
The GC then iterates over all containers in the first list and decrements by one the
|
||||||
`gc_ref` field of any other object that container is referencing. Doing
|
`gc_ref` field of any other object that container is referencing. Doing
|
||||||
this makes use of the ``tp_traverse`` slot in the container class (implemented
|
this makes use of the `tp_traverse` slot in the container class (implemented
|
||||||
using the C API or inherited by a superclass) to know what objects are referenced by
|
using the C API or inherited by a superclass) to know what objects are referenced by
|
||||||
each container. After all the objects have been scanned, only the objects that have
|
each container. After all the objects have been scanned, only the objects that have
|
||||||
references from outside the “objects to scan” list will have ``gc_ref > 0``.
|
references from outside the “objects to scan” list will have `gc_ref > 0`.
|
||||||
|
|
||||||
![gc-image2](images/python-cyclic-gc-2-new-page.png)
|
![gc-image2](images/python-cyclic-gc-2-new-page.png)
|
||||||
|
|
||||||
Notice that having ``gc_ref == 0`` does not imply that the object is unreachable.
|
Notice that having `gc_ref == 0` does not imply that the object is unreachable.
|
||||||
This is because another object that is reachable from the outside (``gc_ref > 0``)
|
This is because another object that is reachable from the outside (`gc_ref > 0`)
|
||||||
can still have references to it. For instance, the ``link_2`` object in our example
|
can still have references to it. For instance, the `link_2` object in our example
|
||||||
ended having ``gc_ref == 0`` but is referenced still by the ``link_1`` object that
|
ended having `gc_ref == 0` but is referenced still by the `link_1` object that
|
||||||
is reachable from the outside. To obtain the set of objects that are really
|
is reachable from the outside. To obtain the set of objects that are really
|
||||||
unreachable, the garbage collector re-scans the container objects using the
|
unreachable, the garbage collector re-scans the container objects using the
|
||||||
``tp_traverse`` slot; this time with a different traverse function that marks objects with
|
`tp_traverse` slot; this time with a different traverse function that marks objects with
|
||||||
``gc_ref == 0`` as "tentatively unreachable" and then moves them to the
|
`gc_ref == 0` as "tentatively unreachable" and then moves them to the
|
||||||
tentatively unreachable list. The following image depicts the state of the lists in a
|
tentatively unreachable list. The following image depicts the state of the lists in a
|
||||||
moment when the GC processed the ``link_3`` and ``link_4`` objects but has not
|
moment when the GC processed the `link_3` and `link_4` objects but has not
|
||||||
processed ``link_1`` and ``link_2`` yet.
|
processed `link_1` and `link_2` yet.
|
||||||
|
|
||||||
![gc-image3](images/python-cyclic-gc-3-new-page.png)
|
![gc-image3](images/python-cyclic-gc-3-new-page.png)
|
||||||
|
|
||||||
Then the GC scans the next ``link_1`` object. Because it has ``gc_ref == 1``,
|
Then the GC scans the next `link_1` object. Because it has `gc_ref == 1`,
|
||||||
the gc does not do anything special because it knows it has to be reachable (and is
|
the gc does not do anything special because it knows it has to be reachable (and is
|
||||||
already in what will become the reachable list):
|
already in what will become the reachable list):
|
||||||
|
|
||||||
![gc-image4](images/python-cyclic-gc-4-new-page.png)
|
![gc-image4](images/python-cyclic-gc-4-new-page.png)
|
||||||
|
|
||||||
When the GC encounters an object which is reachable (``gc_ref > 0``), it traverses
|
When the GC encounters an object which is reachable (`gc_ref > 0`), it traverses
|
||||||
its references using the ``tp_traverse`` slot to find all the objects that are
|
its references using the `tp_traverse` slot to find all the objects that are
|
||||||
reachable from it, moving them to the end of the list of reachable objects (where
|
reachable from it, moving them to the end of the list of reachable objects (where
|
||||||
they started originally) and setting its ``gc_ref`` field to 1. This is what happens
|
they started originally) and setting its `gc_ref` field to 1. This is what happens
|
||||||
to ``link_2`` and ``link_3`` below as they are reachable from ``link_1``. From the
|
to `link_2` and `link_3` below as they are reachable from `link_1`. From the
|
||||||
state in the previous image and after examining the objects referred to by ``link_1``
|
state in the previous image and after examining the objects referred to by `link_1`
|
||||||
the GC knows that ``link_3`` is reachable after all, so it is moved back to the
|
the GC knows that `link_3` is reachable after all, so it is moved back to the
|
||||||
original list and its ``gc_ref`` field is set to 1 so that if the GC visits it again,
|
original list and its `gc_ref` field is set to 1 so that if the GC visits it again,
|
||||||
it will know that it's reachable. To avoid visiting an object twice, the GC marks all
|
it will know that it's reachable. To avoid visiting an object twice, the GC marks all
|
||||||
objects that have already been visited once (by unsetting the ``PREV_MASK_COLLECTING``
|
objects that have already been visited once (by unsetting the `PREV_MASK_COLLECTING`
|
||||||
flag) so that if an object that has already been processed is referenced by some other
|
flag) so that if an object that has already been processed is referenced by some other
|
||||||
object, the GC does not process it twice.
|
object, the GC does not process it twice.
|
||||||
|
|
||||||
|
@ -295,7 +295,7 @@ list are really unreachable and can thus be garbage collected.
|
||||||
Pragmatically, it's important to note that no recursion is required by any of this,
|
Pragmatically, it's important to note that no recursion is required by any of this,
|
||||||
and neither does it in any other way require additional memory proportional to the
|
and neither does it in any other way require additional memory proportional to the
|
||||||
number of objects, number of pointers, or the lengths of pointer chains. Apart from
|
number of objects, number of pointers, or the lengths of pointer chains. Apart from
|
||||||
``O(1)`` storage for internal C needs, the objects themselves contain all the storage
|
`O(1)` storage for internal C needs, the objects themselves contain all the storage
|
||||||
the GC algorithms require.
|
the GC algorithms require.
|
||||||
|
|
||||||
Why moving unreachable objects is better
|
Why moving unreachable objects is better
|
||||||
|
@ -331,7 +331,7 @@ with the objective of completely destroying these objects. Roughly, the process
|
||||||
follows these steps in order:
|
follows these steps in order:
|
||||||
|
|
||||||
1. Handle and clear weak references (if any). Weak references to unreachable objects
|
1. Handle and clear weak references (if any). Weak references to unreachable objects
|
||||||
are set to ``None``. If the weak reference has an associated callback, the callback
|
are set to `None`. If the weak reference has an associated callback, the callback
|
||||||
is enqueued to be called once the clearing of weak references is finished. We only
|
is enqueued to be called once the clearing of weak references is finished. We only
|
||||||
invoke callbacks for weak references that are themselves reachable. If both the weak
|
invoke callbacks for weak references that are themselves reachable. If both the weak
|
||||||
reference and the pointed-to object are unreachable we do not execute the callback.
|
reference and the pointed-to object are unreachable we do not execute the callback.
|
||||||
|
@ -339,15 +339,15 @@ follows these steps in order:
|
||||||
object and support for weak references predates support for object resurrection.
|
object and support for weak references predates support for object resurrection.
|
||||||
Ignoring the weak reference's callback is fine because both the object and the weakref
|
Ignoring the weak reference's callback is fine because both the object and the weakref
|
||||||
are going away, so it's legitimate to say the weak reference is going away first.
|
are going away, so it's legitimate to say the weak reference is going away first.
|
||||||
2. If an object has legacy finalizers (``tp_del`` slot) move it to the
|
2. If an object has legacy finalizers (`tp_del` slot) move it to the
|
||||||
``gc.garbage`` list.
|
`gc.garbage` list.
|
||||||
3. Call the finalizers (``tp_finalize`` slot) and mark the objects as already
|
3. Call the finalizers (`tp_finalize` slot) and mark the objects as already
|
||||||
finalized to avoid calling finalizers twice if the objects are resurrected or
|
finalized to avoid calling finalizers twice if the objects are resurrected or
|
||||||
if other finalizers have removed the object first.
|
if other finalizers have removed the object first.
|
||||||
4. Deal with resurrected objects. If some objects have been resurrected, the GC
|
4. Deal with resurrected objects. If some objects have been resurrected, the GC
|
||||||
finds the new subset of objects that are still unreachable by running the cycle
|
finds the new subset of objects that are still unreachable by running the cycle
|
||||||
detection algorithm again and continues with them.
|
detection algorithm again and continues with them.
|
||||||
5. Call the ``tp_clear`` slot of every object so all internal links are broken and
|
5. Call the `tp_clear` slot of every object so all internal links are broken and
|
||||||
the reference counts fall to 0, triggering the destruction of all unreachable
|
the reference counts fall to 0, triggering the destruction of all unreachable
|
||||||
objects.
|
objects.
|
||||||
|
|
||||||
|
@ -376,9 +376,9 @@ generations. Every collection operates on the entire heap.
|
||||||
|
|
||||||
In order to decide when to run, the collector keeps track of the number of object
|
In order to decide when to run, the collector keeps track of the number of object
|
||||||
allocations and deallocations since the last collection. When the number of
|
allocations and deallocations since the last collection. When the number of
|
||||||
allocations minus the number of deallocations exceeds ``threshold_0``,
|
allocations minus the number of deallocations exceeds `threshold_0`,
|
||||||
collection starts. Initially only generation 0 is examined. If generation 0 has
|
collection starts. Initially only generation 0 is examined. If generation 0 has
|
||||||
been examined more than ``threshold_1`` times since generation 1 has been
|
been examined more than `threshold_1` times since generation 1 has been
|
||||||
examined, then generation 1 is examined as well. With generation 2,
|
examined, then generation 1 is examined as well. With generation 2,
|
||||||
things are a bit more complicated; see
|
things are a bit more complicated; see
|
||||||
[Collecting the oldest generation](#Collecting-the-oldest-generation) for
|
[Collecting the oldest generation](#Collecting-the-oldest-generation) for
|
||||||
|
@ -393,8 +393,8 @@ function:
|
||||||
```
|
```
|
||||||
|
|
||||||
The content of these generations can be examined using the
|
The content of these generations can be examined using the
|
||||||
``gc.get_objects(generation=NUM)`` function and collections can be triggered
|
`gc.get_objects(generation=NUM)` function and collections can be triggered
|
||||||
specifically in a generation by calling ``gc.collect(generation=NUM)``.
|
specifically in a generation by calling `gc.collect(generation=NUM)`.
|
||||||
|
|
||||||
```pycon
|
```pycon
|
||||||
>>> import gc
|
>>> import gc
|
||||||
|
@ -433,7 +433,7 @@ Collecting the oldest generation
|
||||||
--------------------------------
|
--------------------------------
|
||||||
|
|
||||||
In addition to the various configurable thresholds, the GC only triggers a full
|
In addition to the various configurable thresholds, the GC only triggers a full
|
||||||
collection of the oldest generation if the ratio ``long_lived_pending / long_lived_total``
|
collection of the oldest generation if the ratio `long_lived_pending / long_lived_total`
|
||||||
is above a given value (hardwired to 25%). The reason is that, while "non-full"
|
is above a given value (hardwired to 25%). The reason is that, while "non-full"
|
||||||
collections (that is, collections of the young and middle generations) will always
|
collections (that is, collections of the young and middle generations) will always
|
||||||
examine roughly the same number of objects (determined by the aforementioned
|
examine roughly the same number of objects (determined by the aforementioned
|
||||||
|
@ -463,12 +463,12 @@ used for tags or to keep other information – most often as a bit field (each
|
||||||
bit a separate tag) – as long as code that uses the pointer masks out these
|
bit a separate tag) – as long as code that uses the pointer masks out these
|
||||||
bits before accessing memory. For example, on a 32-bit architecture (for both
|
bits before accessing memory. For example, on a 32-bit architecture (for both
|
||||||
addresses and word size), a word is 32 bits = 4 bytes, so word-aligned
|
addresses and word size), a word is 32 bits = 4 bytes, so word-aligned
|
||||||
addresses are always a multiple of 4, hence end in ``00``, leaving the last 2 bits
|
addresses are always a multiple of 4, hence end in `00`, leaving the last 2 bits
|
||||||
available; while on a 64-bit architecture, a word is 64 bits = 8 bytes, so
|
available; while on a 64-bit architecture, a word is 64 bits = 8 bytes, so
|
||||||
word-aligned addresses end in ``000``, leaving the last 3 bits available.
|
word-aligned addresses end in `000`, leaving the last 3 bits available.
|
||||||
|
|
||||||
The CPython GC makes use of two fat pointers that correspond to the extra fields
|
The CPython GC makes use of two fat pointers that correspond to the extra fields
|
||||||
of ``PyGC_Head`` discussed in the `Memory layout and object structure`_ section:
|
of `PyGC_Head` discussed in the `Memory layout and object structure`_ section:
|
||||||
|
|
||||||
> [!WARNING]
|
> [!WARNING]
|
||||||
> Because the presence of extra information, "tagged" or "fat" pointers cannot be
|
> Because the presence of extra information, "tagged" or "fat" pointers cannot be
|
||||||
|
@ -478,23 +478,23 @@ of ``PyGC_Head`` discussed in the `Memory layout and object structure`_ section:
|
||||||
> normally assume the pointers inside the lists are in a consistent state.
|
> normally assume the pointers inside the lists are in a consistent state.
|
||||||
|
|
||||||
|
|
||||||
- The ``_gc_prev`` field is normally used as the "previous" pointer to maintain the
|
- The `_gc_prev` field is normally used as the "previous" pointer to maintain the
|
||||||
doubly linked list but its lowest two bits are used to keep the flags
|
doubly linked list but its lowest two bits are used to keep the flags
|
||||||
``PREV_MASK_COLLECTING`` and ``_PyGC_PREV_MASK_FINALIZED``. Between collections,
|
`PREV_MASK_COLLECTING` and `_PyGC_PREV_MASK_FINALIZED`. Between collections,
|
||||||
the only flag that can be present is ``_PyGC_PREV_MASK_FINALIZED`` that indicates
|
the only flag that can be present is `_PyGC_PREV_MASK_FINALIZED` that indicates
|
||||||
if an object has been already finalized. During collections ``_gc_prev`` is
|
if an object has been already finalized. During collections `_gc_prev` is
|
||||||
temporarily used for storing a copy of the reference count (``gc_ref``), in
|
temporarily used for storing a copy of the reference count (`gc_ref`), in
|
||||||
addition to two flags, and the GC linked list becomes a singly linked list until
|
addition to two flags, and the GC linked list becomes a singly linked list until
|
||||||
``_gc_prev`` is restored.
|
`_gc_prev` is restored.
|
||||||
|
|
||||||
- The ``_gc_next`` field is used as the "next" pointer to maintain the doubly linked
|
- The `_gc_next` field is used as the "next" pointer to maintain the doubly linked
|
||||||
list but during collection its lowest bit is used to keep the
|
list but during collection its lowest bit is used to keep the
|
||||||
``NEXT_MASK_UNREACHABLE`` flag that indicates if an object is tentatively
|
`NEXT_MASK_UNREACHABLE` flag that indicates if an object is tentatively
|
||||||
unreachable during the cycle detection algorithm. This is a drawback to using only
|
unreachable during the cycle detection algorithm. This is a drawback to using only
|
||||||
doubly linked lists to implement partitions: while most needed operations are
|
doubly linked lists to implement partitions: while most needed operations are
|
||||||
constant-time, there is no efficient way to determine which partition an object is
|
constant-time, there is no efficient way to determine which partition an object is
|
||||||
currently in. Instead, when that's needed, ad hoc tricks (like the
|
currently in. Instead, when that's needed, ad hoc tricks (like the
|
||||||
``NEXT_MASK_UNREACHABLE`` flag) are employed.
|
`NEXT_MASK_UNREACHABLE` flag) are employed.
|
||||||
|
|
||||||
Optimization: delay tracking containers
|
Optimization: delay tracking containers
|
||||||
=======================================
|
=======================================
|
||||||
|
@ -531,7 +531,7 @@ benefit from delayed tracking:
|
||||||
full garbage collection (all generations), the collector will untrack any dictionaries
|
full garbage collection (all generations), the collector will untrack any dictionaries
|
||||||
whose contents are not tracked.
|
whose contents are not tracked.
|
||||||
|
|
||||||
The garbage collector module provides the Python function ``is_tracked(obj)``, which returns
|
The garbage collector module provides the Python function `is_tracked(obj)`, which returns
|
||||||
the current tracking status of the object. Subsequent garbage collections may change the
|
the current tracking status of the object. Subsequent garbage collections may change the
|
||||||
tracking status of the object.
|
tracking status of the object.
|
||||||
|
|
||||||
|
@ -556,20 +556,20 @@ Differences between GC implementations
|
||||||
This section summarizes the differences between the GC implementation in the
|
This section summarizes the differences between the GC implementation in the
|
||||||
default build and the implementation in the free-threaded build.
|
default build and the implementation in the free-threaded build.
|
||||||
|
|
||||||
The default build implementation makes extensive use of the ``PyGC_Head`` data
|
The default build implementation makes extensive use of the `PyGC_Head` data
|
||||||
structure, while the free-threaded build implementation does not use that
|
structure, while the free-threaded build implementation does not use that
|
||||||
data structure.
|
data structure.
|
||||||
|
|
||||||
- The default build implementation stores all tracked objects in a doubly
|
- The default build implementation stores all tracked objects in a doubly
|
||||||
linked list using ``PyGC_Head``. The free-threaded build implementation
|
linked list using `PyGC_Head`. The free-threaded build implementation
|
||||||
instead relies on the embedded mimalloc memory allocator to scan the heap
|
instead relies on the embedded mimalloc memory allocator to scan the heap
|
||||||
for tracked objects.
|
for tracked objects.
|
||||||
- The default build implementation uses ``PyGC_Head`` for the unreachable
|
- The default build implementation uses `PyGC_Head` for the unreachable
|
||||||
object list. The free-threaded build implementation repurposes the
|
object list. The free-threaded build implementation repurposes the
|
||||||
``ob_tid`` field to store a unreachable objects linked list.
|
`ob_tid` field to store a unreachable objects linked list.
|
||||||
- The default build implementation stores flags in the ``_gc_prev`` field of
|
- The default build implementation stores flags in the `_gc_prev` field of
|
||||||
``PyGC_Head``. The free-threaded build implementation stores these flags
|
`PyGC_Head`. The free-threaded build implementation stores these flags
|
||||||
in ``ob_gc_bits``.
|
in `ob_gc_bits`.
|
||||||
|
|
||||||
|
|
||||||
The default build implementation relies on the
|
The default build implementation relies on the
|
||||||
|
|
|
@ -9,12 +9,12 @@ Python's Parser is currently a
|
||||||
[`PEG` (Parser Expression Grammar)](https://en.wikipedia.org/wiki/Parsing_expression_grammar)
|
[`PEG` (Parser Expression Grammar)](https://en.wikipedia.org/wiki/Parsing_expression_grammar)
|
||||||
parser. It was introduced in
|
parser. It was introduced in
|
||||||
[PEP 617: New PEG parser for CPython](https://peps.python.org/pep-0617/) to replace
|
[PEP 617: New PEG parser for CPython](https://peps.python.org/pep-0617/) to replace
|
||||||
the original [``LL(1)``](https://en.wikipedia.org/wiki/LL_parser) parser.
|
the original [`LL(1)`](https://en.wikipedia.org/wiki/LL_parser) parser.
|
||||||
|
|
||||||
The code implementing the parser is generated from a grammar definition by a
|
The code implementing the parser is generated from a grammar definition by a
|
||||||
[parser generator](https://en.wikipedia.org/wiki/Compiler-compiler).
|
[parser generator](https://en.wikipedia.org/wiki/Compiler-compiler).
|
||||||
Therefore, changes to the Python language are made by modifying the
|
Therefore, changes to the Python language are made by modifying the
|
||||||
[grammar file](https://github.com/python/cpython/blob/main/Grammar/python.gram).
|
[grammar file](../Grammar/python.gram).
|
||||||
Developers rarely need to modify the generator itself.
|
Developers rarely need to modify the generator itself.
|
||||||
|
|
||||||
See the devguide's [Changing CPython's grammar](https://devguide.python.org/developer-workflow/grammar/#grammar)
|
See the devguide's [Changing CPython's grammar](https://devguide.python.org/developer-workflow/grammar/#grammar)
|
||||||
|
@ -33,9 +33,9 @@ is ordered. This means that when writing:
|
||||||
rule: A | B | C
|
rule: A | B | C
|
||||||
```
|
```
|
||||||
|
|
||||||
a parser that implements a context-free-grammar (such as an ``LL(1)`` parser) will
|
a parser that implements a context-free-grammar (such as an `LL(1)` parser) will
|
||||||
generate constructions that, given an input string, *deduce* which alternative
|
generate constructions that, given an input string, *deduce* which alternative
|
||||||
(``A``, ``B`` or ``C``) must be expanded. On the other hand, a PEG parser will
|
(`A`, `B` or `C`) must be expanded. On the other hand, a PEG parser will
|
||||||
check each alternative, in the order in which they are specified, and select
|
check each alternative, in the order in which they are specified, and select
|
||||||
that first one that succeeds.
|
that first one that succeeds.
|
||||||
|
|
||||||
|
@ -67,21 +67,21 @@ time complexity with a technique called
|
||||||
which not only loads the entire program in memory before parsing it but also
|
which not only loads the entire program in memory before parsing it but also
|
||||||
allows the parser to backtrack arbitrarily. This is made efficient by memoizing
|
allows the parser to backtrack arbitrarily. This is made efficient by memoizing
|
||||||
the rules already matched for each position. The cost of the memoization cache
|
the rules already matched for each position. The cost of the memoization cache
|
||||||
is that the parser will naturally use more memory than a simple ``LL(1)`` parser,
|
is that the parser will naturally use more memory than a simple `LL(1)` parser,
|
||||||
which normally are table-based.
|
which normally are table-based.
|
||||||
|
|
||||||
|
|
||||||
Key ideas
|
Key ideas
|
||||||
---------
|
---------
|
||||||
|
|
||||||
- Alternatives are ordered ( ``A | B`` is not the same as ``B | A`` ).
|
- Alternatives are ordered ( `A | B` is not the same as `B | A` ).
|
||||||
- If a rule returns a failure, it doesn't mean that the parsing has failed,
|
- If a rule returns a failure, it doesn't mean that the parsing has failed,
|
||||||
it just means "try something else".
|
it just means "try something else".
|
||||||
- By default PEG parsers run in exponential time, which can be optimized to linear by
|
- By default PEG parsers run in exponential time, which can be optimized to linear by
|
||||||
using memoization.
|
using memoization.
|
||||||
- If parsing fails completely (no rule succeeds in parsing all the input text), the
|
- If parsing fails completely (no rule succeeds in parsing all the input text), the
|
||||||
PEG parser doesn't have a concept of "where the
|
PEG parser doesn't have a concept of "where the
|
||||||
[``SyntaxError``](https://docs.python.org/3/library/exceptions.html#SyntaxError) is".
|
[`SyntaxError`](https://docs.python.org/3/library/exceptions.html#SyntaxError) is".
|
||||||
|
|
||||||
|
|
||||||
> [!IMPORTANT]
|
> [!IMPORTANT]
|
||||||
|
@ -111,16 +111,16 @@ the following two rules (in these examples, a token is an individual character):
|
||||||
second_rule: ('aa' | 'a' ) 'a'
|
second_rule: ('aa' | 'a' ) 'a'
|
||||||
```
|
```
|
||||||
|
|
||||||
In a regular EBNF grammar, both rules specify the language ``{aa, aaa}`` but
|
In a regular EBNF grammar, both rules specify the language `{aa, aaa}` but
|
||||||
in PEG, one of these two rules accepts the string ``aaa`` but not the string
|
in PEG, one of these two rules accepts the string `aaa` but not the string
|
||||||
``aa``. The other does the opposite -- it accepts the string ``aa``
|
`aa`. The other does the opposite -- it accepts the string `aa`
|
||||||
but not the string ``aaa``. The rule ``('a'|'aa')'a'`` does
|
but not the string `aaa`. The rule `('a'|'aa')'a'` does
|
||||||
not accept ``aaa`` because ``'a'|'aa'`` consumes the first ``a``, letting the
|
not accept `aaa` because `'a'|'aa'` consumes the first `a`, letting the
|
||||||
final ``a`` in the rule consume the second, and leaving out the third ``a``.
|
final `a` in the rule consume the second, and leaving out the third `a`.
|
||||||
As the rule has succeeded, no attempt is ever made to go back and let
|
As the rule has succeeded, no attempt is ever made to go back and let
|
||||||
``'a'|'aa'`` try the second alternative. The expression ``('aa'|'a')'a'`` does
|
`'a'|'aa'` try the second alternative. The expression `('aa'|'a')'a'` does
|
||||||
not accept ``aa`` because ``'aa'|'a'`` accepts all of ``aa``, leaving nothing
|
not accept `aa` because `'aa'|'a'` accepts all of `aa`, leaving nothing
|
||||||
for the final ``a``. Again, the second alternative of ``'aa'|'a'`` is not
|
for the final `a`. Again, the second alternative of `'aa'|'a'` is not
|
||||||
tried.
|
tried.
|
||||||
|
|
||||||
> [!CAUTION]
|
> [!CAUTION]
|
||||||
|
@ -137,7 +137,7 @@ one is in almost all cases a mistake, for example:
|
||||||
```
|
```
|
||||||
|
|
||||||
In this example, the second alternative will never be tried because the first one will
|
In this example, the second alternative will never be tried because the first one will
|
||||||
succeed first (even if the input string has an ``'else' block`` that follows). To correctly
|
succeed first (even if the input string has an `'else' block` that follows). To correctly
|
||||||
write this rule you can simply alter the order:
|
write this rule you can simply alter the order:
|
||||||
|
|
||||||
```
|
```
|
||||||
|
@ -146,7 +146,7 @@ write this rule you can simply alter the order:
|
||||||
| 'if' expression 'then' block
|
| 'if' expression 'then' block
|
||||||
```
|
```
|
||||||
|
|
||||||
In this case, if the input string doesn't have an ``'else' block``, the first alternative
|
In this case, if the input string doesn't have an `'else' block`, the first alternative
|
||||||
will fail and the second will be attempted.
|
will fail and the second will be attempted.
|
||||||
|
|
||||||
Grammar Syntax
|
Grammar Syntax
|
||||||
|
@ -166,8 +166,8 @@ the rule:
|
||||||
rule_name[return_type]: expression
|
rule_name[return_type]: expression
|
||||||
```
|
```
|
||||||
|
|
||||||
If the return type is omitted, then a ``void *`` is returned in C and an
|
If the return type is omitted, then a `void *` is returned in C and an
|
||||||
``Any`` in Python.
|
`Any` in Python.
|
||||||
|
|
||||||
Grammar expressions
|
Grammar expressions
|
||||||
-------------------
|
-------------------
|
||||||
|
@ -214,7 +214,7 @@ Variables in the grammar
|
||||||
------------------------
|
------------------------
|
||||||
|
|
||||||
A sub-expression can be named by preceding it with an identifier and an
|
A sub-expression can be named by preceding it with an identifier and an
|
||||||
``=`` sign. The name can then be used in the action (see below), like this:
|
`=` sign. The name can then be used in the action (see below), like this:
|
||||||
|
|
||||||
```
|
```
|
||||||
rule_name[return_type]: '(' a=some_other_rule ')' { a }
|
rule_name[return_type]: '(' a=some_other_rule ')' { a }
|
||||||
|
@ -387,9 +387,9 @@ returns a valid C-based Python AST:
|
||||||
| NUMBER
|
| NUMBER
|
||||||
```
|
```
|
||||||
|
|
||||||
Here ``EXTRA`` is a macro that expands to ``start_lineno, start_col_offset,
|
Here `EXTRA` is a macro that expands to `start_lineno, start_col_offset,
|
||||||
end_lineno, end_col_offset, p->arena``, those being variables automatically
|
end_lineno, end_col_offset, p->arena`, those being variables automatically
|
||||||
injected by the parser; ``p`` points to an object that holds on to all state
|
injected by the parser; `p` points to an object that holds on to all state
|
||||||
for the parser.
|
for the parser.
|
||||||
|
|
||||||
A similar grammar written to target Python AST objects:
|
A similar grammar written to target Python AST objects:
|
||||||
|
@ -422,50 +422,47 @@ Pegen
|
||||||
|
|
||||||
Pegen is the parser generator used in CPython to produce the final PEG parser
|
Pegen is the parser generator used in CPython to produce the final PEG parser
|
||||||
used by the interpreter. It is the program that can be used to read the python
|
used by the interpreter. It is the program that can be used to read the python
|
||||||
grammar located in
|
grammar located in [`Grammar/python.gram`](../Grammar/python.gram) and produce
|
||||||
[`Grammar/python.gram`](https://github.com/python/cpython/blob/main/Grammar/python.gram)
|
the final C parser. It contains the following pieces:
|
||||||
and produce the final C parser. It contains the following pieces:
|
|
||||||
|
|
||||||
- A parser generator that can read a grammar file and produce a PEG parser
|
- A parser generator that can read a grammar file and produce a PEG parser
|
||||||
written in Python or C that can parse said grammar. The generator is located at
|
written in Python or C that can parse said grammar. The generator is located at
|
||||||
[`Tools/peg_generator/pegen`](https://github.com/python/cpython/blob/main/Tools/peg_generator/pegen).
|
[`Tools/peg_generator/pegen`](../Tools/peg_generator/pegen).
|
||||||
- A PEG meta-grammar that automatically generates a Python parser which is used
|
- A PEG meta-grammar that automatically generates a Python parser which is used
|
||||||
for the parser generator itself (this means that there are no manually-written
|
for the parser generator itself (this means that there are no manually-written
|
||||||
parsers). The meta-grammar is located at
|
parsers). The meta-grammar is located at
|
||||||
[`Tools/peg_generator/pegen/metagrammar.gram`](https://github.com/python/cpython/blob/main/Tools/peg_generator/pegen/metagrammar.gram).
|
[`Tools/peg_generator/pegen/metagrammar.gram`](../Tools/peg_generator/pegen/metagrammar.gram).
|
||||||
- A generated parser (using the parser generator) that can directly produce C and Python AST objects.
|
- A generated parser (using the parser generator) that can directly produce C and Python AST objects.
|
||||||
|
|
||||||
The source code for Pegen lives at
|
The source code for Pegen lives at [`Tools/peg_generator/pegen`](../Tools/peg_generator/pegen)
|
||||||
[`Tools/peg_generator/pegen`](https://github.com/python/cpython/blob/main/Tools/peg_generator/pegen)
|
|
||||||
but normally all typical commands to interact with the parser generator are executed from
|
but normally all typical commands to interact with the parser generator are executed from
|
||||||
the main makefile.
|
the main makefile.
|
||||||
|
|
||||||
How to regenerate the parser
|
How to regenerate the parser
|
||||||
----------------------------
|
----------------------------
|
||||||
|
|
||||||
Once you have made the changes to the grammar files, to regenerate the ``C``
|
Once you have made the changes to the grammar files, to regenerate the `C`
|
||||||
parser (the one used by the interpreter) just execute:
|
parser (the one used by the interpreter) just execute:
|
||||||
|
|
||||||
```
|
```
|
||||||
make regen-pegen
|
make regen-pegen
|
||||||
```
|
```
|
||||||
|
|
||||||
using the ``Makefile`` in the main directory. If you are on Windows you can
|
using the `Makefile` in the main directory. If you are on Windows you can
|
||||||
use the Visual Studio project files to regenerate the parser or to execute:
|
use the Visual Studio project files to regenerate the parser or to execute:
|
||||||
|
|
||||||
```
|
```
|
||||||
./PCbuild/build.bat --regen
|
./PCbuild/build.bat --regen
|
||||||
```
|
```
|
||||||
|
|
||||||
The generated parser file is located at
|
The generated parser file is located at [`Parser/parser.c`](../Parser/parser.c).
|
||||||
[`Parser/parser.c`](https://github.com/python/cpython/blob/main/Parser/parser.c).
|
|
||||||
|
|
||||||
How to regenerate the meta-parser
|
How to regenerate the meta-parser
|
||||||
---------------------------------
|
---------------------------------
|
||||||
|
|
||||||
The meta-grammar (the grammar that describes the grammar for the grammar files
|
The meta-grammar (the grammar that describes the grammar for the grammar files
|
||||||
themselves) is located at
|
themselves) is located at
|
||||||
[`Tools/peg_generator/pegen/metagrammar.gram`](https://github.com/python/cpython/blob/main/Tools/peg_generator/pegen/metagrammar.gram).
|
[`Tools/peg_generator/pegen/metagrammar.gram`](../Tools/peg_generator/pegen/metagrammar.gram).
|
||||||
Although it is very unlikely that you will ever need to modify it, if you make
|
Although it is very unlikely that you will ever need to modify it, if you make
|
||||||
any modifications to this file (in order to implement new Pegen features) you will
|
any modifications to this file (in order to implement new Pegen features) you will
|
||||||
need to regenerate the meta-parser (the parser that parses the grammar files).
|
need to regenerate the meta-parser (the parser that parses the grammar files).
|
||||||
|
@ -488,11 +485,11 @@ Grammatical elements and rules
|
||||||
|
|
||||||
Pegen has some special grammatical elements and rules:
|
Pegen has some special grammatical elements and rules:
|
||||||
|
|
||||||
- Strings with single quotes (') (for example, ``'class'``) denote KEYWORDS.
|
- Strings with single quotes (') (for example, `'class'`) denote KEYWORDS.
|
||||||
- Strings with double quotes (") (for example, ``"match"``) denote SOFT KEYWORDS.
|
- Strings with double quotes (") (for example, `"match"`) denote SOFT KEYWORDS.
|
||||||
- Uppercase names (for example, ``NAME``) denote tokens in the
|
- Uppercase names (for example, `NAME`) denote tokens in the
|
||||||
[`Grammar/Tokens`](https://github.com/python/cpython/blob/main/Grammar/Tokens) file.
|
[`Grammar/Tokens`](../Grammar/Tokens) file.
|
||||||
- Rule names starting with ``invalid_`` are used for specialized syntax errors.
|
- Rule names starting with `invalid_` are used for specialized syntax errors.
|
||||||
|
|
||||||
- These rules are NOT used in the first pass of the parser.
|
- These rules are NOT used in the first pass of the parser.
|
||||||
- Only if the first pass fails to parse, a second pass including the invalid
|
- Only if the first pass fails to parse, a second pass including the invalid
|
||||||
|
@ -509,14 +506,13 @@ Tokenization
|
||||||
It is common among PEG parser frameworks that the parser does both the parsing
|
It is common among PEG parser frameworks that the parser does both the parsing
|
||||||
and the tokenization, but this does not happen in Pegen. The reason is that the
|
and the tokenization, but this does not happen in Pegen. The reason is that the
|
||||||
Python language needs a custom tokenizer to handle things like indentation
|
Python language needs a custom tokenizer to handle things like indentation
|
||||||
boundaries, some special keywords like ``ASYNC`` and ``AWAIT`` (for
|
boundaries, some special keywords like `ASYNC` and `AWAIT` (for
|
||||||
compatibility purposes), backtracking errors (such as unclosed parenthesis),
|
compatibility purposes), backtracking errors (such as unclosed parenthesis),
|
||||||
dealing with encoding, interactive mode and much more. Some of these reasons
|
dealing with encoding, interactive mode and much more. Some of these reasons
|
||||||
are also there for historical purposes, and some others are useful even today.
|
are also there for historical purposes, and some others are useful even today.
|
||||||
|
|
||||||
The list of tokens (all uppercase names in the grammar) that you can use can
|
The list of tokens (all uppercase names in the grammar) that you can use can
|
||||||
be found in thei
|
be found in the [`Grammar/Tokens`](../Grammar/Tokens)
|
||||||
[`Grammar/Tokens`](https://github.com/python/cpython/blob/main/Grammar/Tokens)
|
|
||||||
file. If you change this file to add new tokens, make sure to regenerate the
|
file. If you change this file to add new tokens, make sure to regenerate the
|
||||||
files by executing:
|
files by executing:
|
||||||
|
|
||||||
|
@ -532,9 +528,7 @@ the tokens or to execute:
|
||||||
```
|
```
|
||||||
|
|
||||||
How tokens are generated and the rules governing this are completely up to the tokenizer
|
How tokens are generated and the rules governing this are completely up to the tokenizer
|
||||||
([`Parser/lexer`](https://github.com/python/cpython/blob/main/Parser/lexer)
|
([`Parser/lexer`](../Parser/lexer) and [`Parser/tokenizer`](../Parser/tokenizer));
|
||||||
and
|
|
||||||
[`Parser/tokenizer`](https://github.com/python/cpython/blob/main/Parser/tokenizer));
|
|
||||||
the parser just receives tokens from it.
|
the parser just receives tokens from it.
|
||||||
|
|
||||||
Memoization
|
Memoization
|
||||||
|
@ -548,7 +542,7 @@ both in memory and time. Although the memory cost is obvious (the parser needs
|
||||||
memory for storing previous results in the cache) the execution time cost comes
|
memory for storing previous results in the cache) the execution time cost comes
|
||||||
for continuously checking if the given rule has a cache hit or not. In many
|
for continuously checking if the given rule has a cache hit or not. In many
|
||||||
situations, just parsing it again can be faster. Pegen **disables memoization
|
situations, just parsing it again can be faster. Pegen **disables memoization
|
||||||
by default** except for rules with the special marker ``memo`` after the rule
|
by default** except for rules with the special marker `memo` after the rule
|
||||||
name (and type, if present):
|
name (and type, if present):
|
||||||
|
|
||||||
```
|
```
|
||||||
|
@ -567,8 +561,7 @@ To determine whether a new rule needs memoization or not, benchmarking is requir
|
||||||
(comparing execution times and memory usage of some considerably large files with
|
(comparing execution times and memory usage of some considerably large files with
|
||||||
and without memoization). There is a very simple instrumentation API available
|
and without memoization). There is a very simple instrumentation API available
|
||||||
in the generated C parse code that allows to measure how much each rule uses
|
in the generated C parse code that allows to measure how much each rule uses
|
||||||
memoization (check the
|
memoization (check the [`Parser/pegen.c`](../Parser/pegen.c)
|
||||||
[`Parser/pegen.c`](https://github.com/python/cpython/blob/main/Parser/pegen.c)
|
|
||||||
file for more information) but it needs to be manually activated.
|
file for more information) but it needs to be manually activated.
|
||||||
|
|
||||||
Automatic variables
|
Automatic variables
|
||||||
|
@ -578,9 +571,9 @@ To make writing actions easier, Pegen injects some automatic variables in the
|
||||||
namespace available when writing actions. In the C parser, some of these
|
namespace available when writing actions. In the C parser, some of these
|
||||||
automatic variable names are:
|
automatic variable names are:
|
||||||
|
|
||||||
- ``p``: The parser structure.
|
- `p`: The parser structure.
|
||||||
- ``EXTRA``: This is a macro that expands to
|
- `EXTRA`: This is a macro that expands to
|
||||||
``(_start_lineno, _start_col_offset, _end_lineno, _end_col_offset, p->arena)``,
|
`(_start_lineno, _start_col_offset, _end_lineno, _end_col_offset, p->arena)`,
|
||||||
which is normally used to create AST nodes as almost all constructors need these
|
which is normally used to create AST nodes as almost all constructors need these
|
||||||
attributes to be provided. All of the location variables are taken from the
|
attributes to be provided. All of the location variables are taken from the
|
||||||
location information of the current token.
|
location information of the current token.
|
||||||
|
@ -590,13 +583,13 @@ Hard and soft keywords
|
||||||
|
|
||||||
> [!NOTE]
|
> [!NOTE]
|
||||||
> In the grammar files, keywords are defined using **single quotes** (for example,
|
> In the grammar files, keywords are defined using **single quotes** (for example,
|
||||||
> ``'class'``) while soft keywords are defined using **double quotes** (for example,
|
> `'class'`) while soft keywords are defined using **double quotes** (for example,
|
||||||
> ``"match"``).
|
> `"match"`).
|
||||||
|
|
||||||
There are two kinds of keywords allowed in pegen grammars: *hard* and *soft*
|
There are two kinds of keywords allowed in pegen grammars: *hard* and *soft*
|
||||||
keywords. The difference between hard and soft keywords is that hard keywords
|
keywords. The difference between hard and soft keywords is that hard keywords
|
||||||
are always reserved words, even in positions where they make no sense
|
are always reserved words, even in positions where they make no sense
|
||||||
(for example, ``x = class + 1``), while soft keywords only get a special
|
(for example, `x = class + 1`), while soft keywords only get a special
|
||||||
meaning in context. Trying to use a hard keyword as a variable will always
|
meaning in context. Trying to use a hard keyword as a variable will always
|
||||||
fail:
|
fail:
|
||||||
|
|
||||||
|
@ -621,7 +614,7 @@ one where they are defined as keywords:
|
||||||
>>> foo(match="Yeah!")
|
>>> foo(match="Yeah!")
|
||||||
```
|
```
|
||||||
|
|
||||||
The ``match`` and ``case`` keywords are soft keywords, so that they are
|
The `match` and `case` keywords are soft keywords, so that they are
|
||||||
recognized as keywords at the beginning of a match statement or case block
|
recognized as keywords at the beginning of a match statement or case block
|
||||||
respectively, but are allowed to be used in other places as variable or
|
respectively, but are allowed to be used in other places as variable or
|
||||||
argument names.
|
argument names.
|
||||||
|
@ -662,7 +655,7 @@ is, and it will unwind the stack and report the exception. This means that if a
|
||||||
[rule action](#grammar-actions) raises an exception, all parsing will
|
[rule action](#grammar-actions) raises an exception, all parsing will
|
||||||
stop at that exact point. This is done to allow to correctly propagate any
|
stop at that exact point. This is done to allow to correctly propagate any
|
||||||
exception set by calling Python's C API functions. This also includes
|
exception set by calling Python's C API functions. This also includes
|
||||||
[``SyntaxError``](https://docs.python.org/3/library/exceptions.html#SyntaxError)
|
[`SyntaxError`](https://docs.python.org/3/library/exceptions.html#SyntaxError)
|
||||||
exceptions and it is the main mechanism the parser uses to report custom syntax
|
exceptions and it is the main mechanism the parser uses to report custom syntax
|
||||||
error messages.
|
error messages.
|
||||||
|
|
||||||
|
@ -684,10 +677,10 @@ grammar.
|
||||||
To report generic syntax errors, pegen uses a common heuristic in PEG parsers:
|
To report generic syntax errors, pegen uses a common heuristic in PEG parsers:
|
||||||
the location of *generic* syntax errors is reported to be the furthest token that
|
the location of *generic* syntax errors is reported to be the furthest token that
|
||||||
was attempted to be matched but failed. This is only done if parsing has failed
|
was attempted to be matched but failed. This is only done if parsing has failed
|
||||||
(the parser returns ``NULL`` in C or ``None`` in Python) but no exception has
|
(the parser returns `NULL` in C or `None` in Python) but no exception has
|
||||||
been raised.
|
been raised.
|
||||||
|
|
||||||
As the Python grammar was primordially written as an ``LL(1)`` grammar, this heuristic
|
As the Python grammar was primordially written as an `LL(1)` grammar, this heuristic
|
||||||
has an extremely high success rate, but some PEG features, such as lookaheads,
|
has an extremely high success rate, but some PEG features, such as lookaheads,
|
||||||
can impact this.
|
can impact this.
|
||||||
|
|
||||||
|
@ -699,19 +692,19 @@ can impact this.
|
||||||
To generate more precise syntax errors, custom rules are used. This is a common
|
To generate more precise syntax errors, custom rules are used. This is a common
|
||||||
practice also in context free grammars: the parser will try to accept some
|
practice also in context free grammars: the parser will try to accept some
|
||||||
construct that is known to be incorrect just to report a specific syntax error
|
construct that is known to be incorrect just to report a specific syntax error
|
||||||
for that construct. In pegen grammars, these rules start with the ``invalid_``
|
for that construct. In pegen grammars, these rules start with the `invalid_`
|
||||||
prefix. This is because trying to match these rules normally has a performance
|
prefix. This is because trying to match these rules normally has a performance
|
||||||
impact on parsing (and can also affect the 'correct' grammar itself in some
|
impact on parsing (and can also affect the 'correct' grammar itself in some
|
||||||
tricky cases, depending on the ordering of the rules) so the generated parser
|
tricky cases, depending on the ordering of the rules) so the generated parser
|
||||||
acts in two phases:
|
acts in two phases:
|
||||||
|
|
||||||
1. The first phase will try to parse the input stream without taking into
|
1. The first phase will try to parse the input stream without taking into
|
||||||
account rules that start with the ``invalid_`` prefix. If the parsing
|
account rules that start with the `invalid_` prefix. If the parsing
|
||||||
succeeds it will return the generated AST and the second phase will be
|
succeeds it will return the generated AST and the second phase will be
|
||||||
skipped.
|
skipped.
|
||||||
|
|
||||||
2. If the first phase failed, a second parsing attempt is done including the
|
2. If the first phase failed, a second parsing attempt is done including the
|
||||||
rules that start with an ``invalid_`` prefix. By design this attempt
|
rules that start with an `invalid_` prefix. By design this attempt
|
||||||
**cannot succeed** and is only executed to give to the invalid rules a
|
**cannot succeed** and is only executed to give to the invalid rules a
|
||||||
chance to detect specific situations where custom, more precise, syntax
|
chance to detect specific situations where custom, more precise, syntax
|
||||||
errors can be raised. This also allows to trade a bit of performance for
|
errors can be raised. This also allows to trade a bit of performance for
|
||||||
|
@ -723,15 +716,15 @@ acts in two phases:
|
||||||
> When defining invalid rules:
|
> When defining invalid rules:
|
||||||
>
|
>
|
||||||
> - Make sure all custom invalid rules raise
|
> - Make sure all custom invalid rules raise
|
||||||
> [``SyntaxError``](https://docs.python.org/3/library/exceptions.html#SyntaxError)
|
> [`SyntaxError`](https://docs.python.org/3/library/exceptions.html#SyntaxError)
|
||||||
> exceptions (or a subclass of it).
|
> exceptions (or a subclass of it).
|
||||||
> - Make sure **all** invalid rules start with the ``invalid_`` prefix to not
|
> - Make sure **all** invalid rules start with the `invalid_` prefix to not
|
||||||
> impact performance of parsing correct Python code.
|
> impact performance of parsing correct Python code.
|
||||||
> - Make sure the parser doesn't behave differently for regular rules when you introduce invalid rules
|
> - Make sure the parser doesn't behave differently for regular rules when you introduce invalid rules
|
||||||
> (see the [how PEG parsers work](#how-peg-parsers-work) section for more information).
|
> (see the [how PEG parsers work](#how-peg-parsers-work) section for more information).
|
||||||
|
|
||||||
You can find a collection of macros to raise specialized syntax errors in the
|
You can find a collection of macros to raise specialized syntax errors in the
|
||||||
[`Parser/pegen.h`](https://github.com/python/cpython/blob/main/Parser/pegen.h)
|
[`Parser/pegen.h`](../Parser/pegen.h)
|
||||||
header file. These macros allow also to report ranges for
|
header file. These macros allow also to report ranges for
|
||||||
the custom errors, which will be highlighted in the tracebacks that will be
|
the custom errors, which will be highlighted in the tracebacks that will be
|
||||||
displayed when the error is reported.
|
displayed when the error is reported.
|
||||||
|
@ -746,35 +739,33 @@ displayed when the error is reported.
|
||||||
<valid python code> $ 42
|
<valid python code> $ 42
|
||||||
```
|
```
|
||||||
|
|
||||||
should trigger the syntax error in the ``$`` character. If your rule is not correctly defined this
|
should trigger the syntax error in the `$` character. If your rule is not correctly defined this
|
||||||
won't happen. As another example, suppose that you try to define a rule to match Python 2 style
|
won't happen. As another example, suppose that you try to define a rule to match Python 2 style
|
||||||
``print`` statements in order to create a better error message and you define it as:
|
`print` statements in order to create a better error message and you define it as:
|
||||||
|
|
||||||
```
|
```
|
||||||
invalid_print: "print" expression
|
invalid_print: "print" expression
|
||||||
```
|
```
|
||||||
|
|
||||||
This will **seem** to work because the parser will correctly parse ``print(something)`` because it is valid
|
This will **seem** to work because the parser will correctly parse `print(something)` because it is valid
|
||||||
code and the second phase will never execute but if you try to parse ``print(something) $ 3`` the first pass
|
code and the second phase will never execute but if you try to parse `print(something) $ 3` the first pass
|
||||||
of the parser will fail (because of the ``$``) and in the second phase, the rule will match the
|
of the parser will fail (because of the `$`) and in the second phase, the rule will match the
|
||||||
``print(something)`` as ``print`` followed by the variable ``something`` between parentheses and the error
|
`print(something)` as `print` followed by the variable `something` between parentheses and the error
|
||||||
will be reported there instead of the ``$`` character.
|
will be reported there instead of the `$` character.
|
||||||
|
|
||||||
Generating AST objects
|
Generating AST objects
|
||||||
----------------------
|
----------------------
|
||||||
|
|
||||||
The output of the C parser used by CPython, which is generated from the
|
The output of the C parser used by CPython, which is generated from the
|
||||||
[grammar file](https://github.com/python/cpython/blob/main/Grammar/python.gram),
|
[grammar file](../Grammar/python.gram), is a Python AST object (using C
|
||||||
is a Python AST object (using C structures). This means that the actions in the
|
structures). This means that the actions in the grammar file generate AST
|
||||||
grammar file generate AST objects when they succeed. Constructing these objects
|
objects when they succeed. Constructing these objects can be quite cumbersome
|
||||||
can be quite cumbersome (see the [AST compiler section](compiler.md#abstract-syntax-trees-ast)
|
(see the [AST compiler section](compiler.md#abstract-syntax-trees-ast)
|
||||||
for more information on how these objects are constructed and how they are used
|
for more information on how these objects are constructed and how they are used
|
||||||
by the compiler), so special helper functions are used. These functions are
|
by the compiler), so special helper functions are used. These functions are
|
||||||
declared in the
|
declared in the [`Parser/pegen.h`](../Parser/pegen.h) header file and defined
|
||||||
[`Parser/pegen.h`](https://github.com/python/cpython/blob/main/Parser/pegen.h)
|
in the [`Parser/action_helpers.c`](../Parser/action_helpers.c) file. The
|
||||||
header file and defined in the
|
helpers include functions that join AST sequences, get specific elements
|
||||||
[`Parser/action_helpers.c`](https://github.com/python/cpython/blob/main/Parser/action_helpers.c)
|
|
||||||
file. The helpers include functions that join AST sequences, get specific elements
|
|
||||||
from them or to perform extra processing on the generated tree.
|
from them or to perform extra processing on the generated tree.
|
||||||
|
|
||||||
|
|
||||||
|
@ -788,11 +779,9 @@ from them or to perform extra processing on the generated tree.
|
||||||
|
|
||||||
As a general rule, if an action spawns multiple lines or requires something more
|
As a general rule, if an action spawns multiple lines or requires something more
|
||||||
complicated than a single expression of C code, is normally better to create a
|
complicated than a single expression of C code, is normally better to create a
|
||||||
custom helper in
|
custom helper in [`Parser/action_helpers.c`](../Parser/action_helpers.c)
|
||||||
[`Parser/action_helpers.c`](https://github.com/python/cpython/blob/main/Parser/action_helpers.c)
|
and expose it in the [`Parser/pegen.h`](../Parser/pegen.h) header file so that
|
||||||
and expose it in the
|
it can be used from the grammar.
|
||||||
[`Parser/pegen.h`](https://github.com/python/cpython/blob/main/Parser/pegen.h)
|
|
||||||
header file so that it can be used from the grammar.
|
|
||||||
|
|
||||||
When parsing succeeds, the parser **must** return a **valid** AST object.
|
When parsing succeeds, the parser **must** return a **valid** AST object.
|
||||||
|
|
||||||
|
@ -801,16 +790,15 @@ Testing
|
||||||
|
|
||||||
There are three files that contain tests for the grammar and the parser:
|
There are three files that contain tests for the grammar and the parser:
|
||||||
|
|
||||||
- [test_grammar.py](https://github.com/python/cpython/blob/main/Lib/test/test_grammar.py)
|
- [test_grammar.py](../Lib/test/test_grammar.py)
|
||||||
- [test_syntax.py](https://github.com/python/cpython/blob/main/Lib/test/test_syntax.py)
|
- [test_syntax.py](../Lib/test/test_syntax.py)
|
||||||
- [test_exceptions.py](https://github.com/python/cpython/blob/main/Lib/test/test_exceptions.py)
|
- [test_exceptions.py](../Lib/test/test_exceptions.py)
|
||||||
|
|
||||||
Check the contents of these files to know which is the best place for new tests, depending
|
Check the contents of these files to know which is the best place for new
|
||||||
on the nature of the new feature you are adding.
|
tests, depending on the nature of the new feature you are adding.
|
||||||
|
|
||||||
Tests for the parser generator itself can be found in the
|
Tests for the parser generator itself can be found in the
|
||||||
[test_peg_generator](https://github.com/python/cpython/blob/main/Lib/test_peg_generator)
|
[test_peg_generator](../Lib/test_peg_generator) directory.
|
||||||
directory.
|
|
||||||
|
|
||||||
|
|
||||||
Debugging generated parsers
|
Debugging generated parsers
|
||||||
|
@ -825,33 +813,32 @@ correctly compile and execute Python anymore. This makes it a bit challenging
|
||||||
to debug when something goes wrong, especially when experimenting.
|
to debug when something goes wrong, especially when experimenting.
|
||||||
|
|
||||||
For this reason it is a good idea to experiment first by generating a Python
|
For this reason it is a good idea to experiment first by generating a Python
|
||||||
parser. To do this, you can go to the
|
parser. To do this, you can go to the [Tools/peg_generator](../Tools/peg_generator)
|
||||||
[Tools/peg_generator](https://github.com/python/cpython/blob/main/Tools/peg_generator)
|
|
||||||
directory on the CPython repository and manually call the parser generator by executing:
|
directory on the CPython repository and manually call the parser generator by executing:
|
||||||
|
|
||||||
```
|
```
|
||||||
$ python -m pegen python <PATH TO YOUR GRAMMAR FILE>
|
$ python -m pegen python <PATH TO YOUR GRAMMAR FILE>
|
||||||
```
|
```
|
||||||
|
|
||||||
This will generate a file called ``parse.py`` in the same directory that you
|
This will generate a file called `parse.py` in the same directory that you
|
||||||
can use to parse some input:
|
can use to parse some input:
|
||||||
|
|
||||||
```
|
```
|
||||||
$ python parse.py file_with_source_code_to_test.py
|
$ python parse.py file_with_source_code_to_test.py
|
||||||
```
|
```
|
||||||
|
|
||||||
As the generated ``parse.py`` file is just Python code, you can modify it
|
As the generated `parse.py` file is just Python code, you can modify it
|
||||||
and add breakpoints to debug or better understand some complex situations.
|
and add breakpoints to debug or better understand some complex situations.
|
||||||
|
|
||||||
|
|
||||||
Verbose mode
|
Verbose mode
|
||||||
------------
|
------------
|
||||||
|
|
||||||
When Python is compiled in debug mode (by adding ``--with-pydebug`` when
|
When Python is compiled in debug mode (by adding `--with-pydebug` when
|
||||||
running the configure step in Linux or by adding ``-d`` when calling the
|
running the configure step in Linux or by adding `-d` when calling the
|
||||||
[PCbuild/build.bat](https://github.com/python/cpython/blob/main/PCbuild/build.bat)),
|
[PCbuild/build.bat](../PCbuild/build.bat)), it is possible to activate a
|
||||||
it is possible to activate a **very** verbose mode in the generated parser. This
|
**very** verbose mode in the generated parser. This is very useful to
|
||||||
is very useful to debug the generated parser and to understand how it works, but it
|
debug the generated parser and to understand how it works, but it
|
||||||
can be a bit hard to understand at first.
|
can be a bit hard to understand at first.
|
||||||
|
|
||||||
> [!NOTE]
|
> [!NOTE]
|
||||||
|
@ -859,13 +846,13 @@ can be a bit hard to understand at first.
|
||||||
> interactive mode as it can be much harder to understand, because interactive
|
> interactive mode as it can be much harder to understand, because interactive
|
||||||
> mode involves some special steps compared to regular parsing.
|
> mode involves some special steps compared to regular parsing.
|
||||||
|
|
||||||
To activate verbose mode you can add the ``-d`` flag when executing Python:
|
To activate verbose mode you can add the `-d` flag when executing Python:
|
||||||
|
|
||||||
```
|
```
|
||||||
$ python -d file_to_test.py
|
$ python -d file_to_test.py
|
||||||
```
|
```
|
||||||
|
|
||||||
This will print **a lot** of output to ``stderr`` so it is probably better to dump
|
This will print **a lot** of output to `stderr` so it is probably better to dump
|
||||||
it to a file for further analysis. The output consists of trace lines with the
|
it to a file for further analysis. The output consists of trace lines with the
|
||||||
following structure::
|
following structure::
|
||||||
|
|
||||||
|
@ -873,17 +860,17 @@ following structure::
|
||||||
<indentation> ('>'|'-'|'+'|'!') <rule_name>[<token_location>]: <alternative> ...
|
<indentation> ('>'|'-'|'+'|'!') <rule_name>[<token_location>]: <alternative> ...
|
||||||
```
|
```
|
||||||
|
|
||||||
Every line is indented by a different amount (``<indentation>``) depending on how
|
Every line is indented by a different amount (`<indentation>`) depending on how
|
||||||
deep the call stack is. The next character marks the type of the trace:
|
deep the call stack is. The next character marks the type of the trace:
|
||||||
|
|
||||||
- ``>`` indicates that a rule is going to be attempted to be parsed.
|
- `>` indicates that a rule is going to be attempted to be parsed.
|
||||||
- ``-`` indicates that a rule has failed to be parsed.
|
- `-` indicates that a rule has failed to be parsed.
|
||||||
- ``+`` indicates that a rule has been parsed correctly.
|
- `+` indicates that a rule has been parsed correctly.
|
||||||
- ``!`` indicates that an exception or an error has been detected and the parser is unwinding.
|
- `!` indicates that an exception or an error has been detected and the parser is unwinding.
|
||||||
|
|
||||||
The ``<token_location>`` part indicates the current index in the token array,
|
The `<token_location>` part indicates the current index in the token array,
|
||||||
the ``<rule_name>`` part indicates what rule is being parsed and
|
the `<rule_name>` part indicates what rule is being parsed and
|
||||||
the ``<alternative>`` part indicates what alternative within that rule
|
the `<alternative>` part indicates what alternative within that rule
|
||||||
is being attempted.
|
is being attempted.
|
||||||
|
|
||||||
|
|
||||||
|
@ -891,4 +878,5 @@ is being attempted.
|
||||||
> **Document history**
|
> **Document history**
|
||||||
>
|
>
|
||||||
> Pablo Galindo Salgado - Original author
|
> Pablo Galindo Salgado - Original author
|
||||||
|
>
|
||||||
> Irit Katriel and Jacob Coffee - Convert to Markdown
|
> Irit Katriel and Jacob Coffee - Convert to Markdown
|
||||||
|
|
Loading…
Reference in New Issue