cpython/Tools/cases_generator/README.md

# Tooling to generate interpreters

Documentation for the instruction definitions in `Python/bytecodes.c`
("the DSL") is [here](interpreter_definition.md).

What's currently here:

- `lexer.py`: lexer for C, originally written by Mark Shannon
- `plexer.py`: OO interface on top of lexer.py; main class: `PLexer`
- `parsing.py`: Parser for instruction definition DSL; main class `Parser`
- `generate_cases.py`: driver script to read `Python/bytecodes.c` and
  write `Python/generated_cases.c.h` (and several other files)
- `analysis.py`: `Analyzer` class used to read the input files
- `flags.py`: abstractions related to metadata flags for instructions
- `formatting.py`: `Formatter` class used to write the output files
- `instructions.py`: classes to analyze and write instructions
- `stacking.py`: code to handle generalized stack effects

Note that there is some dummy C code at the top and bottom of
`Python/bytecodes.c`
to fool text editors like VS Code into believing this is valid C code.

## A bit about the parser

The parser class uses a pretty standard recursive descent scheme,
but with unlimited backtracking.
The `PLexer` class tokenizes the entire input before parsing starts.
We do not run the C preprocessor.
Each parsing method returns either an AST node (a `Node` instance)
or `None`, or raises `SyntaxError` (showing the error in the C source).

Most parsing methods are decorated with `@contextual`, which automatically
resets the tokenizer input position when `None` is returned.
Parsing methods may also raise `SyntaxError`, which is irrecoverable.
When a parsing method returns `None`, it is possible that after backtracking
a different parsing method returns a valid AST.

Neither the lexer nor the parsers are complete or fully correct.
Most known issues are tersely indicated by `# TODO:` comments.
We plan to fix issues as they become relevant.
GH-98831: "Generate" the interpreter (#98830) The switch cases (really TARGET(opcode) macros) have been moved from ceval.c to generated_cases.c.h. That file is generated from instruction definitions in bytecodes.c (which impersonates a C file so the C code it contains can be edited without custom support in e.g. VS Code). The code generator lives in Tools/cases_generator (it has a README.md explaining how it works). The DSL used to describe the instructions is a work in progress, described in https://github.com/faster-cpython/ideas/blob/main/3.12/interpreter_definition.md. This is surely a work-in-progress. An easy next step could be auto-generating super-instructions. IMPORTANT: Merge Conflicts If you get a merge conflict for instruction implementations in ceval.c, your best bet is to port your changes to bytecodes.c. That file looks almost the same as the original cases, except instead of `TARGET(NAME)` it uses `inst(NAME)`, and the trailing `DISPATCH()` call is omitted (the code generator adds it automatically). 2022-11-03 01:31:26 -03:00			`# Tooling to generate interpreters`

gh-98831: Move DSL documentation here from ideas repo (#101629) 2023-02-07 01:03:58 -04:00			Documentation for the instruction definitions in `Python/bytecodes.c`
			`("the DSL") is [here](interpreter_definition.md).`

GH-98831: "Generate" the interpreter (#98830) The switch cases (really TARGET(opcode) macros) have been moved from ceval.c to generated_cases.c.h. That file is generated from instruction definitions in bytecodes.c (which impersonates a C file so the C code it contains can be edited without custom support in e.g. VS Code). The code generator lives in Tools/cases_generator (it has a README.md explaining how it works). The DSL used to describe the instructions is a work in progress, described in https://github.com/faster-cpython/ideas/blob/main/3.12/interpreter_definition.md. This is surely a work-in-progress. An easy next step could be auto-generating super-instructions. IMPORTANT: Merge Conflicts If you get a merge conflict for instruction implementations in ceval.c, your best bet is to port your changes to bytecodes.c. That file looks almost the same as the original cases, except instead of `TARGET(NAME)` it uses `inst(NAME)`, and the trailing `DISPATCH()` call is omitted (the code generator adds it automatically). 2022-11-03 01:31:26 -03:00			`What's currently here:`

GH-98831: Add `macro` and `op` and their implementation to DSL (#99495) Newly supported interpreter definition syntax: - `op(NAME, (input_stack_effects -- output_stack_effects)) { ... }` - `macro(NAME) = OP1 + OP2;` Also some other random improvements: - Convert `WITH_EXCEPT_START` to use stack effects - Fix lexer to balk at unrecognized characters, e.g. `@` - Fix moved output names; support object pointers in cache - Introduce `error()` method to print errors - Introduce read_uint16(p) as equivalent to `*p` Co-authored-by: Brandt Bucher <brandtbucher@gmail.com> 2022-11-22 20:04:57 -04:00			- `lexer.py`: lexer for C, originally written by Mark Shannon
			- `plexer.py`: OO interface on top of lexer.py; main class: `PLexer`
Update README for the cases generator (#107826) 2023-08-09 22:05:51 -03:00			- `parsing.py`: Parser for instruction definition DSL; main class `Parser`
GH-98831: "Generate" the interpreter (#98830) The switch cases (really TARGET(opcode) macros) have been moved from ceval.c to generated_cases.c.h. That file is generated from instruction definitions in bytecodes.c (which impersonates a C file so the C code it contains can be edited without custom support in e.g. VS Code). The code generator lives in Tools/cases_generator (it has a README.md explaining how it works). The DSL used to describe the instructions is a work in progress, described in https://github.com/faster-cpython/ideas/blob/main/3.12/interpreter_definition.md. This is surely a work-in-progress. An easy next step could be auto-generating super-instructions. IMPORTANT: Merge Conflicts If you get a merge conflict for instruction implementations in ceval.c, your best bet is to port your changes to bytecodes.c. That file looks almost the same as the original cases, except instead of `TARGET(NAME)` it uses `inst(NAME)`, and the trailing `DISPATCH()` call is omitted (the code generator adds it automatically). 2022-11-03 01:31:26 -03:00			- `generate_cases.py`: driver script to read `Python/bytecodes.c` and
gh-104584: Baby steps towards generating and executing traces (#105924) Added a new, experimental, tracing optimizer and interpreter (a.k.a. "tier 2"). This currently pessimizes, so don't use yet -- this is infrastructure so we can experiment with optimizing passes. To enable it, pass ``-Xuops`` or set ``PYTHONUOPS=1``. To get debug output, set ``PYTHONUOPSDEBUG=N`` where ``N`` is a debug level (0-4, where 0 is no debug output and 4 is excessively verbose). All of this code is likely to change dramatically before the 3.13 feature freeze. But this is a first step. 2023-06-26 23:02:57 -03:00			write `Python/generated_cases.c.h` (and several other files)
Update README for the cases generator (#107826) 2023-08-09 22:05:51 -03:00			- `analysis.py`: `Analyzer` class used to read the input files
			- `flags.py`: abstractions related to metadata flags for instructions
			- `formatting.py`: `Formatter` class used to write the output files
			- `instructions.py`: classes to analyze and write instructions
			- `stacking.py`: code to handle generalized stack effects
GH-98831: "Generate" the interpreter (#98830) The switch cases (really TARGET(opcode) macros) have been moved from ceval.c to generated_cases.c.h. That file is generated from instruction definitions in bytecodes.c (which impersonates a C file so the C code it contains can be edited without custom support in e.g. VS Code). The code generator lives in Tools/cases_generator (it has a README.md explaining how it works). The DSL used to describe the instructions is a work in progress, described in https://github.com/faster-cpython/ideas/blob/main/3.12/interpreter_definition.md. This is surely a work-in-progress. An easy next step could be auto-generating super-instructions. IMPORTANT: Merge Conflicts If you get a merge conflict for instruction implementations in ceval.c, your best bet is to port your changes to bytecodes.c. That file looks almost the same as the original cases, except instead of `TARGET(NAME)` it uses `inst(NAME)`, and the trailing `DISPATCH()` call is omitted (the code generator adds it automatically). 2022-11-03 01:31:26 -03:00
gh-98831: Move DSL documentation here from ideas repo (#101629) 2023-02-07 01:03:58 -04:00			`Note that there is some dummy C code at the top and bottom of`
			`Python/bytecodes.c`
GH-98831: "Generate" the interpreter (#98830) The switch cases (really TARGET(opcode) macros) have been moved from ceval.c to generated_cases.c.h. That file is generated from instruction definitions in bytecodes.c (which impersonates a C file so the C code it contains can be edited without custom support in e.g. VS Code). The code generator lives in Tools/cases_generator (it has a README.md explaining how it works). The DSL used to describe the instructions is a work in progress, described in https://github.com/faster-cpython/ideas/blob/main/3.12/interpreter_definition.md. This is surely a work-in-progress. An easy next step could be auto-generating super-instructions. IMPORTANT: Merge Conflicts If you get a merge conflict for instruction implementations in ceval.c, your best bet is to port your changes to bytecodes.c. That file looks almost the same as the original cases, except instead of `TARGET(NAME)` it uses `inst(NAME)`, and the trailing `DISPATCH()` call is omitted (the code generator adds it automatically). 2022-11-03 01:31:26 -03:00			`to fool text editors like VS Code into believing this is valid C code.`

			`## A bit about the parser`

			`The parser class uses a pretty standard recursive descent scheme,`
			`but with unlimited backtracking.`
			The `PLexer` class tokenizes the entire input before parsing starts.
			`We do not run the C preprocessor.`
			Each parsing method returns either an AST node (a `Node` instance)
			or `None`, or raises `SyntaxError` (showing the error in the C source).

			Most parsing methods are decorated with `@contextual`, which automatically
			resets the tokenizer input position when `None` is returned.
			Parsing methods may also raise `SyntaxError`, which is irrecoverable.
			When a parsing method returns `None`, it is possible that after backtracking
			`a different parsing method returns a valid AST.`

			`Neither the lexer nor the parsers are complete or fully correct.`
			Most known issues are tersely indicated by `# TODO:` comments.
			`We plan to fix issues as they become relevant.`