cpython/Tools/cases_generator/README.md

# Tooling to generate interpreters

Documentation for the instruction definitions in `Python/bytecodes.c`
("the DSL") is [here](interpreter_definition.md).

What's currently here:

- `analyzer.py`: code for converting `AST` generated by `Parser`
  to more high-level structure for easier interaction
- `lexer.py`: lexer for C, originally written by Mark Shannon
- `plexer.py`: OO interface on top of lexer.py; main class: `PLexer`
- `parsing.py`: Parser for instruction definition DSL; main class: `Parser`
- `parser.py` helper for interactions with `parsing.py`
- `tierN_generator.py`: a couple of driver scripts to read `Python/bytecodes.c` and
  write `Python/generated_cases.c.h` (and several other files)
- `optimizer_generator.py`: reads `Python/bytecodes.c` and
  `Python/optimizer_bytecodes.c` and writes
  `Python/optimizer_cases.c.h`
- `stack.py`: code to handle generalized stack effects
- `cwriter.py`: code which understands tokens and how to format C code;
  main class: `CWriter`
- `generators_common.py`: helpers for generators
- `opcode_id_generator.py`: generate a list of opcodes and write them to
  `Include/opcode_ids.h`
- `opcode_metadata_generator.py`: reads the instruction definitions and
  write the metadata to `Include/internal/pycore_opcode_metadata.h`
- `py_metadata_generator.py`: reads the instruction definitions and
  write the metadata to `Lib/_opcode_metadata.py`
- `target_generator.py`: generate targets for computed goto dispatch and
  write them to `Python/opcode_targets.h`
- `uop_id_generator.py`: generate a list of uop IDs and write them to
  `Include/internal/pycore_uop_ids.h`
- `uop_metadata_generator.py`: reads the instruction definitions and
  write the metadata to `Include/internal/pycore_uop_metadata.h`

Note that there is some dummy C code at the top and bottom of
`Python/bytecodes.c`
to fool text editors like VS Code into believing this is valid C code.

## A bit about the parser

The parser class uses a pretty standard recursive descent scheme,
but with unlimited backtracking.
The `PLexer` class tokenizes the entire input before parsing starts.
We do not run the C preprocessor.
Each parsing method returns either an AST node (a `Node` instance)
or `None`, or raises `SyntaxError` (showing the error in the C source).

Most parsing methods are decorated with `@contextual`, which automatically
resets the tokenizer input position when `None` is returned.
Parsing methods may also raise `SyntaxError`, which is irrecoverable.
When a parsing method returns `None`, it is possible that after backtracking
a different parsing method returns a valid AST.

Neither the lexer nor the parsers are complete or fully correct.
Most known issues are tersely indicated by `# TODO:` comments.
We plan to fix issues as they become relevant.
GH-98831: "Generate" the interpreter (#98830) The switch cases (really TARGET(opcode) macros) have been moved from ceval.c to generated_cases.c.h. That file is generated from instruction definitions in bytecodes.c (which impersonates a C file so the C code it contains can be edited without custom support in e.g. VS Code). The code generator lives in Tools/cases_generator (it has a README.md explaining how it works). The DSL used to describe the instructions is a work in progress, described in https://github.com/faster-cpython/ideas/blob/main/3.12/interpreter_definition.md. This is surely a work-in-progress. An easy next step could be auto-generating super-instructions. IMPORTANT: Merge Conflicts If you get a merge conflict for instruction implementations in ceval.c, your best bet is to port your changes to bytecodes.c. That file looks almost the same as the original cases, except instead of `TARGET(NAME)` it uses `inst(NAME)`, and the trailing `DISPATCH()` call is omitted (the code generator adds it automatically). 2022-11-03 01:31:26 -03:00			`# Tooling to generate interpreters`

gh-98831: Move DSL documentation here from ideas repo (#101629) 2023-02-07 01:03:58 -04:00			Documentation for the instruction definitions in `Python/bytecodes.c`
			`("the DSL") is [here](interpreter_definition.md).`

GH-98831: "Generate" the interpreter (#98830) The switch cases (really TARGET(opcode) macros) have been moved from ceval.c to generated_cases.c.h. That file is generated from instruction definitions in bytecodes.c (which impersonates a C file so the C code it contains can be edited without custom support in e.g. VS Code). The code generator lives in Tools/cases_generator (it has a README.md explaining how it works). The DSL used to describe the instructions is a work in progress, described in https://github.com/faster-cpython/ideas/blob/main/3.12/interpreter_definition.md. This is surely a work-in-progress. An easy next step could be auto-generating super-instructions. IMPORTANT: Merge Conflicts If you get a merge conflict for instruction implementations in ceval.c, your best bet is to port your changes to bytecodes.c. That file looks almost the same as the original cases, except instead of `TARGET(NAME)` it uses `inst(NAME)`, and the trailing `DISPATCH()` call is omitted (the code generator adds it automatically). 2022-11-03 01:31:26 -03:00			`What's currently here:`

Update outdated info in ``Tools/cases_generator/README.md`` (#114844) 2024-02-01 12:56:24 -04:00			- `analyzer.py`: code for converting `AST` generated by `Parser`
			`to more high-level structure for easier interaction`
GH-98831: Add `macro` and `op` and their implementation to DSL (#99495) Newly supported interpreter definition syntax: - `op(NAME, (input_stack_effects -- output_stack_effects)) { ... }` - `macro(NAME) = OP1 + OP2;` Also some other random improvements: - Convert `WITH_EXCEPT_START` to use stack effects - Fix lexer to balk at unrecognized characters, e.g. `@` - Fix moved output names; support object pointers in cache - Introduce `error()` method to print errors - Introduce read_uint16(p) as equivalent to `*p` Co-authored-by: Brandt Bucher <brandtbucher@gmail.com> 2022-11-22 20:04:57 -04:00			- `lexer.py`: lexer for C, originally written by Mark Shannon
			- `plexer.py`: OO interface on top of lexer.py; main class: `PLexer`
Update outdated info in ``Tools/cases_generator/README.md`` (#114844) 2024-02-01 12:56:24 -04:00			- `parsing.py`: Parser for instruction definition DSL; main class: `Parser`
			- `parser.py` helper for interactions with `parsing.py`
			- `tierN_generator.py`: a couple of driver scripts to read `Python/bytecodes.c` and
gh-104584: Baby steps towards generating and executing traces (#105924) Added a new, experimental, tracing optimizer and interpreter (a.k.a. "tier 2"). This currently pessimizes, so don't use yet -- this is infrastructure so we can experiment with optimizing passes. To enable it, pass ``-Xuops`` or set ``PYTHONUOPS=1``. To get debug output, set ``PYTHONUOPSDEBUG=N`` where ``N`` is a debug level (0-4, where 0 is no debug output and 4 is excessively verbose). All of this code is likely to change dramatically before the 3.13 feature freeze. But this is a first step. 2023-06-26 23:02:57 -03:00			write `Python/generated_cases.c.h` (and several other files)
Rename tier 2 redundancy eliminator to optimizer (#115888) The original name is just too much of a mouthful. 2024-02-26 12:42:53 -04:00			- `optimizer_generator.py`: reads `Python/bytecodes.c` and
			`Python/optimizer_bytecodes.c` and writes
			`Python/optimizer_cases.c.h`
Update outdated info in ``Tools/cases_generator/README.md`` (#114844) 2024-02-01 12:56:24 -04:00			- `stack.py`: code to handle generalized stack effects
			- `cwriter.py`: code which understands tokens and how to format C code;
			main class: `CWriter`
			- `generators_common.py`: helpers for generators
			- `opcode_id_generator.py`: generate a list of opcodes and write them to
			`Include/opcode_ids.h`
			- `opcode_metadata_generator.py`: reads the instruction definitions and
			write the metadata to `Include/internal/pycore_opcode_metadata.h`
			- `py_metadata_generator.py`: reads the instruction definitions and
			write the metadata to `Lib/_opcode_metadata.py`
			- `target_generator.py`: generate targets for computed goto dispatch and
			write them to `Python/opcode_targets.h`
			- `uop_id_generator.py`: generate a list of uop IDs and write them to
			`Include/internal/pycore_uop_ids.h`
			- `uop_metadata_generator.py`: reads the instruction definitions and
			write the metadata to `Include/internal/pycore_uop_metadata.h`
GH-98831: "Generate" the interpreter (#98830) The switch cases (really TARGET(opcode) macros) have been moved from ceval.c to generated_cases.c.h. That file is generated from instruction definitions in bytecodes.c (which impersonates a C file so the C code it contains can be edited without custom support in e.g. VS Code). The code generator lives in Tools/cases_generator (it has a README.md explaining how it works). The DSL used to describe the instructions is a work in progress, described in https://github.com/faster-cpython/ideas/blob/main/3.12/interpreter_definition.md. This is surely a work-in-progress. An easy next step could be auto-generating super-instructions. IMPORTANT: Merge Conflicts If you get a merge conflict for instruction implementations in ceval.c, your best bet is to port your changes to bytecodes.c. That file looks almost the same as the original cases, except instead of `TARGET(NAME)` it uses `inst(NAME)`, and the trailing `DISPATCH()` call is omitted (the code generator adds it automatically). 2022-11-03 01:31:26 -03:00
gh-98831: Move DSL documentation here from ideas repo (#101629) 2023-02-07 01:03:58 -04:00			`Note that there is some dummy C code at the top and bottom of`
			`Python/bytecodes.c`
GH-98831: "Generate" the interpreter (#98830) The switch cases (really TARGET(opcode) macros) have been moved from ceval.c to generated_cases.c.h. That file is generated from instruction definitions in bytecodes.c (which impersonates a C file so the C code it contains can be edited without custom support in e.g. VS Code). The code generator lives in Tools/cases_generator (it has a README.md explaining how it works). The DSL used to describe the instructions is a work in progress, described in https://github.com/faster-cpython/ideas/blob/main/3.12/interpreter_definition.md. This is surely a work-in-progress. An easy next step could be auto-generating super-instructions. IMPORTANT: Merge Conflicts If you get a merge conflict for instruction implementations in ceval.c, your best bet is to port your changes to bytecodes.c. That file looks almost the same as the original cases, except instead of `TARGET(NAME)` it uses `inst(NAME)`, and the trailing `DISPATCH()` call is omitted (the code generator adds it automatically). 2022-11-03 01:31:26 -03:00			`to fool text editors like VS Code into believing this is valid C code.`

			`## A bit about the parser`

			`The parser class uses a pretty standard recursive descent scheme,`
			`but with unlimited backtracking.`
			The `PLexer` class tokenizes the entire input before parsing starts.
			`We do not run the C preprocessor.`
			Each parsing method returns either an AST node (a `Node` instance)
			or `None`, or raises `SyntaxError` (showing the error in the C source).

			Most parsing methods are decorated with `@contextual`, which automatically
			resets the tokenizer input position when `None` is returned.
			Parsing methods may also raise `SyntaxError`, which is irrecoverable.
			When a parsing method returns `None`, it is possible that after backtracking
			`a different parsing method returns a valid AST.`

			`Neither the lexer nor the parsers are complete or fully correct.`
			Most known issues are tersely indicated by `# TODO:` comments.
			`We plan to fix issues as they become relevant.`