cpython/Tools/cases_generator/README.md

# Tooling to generate interpreters

What's currently here:

- lexer.py: lexer for C, originally written by Mark Shannon
- plexer.py: OO interface on top of lexer.py; main class: `PLexer`
- parser.py: Parser for instruction definition DSL; main class `Parser`
- `generate_cases.py`: driver script to read `Python/bytecodes.c` and
  write `Python/generated_cases.c.h`

The DSL for the instruction definitions in `Python/bytecodes.c` is described
[here](https://github.com/faster-cpython/ideas/blob/main/3.12/interpreter_definition.md).
Note that there is some dummy C code at the top and bottom of the file
to fool text editors like VS Code into believing this is valid C code.

## A bit about the parser

The parser class uses a pretty standard recursive descent scheme,
but with unlimited backtracking.
The `PLexer` class tokenizes the entire input before parsing starts.
We do not run the C preprocessor.
Each parsing method returns either an AST node (a `Node` instance)
or `None`, or raises `SyntaxError` (showing the error in the C source).

Most parsing methods are decorated with `@contextual`, which automatically
resets the tokenizer input position when `None` is returned.
Parsing methods may also raise `SyntaxError`, which is irrecoverable.
When a parsing method returns `None`, it is possible that after backtracking
a different parsing method returns a valid AST.

Neither the lexer nor the parsers are complete or fully correct.
Most known issues are tersely indicated by `# TODO:` comments.
We plan to fix issues as they become relevant.
GH-98831: "Generate" the interpreter (#98830) The switch cases (really TARGET(opcode) macros) have been moved from ceval.c to generated_cases.c.h. That file is generated from instruction definitions in bytecodes.c (which impersonates a C file so the C code it contains can be edited without custom support in e.g. VS Code). The code generator lives in Tools/cases_generator (it has a README.md explaining how it works). The DSL used to describe the instructions is a work in progress, described in https://github.com/faster-cpython/ideas/blob/main/3.12/interpreter_definition.md. This is surely a work-in-progress. An easy next step could be auto-generating super-instructions. IMPORTANT: Merge Conflicts If you get a merge conflict for instruction implementations in ceval.c, your best bet is to port your changes to bytecodes.c. That file looks almost the same as the original cases, except instead of `TARGET(NAME)` it uses `inst(NAME)`, and the trailing `DISPATCH()` call is omitted (the code generator adds it automatically). 2022-11-03 01:31:26 -03:00			`# Tooling to generate interpreters`

			`What's currently here:`

			`- lexer.py: lexer for C, originally written by Mark Shannon`
			- plexer.py: OO interface on top of lexer.py; main class: `PLexer`
			- parser.py: Parser for instruction definition DSL; main class `Parser`
			- `generate_cases.py`: driver script to read `Python/bytecodes.c` and
			write `Python/generated_cases.c.h`

			The DSL for the instruction definitions in `Python/bytecodes.c` is described
			`[here](https://github.com/faster-cpython/ideas/blob/main/3.12/interpreter_definition.md).`
			`Note that there is some dummy C code at the top and bottom of the file`
			`to fool text editors like VS Code into believing this is valid C code.`

			`## A bit about the parser`

			`The parser class uses a pretty standard recursive descent scheme,`
			`but with unlimited backtracking.`
			The `PLexer` class tokenizes the entire input before parsing starts.
			`We do not run the C preprocessor.`
			Each parsing method returns either an AST node (a `Node` instance)
			or `None`, or raises `SyntaxError` (showing the error in the C source).

			Most parsing methods are decorated with `@contextual`, which automatically
			resets the tokenizer input position when `None` is returned.
			Parsing methods may also raise `SyntaxError`, which is irrecoverable.
			When a parsing method returns `None`, it is possible that after backtracking
			`a different parsing method returns a valid AST.`

			`Neither the lexer nor the parsers are complete or fully correct.`
			Most known issues are tersely indicated by `# TODO:` comments.
			`We plan to fix issues as they become relevant.`