cpython/Tools/cases_generator
Guido van Rossum 9cdd2fa63b
GH-98831: Add DECREF_INPUTS(), expanding to DECREF() each stack input (#100205)
The presence of this macro indicates that a particular instruction
may be considered for conversion to a register-based format
(see https://github.com/faster-cpython/ideas/issues/485).

An invariant (currently unchecked) is that `DEOPT_IF()` may only
occur *before* `DECREF_INPUTS()`, and `ERROR_IF()` may only occur
*after* it. One reason not to check this is that there are a few
places where we insert *two* `DECREF_INPUTS()` calls, in different
branches of the code. The invariant checking would have to be able
to do some flow control analysis to understand this.

Note that many instructions, especially specialized ones,
can't be converted to use this macro straightforwardly.
This is because the generator currently only generates plain
`Py_DECREF(variable)` statements, and cannot generate
things like `_Py_DECREF_SPECIALIZED()` let alone deal with
`_PyList_AppendTakeRef()`.
2022-12-16 20:45:55 -08:00
..
README.md GH-98831: Add `macro` and `op` and their implementation to DSL (#99495) 2022-11-22 16:04:57 -08:00
generate_cases.py GH-98831: Add DECREF_INPUTS(), expanding to DECREF() each stack input (#100205) 2022-12-16 20:45:55 -08:00
lexer.py GH-98831: Support cache effects in super- and macro instructions (#99601) 2022-12-02 19:57:30 -08:00
parser.py GH-98831: Typed stack effects, and more instructions converted (#99764) 2022-12-08 13:31:27 -08:00
plexer.py GH-98831: Refactor and fix cases generator (#99526) 2022-11-17 17:06:07 -08:00

README.md

Tooling to generate interpreters

What's currently here:

  • lexer.py: lexer for C, originally written by Mark Shannon
  • plexer.py: OO interface on top of lexer.py; main class: PLexer
  • parser.py: Parser for instruction definition DSL; main class Parser
  • generate_cases.py: driver script to read Python/bytecodes.c and write Python/generated_cases.c.h

The DSL for the instruction definitions in Python/bytecodes.c is described here. Note that there is some dummy C code at the top and bottom of the file to fool text editors like VS Code into believing this is valid C code.

A bit about the parser

The parser class uses a pretty standard recursive descent scheme, but with unlimited backtracking. The PLexer class tokenizes the entire input before parsing starts. We do not run the C preprocessor. Each parsing method returns either an AST node (a Node instance) or None, or raises SyntaxError (showing the error in the C source).

Most parsing methods are decorated with @contextual, which automatically resets the tokenizer input position when None is returned. Parsing methods may also raise SyntaxError, which is irrecoverable. When a parsing method returns None, it is possible that after backtracking a different parsing method returns a valid AST.

Neither the lexer nor the parsers are complete or fully correct. Most known issues are tersely indicated by # TODO: comments. We plan to fix issues as they become relevant.