mirror of https://github.com/python/cpython
58 lines
2.7 KiB
Markdown
58 lines
2.7 KiB
Markdown
# Tooling to generate interpreters
|
|
|
|
Documentation for the instruction definitions in `Python/bytecodes.c`
|
|
("the DSL") is [here](interpreter_definition.md).
|
|
|
|
What's currently here:
|
|
|
|
- `analyzer.py`: code for converting `AST` generated by `Parser`
|
|
to more high-level structure for easier interaction
|
|
- `lexer.py`: lexer for C, originally written by Mark Shannon
|
|
- `plexer.py`: OO interface on top of lexer.py; main class: `PLexer`
|
|
- `parsing.py`: Parser for instruction definition DSL; main class: `Parser`
|
|
- `parser.py` helper for interactions with `parsing.py`
|
|
- `tierN_generator.py`: a couple of driver scripts to read `Python/bytecodes.c` and
|
|
write `Python/generated_cases.c.h` (and several other files)
|
|
- `optimizer_generator.py`: reads `Python/bytecodes.c` and
|
|
`Python/optimizer_bytecodes.c` and writes
|
|
`Python/optimizer_cases.c.h`
|
|
- `stack.py`: code to handle generalized stack effects
|
|
- `cwriter.py`: code which understands tokens and how to format C code;
|
|
main class: `CWriter`
|
|
- `generators_common.py`: helpers for generators
|
|
- `opcode_id_generator.py`: generate a list of opcodes and write them to
|
|
`Include/opcode_ids.h`
|
|
- `opcode_metadata_generator.py`: reads the instruction definitions and
|
|
write the metadata to `Include/internal/pycore_opcode_metadata.h`
|
|
- `py_metadata_generator.py`: reads the instruction definitions and
|
|
write the metadata to `Lib/_opcode_metadata.py`
|
|
- `target_generator.py`: generate targets for computed goto dispatch and
|
|
write them to `Python/opcode_targets.h`
|
|
- `uop_id_generator.py`: generate a list of uop IDs and write them to
|
|
`Include/internal/pycore_uop_ids.h`
|
|
- `uop_metadata_generator.py`: reads the instruction definitions and
|
|
write the metadata to `Include/internal/pycore_uop_metadata.h`
|
|
|
|
Note that there is some dummy C code at the top and bottom of
|
|
`Python/bytecodes.c`
|
|
to fool text editors like VS Code into believing this is valid C code.
|
|
|
|
## A bit about the parser
|
|
|
|
The parser class uses a pretty standard recursive descent scheme,
|
|
but with unlimited backtracking.
|
|
The `PLexer` class tokenizes the entire input before parsing starts.
|
|
We do not run the C preprocessor.
|
|
Each parsing method returns either an AST node (a `Node` instance)
|
|
or `None`, or raises `SyntaxError` (showing the error in the C source).
|
|
|
|
Most parsing methods are decorated with `@contextual`, which automatically
|
|
resets the tokenizer input position when `None` is returned.
|
|
Parsing methods may also raise `SyntaxError`, which is irrecoverable.
|
|
When a parsing method returns `None`, it is possible that after backtracking
|
|
a different parsing method returns a valid AST.
|
|
|
|
Neither the lexer nor the parsers are complete or fully correct.
|
|
Most known issues are tersely indicated by `# TODO:` comments.
|
|
We plan to fix issues as they become relevant.
|