cpython/Tools/cases_generator
Ken Jin 22b0de2755
gh-117139: Convert the evaluation stack to stack refs (#118450)
This PR sets up tagged pointers for CPython.

The general idea is to create a separate struct _PyStackRef for everything on the evaluation stack to store the bits. This forces the C compiler to warn us if we try to cast things or pull things out of the struct directly.

Only for free threading: We tag the low bit if something is deferred - that means we skip incref and decref operations on it. This behavior may change in the future if Mark's plans to defer all objects in the interpreter loop pans out.

This implies a strict stack reference discipline is required. ALL incref and decref operations on stackrefs must use the stackref variants. It is unsafe to untag something then do normal incref/decref ops on it.

The new incref and decref variants are called dup and close. They mimic a "handle" API operating on these stackrefs.

Please read Include/internal/pycore_stackref.h for more information!

---------

Co-authored-by: Mark Shannon <9448417+markshannon@users.noreply.github.com>
2024-06-27 03:10:43 +08:00
..
README.md Rename tier 2 redundancy eliminator to optimizer (#115888) 2024-02-26 08:42:53 -08:00
_typing_backports.py gh-104504: cases generator: Add `--warn-unreachable` to the mypy config (#108112) 2023-08-21 00:40:41 +01:00
analyzer.py gh-117139: Convert the evaluation stack to stack refs (#118450) 2024-06-27 03:10:43 +08:00
cwriter.py GH-111485: Generate instruction and uop metadata (GH-113287) 2023-12-20 14:27:25 +00:00
generators_common.py gh-117139: Convert the evaluation stack to stack refs (#118450) 2024-06-27 03:10:43 +08:00
interpreter_definition.md gh-119689: generate stack effect metadata for pseudo instructions (#119691) 2024-05-29 09:47:56 +00:00
lexer.py gh-115778: Add `tierN` annotation for instruction definitions (#115815) 2024-02-23 17:31:57 +00:00
mypy.ini GH-111485: Separate out parsing, analysis and code-gen phases of tier 1 code generator (GH-112299) 2023-12-07 12:49:40 +00:00
opcode_id_generator.py gh-120417: Remove unused imports in cases_generator (#120622) 2024-06-17 21:58:56 +02:00
opcode_metadata_generator.py gh-120417: Remove unused imports in cases_generator (#120622) 2024-06-17 21:58:56 +02:00
optimizer_generator.py gh-117139: Convert the evaluation stack to stack refs (#118450) 2024-06-27 03:10:43 +08:00
parser.py gh-120417: Remove unused imports in cases_generator (#120622) 2024-06-17 21:58:56 +02:00
parsing.py gh-117139: Convert the evaluation stack to stack refs (#118450) 2024-06-27 03:10:43 +08:00
plexer.py gh-106812: Refactor cases_generator to allow uops with array stack effects (#107564) 2023-08-04 09:35:56 -07:00
py_metadata_generator.py gh-120417: Remove unused imports in cases_generator (#120622) 2024-06-17 21:58:56 +02:00
stack.py gh-117139: Convert the evaluation stack to stack refs (#118450) 2024-06-27 03:10:43 +08:00
target_generator.py gh-120417: Remove unused imports in cases_generator (#120622) 2024-06-17 21:58:56 +02:00
tier1_generator.py gh-117139: Convert the evaluation stack to stack refs (#118450) 2024-06-27 03:10:43 +08:00
tier2_generator.py gh-117139: Convert the evaluation stack to stack refs (#118450) 2024-06-27 03:10:43 +08:00
uop_id_generator.py gh-120417: Remove unused imports in cases_generator (#120622) 2024-06-17 21:58:56 +02:00
uop_metadata_generator.py GH-116422: Tier2 hot/cold splitting (GH-116813) 2024-03-26 09:35:11 +00:00

README.md

Tooling to generate interpreters

Documentation for the instruction definitions in Python/bytecodes.c ("the DSL") is here.

What's currently here:

  • analyzer.py: code for converting AST generated by Parser to more high-level structure for easier interaction
  • lexer.py: lexer for C, originally written by Mark Shannon
  • plexer.py: OO interface on top of lexer.py; main class: PLexer
  • parsing.py: Parser for instruction definition DSL; main class: Parser
  • parser.py helper for interactions with parsing.py
  • tierN_generator.py: a couple of driver scripts to read Python/bytecodes.c and write Python/generated_cases.c.h (and several other files)
  • optimizer_generator.py: reads Python/bytecodes.c and Python/optimizer_bytecodes.c and writes Python/optimizer_cases.c.h
  • stack.py: code to handle generalized stack effects
  • cwriter.py: code which understands tokens and how to format C code; main class: CWriter
  • generators_common.py: helpers for generators
  • opcode_id_generator.py: generate a list of opcodes and write them to Include/opcode_ids.h
  • opcode_metadata_generator.py: reads the instruction definitions and write the metadata to Include/internal/pycore_opcode_metadata.h
  • py_metadata_generator.py: reads the instruction definitions and write the metadata to Lib/_opcode_metadata.py
  • target_generator.py: generate targets for computed goto dispatch and write them to Python/opcode_targets.h
  • uop_id_generator.py: generate a list of uop IDs and write them to Include/internal/pycore_uop_ids.h
  • uop_metadata_generator.py: reads the instruction definitions and write the metadata to Include/internal/pycore_uop_metadata.h

Note that there is some dummy C code at the top and bottom of Python/bytecodes.c to fool text editors like VS Code into believing this is valid C code.

A bit about the parser

The parser class uses a pretty standard recursive descent scheme, but with unlimited backtracking. The PLexer class tokenizes the entire input before parsing starts. We do not run the C preprocessor. Each parsing method returns either an AST node (a Node instance) or None, or raises SyntaxError (showing the error in the C source).

Most parsing methods are decorated with @contextual, which automatically resets the tokenizer input position when None is returned. Parsing methods may also raise SyntaxError, which is irrecoverable. When a parsing method returns None, it is possible that after backtracking a different parsing method returns a valid AST.

Neither the lexer nor the parsers are complete or fully correct. Most known issues are tersely indicated by # TODO: comments. We plan to fix issues as they become relevant.