GH-98831: "Generate" the interpreter (#98830)

The switch cases (really TARGET(opcode) macros) have been moved from ceval.c to generated_cases.c.h. That file is generated from instruction definitions in bytecodes.c (which impersonates a C file so the C code it contains can be edited without custom support in e.g. VS Code). The code generator lives in Tools/cases_generator (it has a README.md explaining how it works). The DSL used to describe the instructions is a work in progress, described in https://github.com/faster-cpython/ideas/blob/main/3.12/interpreter_definition.md. This is surely a work-in-progress. An easy next step could be auto-generating super-instructions. **IMPORTANT: Merge Conflicts** If you get a merge conflict for instruction implementations in ceval.c, your best bet is to port your changes to bytecodes.c. That file looks almost the same as the original cases, except instead of `TARGET(NAME)` it uses `inst(NAME)`, and the trailing `DISPATCH()` call is omitted (the code generator adds it automatically).
2022-11-02 21:31:26 -07:00 · 2022-11-02 21:31:26 -07:00 · 41bc101dd6
parent 2cfcaf5af6
commit 41bc101dd6
13 changed files with 8961 additions and 3851 deletions
--- a/.gitattributes
+++ b/.gitattributes
@ -82,6 +82,7 @@ Parser/parser.c                                     generated
 Parser/token.c                                      generated
 Programs/test_frozenmain.h                          generated
 Python/Python-ast.c                                 generated
 Python/generated_cases.c.h                          generated
 Python/opcode_targets.h                             generated
 Python/stdlib_module_names.h                        generated
 Tools/peg_generator/pegen/grammar_parser.py         generated
--- a/Makefile.pre.in
+++ b/Makefile.pre.in
@ -1445,7 +1445,19 @@ regen-opcode-targets:
 		$(srcdir)/Python/opcode_targets.h.new
 	$(UPDATE_FILE) $(srcdir)/Python/opcode_targets.h $(srcdir)/Python/opcode_targets.h.new
-Python/ceval.o: $(srcdir)/Python/opcode_targets.h $(srcdir)/Python/condvar.h
+.PHONY: regen-cases
 regen-cases:
 	# Regenerate Python/generated_cases.c.h from Python/bytecodes.c
 	# using Tools/cases_generator/generate_cases.py
 	PYTHONPATH=$(srcdir)/Tools/cases_generator \
 	$(PYTHON_FOR_REGEN) \
 	    $(srcdir)/Tools/cases_generator/generate_cases.py \
 		-i $(srcdir)/Python/bytecodes.c \
 		-o $(srcdir)/Python/generated_cases.c.h.new
 	$(UPDATE_FILE) $(srcdir)/Python/generated_cases.c.h $(srcdir)/Python/generated_cases.c.h.new
 Python/ceval.o: $(srcdir)/Python/opcode_targets.h $(srcdir)/Python/condvar.h $(srcdir)/Python/generated_cases.c.h
 Python/frozen.o: $(FROZEN_FILES_OUT)
--- a/Misc/NEWS.d/next/Build/2022-10-28-22-24-26.gh-issue-98831.IXRCRX.rst
+++ b/Misc/NEWS.d/next/Build/2022-10-28-22-24-26.gh-issue-98831.IXRCRX.rst
@ -0,0 +1 @@
 We have new tooling, in ``Tools/cases_generator``, to generate the interpreter switch from a list of opcode definitions.
--- a/Python/bytecodes.c
+++ b/Python/bytecodes.c
--- a/Python/ceval.c
+++ b/Python/ceval.c
--- a/Python/generated_cases.c.h
+++ b/Python/generated_cases.c.h
--- a/Tools/cases_generator/README.md
+++ b/Tools/cases_generator/README.md
@ -0,0 +1,39 @@
 # Tooling to generate interpreters
 What's currently here:
 - lexer.py: lexer for C, originally written by Mark Shannon
 - plexer.py: OO interface on top of lexer.py; main class: `PLexer`
 - parser.py: Parser for instruction definition DSL; main class `Parser`
 - `generate_cases.py`: driver script to read `Python/bytecodes.c` and
  write `Python/generated_cases.c.h`
 **Temporarily also:**
 - `extract_cases.py`: script to extract cases from
  `Python/ceval.c` and write them to `Python/bytecodes.c`
 - `bytecodes_template.h`: template used by `extract_cases.py`
 The DSL for the instruction definitions in `Python/bytecodes.c` is described
 [here](https://github.com/faster-cpython/ideas/blob/main/3.12/interpreter_definition.md).
 Note that there is some dummy C code at the top and bottom of the file
 to fool text editors like VS Code into believing this is valid C code.
 ## A bit about the parser
 The parser class uses a pretty standard recursive descent scheme,
 but with unlimited backtracking.
 The `PLexer` class tokenizes the entire input before parsing starts.
 We do not run the C preprocessor.
 Each parsing method returns either an AST node (a `Node` instance)
 or `None`, or raises `SyntaxError` (showing the error in the C source).
 Most parsing methods are decorated with `@contextual`, which automatically
 resets the tokenizer input position when `None` is returned.
 Parsing methods may also raise `SyntaxError`, which is irrecoverable.
 When a parsing method returns `None`, it is possible that after backtracking
 a different parsing method returns a valid AST.
 Neither the lexer nor the parsers are complete or fully correct.
 Most known issues are tersely indicated by `# TODO:` comments.
 We plan to fix issues as they become relevant.
--- a/Tools/cases_generator/bytecodes_template.c
+++ b/Tools/cases_generator/bytecodes_template.c
@ -0,0 +1,85 @@
 #include "Python.h"
 #include "pycore_abstract.h"      // _PyIndex_Check()
 #include "pycore_call.h"          // _PyObject_FastCallDictTstate()
 #include "pycore_ceval.h"         // _PyEval_SignalAsyncExc()
 #include "pycore_code.h"
 #include "pycore_function.h"
 #include "pycore_long.h"          // _PyLong_GetZero()
 #include "pycore_object.h"        // _PyObject_GC_TRACK()
 #include "pycore_moduleobject.h"  // PyModuleObject
 #include "pycore_opcode.h"        // EXTRA_CASES
 #include "pycore_pyerrors.h"      // _PyErr_Fetch()
 #include "pycore_pymem.h"         // _PyMem_IsPtrFreed()
 #include "pycore_pystate.h"       // _PyInterpreterState_GET()
 #include "pycore_range.h"         // _PyRangeIterObject
 #include "pycore_sliceobject.h"   // _PyBuildSlice_ConsumeRefs
 #include "pycore_sysmodule.h"     // _PySys_Audit()
 #include "pycore_tuple.h"         // _PyTuple_ITEMS()
 #include "pycore_emscripten_signal.h"  // _Py_CHECK_EMSCRIPTEN_SIGNALS
 #include "pycore_dict.h"
 #include "dictobject.h"
 #include "pycore_frame.h"
 #include "opcode.h"
 #include "pydtrace.h"
 #include "setobject.h"
 #include "structmember.h"         // struct PyMemberDef, T_OFFSET_EX
 void _PyFloat_ExactDealloc(PyObject *);
 void _PyUnicode_ExactDealloc(PyObject *);
 #define SET_TOP(v)        (stack_pointer[-1] = (v))
 #define PEEK(n)           (stack_pointer[-(n)])
 #define GETLOCAL(i)     (frame->localsplus[i])
 #define inst(name) case name:
 #define family(name) static int family_##name
 #define NAME_ERROR_MSG \
    "name '%.200s' is not defined"
 typedef struct {
    PyObject *kwnames;
 } CallShape;
 static void
 dummy_func(
    PyThreadState *tstate,
    _PyInterpreterFrame *frame,
    unsigned char opcode,
    unsigned int oparg,
    _Py_atomic_int * const eval_breaker,
    _PyCFrame cframe,
    PyObject *names,
    PyObject *consts,
    _Py_CODEUNIT *next_instr,
    PyObject **stack_pointer,
    CallShape call_shape,
    _Py_CODEUNIT *first_instr,
    int throwflag,
    binaryfunc binary_ops[]
 )
 {
    switch (opcode) {
        /* BEWARE!
           It is essential that any operation that fails must goto error
           and that all operation that succeed call DISPATCH() ! */
 // BEGIN BYTECODES //
 // INSERT CASES HERE //
 // END BYTECODES //
    }
 error:;
 exception_unwind:;
 handle_eval_breaker:;
 resume_frame:;
 resume_with_error:;
 start_frame:;
 unbound_local_error:;
 }
 // Families go below this point //
--- a/Tools/cases_generator/extract_cases.py
+++ b/Tools/cases_generator/extract_cases.py
@ -0,0 +1,247 @@
 """Extract the main interpreter switch cases."""
 # Reads cases from ceval.c, writes to bytecodes.c.
 # (This file is not meant to be compiled, but it has a .c extension
 # so tooling like VS Code can be fooled into thinking it is C code.
 # This helps editing and browsing the code.)
 #
 # The script generate_cases.py regenerates the cases.
 import argparse
 import difflib
 import dis
 import re
 import sys
 parser = argparse.ArgumentParser()
 parser.add_argument("-i", "--input", type=str, default="Python/ceval.c")
 parser.add_argument("-o", "--output", type=str, default="Python/bytecodes.c")
 parser.add_argument("-t", "--template", type=str, default="Tools/cases_generator/bytecodes_template.c")
 parser.add_argument("-c", "--compare", action="store_true")
 parser.add_argument("-q", "--quiet", action="store_true")
 inverse_specializations = {
    specname: familyname
    for familyname, specnames in dis._specializations.items()
    for specname in specnames
 }
 def eopen(filename, mode="r"):
    if filename == "-":
        if "r" in mode:
            return sys.stdin
        else:
            return sys.stdout
    return open(filename, mode)
 def leading_whitespace(line):
    return len(line) - len(line.lstrip())
 def extract_opcode_name(line):
    m = re.match(r"\A\s*TARGET\((\w+)\)\s*{\s*\Z", line)
    if m:
        opcode_name = m.group(1)
        if opcode_name not in dis._all_opmap:
            raise ValueError(f"error: unknown opcode {opcode_name}")
        return opcode_name
    raise ValueError(f"error: no opcode in {line.strip()}")
 def figure_stack_effect(opcode_name):
    # Return (i, diff``) where i is the stack effect for oparg=0
    # and diff is the increment for oparg=1.
    # If it is irregular or unknown, raise ValueError.
    if m := re.match(f"^(\w+)__(\w+)$", opcode_name):
        # Super-instruction adds effect of both parts
        first, second = m.groups()
        se1, incr1 = figure_stack_effect(first)
        se2, incr2 = figure_stack_effect(second)
        if incr1 or incr2:
            raise ValueError(f"irregular stack effect for {opcode_name}")
        return se1 + se2, 0
    if opcode_name in inverse_specializations:
        # Specialized instruction maps to unspecialized instruction
        opcode_name = inverse_specializations[opcode_name]
    opcode = dis._all_opmap[opcode_name]
    if opcode in dis.hasarg:
        try:
            se = dis.stack_effect(opcode, 0)
        except ValueError as err:
            raise ValueError(f"{err} for {opcode_name}")
        if dis.stack_effect(opcode, 0, jump=True) != se:
            raise ValueError(f"{opcode_name} stack effect depends on jump flag")
        if dis.stack_effect(opcode, 0, jump=False) != se:
            raise ValueError(f"{opcode_name} stack effect depends on jump flag")
        for i in range(1, 257):
            if dis.stack_effect(opcode, i) != se:
                return figure_variable_stack_effect(opcode_name, opcode, se)
    else:
        try:
            se = dis.stack_effect(opcode)
        except ValueError as err:
            raise ValueError(f"{err} for {opcode_name}")
        if dis.stack_effect(opcode, jump=True) != se:
            raise ValueError(f"{opcode_name} stack effect depends on jump flag")
        if dis.stack_effect(opcode, jump=False) != se:
            raise ValueError(f"{opcode_name} stack effect depends on jump flag")
    return se, 0
 def figure_variable_stack_effect(opcode_name, opcode, se0):
    # Is it a linear progression?
    se1 = dis.stack_effect(opcode, 1)
    diff = se1 - se0
    for i in range(2, 257):
        sei = dis.stack_effect(opcode, i)
        if sei - se0 != diff * i:
            raise ValueError(f"{opcode_name} has irregular stack effect")
    # Assume it's okay for larger oparg values too
    return se0, diff
 START_MARKER = "/* Start instructions */"  # The '{' is on the preceding line.
 END_MARKER = "/* End regular instructions */"
 def read_cases(f):
    cases = []
    case = None
    started = False
    # TODO: Count line numbers
    for line in f:
        stripped = line.strip()
        if not started:
            if stripped == START_MARKER:
                started = True
            continue
        if stripped == END_MARKER:
            break
        if stripped.startswith("TARGET("):
            if case:
                cases.append(case)
            indent = " " * leading_whitespace(line)
            case = ""
            opcode_name = extract_opcode_name(line)
            try:
                se, diff = figure_stack_effect(opcode_name)
            except ValueError as err:
                case += f"{indent}// error: {err}\n"
                case += f"{indent}inst({opcode_name}) {{\n"
            else:
                inputs = []
                outputs = []
                if se > 0:
                    for i in range(se):
                        outputs.append(f"__{i}")
                elif se < 0:
                    for i in range(-se):
                        inputs.append(f"__{i}")
                if diff > 0:
                    if diff == 1:
                        outputs.append(f"__array[oparg]")
                    else:
                        outputs.append(f"__array[oparg*{diff}]")
                elif diff < 0:
                    if diff == -1:
                        inputs.append(f"__array[oparg]")
                    else:
                        inputs.append(f"__array[oparg*{-diff}]")
                input = ", ".join(inputs)
                output = ", ".join(outputs)
                case += f"{indent}// stack effect: ({input} -- {output})\n"
                case += f"{indent}inst({opcode_name}) {{\n"
        else:
            if case:
                case += line
    if case:
        cases.append(case)
    return cases
 def write_cases(f, cases):
    for case in cases:
        caselines = case.splitlines()
        while caselines[-1].strip() == "":
            caselines.pop()
        if caselines[-1].strip() == "}":
            caselines.pop()
        else:
            raise ValueError("case does not end with '}'")
        if caselines[-1].strip() == "DISPATCH();":
            caselines.pop()
        caselines.append("        }")
        case = "\n".join(caselines)
        print(case + "\n", file=f)
 def write_families(f):
    for opcode, specializations in dis._specializations.items():
        all = [opcode] + specializations
        if len(all) <= 3:
            members = ', '.join(all)
            print(f"family({opcode.lower()}) = {{ {members} }};", file=f)
        else:
            print(f"family({opcode.lower()}) = {{", file=f)
            for i in range(0, len(all), 3):
                members = ', '.join(all[i:i+3])
                if i+3 < len(all):
                    print(f"    {members},", file=f)
                else:
                    print(f"    {members} }};", file=f)
 def compare(oldfile, newfile, quiet=False):
    with open(oldfile) as f:
        oldlines = f.readlines()
    for top, line in enumerate(oldlines):
        if line.strip() == START_MARKER:
            break
    else:
        print(f"No start marker found in {oldfile}", file=sys.stderr)
        return
    del oldlines[:top]
    for bottom, line in enumerate(oldlines):
        if line.strip() == END_MARKER:
            break
    else:
        print(f"No end marker found in {oldfile}", file=sys.stderr)
        return
    del oldlines[bottom:]
    if not quiet:
        print(
            f"// {oldfile} has {len(oldlines)} lines after stripping top/bottom",
            file=sys.stderr,
        )
    with open(newfile) as f:
        newlines = f.readlines()
    if not quiet:
        print(f"// {newfile} has {len(newlines)} lines", file=sys.stderr)
    for line in difflib.unified_diff(oldlines, newlines, fromfile=oldfile, tofile=newfile):
        sys.stdout.write(line)
 def main():
    args = parser.parse_args()
    with eopen(args.input) as f:
        cases = read_cases(f)
    with open(args.template) as f:
        prolog, epilog = f.read().split("// INSERT CASES HERE //", 1)
    if not args.quiet:
        print(f"// Read {len(cases)} cases from {args.input}", file=sys.stderr)
    with eopen(args.output, "w") as f:
        f.write(prolog)
        write_cases(f, cases)
        f.write(epilog)
        write_families(f)
    if not args.quiet:
        print(f"// Wrote {len(cases)} cases to {args.output}", file=sys.stderr)
    if args.compare:
        compare(args.input, args.output, args.quiet)
 if __name__ == "__main__":
    main()
--- a/Tools/cases_generator/generate_cases.py
+++ b/Tools/cases_generator/generate_cases.py
@ -0,0 +1,125 @@
 """Generate the main interpreter switch."""
 # Write the cases to generated_cases.c.h, which is #included in ceval.c.
 # TODO: Reuse C generation framework from deepfreeze.py?
 import argparse
 import io
 import sys
 import parser
 from parser import InstDef
 arg_parser = argparse.ArgumentParser()
 arg_parser.add_argument("-i", "--input", type=str, default="Python/bytecodes.c")
 arg_parser.add_argument("-o", "--output", type=str, default="Python/generated_cases.c.h")
 arg_parser.add_argument("-c", "--compare", action="store_true")
 arg_parser.add_argument("-q", "--quiet", action="store_true")
 def eopen(filename: str, mode: str = "r"):
    if filename == "-":
        if "r" in mode:
            return sys.stdin
        else:
            return sys.stdout
    return open(filename, mode)
 def parse_cases(src: str, filename: str|None = None) -> tuple[list[InstDef], list[parser.Family]]:
    psr = parser.Parser(src, filename=filename)
    instrs: list[InstDef] = []
    families: list[parser.Family] = []
    while not psr.eof():
        if inst := psr.inst_def():
            assert inst.block
            instrs.append(InstDef(inst.name, inst.inputs, inst.outputs, inst.block))
        elif fam := psr.family_def():
            families.append(fam)
        else:
            raise psr.make_syntax_error(f"Unexpected token")
    return instrs, families
 def always_exits(block: parser.Block) -> bool:
    text = block.text
    lines = text.splitlines()
    while lines and not lines[-1].strip():
        lines.pop()
    if not lines or lines[-1].strip() != "}":
        return False
    lines.pop()
    if not lines:
        return False
    line = lines.pop().rstrip()
    # Indent must match exactly (TODO: Do something better)
    if line[:12] != " "*12:
        return False
    line = line[12:]
    return line.startswith(("goto ", "return ", "DISPATCH", "GO_TO_", "Py_UNREACHABLE()"))
 def write_cases(f: io.TextIOBase, instrs: list[InstDef]):
    indent = "        "
    f.write("// This file is generated by Tools/scripts/generate_cases.py\n")
    f.write("// Do not edit!\n")
    for instr in instrs:
        assert isinstance(instr, InstDef)
        f.write(f"\n{indent}TARGET({instr.name}) {{\n")
        # input = ", ".join(instr.inputs)
        # output = ", ".join(instr.outputs)
        # f.write(f"{indent}    // {input} -- {output}\n")
        assert instr.block
        blocklines = instr.block.text.splitlines(True)
        # Remove blank lines from ends
        while blocklines and not blocklines[0].strip():
            blocklines.pop(0)
        while blocklines and not blocklines[-1].strip():
            blocklines.pop()
        # Remove leading '{' and trailing '}'
        assert blocklines and blocklines[0].strip() == "{"
        assert blocklines and blocklines[-1].strip() == "}"
        blocklines.pop()
        blocklines.pop(0)
        # Remove trailing blank lines
        while blocklines and not blocklines[-1].strip():
            blocklines.pop()
        # Write the body
        for line in blocklines:
            f.write(line)
        assert instr.block
        if not always_exits(instr.block):
            f.write(f"{indent}    DISPATCH();\n")
        # Write trailing '}'
        f.write(f"{indent}}}\n")
 def main():
    args = arg_parser.parse_args()
    with eopen(args.input) as f:
        srclines = f.read().splitlines()
    begin = srclines.index("// BEGIN BYTECODES //")
    end = srclines.index("// END BYTECODES //")
    src = "\n".join(srclines[begin+1 : end])
    instrs, families = parse_cases(src, filename=args.input)
    ninstrs = nfamilies = 0
    if not args.quiet:
        ninstrs = len(instrs)
        nfamilies = len(families)
        print(
            f"Read {ninstrs} instructions "
            f"and {nfamilies} families from {args.input}",
            file=sys.stderr,
        )
    with eopen(args.output, "w") as f:
        write_cases(f, instrs)
    if not args.quiet:
        print(
            f"Wrote {ninstrs} instructions to {args.output}",
            file=sys.stderr,
        )
 if __name__ == "__main__":
    main()
--- a/Tools/cases_generator/lexer.py
+++ b/Tools/cases_generator/lexer.py
@ -0,0 +1,257 @@
 # Parser for C code
 # Originally by Mark Shannon (mark@hotpy.org)
 # https://gist.github.com/markshannon/db7ab649440b5af765451bb77c7dba34
 import re
 import sys
 import collections
 from dataclasses import dataclass
 def choice(*opts):
    return "|".join("(%s)" % opt for opt in opts)
 # Regexes
 # Longer operators must go before shorter ones.
 PLUSPLUS = r'\+\+'
 MINUSMINUS = r'--'
 # ->
 ARROW = r'->'
 ELLIPSIS = r'\.\.\.'
 # Assignment operators
 TIMESEQUAL = r'\*='
 DIVEQUAL = r'/='
 MODEQUAL = r'%='
 PLUSEQUAL = r'\+='
 MINUSEQUAL = r'-='
 LSHIFTEQUAL = r'<<='
 RSHIFTEQUAL = r'>>='
 ANDEQUAL = r'&='
 OREQUAL = r'\|='
 XOREQUAL = r'\^='
 # Operators
 PLUS = r'\+'
 MINUS = r'-'
 TIMES = r'\*'
 DIVIDE = r'/'
 MOD = r'%'
 NOT = r'~'
 XOR = r'\^'
 LOR = r'\|\|'
 LAND = r'&&'
 LSHIFT = r'<<'
 RSHIFT = r'>>'
 LE = r'<='
 GE = r'>='
 EQ = r'=='
 NE = r'!='
 LT = r'<'
 GT = r'>'
 LNOT = r'!'
 OR = r'\|'
 AND = r'&'
 EQUALS = r'='
 # ?
 CONDOP = r'\?'
 # Delimiters
 LPAREN = r'\('
 RPAREN = r'\)'
 LBRACKET = r'\['
 RBRACKET = r'\]'
 LBRACE = r'\{'
 RBRACE = r'\}'
 COMMA = r','
 PERIOD = r'\.'
 SEMI = r';'
 COLON = r':'
 BACKSLASH = r'\\'
 operators = { op: pattern for op, pattern in globals().items() if op == op.upper() }
 for op in operators:
    globals()[op] = op
 opmap = { pattern.replace("\\", "") or '\\' : op for op, pattern in operators.items() }
 # Macros
 macro = r'# *(ifdef|ifndef|undef|define|error|endif|if|else|include|#)'
 MACRO = 'MACRO'
 id_re = r'[a-zA-Z_][0-9a-zA-Z_]*'
 IDENTIFIER = 'IDENTIFIER'
 suffix = r'([uU]?[lL]?[lL]?)'
 octal = r'0[0-7]+' + suffix
 hex = r'0[xX][0-9a-fA-F]+'
 decimal_digits = r'(0|[1-9][0-9]*)'
 decimal = decimal_digits + suffix
 exponent = r"""([eE][-+]?[0-9]+)"""
 fraction = r"""([0-9]*\.[0-9]+)|([0-9]+\.)"""
 float = '(((('+fraction+')'+exponent+'?)|([0-9]+'+exponent+'))[FfLl]?)'
 number_re = choice(octal, hex, float, decimal)
 NUMBER = 'NUMBER'
 simple_escape = r"""([a-zA-Z._~!=&\^\-\\?'"])"""
 decimal_escape = r"""(\d+)"""
 hex_escape = r"""(x[0-9a-fA-F]+)"""
 escape_sequence = r"""(\\("""+simple_escape+'|'+decimal_escape+'|'+hex_escape+'))'
 string_char = r"""([^"\\\n]|"""+escape_sequence+')'
 str_re = '"'+string_char+'*"'
 STRING = 'STRING'
 char = r'\'.\''  # TODO: escape sequence
 CHARACTER = 'CHARACTER'
 comment_re = r'//.*|/\*([^*]|\*[^/])*\*/'
 COMMENT = 'COMMENT'
 newline = r"\n"
 matcher = re.compile(choice(id_re, number_re, str_re, char, newline, macro, comment_re, *operators.values()))
 letter = re.compile(r'[a-zA-Z_]')
 keywords = (
    'AUTO', 'BREAK', 'CASE', 'CHAR', 'CONST',
    'CONTINUE', 'DEFAULT', 'DO', 'DOUBLE', 'ELSE', 'ENUM', 'EXTERN',
    'FLOAT', 'FOR', 'GOTO', 'IF', 'INLINE', 'INT', 'LONG',
    'REGISTER', 'OFFSETOF',
    'RESTRICT', 'RETURN', 'SHORT', 'SIGNED', 'SIZEOF', 'STATIC', 'STRUCT',
    'SWITCH', 'TYPEDEF', 'UNION', 'UNSIGNED', 'VOID',
    'VOLATILE', 'WHILE'
 )
 for name in keywords:
    globals()[name] = name
 keywords = { name.lower() : name for name in keywords }
 def make_syntax_error(
    message: str, filename: str, line: int, column: int, line_text: str,
 ) -> SyntaxError:
    return SyntaxError(message, (filename, line, column, line_text))
@dataclass(slots=True)
 class Token:
    kind: str
    text: str
    begin: tuple[int, int]
    end: tuple[int, int]
    @property
    def line(self):
        return self.begin[0]
    @property
    def column(self):
        return self.begin[1]
    @property
    def end_line(self):
        return self.end[0]
    @property
    def end_column(self):
        return self.end[1]
    @property
    def width(self):
        return self.end[1] - self.begin[1]
    def replaceText(self, txt):
        assert isinstance(txt, str)
        return Token(self.kind, txt, self.begin, self.end)
    def __repr__(self):
        b0, b1 = self.begin
        e0, e1 = self.end
        if b0 == e0:
            return f"{self.kind}({self.text!r}, {b0}:{b1}:{e1})"
        else:
            return f"{self.kind}({self.text!r}, {b0}:{b1}, {e0}:{e1})"
 def tokenize(src, line=1, filename=None):
    linestart = -1
    # TODO: finditer() skips over unrecognized characters, e.g. '@'
    for m in matcher.finditer(src):
        start, end = m.span()
        text = m.group(0)
        if text in keywords:
            kind = keywords[text]
        elif letter.match(text):
            kind = IDENTIFIER
        elif text == '...':
            kind = ELLIPSIS
        elif text == '.':
            kind = PERIOD
        elif text[0] in '0123456789.':
            kind = NUMBER
        elif text[0] == '"':
            kind = STRING
        elif text in opmap:
            kind = opmap[text]
        elif text == '\n':
            linestart = start
            line += 1
            kind = '\n'
        elif text[0] == "'":
            kind = CHARACTER
        elif text[0] == '#':
            kind = MACRO
        elif text[0] == '/' and text[1] in '/*':
            kind = COMMENT
        else:
            lineend = src.find("\n", start)
            if lineend == -1:
                lineend = len(src)
            raise make_syntax_error(f"Bad token: {text}",
                filename, line, start-linestart+1, src[linestart:lineend])
        if kind == COMMENT:
            begin = line, start-linestart
            newlines = text.count('\n')
            if newlines:
                linestart = start + text.rfind('\n')
                line += newlines
        else:
            begin = line, start-linestart
        if kind != "\n":
            yield Token(kind, text, begin, (line, start-linestart+len(text)))
 __all__ = []
 __all__.extend([kind for kind in globals() if kind.upper() == kind])
 def to_text(tkns: list[Token], dedent: int = 0) -> str:
    res: list[str] = []
    line, col = -1, 1+dedent
    for tkn in tkns:
        if line == -1:
            line, _ = tkn.begin
        l, c = tkn.begin
        #assert(l >= line), (line, txt, start, end)
        while l > line:
            line += 1
            res.append('\n')
            col = 1+dedent
        res.append(' '*(c-col))
        res.append(tkn.text)
        line, col = tkn.end
    return ''.join(res)
 if __name__ == "__main__":
    import sys
    filename = sys.argv[1]
    if filename == "-c":
        src = sys.argv[2]
    else:
        src = open(filename).read()
    # print(to_text(tokenize(src)))
    for tkn in tokenize(src, filename=filename):
        print(tkn)
--- a/Tools/cases_generator/parser.py
+++ b/Tools/cases_generator/parser.py
@ -0,0 +1,222 @@
 """Parser for bytecodes.inst."""
 from dataclasses import dataclass, field
 from typing import NamedTuple, Callable, TypeVar
 import lexer as lx
 from plexer import PLexer
 P = TypeVar("P", bound="Parser")
 N = TypeVar("N", bound="Node")
 def contextual(func: Callable[[P], N|None]) -> Callable[[P], N|None]:
    # Decorator to wrap grammar methods.
    # Resets position if `func` returns None.
    def contextual_wrapper(self: P) -> N|None:
        begin = self.getpos()
        res = func(self)
        if res is None:
            self.setpos(begin)
            return
        end = self.getpos()
        res.context = Context(begin, end, self)
        return res
    return contextual_wrapper
 class Context(NamedTuple):
    begin: int
    end: int
    owner: PLexer
    def __repr__(self):
        return f"<{self.begin}-{self.end}>"
@dataclass
 class Node:
    context: Context|None = field(init=False, default=None)
    @property
    def text(self) -> str:
        context = self.context
        if not context:
            return ""
        tokens = context.owner.tokens
        begin = context.begin
        end = context.end
        return lx.to_text(tokens[begin:end])
@dataclass
 class Block(Node):
    tokens: list[lx.Token]
@dataclass
 class InstDef(Node):
    name: str
    inputs: list[str] | None
    outputs: list[str] | None
    block: Block | None
@dataclass
 class Family(Node):
    name: str
    members: list[str]
 class Parser(PLexer):
    @contextual
    def inst_def(self) -> InstDef | None:
        if header := self.inst_header():
            if block := self.block():
                header.block = block
                return header
            raise self.make_syntax_error("Expected block")
        return None
    @contextual
    def inst_header(self):
        # inst(NAME) | inst(NAME, (inputs -- outputs))
        # TODO: Error out when there is something unexpected.
        # TODO: Make INST a keyword in the lexer.
        if (tkn := self.expect(lx.IDENTIFIER)) and tkn.text == "inst":
            if (self.expect(lx.LPAREN)
                    and (tkn := self.expect(lx.IDENTIFIER))):
                name = tkn.text
                if self.expect(lx.COMMA):
                    inp, outp = self.stack_effect()
                    if (self.expect(lx.RPAREN)
                            and self.peek().kind == lx.LBRACE):
                        return InstDef(name, inp, outp, [])
                elif self.expect(lx.RPAREN):
                    return InstDef(name, None, None, [])
        return None
    def stack_effect(self):
        # '(' [inputs] '--' [outputs] ')'
        if self.expect(lx.LPAREN):
            inp = self.inputs() or []
            if self.expect(lx.MINUSMINUS):
                outp = self.outputs() or []
                if self.expect(lx.RPAREN):
                    return inp, outp
        raise self.make_syntax_error("Expected stack effect")
    def inputs(self):
        # input (, input)*
        here = self.getpos()
        if inp := self.input():
            near = self.getpos()
            if self.expect(lx.COMMA):
                if rest := self.inputs():
                    return [inp] + rest
            self.setpos(near)
            return [inp]
        self.setpos(here)
        return None
    def input(self):
        # IDENTIFIER
        if (tkn := self.expect(lx.IDENTIFIER)):
            if self.expect(lx.LBRACKET):
                if arg := self.expect(lx.IDENTIFIER):
                    if self.expect(lx.RBRACKET):
                        return f"{tkn.text}[{arg.text}]"
                    if self.expect(lx.TIMES):
                        if num := self.expect(lx.NUMBER):
                            if self.expect(lx.RBRACKET):
                                return f"{tkn.text}[{arg.text}*{num.text}]"
                raise self.make_syntax_error("Expected argument in brackets", tkn)
            return tkn.text
        if self.expect(lx.CONDOP):
            while self.expect(lx.CONDOP):
                pass
            return "??"
        return None
    def outputs(self):
        # output (, output)*
        here = self.getpos()
        if outp := self.output():
            near = self.getpos()
            if self.expect(lx.COMMA):
                if rest := self.outputs():
                    return [outp] + rest
            self.setpos(near)
            return [outp]
        self.setpos(here)
        return None
    def output(self):
        return self.input()  # TODO: They're not quite the same.
    @contextual
    def family_def(self) -> Family | None:
        here = self.getpos()
        if (tkn := self.expect(lx.IDENTIFIER)) and tkn.text == "family":
            if self.expect(lx.LPAREN):
                if (tkn := self.expect(lx.IDENTIFIER)):
                    name = tkn.text
                    if self.expect(lx.RPAREN):
                        if self.expect(lx.EQUALS):
                            if members := self.members():
                                if self.expect(lx.SEMI):
                                    return Family(name, members)
        return None
    def members(self):
        here = self.getpos()
        if tkn := self.expect(lx.IDENTIFIER):
            near = self.getpos()
            if self.expect(lx.COMMA):
                if rest := self.members():
                    return [tkn.text] + rest
            self.setpos(near)
            return [tkn.text]
        self.setpos(here)
        return None
    @contextual
    def block(self) -> Block:
        tokens = self.c_blob()
        return Block(tokens)
    def c_blob(self):
        tokens = []
        level = 0
        while tkn := self.next(raw=True):
            if tkn.kind in (lx.LBRACE, lx.LPAREN, lx.LBRACKET):
                level += 1
            elif tkn.kind in (lx.RBRACE, lx.RPAREN, lx.RBRACKET):
                level -= 1
                if level <= 0:
                    break
            tokens.append(tkn)
        return tokens
 if __name__ == "__main__":
    import sys
    if sys.argv[1:]:
        filename = sys.argv[1]
        if filename == "-c" and sys.argv[2:]:
            src = sys.argv[2]
            filename = None
        else:
            with open(filename) as f:
                src = f.read()
            srclines = src.splitlines()
            begin = srclines.index("// BEGIN BYTECODES //")
            end = srclines.index("// END BYTECODES //")
            src = "\n".join(srclines[begin+1 : end])
    else:
        filename = None
        src = "if (x) { x.foo; // comment\n}"
    parser = Parser(src, filename)
    x = parser.inst_def()
    print(x)
--- a/Tools/cases_generator/plexer.py
+++ b/Tools/cases_generator/plexer.py
@ -0,0 +1,104 @@
 import lexer as lx
 Token = lx.Token
 class PLexer:
    def __init__(self, src: str, filename: str|None = None):
        self.src = src
        self.filename = filename
        self.tokens = list(lx.tokenize(self.src, filename=filename))
        self.pos = 0
    def getpos(self) -> int:
        # Current position
        return self.pos
    def eof(self) -> bool:
        # Are we at EOF?
        return self.pos >= len(self.tokens)
    def setpos(self, pos: int) -> None:
        # Reset position
        assert 0 <= pos <= len(self.tokens), (pos, len(self.tokens))
        self.pos = pos
    def backup(self) -> None:
        # Back up position by 1
        assert self.pos > 0
        self.pos -= 1
    def next(self, raw: bool = False) -> Token | None:
        # Return next token and advance position; None if at EOF
        # TODO: Return synthetic EOF token instead of None?
        while self.pos < len(self.tokens):
            tok = self.tokens[self.pos]
            self.pos += 1
            if raw or tok.kind != "COMMENT":
                return tok
        return None
    def peek(self, raw: bool = False) -> Token | None:
        # Return next token without advancing position
        tok = self.next(raw=raw)
        self.backup()
        return tok
    def maybe(self, kind: str, raw: bool = False) -> Token | None:
        # Return next token without advancing position if kind matches
        tok = self.peek(raw=raw)
        if tok and tok.kind == kind:
            return tok
        return None
    def expect(self, kind: str) -> Token | None:
        # Return next token and advance position if kind matches
        tkn = self.next()
        if tkn is not None:
            if tkn.kind == kind:
                return tkn
            self.backup()
        return None
    def require(self, kind: str) -> Token:
        # Return next token and advance position, requiring kind to match
        tkn = self.next()
        if tkn is not None and tkn.kind == kind:
            return tkn
        raise self.make_syntax_error(f"Expected {kind!r} but got {tkn and tkn.text!r}", tkn)
    def extract_line(self, lineno: int) -> str:
        # Return source line `lineno` (1-based)
        lines = self.src.splitlines()
        if lineno > len(lines):
            return ""
        return lines[lineno - 1]
    def make_syntax_error(self, message: str, tkn: Token|None = None) -> SyntaxError:
        # Construct a SyntaxError instance from message and token
        if tkn is None:
            tkn = self.peek()
        if tkn is None:
            tkn = self.tokens[-1]
        return lx.make_syntax_error(message,
            self.filename, tkn.line, tkn.column, self.extract_line(tkn.line))
 if __name__ == "__main__":
    import sys
    if sys.argv[1:]:
        filename = sys.argv[1]
        if filename == "-c" and sys.argv[2:]:
            src = sys.argv[2]
            filename = None
        else:
            with open(filename) as f:
                src = f.read()
    else:
        filename = None
        src = "if (x) { x.foo; // comment\n}"
    p = PLexer(src, filename)
    while not p.eof():
        tok = p.next(raw=True)
        left = repr(tok)
        right = lx.to_text([tok]).rstrip()
        print(f"{left:40.40} {right}")
		`@ -0,0 +1 @@`
							We have new tooling, in ``Tools/cases_generator``, to generate the interpreter switch from a list of opcode definitions.