mirror of https://github.com/python/cpython
GH-98831: "Generate" the interpreter (#98830)
The switch cases (really TARGET(opcode) macros) have been moved from ceval.c to generated_cases.c.h. That file is generated from instruction definitions in bytecodes.c (which impersonates a C file so the C code it contains can be edited without custom support in e.g. VS Code). The code generator lives in Tools/cases_generator (it has a README.md explaining how it works). The DSL used to describe the instructions is a work in progress, described in https://github.com/faster-cpython/ideas/blob/main/3.12/interpreter_definition.md. This is surely a work-in-progress. An easy next step could be auto-generating super-instructions. **IMPORTANT: Merge Conflicts** If you get a merge conflict for instruction implementations in ceval.c, your best bet is to port your changes to bytecodes.c. That file looks almost the same as the original cases, except instead of `TARGET(NAME)` it uses `inst(NAME)`, and the trailing `DISPATCH()` call is omitted (the code generator adds it automatically).
This commit is contained in:
parent
2cfcaf5af6
commit
41bc101dd6
|
@ -82,6 +82,7 @@ Parser/parser.c generated
|
||||||
Parser/token.c generated
|
Parser/token.c generated
|
||||||
Programs/test_frozenmain.h generated
|
Programs/test_frozenmain.h generated
|
||||||
Python/Python-ast.c generated
|
Python/Python-ast.c generated
|
||||||
|
Python/generated_cases.c.h generated
|
||||||
Python/opcode_targets.h generated
|
Python/opcode_targets.h generated
|
||||||
Python/stdlib_module_names.h generated
|
Python/stdlib_module_names.h generated
|
||||||
Tools/peg_generator/pegen/grammar_parser.py generated
|
Tools/peg_generator/pegen/grammar_parser.py generated
|
||||||
|
|
|
@ -1445,7 +1445,19 @@ regen-opcode-targets:
|
||||||
$(srcdir)/Python/opcode_targets.h.new
|
$(srcdir)/Python/opcode_targets.h.new
|
||||||
$(UPDATE_FILE) $(srcdir)/Python/opcode_targets.h $(srcdir)/Python/opcode_targets.h.new
|
$(UPDATE_FILE) $(srcdir)/Python/opcode_targets.h $(srcdir)/Python/opcode_targets.h.new
|
||||||
|
|
||||||
Python/ceval.o: $(srcdir)/Python/opcode_targets.h $(srcdir)/Python/condvar.h
|
.PHONY: regen-cases
|
||||||
|
regen-cases:
|
||||||
|
# Regenerate Python/generated_cases.c.h from Python/bytecodes.c
|
||||||
|
# using Tools/cases_generator/generate_cases.py
|
||||||
|
PYTHONPATH=$(srcdir)/Tools/cases_generator \
|
||||||
|
$(PYTHON_FOR_REGEN) \
|
||||||
|
$(srcdir)/Tools/cases_generator/generate_cases.py \
|
||||||
|
-i $(srcdir)/Python/bytecodes.c \
|
||||||
|
-o $(srcdir)/Python/generated_cases.c.h.new
|
||||||
|
$(UPDATE_FILE) $(srcdir)/Python/generated_cases.c.h $(srcdir)/Python/generated_cases.c.h.new
|
||||||
|
|
||||||
|
Python/ceval.o: $(srcdir)/Python/opcode_targets.h $(srcdir)/Python/condvar.h $(srcdir)/Python/generated_cases.c.h
|
||||||
|
|
||||||
|
|
||||||
Python/frozen.o: $(FROZEN_FILES_OUT)
|
Python/frozen.o: $(FROZEN_FILES_OUT)
|
||||||
|
|
||||||
|
|
|
@ -0,0 +1 @@
|
||||||
|
We have new tooling, in ``Tools/cases_generator``, to generate the interpreter switch from a list of opcode definitions.
|
File diff suppressed because it is too large
Load Diff
3851
Python/ceval.c
3851
Python/ceval.c
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
|
@ -0,0 +1,39 @@
|
||||||
|
# Tooling to generate interpreters
|
||||||
|
|
||||||
|
What's currently here:
|
||||||
|
|
||||||
|
- lexer.py: lexer for C, originally written by Mark Shannon
|
||||||
|
- plexer.py: OO interface on top of lexer.py; main class: `PLexer`
|
||||||
|
- parser.py: Parser for instruction definition DSL; main class `Parser`
|
||||||
|
- `generate_cases.py`: driver script to read `Python/bytecodes.c` and
|
||||||
|
write `Python/generated_cases.c.h`
|
||||||
|
|
||||||
|
**Temporarily also:**
|
||||||
|
|
||||||
|
- `extract_cases.py`: script to extract cases from
|
||||||
|
`Python/ceval.c` and write them to `Python/bytecodes.c`
|
||||||
|
- `bytecodes_template.h`: template used by `extract_cases.py`
|
||||||
|
|
||||||
|
The DSL for the instruction definitions in `Python/bytecodes.c` is described
|
||||||
|
[here](https://github.com/faster-cpython/ideas/blob/main/3.12/interpreter_definition.md).
|
||||||
|
Note that there is some dummy C code at the top and bottom of the file
|
||||||
|
to fool text editors like VS Code into believing this is valid C code.
|
||||||
|
|
||||||
|
## A bit about the parser
|
||||||
|
|
||||||
|
The parser class uses a pretty standard recursive descent scheme,
|
||||||
|
but with unlimited backtracking.
|
||||||
|
The `PLexer` class tokenizes the entire input before parsing starts.
|
||||||
|
We do not run the C preprocessor.
|
||||||
|
Each parsing method returns either an AST node (a `Node` instance)
|
||||||
|
or `None`, or raises `SyntaxError` (showing the error in the C source).
|
||||||
|
|
||||||
|
Most parsing methods are decorated with `@contextual`, which automatically
|
||||||
|
resets the tokenizer input position when `None` is returned.
|
||||||
|
Parsing methods may also raise `SyntaxError`, which is irrecoverable.
|
||||||
|
When a parsing method returns `None`, it is possible that after backtracking
|
||||||
|
a different parsing method returns a valid AST.
|
||||||
|
|
||||||
|
Neither the lexer nor the parsers are complete or fully correct.
|
||||||
|
Most known issues are tersely indicated by `# TODO:` comments.
|
||||||
|
We plan to fix issues as they become relevant.
|
|
@ -0,0 +1,85 @@
|
||||||
|
#include "Python.h"
|
||||||
|
#include "pycore_abstract.h" // _PyIndex_Check()
|
||||||
|
#include "pycore_call.h" // _PyObject_FastCallDictTstate()
|
||||||
|
#include "pycore_ceval.h" // _PyEval_SignalAsyncExc()
|
||||||
|
#include "pycore_code.h"
|
||||||
|
#include "pycore_function.h"
|
||||||
|
#include "pycore_long.h" // _PyLong_GetZero()
|
||||||
|
#include "pycore_object.h" // _PyObject_GC_TRACK()
|
||||||
|
#include "pycore_moduleobject.h" // PyModuleObject
|
||||||
|
#include "pycore_opcode.h" // EXTRA_CASES
|
||||||
|
#include "pycore_pyerrors.h" // _PyErr_Fetch()
|
||||||
|
#include "pycore_pymem.h" // _PyMem_IsPtrFreed()
|
||||||
|
#include "pycore_pystate.h" // _PyInterpreterState_GET()
|
||||||
|
#include "pycore_range.h" // _PyRangeIterObject
|
||||||
|
#include "pycore_sliceobject.h" // _PyBuildSlice_ConsumeRefs
|
||||||
|
#include "pycore_sysmodule.h" // _PySys_Audit()
|
||||||
|
#include "pycore_tuple.h" // _PyTuple_ITEMS()
|
||||||
|
#include "pycore_emscripten_signal.h" // _Py_CHECK_EMSCRIPTEN_SIGNALS
|
||||||
|
|
||||||
|
#include "pycore_dict.h"
|
||||||
|
#include "dictobject.h"
|
||||||
|
#include "pycore_frame.h"
|
||||||
|
#include "opcode.h"
|
||||||
|
#include "pydtrace.h"
|
||||||
|
#include "setobject.h"
|
||||||
|
#include "structmember.h" // struct PyMemberDef, T_OFFSET_EX
|
||||||
|
|
||||||
|
void _PyFloat_ExactDealloc(PyObject *);
|
||||||
|
void _PyUnicode_ExactDealloc(PyObject *);
|
||||||
|
|
||||||
|
#define SET_TOP(v) (stack_pointer[-1] = (v))
|
||||||
|
#define PEEK(n) (stack_pointer[-(n)])
|
||||||
|
|
||||||
|
#define GETLOCAL(i) (frame->localsplus[i])
|
||||||
|
|
||||||
|
#define inst(name) case name:
|
||||||
|
#define family(name) static int family_##name
|
||||||
|
|
||||||
|
#define NAME_ERROR_MSG \
|
||||||
|
"name '%.200s' is not defined"
|
||||||
|
|
||||||
|
typedef struct {
|
||||||
|
PyObject *kwnames;
|
||||||
|
} CallShape;
|
||||||
|
|
||||||
|
static void
|
||||||
|
dummy_func(
|
||||||
|
PyThreadState *tstate,
|
||||||
|
_PyInterpreterFrame *frame,
|
||||||
|
unsigned char opcode,
|
||||||
|
unsigned int oparg,
|
||||||
|
_Py_atomic_int * const eval_breaker,
|
||||||
|
_PyCFrame cframe,
|
||||||
|
PyObject *names,
|
||||||
|
PyObject *consts,
|
||||||
|
_Py_CODEUNIT *next_instr,
|
||||||
|
PyObject **stack_pointer,
|
||||||
|
CallShape call_shape,
|
||||||
|
_Py_CODEUNIT *first_instr,
|
||||||
|
int throwflag,
|
||||||
|
binaryfunc binary_ops[]
|
||||||
|
)
|
||||||
|
{
|
||||||
|
switch (opcode) {
|
||||||
|
|
||||||
|
/* BEWARE!
|
||||||
|
It is essential that any operation that fails must goto error
|
||||||
|
and that all operation that succeed call DISPATCH() ! */
|
||||||
|
|
||||||
|
// BEGIN BYTECODES //
|
||||||
|
// INSERT CASES HERE //
|
||||||
|
// END BYTECODES //
|
||||||
|
|
||||||
|
}
|
||||||
|
error:;
|
||||||
|
exception_unwind:;
|
||||||
|
handle_eval_breaker:;
|
||||||
|
resume_frame:;
|
||||||
|
resume_with_error:;
|
||||||
|
start_frame:;
|
||||||
|
unbound_local_error:;
|
||||||
|
}
|
||||||
|
|
||||||
|
// Families go below this point //
|
||||||
|
|
|
@ -0,0 +1,247 @@
|
||||||
|
"""Extract the main interpreter switch cases."""
|
||||||
|
|
||||||
|
# Reads cases from ceval.c, writes to bytecodes.c.
|
||||||
|
# (This file is not meant to be compiled, but it has a .c extension
|
||||||
|
# so tooling like VS Code can be fooled into thinking it is C code.
|
||||||
|
# This helps editing and browsing the code.)
|
||||||
|
#
|
||||||
|
# The script generate_cases.py regenerates the cases.
|
||||||
|
|
||||||
|
import argparse
|
||||||
|
import difflib
|
||||||
|
import dis
|
||||||
|
import re
|
||||||
|
import sys
|
||||||
|
|
||||||
|
parser = argparse.ArgumentParser()
|
||||||
|
parser.add_argument("-i", "--input", type=str, default="Python/ceval.c")
|
||||||
|
parser.add_argument("-o", "--output", type=str, default="Python/bytecodes.c")
|
||||||
|
parser.add_argument("-t", "--template", type=str, default="Tools/cases_generator/bytecodes_template.c")
|
||||||
|
parser.add_argument("-c", "--compare", action="store_true")
|
||||||
|
parser.add_argument("-q", "--quiet", action="store_true")
|
||||||
|
|
||||||
|
|
||||||
|
inverse_specializations = {
|
||||||
|
specname: familyname
|
||||||
|
for familyname, specnames in dis._specializations.items()
|
||||||
|
for specname in specnames
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
def eopen(filename, mode="r"):
|
||||||
|
if filename == "-":
|
||||||
|
if "r" in mode:
|
||||||
|
return sys.stdin
|
||||||
|
else:
|
||||||
|
return sys.stdout
|
||||||
|
return open(filename, mode)
|
||||||
|
|
||||||
|
|
||||||
|
def leading_whitespace(line):
|
||||||
|
return len(line) - len(line.lstrip())
|
||||||
|
|
||||||
|
|
||||||
|
def extract_opcode_name(line):
|
||||||
|
m = re.match(r"\A\s*TARGET\((\w+)\)\s*{\s*\Z", line)
|
||||||
|
if m:
|
||||||
|
opcode_name = m.group(1)
|
||||||
|
if opcode_name not in dis._all_opmap:
|
||||||
|
raise ValueError(f"error: unknown opcode {opcode_name}")
|
||||||
|
return opcode_name
|
||||||
|
raise ValueError(f"error: no opcode in {line.strip()}")
|
||||||
|
|
||||||
|
|
||||||
|
def figure_stack_effect(opcode_name):
|
||||||
|
# Return (i, diff``) where i is the stack effect for oparg=0
|
||||||
|
# and diff is the increment for oparg=1.
|
||||||
|
# If it is irregular or unknown, raise ValueError.
|
||||||
|
if m := re.match(f"^(\w+)__(\w+)$", opcode_name):
|
||||||
|
# Super-instruction adds effect of both parts
|
||||||
|
first, second = m.groups()
|
||||||
|
se1, incr1 = figure_stack_effect(first)
|
||||||
|
se2, incr2 = figure_stack_effect(second)
|
||||||
|
if incr1 or incr2:
|
||||||
|
raise ValueError(f"irregular stack effect for {opcode_name}")
|
||||||
|
return se1 + se2, 0
|
||||||
|
if opcode_name in inverse_specializations:
|
||||||
|
# Specialized instruction maps to unspecialized instruction
|
||||||
|
opcode_name = inverse_specializations[opcode_name]
|
||||||
|
opcode = dis._all_opmap[opcode_name]
|
||||||
|
if opcode in dis.hasarg:
|
||||||
|
try:
|
||||||
|
se = dis.stack_effect(opcode, 0)
|
||||||
|
except ValueError as err:
|
||||||
|
raise ValueError(f"{err} for {opcode_name}")
|
||||||
|
if dis.stack_effect(opcode, 0, jump=True) != se:
|
||||||
|
raise ValueError(f"{opcode_name} stack effect depends on jump flag")
|
||||||
|
if dis.stack_effect(opcode, 0, jump=False) != se:
|
||||||
|
raise ValueError(f"{opcode_name} stack effect depends on jump flag")
|
||||||
|
for i in range(1, 257):
|
||||||
|
if dis.stack_effect(opcode, i) != se:
|
||||||
|
return figure_variable_stack_effect(opcode_name, opcode, se)
|
||||||
|
else:
|
||||||
|
try:
|
||||||
|
se = dis.stack_effect(opcode)
|
||||||
|
except ValueError as err:
|
||||||
|
raise ValueError(f"{err} for {opcode_name}")
|
||||||
|
if dis.stack_effect(opcode, jump=True) != se:
|
||||||
|
raise ValueError(f"{opcode_name} stack effect depends on jump flag")
|
||||||
|
if dis.stack_effect(opcode, jump=False) != se:
|
||||||
|
raise ValueError(f"{opcode_name} stack effect depends on jump flag")
|
||||||
|
return se, 0
|
||||||
|
|
||||||
|
|
||||||
|
def figure_variable_stack_effect(opcode_name, opcode, se0):
|
||||||
|
# Is it a linear progression?
|
||||||
|
se1 = dis.stack_effect(opcode, 1)
|
||||||
|
diff = se1 - se0
|
||||||
|
for i in range(2, 257):
|
||||||
|
sei = dis.stack_effect(opcode, i)
|
||||||
|
if sei - se0 != diff * i:
|
||||||
|
raise ValueError(f"{opcode_name} has irregular stack effect")
|
||||||
|
# Assume it's okay for larger oparg values too
|
||||||
|
return se0, diff
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
START_MARKER = "/* Start instructions */" # The '{' is on the preceding line.
|
||||||
|
END_MARKER = "/* End regular instructions */"
|
||||||
|
|
||||||
|
def read_cases(f):
|
||||||
|
cases = []
|
||||||
|
case = None
|
||||||
|
started = False
|
||||||
|
# TODO: Count line numbers
|
||||||
|
for line in f:
|
||||||
|
stripped = line.strip()
|
||||||
|
if not started:
|
||||||
|
if stripped == START_MARKER:
|
||||||
|
started = True
|
||||||
|
continue
|
||||||
|
if stripped == END_MARKER:
|
||||||
|
break
|
||||||
|
if stripped.startswith("TARGET("):
|
||||||
|
if case:
|
||||||
|
cases.append(case)
|
||||||
|
indent = " " * leading_whitespace(line)
|
||||||
|
case = ""
|
||||||
|
opcode_name = extract_opcode_name(line)
|
||||||
|
try:
|
||||||
|
se, diff = figure_stack_effect(opcode_name)
|
||||||
|
except ValueError as err:
|
||||||
|
case += f"{indent}// error: {err}\n"
|
||||||
|
case += f"{indent}inst({opcode_name}) {{\n"
|
||||||
|
else:
|
||||||
|
inputs = []
|
||||||
|
outputs = []
|
||||||
|
if se > 0:
|
||||||
|
for i in range(se):
|
||||||
|
outputs.append(f"__{i}")
|
||||||
|
elif se < 0:
|
||||||
|
for i in range(-se):
|
||||||
|
inputs.append(f"__{i}")
|
||||||
|
if diff > 0:
|
||||||
|
if diff == 1:
|
||||||
|
outputs.append(f"__array[oparg]")
|
||||||
|
else:
|
||||||
|
outputs.append(f"__array[oparg*{diff}]")
|
||||||
|
elif diff < 0:
|
||||||
|
if diff == -1:
|
||||||
|
inputs.append(f"__array[oparg]")
|
||||||
|
else:
|
||||||
|
inputs.append(f"__array[oparg*{-diff}]")
|
||||||
|
input = ", ".join(inputs)
|
||||||
|
output = ", ".join(outputs)
|
||||||
|
case += f"{indent}// stack effect: ({input} -- {output})\n"
|
||||||
|
case += f"{indent}inst({opcode_name}) {{\n"
|
||||||
|
else:
|
||||||
|
if case:
|
||||||
|
case += line
|
||||||
|
if case:
|
||||||
|
cases.append(case)
|
||||||
|
return cases
|
||||||
|
|
||||||
|
|
||||||
|
def write_cases(f, cases):
|
||||||
|
for case in cases:
|
||||||
|
caselines = case.splitlines()
|
||||||
|
while caselines[-1].strip() == "":
|
||||||
|
caselines.pop()
|
||||||
|
if caselines[-1].strip() == "}":
|
||||||
|
caselines.pop()
|
||||||
|
else:
|
||||||
|
raise ValueError("case does not end with '}'")
|
||||||
|
if caselines[-1].strip() == "DISPATCH();":
|
||||||
|
caselines.pop()
|
||||||
|
caselines.append(" }")
|
||||||
|
case = "\n".join(caselines)
|
||||||
|
print(case + "\n", file=f)
|
||||||
|
|
||||||
|
|
||||||
|
def write_families(f):
|
||||||
|
for opcode, specializations in dis._specializations.items():
|
||||||
|
all = [opcode] + specializations
|
||||||
|
if len(all) <= 3:
|
||||||
|
members = ', '.join(all)
|
||||||
|
print(f"family({opcode.lower()}) = {{ {members} }};", file=f)
|
||||||
|
else:
|
||||||
|
print(f"family({opcode.lower()}) = {{", file=f)
|
||||||
|
for i in range(0, len(all), 3):
|
||||||
|
members = ', '.join(all[i:i+3])
|
||||||
|
if i+3 < len(all):
|
||||||
|
print(f" {members},", file=f)
|
||||||
|
else:
|
||||||
|
print(f" {members} }};", file=f)
|
||||||
|
|
||||||
|
|
||||||
|
def compare(oldfile, newfile, quiet=False):
|
||||||
|
with open(oldfile) as f:
|
||||||
|
oldlines = f.readlines()
|
||||||
|
for top, line in enumerate(oldlines):
|
||||||
|
if line.strip() == START_MARKER:
|
||||||
|
break
|
||||||
|
else:
|
||||||
|
print(f"No start marker found in {oldfile}", file=sys.stderr)
|
||||||
|
return
|
||||||
|
del oldlines[:top]
|
||||||
|
for bottom, line in enumerate(oldlines):
|
||||||
|
if line.strip() == END_MARKER:
|
||||||
|
break
|
||||||
|
else:
|
||||||
|
print(f"No end marker found in {oldfile}", file=sys.stderr)
|
||||||
|
return
|
||||||
|
del oldlines[bottom:]
|
||||||
|
if not quiet:
|
||||||
|
print(
|
||||||
|
f"// {oldfile} has {len(oldlines)} lines after stripping top/bottom",
|
||||||
|
file=sys.stderr,
|
||||||
|
)
|
||||||
|
with open(newfile) as f:
|
||||||
|
newlines = f.readlines()
|
||||||
|
if not quiet:
|
||||||
|
print(f"// {newfile} has {len(newlines)} lines", file=sys.stderr)
|
||||||
|
for line in difflib.unified_diff(oldlines, newlines, fromfile=oldfile, tofile=newfile):
|
||||||
|
sys.stdout.write(line)
|
||||||
|
|
||||||
|
|
||||||
|
def main():
|
||||||
|
args = parser.parse_args()
|
||||||
|
with eopen(args.input) as f:
|
||||||
|
cases = read_cases(f)
|
||||||
|
with open(args.template) as f:
|
||||||
|
prolog, epilog = f.read().split("// INSERT CASES HERE //", 1)
|
||||||
|
if not args.quiet:
|
||||||
|
print(f"// Read {len(cases)} cases from {args.input}", file=sys.stderr)
|
||||||
|
with eopen(args.output, "w") as f:
|
||||||
|
f.write(prolog)
|
||||||
|
write_cases(f, cases)
|
||||||
|
f.write(epilog)
|
||||||
|
write_families(f)
|
||||||
|
if not args.quiet:
|
||||||
|
print(f"// Wrote {len(cases)} cases to {args.output}", file=sys.stderr)
|
||||||
|
if args.compare:
|
||||||
|
compare(args.input, args.output, args.quiet)
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
main()
|
|
@ -0,0 +1,125 @@
|
||||||
|
"""Generate the main interpreter switch."""
|
||||||
|
|
||||||
|
# Write the cases to generated_cases.c.h, which is #included in ceval.c.
|
||||||
|
|
||||||
|
# TODO: Reuse C generation framework from deepfreeze.py?
|
||||||
|
|
||||||
|
import argparse
|
||||||
|
import io
|
||||||
|
import sys
|
||||||
|
|
||||||
|
import parser
|
||||||
|
from parser import InstDef
|
||||||
|
|
||||||
|
arg_parser = argparse.ArgumentParser()
|
||||||
|
arg_parser.add_argument("-i", "--input", type=str, default="Python/bytecodes.c")
|
||||||
|
arg_parser.add_argument("-o", "--output", type=str, default="Python/generated_cases.c.h")
|
||||||
|
arg_parser.add_argument("-c", "--compare", action="store_true")
|
||||||
|
arg_parser.add_argument("-q", "--quiet", action="store_true")
|
||||||
|
|
||||||
|
|
||||||
|
def eopen(filename: str, mode: str = "r"):
|
||||||
|
if filename == "-":
|
||||||
|
if "r" in mode:
|
||||||
|
return sys.stdin
|
||||||
|
else:
|
||||||
|
return sys.stdout
|
||||||
|
return open(filename, mode)
|
||||||
|
|
||||||
|
|
||||||
|
def parse_cases(src: str, filename: str|None = None) -> tuple[list[InstDef], list[parser.Family]]:
|
||||||
|
psr = parser.Parser(src, filename=filename)
|
||||||
|
instrs: list[InstDef] = []
|
||||||
|
families: list[parser.Family] = []
|
||||||
|
while not psr.eof():
|
||||||
|
if inst := psr.inst_def():
|
||||||
|
assert inst.block
|
||||||
|
instrs.append(InstDef(inst.name, inst.inputs, inst.outputs, inst.block))
|
||||||
|
elif fam := psr.family_def():
|
||||||
|
families.append(fam)
|
||||||
|
else:
|
||||||
|
raise psr.make_syntax_error(f"Unexpected token")
|
||||||
|
return instrs, families
|
||||||
|
|
||||||
|
|
||||||
|
def always_exits(block: parser.Block) -> bool:
|
||||||
|
text = block.text
|
||||||
|
lines = text.splitlines()
|
||||||
|
while lines and not lines[-1].strip():
|
||||||
|
lines.pop()
|
||||||
|
if not lines or lines[-1].strip() != "}":
|
||||||
|
return False
|
||||||
|
lines.pop()
|
||||||
|
if not lines:
|
||||||
|
return False
|
||||||
|
line = lines.pop().rstrip()
|
||||||
|
# Indent must match exactly (TODO: Do something better)
|
||||||
|
if line[:12] != " "*12:
|
||||||
|
return False
|
||||||
|
line = line[12:]
|
||||||
|
return line.startswith(("goto ", "return ", "DISPATCH", "GO_TO_", "Py_UNREACHABLE()"))
|
||||||
|
|
||||||
|
|
||||||
|
def write_cases(f: io.TextIOBase, instrs: list[InstDef]):
|
||||||
|
indent = " "
|
||||||
|
f.write("// This file is generated by Tools/scripts/generate_cases.py\n")
|
||||||
|
f.write("// Do not edit!\n")
|
||||||
|
for instr in instrs:
|
||||||
|
assert isinstance(instr, InstDef)
|
||||||
|
f.write(f"\n{indent}TARGET({instr.name}) {{\n")
|
||||||
|
# input = ", ".join(instr.inputs)
|
||||||
|
# output = ", ".join(instr.outputs)
|
||||||
|
# f.write(f"{indent} // {input} -- {output}\n")
|
||||||
|
assert instr.block
|
||||||
|
blocklines = instr.block.text.splitlines(True)
|
||||||
|
# Remove blank lines from ends
|
||||||
|
while blocklines and not blocklines[0].strip():
|
||||||
|
blocklines.pop(0)
|
||||||
|
while blocklines and not blocklines[-1].strip():
|
||||||
|
blocklines.pop()
|
||||||
|
# Remove leading '{' and trailing '}'
|
||||||
|
assert blocklines and blocklines[0].strip() == "{"
|
||||||
|
assert blocklines and blocklines[-1].strip() == "}"
|
||||||
|
blocklines.pop()
|
||||||
|
blocklines.pop(0)
|
||||||
|
# Remove trailing blank lines
|
||||||
|
while blocklines and not blocklines[-1].strip():
|
||||||
|
blocklines.pop()
|
||||||
|
# Write the body
|
||||||
|
for line in blocklines:
|
||||||
|
f.write(line)
|
||||||
|
assert instr.block
|
||||||
|
if not always_exits(instr.block):
|
||||||
|
f.write(f"{indent} DISPATCH();\n")
|
||||||
|
# Write trailing '}'
|
||||||
|
f.write(f"{indent}}}\n")
|
||||||
|
|
||||||
|
|
||||||
|
def main():
|
||||||
|
args = arg_parser.parse_args()
|
||||||
|
with eopen(args.input) as f:
|
||||||
|
srclines = f.read().splitlines()
|
||||||
|
begin = srclines.index("// BEGIN BYTECODES //")
|
||||||
|
end = srclines.index("// END BYTECODES //")
|
||||||
|
src = "\n".join(srclines[begin+1 : end])
|
||||||
|
instrs, families = parse_cases(src, filename=args.input)
|
||||||
|
ninstrs = nfamilies = 0
|
||||||
|
if not args.quiet:
|
||||||
|
ninstrs = len(instrs)
|
||||||
|
nfamilies = len(families)
|
||||||
|
print(
|
||||||
|
f"Read {ninstrs} instructions "
|
||||||
|
f"and {nfamilies} families from {args.input}",
|
||||||
|
file=sys.stderr,
|
||||||
|
)
|
||||||
|
with eopen(args.output, "w") as f:
|
||||||
|
write_cases(f, instrs)
|
||||||
|
if not args.quiet:
|
||||||
|
print(
|
||||||
|
f"Wrote {ninstrs} instructions to {args.output}",
|
||||||
|
file=sys.stderr,
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
main()
|
|
@ -0,0 +1,257 @@
|
||||||
|
# Parser for C code
|
||||||
|
# Originally by Mark Shannon (mark@hotpy.org)
|
||||||
|
# https://gist.github.com/markshannon/db7ab649440b5af765451bb77c7dba34
|
||||||
|
|
||||||
|
import re
|
||||||
|
import sys
|
||||||
|
import collections
|
||||||
|
from dataclasses import dataclass
|
||||||
|
|
||||||
|
def choice(*opts):
|
||||||
|
return "|".join("(%s)" % opt for opt in opts)
|
||||||
|
|
||||||
|
# Regexes
|
||||||
|
|
||||||
|
# Longer operators must go before shorter ones.
|
||||||
|
|
||||||
|
PLUSPLUS = r'\+\+'
|
||||||
|
MINUSMINUS = r'--'
|
||||||
|
|
||||||
|
# ->
|
||||||
|
ARROW = r'->'
|
||||||
|
ELLIPSIS = r'\.\.\.'
|
||||||
|
|
||||||
|
# Assignment operators
|
||||||
|
TIMESEQUAL = r'\*='
|
||||||
|
DIVEQUAL = r'/='
|
||||||
|
MODEQUAL = r'%='
|
||||||
|
PLUSEQUAL = r'\+='
|
||||||
|
MINUSEQUAL = r'-='
|
||||||
|
LSHIFTEQUAL = r'<<='
|
||||||
|
RSHIFTEQUAL = r'>>='
|
||||||
|
ANDEQUAL = r'&='
|
||||||
|
OREQUAL = r'\|='
|
||||||
|
XOREQUAL = r'\^='
|
||||||
|
|
||||||
|
# Operators
|
||||||
|
PLUS = r'\+'
|
||||||
|
MINUS = r'-'
|
||||||
|
TIMES = r'\*'
|
||||||
|
DIVIDE = r'/'
|
||||||
|
MOD = r'%'
|
||||||
|
NOT = r'~'
|
||||||
|
XOR = r'\^'
|
||||||
|
LOR = r'\|\|'
|
||||||
|
LAND = r'&&'
|
||||||
|
LSHIFT = r'<<'
|
||||||
|
RSHIFT = r'>>'
|
||||||
|
LE = r'<='
|
||||||
|
GE = r'>='
|
||||||
|
EQ = r'=='
|
||||||
|
NE = r'!='
|
||||||
|
LT = r'<'
|
||||||
|
GT = r'>'
|
||||||
|
LNOT = r'!'
|
||||||
|
OR = r'\|'
|
||||||
|
AND = r'&'
|
||||||
|
EQUALS = r'='
|
||||||
|
|
||||||
|
# ?
|
||||||
|
CONDOP = r'\?'
|
||||||
|
|
||||||
|
# Delimiters
|
||||||
|
LPAREN = r'\('
|
||||||
|
RPAREN = r'\)'
|
||||||
|
LBRACKET = r'\['
|
||||||
|
RBRACKET = r'\]'
|
||||||
|
LBRACE = r'\{'
|
||||||
|
RBRACE = r'\}'
|
||||||
|
COMMA = r','
|
||||||
|
PERIOD = r'\.'
|
||||||
|
SEMI = r';'
|
||||||
|
COLON = r':'
|
||||||
|
BACKSLASH = r'\\'
|
||||||
|
|
||||||
|
operators = { op: pattern for op, pattern in globals().items() if op == op.upper() }
|
||||||
|
for op in operators:
|
||||||
|
globals()[op] = op
|
||||||
|
opmap = { pattern.replace("\\", "") or '\\' : op for op, pattern in operators.items() }
|
||||||
|
|
||||||
|
# Macros
|
||||||
|
macro = r'# *(ifdef|ifndef|undef|define|error|endif|if|else|include|#)'
|
||||||
|
MACRO = 'MACRO'
|
||||||
|
|
||||||
|
id_re = r'[a-zA-Z_][0-9a-zA-Z_]*'
|
||||||
|
IDENTIFIER = 'IDENTIFIER'
|
||||||
|
|
||||||
|
suffix = r'([uU]?[lL]?[lL]?)'
|
||||||
|
octal = r'0[0-7]+' + suffix
|
||||||
|
hex = r'0[xX][0-9a-fA-F]+'
|
||||||
|
decimal_digits = r'(0|[1-9][0-9]*)'
|
||||||
|
decimal = decimal_digits + suffix
|
||||||
|
|
||||||
|
|
||||||
|
exponent = r"""([eE][-+]?[0-9]+)"""
|
||||||
|
fraction = r"""([0-9]*\.[0-9]+)|([0-9]+\.)"""
|
||||||
|
float = '(((('+fraction+')'+exponent+'?)|([0-9]+'+exponent+'))[FfLl]?)'
|
||||||
|
|
||||||
|
number_re = choice(octal, hex, float, decimal)
|
||||||
|
NUMBER = 'NUMBER'
|
||||||
|
|
||||||
|
simple_escape = r"""([a-zA-Z._~!=&\^\-\\?'"])"""
|
||||||
|
decimal_escape = r"""(\d+)"""
|
||||||
|
hex_escape = r"""(x[0-9a-fA-F]+)"""
|
||||||
|
escape_sequence = r"""(\\("""+simple_escape+'|'+decimal_escape+'|'+hex_escape+'))'
|
||||||
|
string_char = r"""([^"\\\n]|"""+escape_sequence+')'
|
||||||
|
str_re = '"'+string_char+'*"'
|
||||||
|
STRING = 'STRING'
|
||||||
|
char = r'\'.\'' # TODO: escape sequence
|
||||||
|
CHARACTER = 'CHARACTER'
|
||||||
|
|
||||||
|
comment_re = r'//.*|/\*([^*]|\*[^/])*\*/'
|
||||||
|
COMMENT = 'COMMENT'
|
||||||
|
|
||||||
|
newline = r"\n"
|
||||||
|
matcher = re.compile(choice(id_re, number_re, str_re, char, newline, macro, comment_re, *operators.values()))
|
||||||
|
letter = re.compile(r'[a-zA-Z_]')
|
||||||
|
|
||||||
|
keywords = (
|
||||||
|
'AUTO', 'BREAK', 'CASE', 'CHAR', 'CONST',
|
||||||
|
'CONTINUE', 'DEFAULT', 'DO', 'DOUBLE', 'ELSE', 'ENUM', 'EXTERN',
|
||||||
|
'FLOAT', 'FOR', 'GOTO', 'IF', 'INLINE', 'INT', 'LONG',
|
||||||
|
'REGISTER', 'OFFSETOF',
|
||||||
|
'RESTRICT', 'RETURN', 'SHORT', 'SIGNED', 'SIZEOF', 'STATIC', 'STRUCT',
|
||||||
|
'SWITCH', 'TYPEDEF', 'UNION', 'UNSIGNED', 'VOID',
|
||||||
|
'VOLATILE', 'WHILE'
|
||||||
|
)
|
||||||
|
for name in keywords:
|
||||||
|
globals()[name] = name
|
||||||
|
keywords = { name.lower() : name for name in keywords }
|
||||||
|
|
||||||
|
|
||||||
|
def make_syntax_error(
|
||||||
|
message: str, filename: str, line: int, column: int, line_text: str,
|
||||||
|
) -> SyntaxError:
|
||||||
|
return SyntaxError(message, (filename, line, column, line_text))
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass(slots=True)
|
||||||
|
class Token:
|
||||||
|
kind: str
|
||||||
|
text: str
|
||||||
|
begin: tuple[int, int]
|
||||||
|
end: tuple[int, int]
|
||||||
|
|
||||||
|
@property
|
||||||
|
def line(self):
|
||||||
|
return self.begin[0]
|
||||||
|
|
||||||
|
@property
|
||||||
|
def column(self):
|
||||||
|
return self.begin[1]
|
||||||
|
|
||||||
|
@property
|
||||||
|
def end_line(self):
|
||||||
|
return self.end[0]
|
||||||
|
|
||||||
|
@property
|
||||||
|
def end_column(self):
|
||||||
|
return self.end[1]
|
||||||
|
|
||||||
|
@property
|
||||||
|
def width(self):
|
||||||
|
return self.end[1] - self.begin[1]
|
||||||
|
|
||||||
|
def replaceText(self, txt):
|
||||||
|
assert isinstance(txt, str)
|
||||||
|
return Token(self.kind, txt, self.begin, self.end)
|
||||||
|
|
||||||
|
def __repr__(self):
|
||||||
|
b0, b1 = self.begin
|
||||||
|
e0, e1 = self.end
|
||||||
|
if b0 == e0:
|
||||||
|
return f"{self.kind}({self.text!r}, {b0}:{b1}:{e1})"
|
||||||
|
else:
|
||||||
|
return f"{self.kind}({self.text!r}, {b0}:{b1}, {e0}:{e1})"
|
||||||
|
|
||||||
|
|
||||||
|
def tokenize(src, line=1, filename=None):
|
||||||
|
linestart = -1
|
||||||
|
# TODO: finditer() skips over unrecognized characters, e.g. '@'
|
||||||
|
for m in matcher.finditer(src):
|
||||||
|
start, end = m.span()
|
||||||
|
text = m.group(0)
|
||||||
|
if text in keywords:
|
||||||
|
kind = keywords[text]
|
||||||
|
elif letter.match(text):
|
||||||
|
kind = IDENTIFIER
|
||||||
|
elif text == '...':
|
||||||
|
kind = ELLIPSIS
|
||||||
|
elif text == '.':
|
||||||
|
kind = PERIOD
|
||||||
|
elif text[0] in '0123456789.':
|
||||||
|
kind = NUMBER
|
||||||
|
elif text[0] == '"':
|
||||||
|
kind = STRING
|
||||||
|
elif text in opmap:
|
||||||
|
kind = opmap[text]
|
||||||
|
elif text == '\n':
|
||||||
|
linestart = start
|
||||||
|
line += 1
|
||||||
|
kind = '\n'
|
||||||
|
elif text[0] == "'":
|
||||||
|
kind = CHARACTER
|
||||||
|
elif text[0] == '#':
|
||||||
|
kind = MACRO
|
||||||
|
elif text[0] == '/' and text[1] in '/*':
|
||||||
|
kind = COMMENT
|
||||||
|
else:
|
||||||
|
lineend = src.find("\n", start)
|
||||||
|
if lineend == -1:
|
||||||
|
lineend = len(src)
|
||||||
|
raise make_syntax_error(f"Bad token: {text}",
|
||||||
|
filename, line, start-linestart+1, src[linestart:lineend])
|
||||||
|
if kind == COMMENT:
|
||||||
|
begin = line, start-linestart
|
||||||
|
newlines = text.count('\n')
|
||||||
|
if newlines:
|
||||||
|
linestart = start + text.rfind('\n')
|
||||||
|
line += newlines
|
||||||
|
else:
|
||||||
|
begin = line, start-linestart
|
||||||
|
if kind != "\n":
|
||||||
|
yield Token(kind, text, begin, (line, start-linestart+len(text)))
|
||||||
|
|
||||||
|
|
||||||
|
__all__ = []
|
||||||
|
__all__.extend([kind for kind in globals() if kind.upper() == kind])
|
||||||
|
|
||||||
|
|
||||||
|
def to_text(tkns: list[Token], dedent: int = 0) -> str:
|
||||||
|
res: list[str] = []
|
||||||
|
line, col = -1, 1+dedent
|
||||||
|
for tkn in tkns:
|
||||||
|
if line == -1:
|
||||||
|
line, _ = tkn.begin
|
||||||
|
l, c = tkn.begin
|
||||||
|
#assert(l >= line), (line, txt, start, end)
|
||||||
|
while l > line:
|
||||||
|
line += 1
|
||||||
|
res.append('\n')
|
||||||
|
col = 1+dedent
|
||||||
|
res.append(' '*(c-col))
|
||||||
|
res.append(tkn.text)
|
||||||
|
line, col = tkn.end
|
||||||
|
return ''.join(res)
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
import sys
|
||||||
|
filename = sys.argv[1]
|
||||||
|
if filename == "-c":
|
||||||
|
src = sys.argv[2]
|
||||||
|
else:
|
||||||
|
src = open(filename).read()
|
||||||
|
# print(to_text(tokenize(src)))
|
||||||
|
for tkn in tokenize(src, filename=filename):
|
||||||
|
print(tkn)
|
|
@ -0,0 +1,222 @@
|
||||||
|
"""Parser for bytecodes.inst."""
|
||||||
|
|
||||||
|
from dataclasses import dataclass, field
|
||||||
|
from typing import NamedTuple, Callable, TypeVar
|
||||||
|
|
||||||
|
import lexer as lx
|
||||||
|
from plexer import PLexer
|
||||||
|
|
||||||
|
|
||||||
|
P = TypeVar("P", bound="Parser")
|
||||||
|
N = TypeVar("N", bound="Node")
|
||||||
|
def contextual(func: Callable[[P], N|None]) -> Callable[[P], N|None]:
|
||||||
|
# Decorator to wrap grammar methods.
|
||||||
|
# Resets position if `func` returns None.
|
||||||
|
def contextual_wrapper(self: P) -> N|None:
|
||||||
|
begin = self.getpos()
|
||||||
|
res = func(self)
|
||||||
|
if res is None:
|
||||||
|
self.setpos(begin)
|
||||||
|
return
|
||||||
|
end = self.getpos()
|
||||||
|
res.context = Context(begin, end, self)
|
||||||
|
return res
|
||||||
|
return contextual_wrapper
|
||||||
|
|
||||||
|
|
||||||
|
class Context(NamedTuple):
|
||||||
|
begin: int
|
||||||
|
end: int
|
||||||
|
owner: PLexer
|
||||||
|
|
||||||
|
def __repr__(self):
|
||||||
|
return f"<{self.begin}-{self.end}>"
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class Node:
|
||||||
|
context: Context|None = field(init=False, default=None)
|
||||||
|
|
||||||
|
@property
|
||||||
|
def text(self) -> str:
|
||||||
|
context = self.context
|
||||||
|
if not context:
|
||||||
|
return ""
|
||||||
|
tokens = context.owner.tokens
|
||||||
|
begin = context.begin
|
||||||
|
end = context.end
|
||||||
|
return lx.to_text(tokens[begin:end])
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class Block(Node):
|
||||||
|
tokens: list[lx.Token]
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class InstDef(Node):
|
||||||
|
name: str
|
||||||
|
inputs: list[str] | None
|
||||||
|
outputs: list[str] | None
|
||||||
|
block: Block | None
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class Family(Node):
|
||||||
|
name: str
|
||||||
|
members: list[str]
|
||||||
|
|
||||||
|
|
||||||
|
class Parser(PLexer):
|
||||||
|
|
||||||
|
@contextual
|
||||||
|
def inst_def(self) -> InstDef | None:
|
||||||
|
if header := self.inst_header():
|
||||||
|
if block := self.block():
|
||||||
|
header.block = block
|
||||||
|
return header
|
||||||
|
raise self.make_syntax_error("Expected block")
|
||||||
|
return None
|
||||||
|
|
||||||
|
@contextual
|
||||||
|
def inst_header(self):
|
||||||
|
# inst(NAME) | inst(NAME, (inputs -- outputs))
|
||||||
|
# TODO: Error out when there is something unexpected.
|
||||||
|
# TODO: Make INST a keyword in the lexer.
|
||||||
|
if (tkn := self.expect(lx.IDENTIFIER)) and tkn.text == "inst":
|
||||||
|
if (self.expect(lx.LPAREN)
|
||||||
|
and (tkn := self.expect(lx.IDENTIFIER))):
|
||||||
|
name = tkn.text
|
||||||
|
if self.expect(lx.COMMA):
|
||||||
|
inp, outp = self.stack_effect()
|
||||||
|
if (self.expect(lx.RPAREN)
|
||||||
|
and self.peek().kind == lx.LBRACE):
|
||||||
|
return InstDef(name, inp, outp, [])
|
||||||
|
elif self.expect(lx.RPAREN):
|
||||||
|
return InstDef(name, None, None, [])
|
||||||
|
return None
|
||||||
|
|
||||||
|
def stack_effect(self):
|
||||||
|
# '(' [inputs] '--' [outputs] ')'
|
||||||
|
if self.expect(lx.LPAREN):
|
||||||
|
inp = self.inputs() or []
|
||||||
|
if self.expect(lx.MINUSMINUS):
|
||||||
|
outp = self.outputs() or []
|
||||||
|
if self.expect(lx.RPAREN):
|
||||||
|
return inp, outp
|
||||||
|
raise self.make_syntax_error("Expected stack effect")
|
||||||
|
|
||||||
|
def inputs(self):
|
||||||
|
# input (, input)*
|
||||||
|
here = self.getpos()
|
||||||
|
if inp := self.input():
|
||||||
|
near = self.getpos()
|
||||||
|
if self.expect(lx.COMMA):
|
||||||
|
if rest := self.inputs():
|
||||||
|
return [inp] + rest
|
||||||
|
self.setpos(near)
|
||||||
|
return [inp]
|
||||||
|
self.setpos(here)
|
||||||
|
return None
|
||||||
|
|
||||||
|
def input(self):
|
||||||
|
# IDENTIFIER
|
||||||
|
if (tkn := self.expect(lx.IDENTIFIER)):
|
||||||
|
if self.expect(lx.LBRACKET):
|
||||||
|
if arg := self.expect(lx.IDENTIFIER):
|
||||||
|
if self.expect(lx.RBRACKET):
|
||||||
|
return f"{tkn.text}[{arg.text}]"
|
||||||
|
if self.expect(lx.TIMES):
|
||||||
|
if num := self.expect(lx.NUMBER):
|
||||||
|
if self.expect(lx.RBRACKET):
|
||||||
|
return f"{tkn.text}[{arg.text}*{num.text}]"
|
||||||
|
raise self.make_syntax_error("Expected argument in brackets", tkn)
|
||||||
|
|
||||||
|
return tkn.text
|
||||||
|
if self.expect(lx.CONDOP):
|
||||||
|
while self.expect(lx.CONDOP):
|
||||||
|
pass
|
||||||
|
return "??"
|
||||||
|
return None
|
||||||
|
|
||||||
|
def outputs(self):
|
||||||
|
# output (, output)*
|
||||||
|
here = self.getpos()
|
||||||
|
if outp := self.output():
|
||||||
|
near = self.getpos()
|
||||||
|
if self.expect(lx.COMMA):
|
||||||
|
if rest := self.outputs():
|
||||||
|
return [outp] + rest
|
||||||
|
self.setpos(near)
|
||||||
|
return [outp]
|
||||||
|
self.setpos(here)
|
||||||
|
return None
|
||||||
|
|
||||||
|
def output(self):
|
||||||
|
return self.input() # TODO: They're not quite the same.
|
||||||
|
|
||||||
|
@contextual
|
||||||
|
def family_def(self) -> Family | None:
|
||||||
|
here = self.getpos()
|
||||||
|
if (tkn := self.expect(lx.IDENTIFIER)) and tkn.text == "family":
|
||||||
|
if self.expect(lx.LPAREN):
|
||||||
|
if (tkn := self.expect(lx.IDENTIFIER)):
|
||||||
|
name = tkn.text
|
||||||
|
if self.expect(lx.RPAREN):
|
||||||
|
if self.expect(lx.EQUALS):
|
||||||
|
if members := self.members():
|
||||||
|
if self.expect(lx.SEMI):
|
||||||
|
return Family(name, members)
|
||||||
|
return None
|
||||||
|
|
||||||
|
def members(self):
|
||||||
|
here = self.getpos()
|
||||||
|
if tkn := self.expect(lx.IDENTIFIER):
|
||||||
|
near = self.getpos()
|
||||||
|
if self.expect(lx.COMMA):
|
||||||
|
if rest := self.members():
|
||||||
|
return [tkn.text] + rest
|
||||||
|
self.setpos(near)
|
||||||
|
return [tkn.text]
|
||||||
|
self.setpos(here)
|
||||||
|
return None
|
||||||
|
|
||||||
|
@contextual
|
||||||
|
def block(self) -> Block:
|
||||||
|
tokens = self.c_blob()
|
||||||
|
return Block(tokens)
|
||||||
|
|
||||||
|
def c_blob(self):
|
||||||
|
tokens = []
|
||||||
|
level = 0
|
||||||
|
while tkn := self.next(raw=True):
|
||||||
|
if tkn.kind in (lx.LBRACE, lx.LPAREN, lx.LBRACKET):
|
||||||
|
level += 1
|
||||||
|
elif tkn.kind in (lx.RBRACE, lx.RPAREN, lx.RBRACKET):
|
||||||
|
level -= 1
|
||||||
|
if level <= 0:
|
||||||
|
break
|
||||||
|
tokens.append(tkn)
|
||||||
|
return tokens
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
import sys
|
||||||
|
if sys.argv[1:]:
|
||||||
|
filename = sys.argv[1]
|
||||||
|
if filename == "-c" and sys.argv[2:]:
|
||||||
|
src = sys.argv[2]
|
||||||
|
filename = None
|
||||||
|
else:
|
||||||
|
with open(filename) as f:
|
||||||
|
src = f.read()
|
||||||
|
srclines = src.splitlines()
|
||||||
|
begin = srclines.index("// BEGIN BYTECODES //")
|
||||||
|
end = srclines.index("// END BYTECODES //")
|
||||||
|
src = "\n".join(srclines[begin+1 : end])
|
||||||
|
else:
|
||||||
|
filename = None
|
||||||
|
src = "if (x) { x.foo; // comment\n}"
|
||||||
|
parser = Parser(src, filename)
|
||||||
|
x = parser.inst_def()
|
||||||
|
print(x)
|
|
@ -0,0 +1,104 @@
|
||||||
|
import lexer as lx
|
||||||
|
Token = lx.Token
|
||||||
|
|
||||||
|
|
||||||
|
class PLexer:
|
||||||
|
def __init__(self, src: str, filename: str|None = None):
|
||||||
|
self.src = src
|
||||||
|
self.filename = filename
|
||||||
|
self.tokens = list(lx.tokenize(self.src, filename=filename))
|
||||||
|
self.pos = 0
|
||||||
|
|
||||||
|
def getpos(self) -> int:
|
||||||
|
# Current position
|
||||||
|
return self.pos
|
||||||
|
|
||||||
|
def eof(self) -> bool:
|
||||||
|
# Are we at EOF?
|
||||||
|
return self.pos >= len(self.tokens)
|
||||||
|
|
||||||
|
def setpos(self, pos: int) -> None:
|
||||||
|
# Reset position
|
||||||
|
assert 0 <= pos <= len(self.tokens), (pos, len(self.tokens))
|
||||||
|
self.pos = pos
|
||||||
|
|
||||||
|
def backup(self) -> None:
|
||||||
|
# Back up position by 1
|
||||||
|
assert self.pos > 0
|
||||||
|
self.pos -= 1
|
||||||
|
|
||||||
|
def next(self, raw: bool = False) -> Token | None:
|
||||||
|
# Return next token and advance position; None if at EOF
|
||||||
|
# TODO: Return synthetic EOF token instead of None?
|
||||||
|
while self.pos < len(self.tokens):
|
||||||
|
tok = self.tokens[self.pos]
|
||||||
|
self.pos += 1
|
||||||
|
if raw or tok.kind != "COMMENT":
|
||||||
|
return tok
|
||||||
|
return None
|
||||||
|
|
||||||
|
def peek(self, raw: bool = False) -> Token | None:
|
||||||
|
# Return next token without advancing position
|
||||||
|
tok = self.next(raw=raw)
|
||||||
|
self.backup()
|
||||||
|
return tok
|
||||||
|
|
||||||
|
def maybe(self, kind: str, raw: bool = False) -> Token | None:
|
||||||
|
# Return next token without advancing position if kind matches
|
||||||
|
tok = self.peek(raw=raw)
|
||||||
|
if tok and tok.kind == kind:
|
||||||
|
return tok
|
||||||
|
return None
|
||||||
|
|
||||||
|
def expect(self, kind: str) -> Token | None:
|
||||||
|
# Return next token and advance position if kind matches
|
||||||
|
tkn = self.next()
|
||||||
|
if tkn is not None:
|
||||||
|
if tkn.kind == kind:
|
||||||
|
return tkn
|
||||||
|
self.backup()
|
||||||
|
return None
|
||||||
|
|
||||||
|
def require(self, kind: str) -> Token:
|
||||||
|
# Return next token and advance position, requiring kind to match
|
||||||
|
tkn = self.next()
|
||||||
|
if tkn is not None and tkn.kind == kind:
|
||||||
|
return tkn
|
||||||
|
raise self.make_syntax_error(f"Expected {kind!r} but got {tkn and tkn.text!r}", tkn)
|
||||||
|
|
||||||
|
def extract_line(self, lineno: int) -> str:
|
||||||
|
# Return source line `lineno` (1-based)
|
||||||
|
lines = self.src.splitlines()
|
||||||
|
if lineno > len(lines):
|
||||||
|
return ""
|
||||||
|
return lines[lineno - 1]
|
||||||
|
|
||||||
|
def make_syntax_error(self, message: str, tkn: Token|None = None) -> SyntaxError:
|
||||||
|
# Construct a SyntaxError instance from message and token
|
||||||
|
if tkn is None:
|
||||||
|
tkn = self.peek()
|
||||||
|
if tkn is None:
|
||||||
|
tkn = self.tokens[-1]
|
||||||
|
return lx.make_syntax_error(message,
|
||||||
|
self.filename, tkn.line, tkn.column, self.extract_line(tkn.line))
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
import sys
|
||||||
|
if sys.argv[1:]:
|
||||||
|
filename = sys.argv[1]
|
||||||
|
if filename == "-c" and sys.argv[2:]:
|
||||||
|
src = sys.argv[2]
|
||||||
|
filename = None
|
||||||
|
else:
|
||||||
|
with open(filename) as f:
|
||||||
|
src = f.read()
|
||||||
|
else:
|
||||||
|
filename = None
|
||||||
|
src = "if (x) { x.foo; // comment\n}"
|
||||||
|
p = PLexer(src, filename)
|
||||||
|
while not p.eof():
|
||||||
|
tok = p.next(raw=True)
|
||||||
|
left = repr(tok)
|
||||||
|
right = lx.to_text([tok]).rstrip()
|
||||||
|
print(f"{left:40.40} {right}")
|
Loading…
Reference in New Issue