This moves EXIT_TRACE, SAVE_IP, JUMP_TO_TOP, and
_POP_JUMP_IF_{FALSE,TRUE} from ceval.c to bytecodes.c.
They are no less special than before, but this way
they are discoverable o the copy-and-patch tooling.
During superblock generation, a JUMP_BACKWARD instruction is translated to either a JUMP_TO_TOP micro-op (when the target of the jump is exactly the beginning of the superblock, closing the loop), or a SAVE_IP + EXIT_TRACE pair, when the jump goes elsewhere.
The new JUMP_TO_TOP instruction includes a CHECK_EVAL_BREAKER() call, so a closed loop can still be interrupted.
- Hand-written uops JUMP_IF_{TRUE,FALSE}.
These peek at the top of the stack.
The jump target (in superblock space) is absolute.
- Hand-written translation for POP_JUMP_IF_{TRUE,FALSE},
assuming the jump is unlikely.
Once we implement jump-likelihood profiling,
we can implement the jump-unlikely case (in another PR).
- Tests (including some test cleanup).
- Improvements to len(ex) and ex[i] to expose the whole trace.
This adds several of unspecialized opcodes to superblocks:
TO_BOOL, BINARY_SUBSCR, STORE_SUBSCR,
UNPACK_SEQUENCE, LOAD_GLOBAL, LOAD_ATTR,
COMPARE_OP, BINARY_OP.
While we may not want that eventually, for now this helps finding bugs.
There is a rudimentary test checking for UNPACK_SEQUENCE.
Once we're ready to undo this, that would be simple:
just replace the call to variable_used_unspecialized
with a call to variable_used (as shown in a comment).
Or add individual opcdes to FORBIDDEN_NAMES_IN_UOPS.
Instead of special-casing specific instructions,
we add a few more special values to the 'size' field of expansions,
so in the future we can automatically handle
additional super-instructions in the generator.
- Tweak uops debugging output
- Fix the bug from gh-106290
- Rename `SET_IP` to `SAVE_IP` (per https://github.com/faster-cpython/ideas/issues/558)
- Add a `SAVE_IP` uop at the start of the trace (ditto)
- Allow `unbound_local_error`; this gives us uops for `LOAD_FAST_CHECK`, `LOAD_CLOSURE`, and `DELETE_FAST`
- Longer traces
- Support `STORE_FAST_LOAD_FAST`, `STORE_FAST_STORE_FAST`
- Add deps on pycore_uops.h to Makefile(.pre.in)
This produces longer traces (superblocks?).
Also improved debug output (uop names are now printed instead of numeric opcodes). This would be simpler if the numeric opcode values were generated by generate_cases.py, but that's another project.
Refactored some code in generate_cases.py so the essential algorithm for cache effects is only run once. (Deciding which effects are used and what the total cache size is, regardless of what's used.)
Added a new, experimental, tracing optimizer and interpreter (a.k.a. "tier 2"). This currently pessimizes, so don't use yet -- this is infrastructure so we can experiment with optimizing passes. To enable it, pass ``-Xuops`` or set ``PYTHONUOPS=1``. To get debug output, set ``PYTHONUOPSDEBUG=N`` where ``N`` is a debug level (0-4, where 0 is no debug output and 4 is excessively verbose).
All of this code is likely to change dramatically before the 3.13 feature freeze. But this is a first step.
This behavior is optional, because in some extreme cases it
may just make debugging harder. The tool defaults it to off,
but it is on in Makefile.pre.in.
Also note that this makes diffs to generated_cases.c.h noisier,
since whenever you insert or delete a line in bytecodes.c,
all subsequent #line directives will change.
* Write output and metadata in a single run
This halves the time to run the cases generator
(most of the time goes into parsing the input).
* Declare or define opcode metadata based on NEED_OPCODE_TABLES
* Use generated metadata for stack_effect()
* compile.o depends on opcode_metadata.h
* Return -1 from _PyOpcode_num_popped/pushed for unknown opcode
New generator feature: Generate useful glue for output arrays, so you can just write values to the output array (no bounds checking). Rewrote UNPACK_SEQUENCE_TWO_TUPLE to use this, and also UNPACK_SEQUENCE_{TUPLE,LIST}.
You can now write things like this:
```
inst(BUILD_STRING, (pieces[oparg] -- str)) { ... }
inst(LIST_APPEND, (list, unused[oparg-1], v -- list, unused[oparg-1])) { ... }
```
Note that array output effects are only partially supported (they must be named `unused` or correspond to an input effect).
For these the instr_format field uses IX instead of IB.
Register instructions use IX, IB, IBBX, IBBB, etc.
Also: Include the closing '}' in Block.tokens, for completeness