These are the most popular specializations of `LOAD_ATTR` and `STORE_ATTR`
that weren't already viable uops:
* Split LOAD_ATTR_METHOD_WITH_VALUES
* Split LOAD_ATTR_METHOD_NO_DICT
* Split LOAD_ATTR_SLOT
* Split STORE_ATTR_SLOT
* Split STORE_ATTR_INSTANCE_VALUE
Also:
* Add `-v` flag to code generator which prints a list of non-viable uops
(easter-egg: it can print execution counts -- see source)
* Double _Py_UOP_MAX_TRACE_LENGTH to 128
I had dropped one of the DEOPT_IF() calls! :-(
* Rename SAVE_IP to _SET_IP
* Rename EXIT_TRACE to _EXIT_TRACE
* Rename SAVE_CURRENT_IP to _SAVE_CURRENT_IP
* Rename INSERT to _INSERT (This is for Ken Jin's abstract interpreter)
* Rename IS_NONE to _IS_NONE
* Rename JUMP_TO_TOP to _JUMP_TO_TOP
Instead of using `GO_TO_INSTRUCTION(CALL_PY_EXACT_ARGS)` we just add the macro elements of the latter to the macro for the former. This requires lengthening the uops array in struct opcode_macro_expansion. (It also required changes to stacking.py that were merged already.)
This finishes the work begun in gh-107760. When, while projecting a superblock, we encounter a call to a short, simple function, the superblock will now enter the function using `_PUSH_FRAME`, continue through it, and leave it using `_POP_FRAME`, and then continue through the original code. Multiple frame pushes and pops are even possible. It is also possible to stop appending to the superblock in the middle of a called function, when running out of space or encountering an unsupported bytecode.
* Split `CALL_PY_EXACT_ARGS` into uops
This is only the first step for doing `CALL` in Tier 2.
The next step involves tracing into the called code object and back.
After that we'll have to do the remaining `CALL` specialization.
Finally we'll have to deal with `KW_NAMES`.
Note: this moves setting `frame->return_offset` directly in front of
`DISPATCH_INLINED()`, to make it easier to move it into `_PUSH_FRAME`.
Introducing a new file, stacking.py, that takes over several responsibilities related to symbolic evaluation of push/pop operations, with more generality.
There's no need to use a dummy uop to skip unused cache entries. The macro syntax lets you write `unused/1` instead.
Similarly, move `unused/5` from op `_LOAD_ATTR_INSTANCE_VALUE` to macro `LOAD_ATTR_INSTANCE_VALUE`.
By turning `assert(kwnames == NULL)` into a macro that is not in the "forbidden" list, many instructions that formerly were skipped because they contained such an assert (but no other mention of `kwnames`) are now supported in Tier 2. This covers 10 instructions in total (all specializations of `CALL` that invoke some C code):
- `CALL_NO_KW_TYPE_1`
- `CALL_NO_KW_STR_1`
- `CALL_NO_KW_TUPLE_1`
- `CALL_NO_KW_BUILTIN_O`
- `CALL_NO_KW_BUILTIN_FAST`
- `CALL_NO_KW_LEN`
- `CALL_NO_KW_ISINSTANCE`
- `CALL_NO_KW_METHOD_DESCRIPTOR_O`
- `CALL_NO_KW_METHOD_DESCRIPTOR_NOARGS`
- `CALL_NO_KW_METHOD_DESCRIPTOR_FAST`
The Tier 2 opcode _IS_ITER_EXHAUSTED_LIST (and _TUPLE)
didn't set it->it_seq to NULL, causing a subtle bug
that resulted in test_exhausted_iterator in list_tests.py
to fail when running all tests with -Xuops.
The bug was introduced in gh-106696.
Added this as an explicit test.
Also fixed the dependencies for ceval.o -- it depends on executor_cases.c.h.
This moves EXIT_TRACE, SAVE_IP, JUMP_TO_TOP, and
_POP_JUMP_IF_{FALSE,TRUE} from ceval.c to bytecodes.c.
They are no less special than before, but this way
they are discoverable o the copy-and-patch tooling.
* Convert PyObject_DelAttr() and PyObject_DelAttrString() macros to
functions.
* Add PyObject_DelAttr() and PyObject_DelAttrString() functions to
the stable ABI.
* Replace PyObject_SetAttr(obj, name, NULL) with
PyObject_DelAttr(obj, name).
This adds several of unspecialized opcodes to superblocks:
TO_BOOL, BINARY_SUBSCR, STORE_SUBSCR,
UNPACK_SEQUENCE, LOAD_GLOBAL, LOAD_ATTR,
COMPARE_OP, BINARY_OP.
While we may not want that eventually, for now this helps finding bugs.
There is a rudimentary test checking for UNPACK_SEQUENCE.
Once we're ready to undo this, that would be simple:
just replace the call to variable_used_unspecialized
with a call to variable_used (as shown in a comment).
Or add individual opcdes to FORBIDDEN_NAMES_IN_UOPS.
- Tweak uops debugging output
- Fix the bug from gh-106290
- Rename `SET_IP` to `SAVE_IP` (per https://github.com/faster-cpython/ideas/issues/558)
- Add a `SAVE_IP` uop at the start of the trace (ditto)
- Allow `unbound_local_error`; this gives us uops for `LOAD_FAST_CHECK`, `LOAD_CLOSURE`, and `DELETE_FAST`
- Longer traces
- Support `STORE_FAST_LOAD_FAST`, `STORE_FAST_STORE_FAST`
- Add deps on pycore_uops.h to Makefile(.pre.in)
Added a new, experimental, tracing optimizer and interpreter (a.k.a. "tier 2"). This currently pessimizes, so don't use yet -- this is infrastructure so we can experiment with optimizing passes. To enable it, pass ``-Xuops`` or set ``PYTHONUOPS=1``. To get debug output, set ``PYTHONUOPSDEBUG=N`` where ``N`` is a debug level (0-4, where 0 is no debug output and 4 is excessively verbose).
All of this code is likely to change dramatically before the 3.13 feature freeze. But this is a first step.