This uses the new mechanism whereby certain uops
are replaced by others during translation,
using the `_PyUop_Replacements` table.
We further special-case the `_FOR_ITER_TIER_TWO` uop
to update the deoptimization target to point
just past the corresponding `END_FOR` opcode.
Two tiny code cleanups are also part of this PR.
- Double max trace size to 256
- Add a dependency on executor_cases.c.h for ceval.o
- Mark `_SPECIALIZE_UNPACK_SEQUENCE` as `TIER_ONE_ONLY`
- Add debug output back showing the optimized trace
- Bunch of cleanups to Tools/cases_generator/
* Replace jumps with deopts in tier 2
* Fewer special cases of uop names
* Add target field to uop IR
* Remove more redundant SET_IP and _CHECK_VALIDITY micro-ops
* Extend whitelist of non-escaping API functions.
- There is no longer a separate Python/executor.c file.
- Conventions in Python/bytecodes.c are slightly different -- don't use `goto error`,
you must use `GOTO_ERROR(error)` (same for others like `unused_local_error`).
- The `TIER_ONE` and `TIER_TWO` symbols are only valid in the generated (.c.h) files.
- In Lib/test/support/__init__.py, `Py_C_RECURSION_LIMIT` is imported from `_testcapi`.
- On Windows, in debug mode, stack allocation grows from 8MiB to 12MiB.
- **Beware!** This changes the env vars to enable uops and their debugging
to `PYTHON_UOPS` and `PYTHON_LLTRACE`.
* Rename SAVE_IP to _SET_IP
* Rename EXIT_TRACE to _EXIT_TRACE
* Rename SAVE_CURRENT_IP to _SAVE_CURRENT_IP
* Rename INSERT to _INSERT (This is for Ken Jin's abstract interpreter)
* Rename IS_NONE to _IS_NONE
* Rename JUMP_TO_TOP to _JUMP_TO_TOP
This adds a 16-bit inline cache entry to the conditional branch instructions POP_JUMP_IF_{FALSE,TRUE,NONE,NOT_NONE} and their instrumented variants, which is used to keep track of the branch direction.
Each time we encounter these instructions we shift the cache entry left by one and set the bottom bit to whether we jumped.
Then when it's time to translate such a branch to Tier 2 uops, we use the bit count from the cache entry to decided whether to continue translating the "didn't jump" branch or the "jumped" branch.
The counter is initialized to a pattern of alternating ones and zeros to avoid bias.
The .pyc file magic number is updated. There's a new test, some fixes for existing tests, and a few miscellaneous cleanups.
Also remove NOP instructions.
The "stubs" are not optimized in this fashion (their SAVE_IP should always be preserved since it's where to jump next, and they don't contain NOPs by their nature).
This finishes the work begun in gh-107760. When, while projecting a superblock, we encounter a call to a short, simple function, the superblock will now enter the function using `_PUSH_FRAME`, continue through it, and leave it using `_POP_FRAME`, and then continue through the original code. Multiple frame pushes and pops are even possible. It is also possible to stop appending to the superblock in the middle of a called function, when running out of space or encountering an unsupported bytecode.
* Split `CALL_PY_EXACT_ARGS` into uops
This is only the first step for doing `CALL` in Tier 2.
The next step involves tracing into the called code object and back.
After that we'll have to do the remaining `CALL` specialization.
Finally we'll have to deal with `KW_NAMES`.
Note: this moves setting `frame->return_offset` directly in front of
`DISPATCH_INLINED()`, to make it easier to move it into `_PUSH_FRAME`.
- The `dump_stack()` method could call a `__repr__` method implemented in Python,
causing (infinite) recursion.
I rewrote it to only print out the values for some fundamental types (`int`, `str`, etc.);
for everything else it just prints `<type_name @ 0xdeadbeef>`.
- The lltrace-like feature for uops wrote to `stderr`, while the one in `ceval.c` writes to `stdout`;
I changed the uops to write to stdout as well.
These aren't automatically translated because (ironically)
they are macros deferring to POP_JUMP_IF_{TRUE,FALSE},
which are not viable uops (being manually translated).
The hack is that we emit IS_NONE and then set opcode and
jump to the POP_JUMP_IF_{TRUE,FALSE} translation code.
During superblock generation, a JUMP_BACKWARD instruction is translated to either a JUMP_TO_TOP micro-op (when the target of the jump is exactly the beginning of the superblock, closing the loop), or a SAVE_IP + EXIT_TRACE pair, when the jump goes elsewhere.
The new JUMP_TO_TOP instruction includes a CHECK_EVAL_BREAKER() call, so a closed loop can still be interrupted.
- Hand-written uops JUMP_IF_{TRUE,FALSE}.
These peek at the top of the stack.
The jump target (in superblock space) is absolute.
- Hand-written translation for POP_JUMP_IF_{TRUE,FALSE},
assuming the jump is unlikely.
Once we implement jump-likelihood profiling,
we can implement the jump-unlikely case (in another PR).
- Tests (including some test cleanup).
- Improvements to len(ex) and ex[i] to expose the whole trace.