This change adds an `eval_breaker` field to `PyThreadState`. The primary
motivation is for performance in free-threaded builds: with thread-local eval
breakers, we can stop a specific thread (e.g., for an async exception) without
interrupting other threads.
The source of truth for the global instrumentation version is stored in the
`instrumentation_version` field in PyInterpreterState. Threads usually read the
version from their local `eval_breaker`, where it continues to be colocated
with the eval breaker bits.
This uses the new mechanism whereby certain uops
are replaced by others during translation,
using the `_PyUop_Replacements` table.
We further special-case the `_FOR_ITER_TIER_TWO` uop
to update the deoptimization target to point
just past the corresponding `END_FOR` opcode.
Two tiny code cleanups are also part of this PR.
- Ensure that `assert(type_version != 0);` always comes *before* using `type_version`
Also:
- In cases_generator, rename `-v` to from `--verbose` to `--viable`
- Double max trace size to 256
- Add a dependency on executor_cases.c.h for ceval.o
- Mark `_SPECIALIZE_UNPACK_SEQUENCE` as `TIER_ONE_ONLY`
- Add debug output back showing the optimized trace
- Bunch of cleanups to Tools/cases_generator/
- There is no longer a separate Python/executor.c file.
- Conventions in Python/bytecodes.c are slightly different -- don't use `goto error`,
you must use `GOTO_ERROR(error)` (same for others like `unused_local_error`).
- The `TIER_ONE` and `TIER_TWO` symbols are only valid in the generated (.c.h) files.
- In Lib/test/support/__init__.py, `Py_C_RECURSION_LIMIT` is imported from `_testcapi`.
- On Windows, in debug mode, stack allocation grows from 8MiB to 12MiB.
- **Beware!** This changes the env vars to enable uops and their debugging
to `PYTHON_UOPS` and `PYTHON_LLTRACE`.
In Python/bytecodes.c, you now write
```
DEOPT_IF(condition);
```
The code generator expands this to
```
DEOPT_IF(condition, opcode);
```
where `opcode` is the name of the unspecialized instruction.
This works inside macro expansions too.
**CAVEAT:** The entire `DEOPT_IF(condition)` statement must be on a single line.
If it isn't, the substitution will fail; an error will be printed by the code generator
and the C compiler will report some errors.
These are the most popular specializations of `LOAD_ATTR` and `STORE_ATTR`
that weren't already viable uops:
* Split LOAD_ATTR_METHOD_WITH_VALUES
* Split LOAD_ATTR_METHOD_NO_DICT
* Split LOAD_ATTR_SLOT
* Split STORE_ATTR_SLOT
* Split STORE_ATTR_INSTANCE_VALUE
Also:
* Add `-v` flag to code generator which prints a list of non-viable uops
(easter-egg: it can print execution counts -- see source)
* Double _Py_UOP_MAX_TRACE_LENGTH to 128
I had dropped one of the DEOPT_IF() calls! :-(
* Rename SAVE_IP to _SET_IP
* Rename EXIT_TRACE to _EXIT_TRACE
* Rename SAVE_CURRENT_IP to _SAVE_CURRENT_IP
* Rename INSERT to _INSERT (This is for Ken Jin's abstract interpreter)
* Rename IS_NONE to _IS_NONE
* Rename JUMP_TO_TOP to _JUMP_TO_TOP
This adds a 16-bit inline cache entry to the conditional branch instructions POP_JUMP_IF_{FALSE,TRUE,NONE,NOT_NONE} and their instrumented variants, which is used to keep track of the branch direction.
Each time we encounter these instructions we shift the cache entry left by one and set the bottom bit to whether we jumped.
Then when it's time to translate such a branch to Tier 2 uops, we use the bit count from the cache entry to decided whether to continue translating the "didn't jump" branch or the "jumped" branch.
The counter is initialized to a pattern of alternating ones and zeros to avoid bias.
The .pyc file magic number is updated. There's a new test, some fixes for existing tests, and a few miscellaneous cleanups.
Instead of using `GO_TO_INSTRUCTION(CALL_PY_EXACT_ARGS)` we just add the macro elements of the latter to the macro for the former. This requires lengthening the uops array in struct opcode_macro_expansion. (It also required changes to stacking.py that were merged already.)
This finishes the work begun in gh-107760. When, while projecting a superblock, we encounter a call to a short, simple function, the superblock will now enter the function using `_PUSH_FRAME`, continue through it, and leave it using `_POP_FRAME`, and then continue through the original code. Multiple frame pushes and pops are even possible. It is also possible to stop appending to the superblock in the middle of a called function, when running out of space or encountering an unsupported bytecode.
* Split `CALL_PY_EXACT_ARGS` into uops
This is only the first step for doing `CALL` in Tier 2.
The next step involves tracing into the called code object and back.
After that we'll have to do the remaining `CALL` specialization.
Finally we'll have to deal with `KW_NAMES`.
Note: this moves setting `frame->return_offset` directly in front of
`DISPATCH_INLINED()`, to make it easier to move it into `_PUSH_FRAME`.
There's no need to use a dummy uop to skip unused cache entries. The macro syntax lets you write `unused/1` instead.
Similarly, move `unused/5` from op `_LOAD_ATTR_INSTANCE_VALUE` to macro `LOAD_ATTR_INSTANCE_VALUE`.
By turning `assert(kwnames == NULL)` into a macro that is not in the "forbidden" list, many instructions that formerly were skipped because they contained such an assert (but no other mention of `kwnames`) are now supported in Tier 2. This covers 10 instructions in total (all specializations of `CALL` that invoke some C code):
- `CALL_NO_KW_TYPE_1`
- `CALL_NO_KW_STR_1`
- `CALL_NO_KW_TUPLE_1`
- `CALL_NO_KW_BUILTIN_O`
- `CALL_NO_KW_BUILTIN_FAST`
- `CALL_NO_KW_LEN`
- `CALL_NO_KW_ISINSTANCE`
- `CALL_NO_KW_METHOD_DESCRIPTOR_O`
- `CALL_NO_KW_METHOD_DESCRIPTOR_NOARGS`
- `CALL_NO_KW_METHOD_DESCRIPTOR_FAST`
The Tier 2 opcode _IS_ITER_EXHAUSTED_LIST (and _TUPLE)
didn't set it->it_seq to NULL, causing a subtle bug
that resulted in test_exhausted_iterator in list_tests.py
to fail when running all tests with -Xuops.
The bug was introduced in gh-106696.
Added this as an explicit test.
Also fixed the dependencies for ceval.o -- it depends on executor_cases.c.h.
This moves EXIT_TRACE, SAVE_IP, JUMP_TO_TOP, and
_POP_JUMP_IF_{FALSE,TRUE} from ceval.c to bytecodes.c.
They are no less special than before, but this way
they are discoverable o the copy-and-patch tooling.
* Convert PyObject_DelAttr() and PyObject_DelAttrString() macros to
functions.
* Add PyObject_DelAttr() and PyObject_DelAttrString() functions to
the stable ABI.
* Replace PyObject_SetAttr(obj, name, NULL) with
PyObject_DelAttr(obj, name).
When `_PyOptimizer_BackEdge` returns `NULL`, we should restore `next_instr` (and `stack_pointer`). To accomplish this we should jump to `resume_with_error` instead of just `error`.
The problem this causes is subtle -- the only repro I have is in PR gh-106393, at commit d7df54b139bcc47f5ea094bfaa9824f79bc45adc. But the fix is real (as shown later in that PR).
While we're at it, also improve the debug output: the offsets at which traces are identified are now measured in bytes, and always show the start offset. This makes it easier to correlate executor calls with optimizer calls, and either with `dis` output.
<!-- gh-issue-number: gh-104584 -->
* Issue: gh-104584
<!-- /gh-issue-number -->
Added a new, experimental, tracing optimizer and interpreter (a.k.a. "tier 2"). This currently pessimizes, so don't use yet -- this is infrastructure so we can experiment with optimizing passes. To enable it, pass ``-Xuops`` or set ``PYTHONUOPS=1``. To get debug output, set ``PYTHONUOPSDEBUG=N`` where ``N`` is a debug level (0-4, where 0 is no debug output and 4 is excessively verbose).
All of this code is likely to change dramatically before the 3.13 feature freeze. But this is a first step.
* Add table describing possible executable classes for out-of-process debuggers.
* Remove shim code object creation code as it is no longer needed.
* Make lltrace a bit more robust w.r.t. non-standard frames.
This implements PEP 695, Type Parameter Syntax. It adds support for:
- Generic functions (def func[T](): ...)
- Generic classes (class X[T](): ...)
- Type aliases (type X = ...)
- New scoping when the new syntax is used within a class body
- Compiler and interpreter changes to support the new syntax and scoping rules
Co-authored-by: Marc Mueller <30130371+cdce8p@users.noreply.github.com>
Co-authored-by: Eric Traut <eric@traut.com>
Co-authored-by: Larry Hastings <larry@hastings.org>
Co-authored-by: Alex Waygood <Alex.Waygood@Gmail.com>
When monitoring LINE events, instrument all instructions that can have a predecessor on a different line.
Then check that the a new line has been hit in the instrumentation code.
This brings the behavior closer to that of 3.11, simplifying implementation and porting of tools.
This speeds up `super()` (by around 85%, for a simple one-level
`super().meth()` microbenchmark) by avoiding allocation of a new
single-use `super()` object on each use.
* The majority of the monitoring code is in instrumentation.c
* The new instrumentation bytecodes are in bytecodes.c
* legacy_tracing.c adapts the new API to the old sys.setrace and sys.setprofile APIs