- Ensure that `assert(type_version != 0);` always comes *before* using `type_version`
Also:
- In cases_generator, rename `-v` to from `--verbose` to `--viable`
- Double max trace size to 256
- Add a dependency on executor_cases.c.h for ceval.o
- Mark `_SPECIALIZE_UNPACK_SEQUENCE` as `TIER_ONE_ONLY`
- Add debug output back showing the optimized trace
- Bunch of cleanups to Tools/cases_generator/
- There is no longer a separate Python/executor.c file.
- Conventions in Python/bytecodes.c are slightly different -- don't use `goto error`,
you must use `GOTO_ERROR(error)` (same for others like `unused_local_error`).
- The `TIER_ONE` and `TIER_TWO` symbols are only valid in the generated (.c.h) files.
- In Lib/test/support/__init__.py, `Py_C_RECURSION_LIMIT` is imported from `_testcapi`.
- On Windows, in debug mode, stack allocation grows from 8MiB to 12MiB.
- **Beware!** This changes the env vars to enable uops and their debugging
to `PYTHON_UOPS` and `PYTHON_LLTRACE`.
In Python/bytecodes.c, you now write
```
DEOPT_IF(condition);
```
The code generator expands this to
```
DEOPT_IF(condition, opcode);
```
where `opcode` is the name of the unspecialized instruction.
This works inside macro expansions too.
**CAVEAT:** The entire `DEOPT_IF(condition)` statement must be on a single line.
If it isn't, the substitution will fail; an error will be printed by the code generator
and the C compiler will report some errors.
These are the most popular specializations of `LOAD_ATTR` and `STORE_ATTR`
that weren't already viable uops:
* Split LOAD_ATTR_METHOD_WITH_VALUES
* Split LOAD_ATTR_METHOD_NO_DICT
* Split LOAD_ATTR_SLOT
* Split STORE_ATTR_SLOT
* Split STORE_ATTR_INSTANCE_VALUE
Also:
* Add `-v` flag to code generator which prints a list of non-viable uops
(easter-egg: it can print execution counts -- see source)
* Double _Py_UOP_MAX_TRACE_LENGTH to 128
I had dropped one of the DEOPT_IF() calls! :-(
I must have overlooked this when refactoring the code generator.
The Tier 1 interpreter contained a few silly things like
```
goto resume_frame;
STACK_SHRINK(1);
```
(and other variations, some where the unconditional `goto` was hidden in a macro).
* Rename SAVE_IP to _SET_IP
* Rename EXIT_TRACE to _EXIT_TRACE
* Rename SAVE_CURRENT_IP to _SAVE_CURRENT_IP
* Rename INSERT to _INSERT (This is for Ken Jin's abstract interpreter)
* Rename IS_NONE to _IS_NONE
* Rename JUMP_TO_TOP to _JUMP_TO_TOP
This adds a 16-bit inline cache entry to the conditional branch instructions POP_JUMP_IF_{FALSE,TRUE,NONE,NOT_NONE} and their instrumented variants, which is used to keep track of the branch direction.
Each time we encounter these instructions we shift the cache entry left by one and set the bottom bit to whether we jumped.
Then when it's time to translate such a branch to Tier 2 uops, we use the bit count from the cache entry to decided whether to continue translating the "didn't jump" branch or the "jumped" branch.
The counter is initialized to a pattern of alternating ones and zeros to avoid bias.
The .pyc file magic number is updated. There's a new test, some fixes for existing tests, and a few miscellaneous cleanups.
Instead of using `GO_TO_INSTRUCTION(CALL_PY_EXACT_ARGS)` we just add the macro elements of the latter to the macro for the former. This requires lengthening the uops array in struct opcode_macro_expansion. (It also required changes to stacking.py that were merged already.)
I was comparing the last preceding poke with the *last* peek,
rather than the *first* peek.
Unfortunately this bug obscured another bug:
When the last preceding poke is UNUSED, the first peek disappears,
leaving the variable unassigned. This is how I fixed it:
- Rename CopyEffect to CopyItem.
- Change CopyItem to contain StackItems instead of StackEffects.
- Update those StackItems when adjusting the manager higher or lower.
- Assert that those StackItems' offsets are equivalent.
- Other clever things.
---------
Co-authored-by: Irit Katriel <1055913+iritkatriel@users.noreply.github.com>
This finishes the work begun in gh-107760. When, while projecting a superblock, we encounter a call to a short, simple function, the superblock will now enter the function using `_PUSH_FRAME`, continue through it, and leave it using `_POP_FRAME`, and then continue through the original code. Multiple frame pushes and pops are even possible. It is also possible to stop appending to the superblock in the middle of a called function, when running out of space or encountering an unsupported bytecode.
* Split `CALL_PY_EXACT_ARGS` into uops
This is only the first step for doing `CALL` in Tier 2.
The next step involves tracing into the called code object and back.
After that we'll have to do the remaining `CALL` specialization.
Finally we'll have to deal with `KW_NAMES`.
Note: this moves setting `frame->return_offset` directly in front of
`DISPATCH_INLINED()`, to make it easier to move it into `_PUSH_FRAME`.
Introducing a new file, stacking.py, that takes over several responsibilities related to symbolic evaluation of push/pop operations, with more generality.
There's no need to use a dummy uop to skip unused cache entries. The macro syntax lets you write `unused/1` instead.
Similarly, move `unused/5` from op `_LOAD_ATTR_INSTANCE_VALUE` to macro `LOAD_ATTR_INSTANCE_VALUE`.
By turning `assert(kwnames == NULL)` into a macro that is not in the "forbidden" list, many instructions that formerly were skipped because they contained such an assert (but no other mention of `kwnames`) are now supported in Tier 2. This covers 10 instructions in total (all specializations of `CALL` that invoke some C code):
- `CALL_NO_KW_TYPE_1`
- `CALL_NO_KW_STR_1`
- `CALL_NO_KW_TUPLE_1`
- `CALL_NO_KW_BUILTIN_O`
- `CALL_NO_KW_BUILTIN_FAST`
- `CALL_NO_KW_LEN`
- `CALL_NO_KW_ISINSTANCE`
- `CALL_NO_KW_METHOD_DESCRIPTOR_O`
- `CALL_NO_KW_METHOD_DESCRIPTOR_NOARGS`
- `CALL_NO_KW_METHOD_DESCRIPTOR_FAST`
* Convert PyObject_DelAttr() and PyObject_DelAttrString() macros to
functions.
* Add PyObject_DelAttr() and PyObject_DelAttrString() functions to
the stable ABI.
* Replace PyObject_SetAttr(obj, name, NULL) with
PyObject_DelAttr(obj, name).
When `_PyOptimizer_BackEdge` returns `NULL`, we should restore `next_instr` (and `stack_pointer`). To accomplish this we should jump to `resume_with_error` instead of just `error`.
The problem this causes is subtle -- the only repro I have is in PR gh-106393, at commit d7df54b139bcc47f5ea094bfaa9824f79bc45adc. But the fix is real (as shown later in that PR).
While we're at it, also improve the debug output: the offsets at which traces are identified are now measured in bytes, and always show the start offset. This makes it easier to correlate executor calls with optimizer calls, and either with `dis` output.
<!-- gh-issue-number: gh-104584 -->
* Issue: gh-104584
<!-- /gh-issue-number -->
Added a new, experimental, tracing optimizer and interpreter (a.k.a. "tier 2"). This currently pessimizes, so don't use yet -- this is infrastructure so we can experiment with optimizing passes. To enable it, pass ``-Xuops`` or set ``PYTHONUOPS=1``. To get debug output, set ``PYTHONUOPSDEBUG=N`` where ``N`` is a debug level (0-4, where 0 is no debug output and 4 is excessively verbose).
All of this code is likely to change dramatically before the 3.13 feature freeze. But this is a first step.
* Add table describing possible executable classes for out-of-process debuggers.
* Remove shim code object creation code as it is no longer needed.
* Make lltrace a bit more robust w.r.t. non-standard frames.
This implements PEP 695, Type Parameter Syntax. It adds support for:
- Generic functions (def func[T](): ...)
- Generic classes (class X[T](): ...)
- Type aliases (type X = ...)
- New scoping when the new syntax is used within a class body
- Compiler and interpreter changes to support the new syntax and scoping rules
Co-authored-by: Marc Mueller <30130371+cdce8p@users.noreply.github.com>
Co-authored-by: Eric Traut <eric@traut.com>
Co-authored-by: Larry Hastings <larry@hastings.org>
Co-authored-by: Alex Waygood <Alex.Waygood@Gmail.com>
When monitoring LINE events, instrument all instructions that can have a predecessor on a different line.
Then check that the a new line has been hit in the instrumentation code.
This brings the behavior closer to that of 3.11, simplifying implementation and porting of tools.
This speeds up `super()` (by around 85%, for a simple one-level
`super().meth()` microbenchmark) by avoiding allocation of a new
single-use `super()` object on each use.
* The majority of the monitoring code is in instrumentation.c
* The new instrumentation bytecodes are in bytecodes.c
* legacy_tracing.c adapts the new API to the old sys.setrace and sys.setprofile APIs
* Eliminate all remaining uses of Py_SIZE and Py_SET_SIZE on PyLongObject, adding asserts.
* Change layout of size/sign bits in longobject to support future addition of immortal ints and tagged medium ints.
* Add functions to hide some internals of long object, and for setting sign and digit count.
* Replace uses of IS_MEDIUM_VALUE macro with _PyLong_IsCompact().
This behavior is optional, because in some extreme cases it
may just make debugging harder. The tool defaults it to off,
but it is on in Makefile.pre.in.
Also note that this makes diffs to generated_cases.c.h noisier,
since whenever you insert or delete a line in bytecodes.c,
all subsequent #line directives will change.
* Rename local variables, names and consts, from the interpeter loop. Will allow non-code objects in frames for better introspection of C builtins and extensions.
* Remove unused dummy variables.