Each of the `LOAD_GLOBAL` specializations is implemented roughly as:
1. Load keys version.
2. Load cached keys version.
3. Deopt if (1) and (2) don't match.
4. Load keys.
5. Load cached index into keys.
6. Load object from (4) at offset from (5).
This is not thread-safe in free-threaded builds; the keys object may be replaced
in between steps (3) and (4).
This change refactors the specializations to avoid reloading the keys object and
instead pass the keys object from guards to be consumed by downstream uops.
* Replace unicode_compare_eq() with unicode_eq().
* Use unicode_eq() in setobject.c.
* Replace _PyUnicode_EQ() with _PyUnicode_Equal().
* Remove unicode_compare_eq() and _PyUnicode_EQ().
* Make slices marshallable
* Emit slices as constants
* Update Python/marshal.c
Co-authored-by: Peter Bierma <zintensitydev@gmail.com>
* Refactor codegen_slice into two functions so it
always has the same net effect
* Fix for free-threaded builds
* Simplify marshal loading of slices
* Only return SUCCESS/ERROR from codegen_slice
---------
Co-authored-by: Mark Shannon <mark@hotpy.org>
Co-authored-by: Peter Bierma <zintensitydev@gmail.com>
Stop the world when invalidating function versions
The tier1 interpreter specializes `CALL` instructions based on the values
of certain function attributes (e.g. `__code__`, `__defaults__`). The tier1
interpreter uses function versions to verify that the attributes of a function
during execution of a specialization match those seen during specialization.
A function's version is initialized in `MAKE_FUNCTION` and is invalidated when
any of the critical function attributes are changed. The tier1 interpreter stores
the function version in the inline cache during specialization. A guard is used by
the specialized instruction to verify that the version of the function on the operand
stack matches the cached version (and therefore has all of the expected attributes).
It is assumed that once the guard passes, all attributes will remain unchanged
while executing the rest of the specialized instruction.
Stopping the world when invalidating function versions ensures that all critical
function attributes will remain unchanged after the function version guard passes
in free-threaded builds. It's important to note that this is only true if the remainder
of the specialized instruction does not enter and exit a stop-the-world point.
We will stop the world the first time any of the following function attributes
are mutated:
- defaults
- vectorcall
- kwdefaults
- closure
- code
This should happen rarely and only happens once per function, so the performance
impact on majority of code should be minimal.
Additionally, refactor the API for manipulating function versions to more clearly
match the stated semantics.
* Spill the evaluation around escaping calls in the generated interpreter and JIT.
* The code generator tracks live, cached values so they can be saved to memory when needed.
* Spills the stack pointer around escaping calls, so that the exact stack is visible to the cycle GC.
Instead of surprise crashes and memory corruption, we now hang threads that attempt to re-enter the Python interpreter after Python runtime finalization has started. These are typically daemon threads (our long standing mis-feature) but could also be threads spawned by extension modules that then try to call into Python. This marks the `PyThread_exit_thread` public C API as deprecated as there is no plausible safe way to accomplish that on any supported platform in the face of things like C++ code with finalizers anywhere on a thread's stack. Doing this was the least bad option.
Co-authored-by: Gregory P. Smith <greg@krypto.org>
Currently, we only use per-thread reference counting for heap type objects and
the naming reflects that. We will extend it to a few additional types in an
upcoming change to avoid scaling bottlenecks when creating nested functions.
Rename some of the files and functions in preparation for this change.
Instead of be limited just by the size of addressable memory (2**63
bytes), Python integers are now also limited by the number of bits, so
the number of bit now always fit in a 64-bit integer.
Both limits are much larger than what might be available in practice,
so it doesn't affect users.
_PyLong_NumBits() and _PyLong_Frexp() are now always successful.
Use a `_PyStackRef` and defer the reference to `f_funcobj` when
possible. This avoids some reference count contention in the common case
of executing the same code object from multiple threads concurrently in
the free-threaded build.
Use a `_PyStackRef` and defer the reference to `f_executable` when
possible. This avoids some reference count contention in the common case
of executing the same code object from multiple threads concurrently in
the free-threaded build.
In gh-121602, I applied a fix to a builtin types initialization bug.
That fix made sense in the context of some broader future changes,
but introduced a little bit of extra complexity. That fix has turned
out to be incomplete for some of the builtin types we haven't
been testing. I found that out while improving the tests.
A while back, @markshannon suggested a simpler fix that doesn't
have that problem, which I've already applied to 3.12 and 3.13.
I'm switching to that here. Given the potential long-term
benefits of the more complex (but still incomplete) approach,
I'll circle back to it in the future, particularly after I've improved
the tests so no corner cases slip through the cracks.
(This is effectively a "forward-port" of 716c677 from 3.13.)
Add PyConfig_Get(), PyConfig_GetInt(), PyConfig_Set() and
PyConfig_Names() functions to get and set the current runtime Python
configuration.
Add visibility and "sys spec" to config and preconfig specifications.
_PyConfig_AsDict() now converts PyConfig.xoptions as a dictionary.
Co-authored-by: Bénédikt Tran <10796600+picnixz@users.noreply.github.com>
Switch more _Py_IsImmortal(...) assertions to _Py_IsImmortalLoose(...)
The remaining calls to _Py_IsImmortal are in free-threaded-only code,
initialization of core objects, tests, and guards that fall back to
code that works with mortal objects.
The `zip_next` function uses a common optimization technique for methods
that generate tuples. The iterator maintains an internal reference to
the returned tuple. When the method is called again, it checks if the
internal tuple's reference count is 1. If so, the tuple can be reused.
However, this approach is not safe under the free-threading build:
after checking the reference count, another thread may perform the same
check and also reuse the tuple. This can result in a double decref on
the items of the replaced tuple and a double incref (memory leak) on
the items of the tuple being set.
This adds a function, `_PyObject_IsUniquelyReferenced` that
encapsulates the stricter logic necessary for the free-threaded build:
the internal tuple must be owned by the current thread, have a local
refcount of one, and a shared refcount of zero.