This interns the strings for `co_filename`, `co_name`, and `co_qualname`
on codeobjects in the free-threaded build. This partially addresses a
reference counting bottleneck when creating closures concurrently. The
closures take the name and qualified name from the code object.
This PR adds the ability to enable the GIL if it was disabled at
interpreter startup, and modifies the multi-phase module initialization
path to enable the GIL when loading a module, unless that module's spec
includes a slot indicating it can run safely without the GIL.
PEP 703 called the constant for the slot `Py_mod_gil_not_used`; I went
with `Py_MOD_GIL_NOT_USED` for consistency with gh-104148.
A warning will be issued up to once per interpreter for the first
GIL-using module that is loaded. If `-v` is given, a shorter message
will be printed to stderr every time a GIL-using module is loaded
(including the first one that issues a warning).
The module itself is a thin wrapper around calls to functions in
`Python/codecs.c`, so that's where the meaningful changes happened:
- Move codecs-related state that lives on `PyInterpreterState` to a
struct declared in `pycore_codecs.h`.
- In free-threaded builds, add a mutex to `codecs_state` to synchronize
operations on `search_path`. Because `search_path_mutex` is used as a
normal mutex and not a critical section, we must be extremely careful
with operations called while holding it.
- The codec registry is explicitly initialized as part of
`_PyUnicode_InitEncodings` to simplify thread-safety.
The code for Tier 2 is now only compiled when configured
with `--enable-experimental-jit[=yes|interpreter]`.
We drop support for `PYTHON_UOPS` and -`Xuops`,
but you can disable the interpreter or JIT
at runtime by setting `PYTHON_JIT=0`.
You can also build it without enabling it by default
using `--enable-experimental-jit=yes-off`;
enable with `PYTHON_JIT=1`.
On Windows, the `build.bat` script supports
`--experimental-jit`, `--experimental-jit-off`,
`--experimental-interpreter`.
In the C code, `_Py_JIT` is defined as before
when the JIT is enabled; the new variable
`_Py_TIER2` is defined when the JIT *or* the
interpreter is enabled. It is actually a bitmask:
1: JIT; 2: default-off; 4: interpreter.
Deferred reference counting is not fully implemented yet. As a temporary
measure, we immortalize objects that would use deferred reference
counting to avoid multi-threaded scaling bottlenecks.
This is only performed in the free-threaded build once the first
non-main thread is started. Additionally, some tests, including refleak
tests, suppress this behavior.
It's not safe to raise an exception in `PyObject_ClearWeakRefs()` if one
is not already set, since it may be called by `_Py_Dealloc()`, which
requires that the active exception does not change.
Additionally, make sure we clear the weakrefs even when tuple allocation
fails.
Fix data races in the method cache in free-threaded builds
These are technically data races, but I think they're benign (to
the extent that that is actually possible). We update cache entries
non-atomically but read them atomically from another thread, and there's
nothing that establishes a happens-before relationship between the
reads and writes that I can see.
Fix mimalloc allocator for huge memory allocation (around
8,589,934,592 GiB) on s390x.
Abort allocation early in mimalloc if the number of slices doesn't
fit into uint32_t, to prevent a integer overflow (cast 64-bit
size_t to uint32_t).
We want code objects to use deferred reference counting in the
free-threaded build. This requires them to be tracked by the GC, so we
set `Py_TPFLAGS_HAVE_GC` in the free-threaded build, but not the default
build.
Guido pointed out to me that some details about the per-interpreter state for the builtin types aren't especially clear. I'm addressing that by:
* adding a comment explaining that state
* adding some asserts to point out the relationship between each index and the interp/global runtime state