For a while now, pending calls only run in the main thread (in the main interpreter). This PR changes things to allow any thread run a pending call, unless the pending call was explicitly added for the main thread to run.
This function no longer makes sense, since its runtime parameter is
no longer used. Use directly _PyThreadState_GET() and
_PyInterpreterState_GET() instead.
This is strictly about moving the "obmalloc" runtime state from
`_PyRuntimeState` to `PyInterpreterState`. Doing so improves isolation
between interpreters, specifically most of the memory (incl. objects)
allocated for each interpreter's use. This is important for a
per-interpreter GIL, but such isolation is valuable even without it.
FWIW, a per-interpreter obmalloc is the proverbial
canary-in-the-coalmine when it comes to the isolation of objects between
interpreters. Any object that leaks (unintentionally) to another
interpreter is highly likely to cause a crash (on debug builds at
least). That's a useful thing to know, relative to interpreter
isolation.
We replace _PyRuntime.tstate_current with a thread-local variable. As part of this change, we add a _Py_thread_local macro in pyport.h (only for the core runtime) to smooth out the compiler differences. The main motivation here is in support of a per-interpreter GIL, but this change also provides some performance improvement opportunities.
Note that we do not provide a fallback to the thread-local, either falling back to the old tstate_current or to thread-specific storage (PyThread_tss_*()). If that proves problematic then we can circle back. I consider it unlikely, but will run the buildbots to double-check.
Also note that this does not change any of the code related to the GILState API, where it uses a thread state stored in thread-specific storage. I suspect we can combine that with _Py_tss_tstate (from here). However, that can be addressed separately and is not urgent (nor critical).
(While this change was mostly done independently, I did take some inspiration from earlier (~2020) work by @markshannon (main...markshannon:threadstate_in_tls) and @vstinner (#23976).)
* The majority of the monitoring code is in instrumentation.c
* The new instrumentation bytecodes are in bytecodes.c
* legacy_tracing.c adapts the new API to the old sys.setrace and sys.setprofile APIs
Sharing mutable (or non-immortal) objects between interpreters is generally not safe. We can work around that but not easily.
There are two restrictions that are critical for objects that break interpreter isolation.
The first is that the object's state be guarded by a global lock. For now the GIL meets this requirement, but a granular global lock is needed once we have a per-interpreter GIL.
The second restriction is that the object (and, for a container, its items) be deallocated/resized only when the interpreter in which it was allocated is the current one. This is because every interpreter has (or will have, see gh-101660) its own object allocator. Deallocating an object with a different allocator can cause crashes.
The dict for the cache of module defs is completely internal, which simplifies what we have to do to meet those requirements. To do so, we do the following:
* add a mechanism for re-using a temporary thread state tied to the main interpreter in an arbitrary thread
* add _PyRuntime.imports.extensions.main_tstate`
* add _PyThreadState_InitDetached() and _PyThreadState_ClearDetached() (pystate.c)
* add _PyThreadState_BindDetached() and _PyThreadState_UnbindDetached() (pystate.c)
* make sure the cache dict (_PyRuntime.imports.extensions.dict) and its items are all owned by the main interpreter)
* add a placeholder using for a granular global lock
Note that the cache is only used for legacy extension modules and not for multi-phase init modules.
https://github.com/python/cpython/issues/100227
This reverts commit 87be8d9.
This approach to keeping the interned strings safe is turning out to be too complex for my taste (due to obmalloc isolation). For now I'm going with the simpler solution, making the dict per-interpreter. We can revisit that later if we want a sharing solution.
This is effectively two changes. The first (the bulk of the change) is where we add _Py_AddToGlobalDict() (and _PyRuntime.cached_objects.main_tstate, etc.). The second (much smaller) change is where we update PyUnicode_InternInPlace() to use _Py_AddToGlobalDict() instead of calling PyDict_SetDefault() directly.
Basically, _Py_AddToGlobalDict() is a wrapper around PyDict_SetDefault() that should be used whenever we need to add a value to a runtime-global dict object (in the few cases where we are leaving the container global rather than moving it to PyInterpreterState, e.g. the interned strings dict). _Py_AddToGlobalDict() does all the necessary work to make sure the target global dict is shared safely between isolated interpreters. This is especially important as we move the obmalloc state to each interpreter (gh-101660), as well as, potentially, the GIL (PEP 684).
https://github.com/python/cpython/issues/100227
We're adding the function back, only for the stable ABI symbol and not as any form of API. I had removed it yesterday.
This undocumented "private" function was added with the implementation for PEP 3121 (3.0, 2007) for internal use and later moved out of the limited API (3.6, 2016) and then into the internal API (3.9, 2019). I removed it completely yesterday, including from the stable ABI manifest (where it was added because the symbol happened to be exported). It's unlikely that anyone is using _PyState_AddModule(), especially any stable ABI extensions built against 3.2-3.5, but we're playing it safe.
https://github.com/python/cpython/issues/101758
This change is almost entirely moving code around and hiding import state behind internal API. We introduce no changes to behavior, nor to non-internal API. (Since there was already going to be a lot of churn, I took this as an opportunity to re-organize import.c into topically-grouped sections of code.) The motivation is to simplify a number of upcoming changes.
Specific changes:
* move existing import-related code to import.c, wherever possible
* add internal API for interacting with import state (both global and per-interpreter)
* use only API outside of import.c (to limit churn there when changing the location, etc.)
* consolidate the import-related state of PyInterpreterState into a single struct field (this changes layout slightly)
* add macros for import state in import.c (to simplify changing the location)
* group code in import.c into sections
*remove _PyState_AddModule()
https://github.com/python/cpython/issues/101758
A PyThreadState can be in one of many states in its lifecycle, represented by some status value. Those statuses haven't been particularly clear, so we're addressing that here. Specifically:
* made the distinct lifecycle statuses clear on PyThreadState
* identified expectations of how various lifecycle-related functions relate to status
* noted the various places where those expectations don't match the actual behavior
At some point we'll need to address the mismatches.
(This change also includes some cleanup.)
https://github.com/python/cpython/issues/59956
The objective of this change is to help make the GILState-related code easier to understand. This mostly involves moving code around and some semantically equivalent refactors. However, there are a also a small number of slight changes in structure and behavior:
* tstate_current is moved out of _PyRuntimeState.gilstate
* autoTSSkey is moved out of _PyRuntimeState.gilstate
* autoTSSkey is initialized earlier
* autoTSSkey is re-initialized (after fork) earlier
https://github.com/python/cpython/issues/59956
This was added for bpo-40514 (gh-84694) to test out a per-interpreter GIL. However, it has since proven unnecessary to keep the experiment in the repo. (It can be done as a branch in a fork like normal.) So here we are removing:
* the configure option
* the macro
* the code enabled by the macro
* Revert "bpo-46850: Move _PyInterpreterState_SetEvalFrameFunc() to internal C API (GH-32054)"
This reverts commit f877b40e3f.
* Revert "bpo-46850: Move _PyEval_EvalFrameDefault() to internal C API (GH-32052)"
This reverts commit b9a5522dd9.
Move the private _PyFrameEvalFunction type, and private
_PyInterpreterState_GetEvalFrameFunc() and
_PyInterpreterState_SetEvalFrameFunc() functions to the internal C
API. The _PyFrameEvalFunction callback function type now uses the
_PyInterpreterFrame type which is part of the internal C API.
Update the _PyFrameEvalFunction documentation.
PyInterpreterState_Main() is a plain function exposed in the public C-API. For internal usage we can take the more efficient approach in this PR.
https://bugs.python.org/issue46008
This simplifies new_threadstate(). We also rename _PyThreadState_Init() to _PyThreadState_SetCurrent() to reflect what it actually does.
https://bugs.python.org/issue46008
Add PyThreadState_EnterTracing() and PyThreadState_LeaveTracing()
functions to the limited C API to suspend and resume tracing and
profiling.
Add an unit test on the PyThreadState C API to _testcapi.
Add also internal _PyThreadState_DisableTracing() and
_PyThreadState_ResetTracing().
Redefining the PyThreadState_GET() macro in pycore_pystate.h is
useless since it doesn't affect files not including it. Either use
_PyThreadState_GET() directly, or don't use pycore_pystate.h internal
C API. For example, the _testcapi extension don't use the internal C
API, but use the public PyThreadState_Get() function instead.
Replace PyThreadState_Get() with _PyThreadState_GET(). The
_PyThreadState_GET() macro is more efficient than PyThreadState_Get()
and PyThreadState_GET() function calls which call fail with a fatal
Python error.
posixmodule.c and _ctypes extension now include <windows.h> before
pycore header files (like pycore_call.h).
_PyTraceback_Add() now uses _PyErr_Fetch()/_PyErr_Restore() instead
of PyErr_Fetch()/PyErr_Restore().
The _decimal and _xxsubinterpreters extensions are now built with the
Py_BUILD_CORE_MODULE macro defined to get access to the internal C
API.
This accomplishes 2 things:
* consolidates some common code between getpath.c and getpathp.c
* makes the helpers available to code in other files
FWIW, the signature of the join_relfile() function (in fileutils.c) intentionally mirrors that of Windows' PathCchCombineEx().
Note that this change is mostly moving code around. No behavior is meant to change.
https://bugs.python.org/issue45211
* Convert "specials" array to InterpreterFrame struct, adding f_lasti, f_state and other non-debug FrameObject fields to it.
* Refactor, calls pushing the call to the interpreter upward toward _PyEval_Vector.
* Compute f_back when on thread stack, only filling in value when frame object outlives stack invocation.
* Move ownership of InterpreterFrame in generator from frame object to generator object.
* Do not create frame objects for Python calls.
* Do not create frame objects for generators.
* Remove 'zombie' frames. We won't need them once we are allocating fixed-size frames.
* Add co_nlocalplus field to code object to avoid recomputing size of locals + frees + cells.
* Move locals, cells and freevars out of frame object into separate memory buffer.
* Use per-threadstate allocated memory chunks for local variables.
* Move globals and builtins from frame object to per-thread stack.
* Move (slow) locals frame object to per-thread stack.
* Move internal frame functions to internal header.
my_fgets() now calls _PyOS_InterruptOccurred(tstate) to check for
pending signals, rather calling PyOS_InterruptOccurred().
my_fgets() is called with the GIL released, whereas
PyOS_InterruptOccurred() must be called with the GIL held.
test_repl: use text=True and avoid SuppressCrashReport in
test_multiline_string_parsing().
Fix my_fgets() on Windows: fgets(fp) does crash if fileno(fp) is closed.
PyOS_AfterFork_Child() helper functions now return a PyStatus:
PyOS_AfterFork_Child() is now responsible to handle errors.
* Move _PySignal_AfterFork() to the internal C API
* Add #ifdef HAVE_FORK on _PyGILState_Reinit(), _PySignal_AfterFork()
and _PyInterpreterState_DeleteExceptMain().
In the experimental isolated subinterpreters build mode,
_PyThreadState_GET() gets the autoTSSkey variable and
_PyThreadState_Swap() sets the autoTSSkey variable.
* Add _PyThreadState_GetTSS()
* _PyRuntimeState_GetThreadState() and _PyThreadState_GET()
return _PyThreadState_GetTSS()
* PyEval_SaveThread() sets the autoTSSkey variable to current Python
thread state rather than NULL.
* eval_frame_handle_pending() doesn't check that
_PyThreadState_Swap() result is NULL.
* _PyThreadState_Swap() gets the current Python thread state with
_PyThreadState_GetTSS() rather than
_PyRuntimeGILState_GetThreadState().
* PyGILState_Ensure() no longer checks _PyEval_ThreadsInitialized()
since it cannot access the current interpreter.
Rename _PyInterpreterState_GET_UNSAFE() to _PyInterpreterState_GET()
for consistency with _PyThreadState_GET() and to have a shorter name
(help to fit into 80 columns).
Add also "assert(tstate != NULL);" to the function.
Don't access PyInterpreterState.config member directly anymore, but
use new functions:
* _PyInterpreterState_GetConfig()
* _PyInterpreterState_SetConfig()
* _Py_GetConfig()
Fix the signal handler: it now always uses the main interpreter,
rather than trying to get the current Python thread state.
The following function now accepts an interpreter, instead of a
Python thread state:
* _PyEval_SignalReceived()
* _Py_ThreadCanHandleSignals()
* _PyEval_AddPendingCall()
* COMPUTE_EVAL_BREAKER()
* SET_GIL_DROP_REQUEST(), RESET_GIL_DROP_REQUEST()
* SIGNAL_PENDING_CALLS(), UNSIGNAL_PENDING_CALLS()
* SIGNAL_PENDING_SIGNALS(), UNSIGNAL_PENDING_SIGNALS()
* SIGNAL_ASYNC_EXC(), UNSIGNAL_ASYNC_EXC()
Py_AddPendingCall() now uses the main interpreter if it fails to the
current Python thread state.
Convert _PyThreadState_GET() and PyInterpreterState_GET_UNSAFE()
macros to static inline functions.
Remove _PyRuntime.getframe hook and remove _PyThreadState_GetFrame
macro which was an alias to _PyRuntime.getframe. They were only
exposed by the internal C API. Remove also PyThreadFrameGetter type.
If a thread different than the main thread schedules a pending call
(Py_AddPendingCall()), the bytecode evaluation loop is no longer
interrupted at each bytecode instruction to check for pending calls
which cannot be executed. Only the main thread can execute pending
calls.
Previously, the bytecode evaluation loop was interrupted at each
instruction until the main thread executes pending calls.
* Add _Py_ThreadCanHandlePendingCalls() function.
* SIGNAL_PENDING_CALLS() now only sets eval_breaker to 1 if the
current thread can execute pending calls. Only the main thread can
execute pending calls.
COMPUTE_EVAL_BREAKER() now also checks if the Python thread state
belongs to the main interpreter. Don't break the evaluation loop if
there are pending signals but the Python thread state it belongs to a
subinterpeter.
* Add _Py_IsMainThread() function.
* Add _Py_ThreadCanHandleSignals() function.
If Py_AddPendingCall() is called in a subinterpreter, the function is
now scheduled to be called from the subinterpreter, rather than being
called from the main interpreter.
Each subinterpreter now has its own list of scheduled calls.
* Move pending and eval_breaker fields from _PyRuntimeState.ceval
to PyInterpreterState.ceval.
* new_interpreter() now calls _PyEval_InitThreads() to create
pending calls lock.
* Fix Py_AddPendingCall() for subinterpreters. It now calls
_PyThreadState_GET() which works in a subinterpreter if the
caller holds the GIL, and only falls back on
PyGILState_GetThisThreadState() if _PyThreadState_GET()
returns NULL.
PyInterpreterState.eval_frame function now requires a tstate (Python
thread state) parameter.
Add private functions to the C API to get and set the frame
evaluation function:
* Add tstate parameter to _PyFrameEvalFunction function type.
* Add _PyInterpreterState_GetEvalFrameFunc() and
_PyInterpreterState_SetEvalFrameFunc() functions.
* Add tstate parameter to _PyEval_EvalFrameDefault().
Convert _PyRuntimeState.finalizing field to an atomic variable:
* Rename it to _finalizing
* Change its type to _Py_atomic_address
* Add _PyRuntimeState_GetFinalizing() and _PyRuntimeState_SetFinalizing()
functions
* Remove _Py_CURRENTLY_FINALIZING() function: replace it with testing
directly _PyRuntimeState_GetFinalizing() value
Convert _PyRuntimeState_GetThreadState() to static inline function.
Add a fast-path for UTF-8 encoding in PyUnicode_EncodeFSDefault()
and PyUnicode_DecodeFSDefaultAndSize().
Add _PyUnicode_FiniEncodings() helper function for _PyUnicode_Fini().
Each Python subinterpreter now has its own "small integer
singletons": numbers in [-5; 257] range.
It is no longer possible to change the number of small integers at
build time by overriding NSMALLNEGINTS and NSMALLPOSINTS macros:
macros should now be modified manually in pycore_pystate.h header
file.
For now, continue to share _PyLong_Zero and _PyLong_One singletons
between all subinterpreters.