Commit Graph

117 Commits

Author SHA1 Message Date
mpage 2e95c5ba3b
gh-115999: Implement thread-local bytecode and enable specialization for `BINARY_OP` (#123926)
Each thread specializes a thread-local copy of the bytecode, created on the first RESUME, in free-threaded builds. All copies of the bytecode for a code object are stored in the co_tlbc array on the code object. Threads reserve a globally unique index identifying its copy of the bytecode in all co_tlbc arrays at thread creation and release the index at thread destruction. The first entry in every co_tlbc array always points to the "main" copy of the bytecode that is stored at the end of the code object. This ensures that no bytecode is copied for programs that do not use threads.

Thread-local bytecode can be disabled at runtime by providing either -X tlbc=0 or PYTHON_TLBC=0. Disabling thread-local bytecode also disables specialization.

Concurrent modifications to the bytecode made by the specializing interpreter and instrumentation use atomics, with specialization taking care not to overwrite an instruction that was instrumented concurrently.
2024-11-04 11:13:32 -08:00
Sam Gross 3c4a7fa617
gh-124218: Avoid refcount contention on builtins module (GH-125847)
This replaces `_PyEval_BuiltinsFromGlobals` with
`_PyDict_LoadBuiltinsFromGlobals`, which returns a new reference
instead of a borrowed reference. Internally, the new function uses
per-thread reference counting when possible to avoid contention on the
refcount fields on the builtins module.
2024-10-24 12:44:38 -04:00
Mark Shannon 06ca33020e
GH-125323: Convert DECREF_INPUTS_AND_REUSE_FLOAT into a function that takes PyStackRefs. (GH-125439) 2024-10-14 14:18:57 +01:00
Mark Shannon da071fa3e8
GH-119866: Spill the stack around escaping calls. (GH-124392)
* Spill the evaluation around escaping calls in the generated interpreter and JIT. 

* The code generator tracks live, cached values so they can be saved to memory when needed.

* Spills the stack pointer around escaping calls, so that the exact stack is visible to the cycle GC.
2024-10-07 14:56:39 +01:00
Savannah Ostrowski 65f1237098
GH-123516: Improve JIT memory consumption by invalidating cold executors (GH-124443)
Co-authored-by: Bénédikt Tran <10796600+picnixz@users.noreply.github.com>
2024-09-27 00:35:42 +00:00
Ken Jin 8810e286fa
gh-121459: Deferred LOAD_GLOBAL (GH-123128)
Co-authored-by: Bénédikt Tran <10796600+picnixz@users.noreply.github.com>
Co-authored-by: Sam Gross <655866+colesbury@users.noreply.github.com>
2024-09-14 00:23:51 +08:00
Mark Shannon 7aca84e557
GH-117224: Move the body of a few large-ish micro-ops into helper functions (GH-122601) 2024-08-02 16:31:17 +01:00
Brandt Bucher 15d4cd0967
GH-116090: Fire RAISE events from _FOR_ITER_TIER_TWO (GH-122413) 2024-07-29 12:17:47 -07:00
Brandt Bucher 7b36b67b1e
GH-118093: Add tier two support to several instructions (GH-121884) 2024-07-18 14:24:58 -07:00
Ken Jin 22b0de2755
gh-117139: Convert the evaluation stack to stack refs (#118450)
This PR sets up tagged pointers for CPython.

The general idea is to create a separate struct _PyStackRef for everything on the evaluation stack to store the bits. This forces the C compiler to warn us if we try to cast things or pull things out of the struct directly.

Only for free threading: We tag the low bit if something is deferred - that means we skip incref and decref operations on it. This behavior may change in the future if Mark's plans to defer all objects in the interpreter loop pans out.

This implies a strict stack reference discipline is required. ALL incref and decref operations on stackrefs must use the stackref variants. It is unsafe to untag something then do normal incref/decref ops on it.

The new incref and decref variants are called dup and close. They mimic a "handle" API operating on these stackrefs.

Please read Include/internal/pycore_stackref.h for more information!

---------

Co-authored-by: Mark Shannon <9448417+markshannon@users.noreply.github.com>
2024-06-27 03:10:43 +08:00
Mark Shannon 9cefcc0ee7
GH-120507: Lower the `BEFORE_WITH` and `BEFORE_ASYNC_WITH` instructions. (#120640)
* Remove BEFORE_WITH and BEFORE_ASYNC_WITH instructions.

* Add LOAD_SPECIAL instruction

* Reimplement `with` and `async with` statements using LOAD_SPECIAL
2024-06-18 12:17:46 +01:00
Sam Gross f3b89a63cb
gh-117657: Fix TSAN reported race in `_PyEval_IsGILEnabled`. (#119921)
The GIL may be disabled concurrently with this call so we need to use a
relaxed atomic load.
2024-06-02 10:19:02 -04:00
Brett Simmers be1dfccdf2
gh-118727: Don't drop the GIL in `drop_gil()` unless the current thread holds it (#118745)
`drop_gil()` assumes that its caller is attached, which means that the current
thread holds the GIL if and only if the GIL is enabled, and the enabled-state
of the GIL won't change. This isn't true, though, because `detach_thread()`
calls `_PyEval_ReleaseLock()` after detaching and
`_PyThreadState_DeleteCurrent()` calls it after removing the current thread
from consideration for stop-the-world requests (effectively detaching it).

Fix this by remembering whether or not a thread acquired the GIL when it last
attached, in `PyThreadState._status.holds_gil`, and check this in `drop_gil()`
instead of `gil->enabled`.

This fixes a crash in `test_multiprocessing_pool_circular_import()`, so I've
reenabled it.
2024-05-23 16:59:35 -04:00
Brett Simmers 853163d3b5
gh-116322: Enable the GIL while loading C extension modules (#118560)
Add the ability to enable/disable the GIL at runtime, and use that in
the C module loading code.

We can't know before running a module init function if it supports
free-threading, so the GIL is temporarily enabled before doing so. If
the module declares support for running without the GIL, the GIL is
later disabled. Otherwise, the GIL is permanently enabled, and will
never be disabled again for the life of the current interpreter.
2024-05-06 23:07:23 -04:00
Pablo Galindo Salgado 1b22d801b8
gh-118518: Allow perf to work without frame pointers (#112254) 2024-05-05 03:07:29 +02:00
Eric Snow 09c2947581
gh-110693: Pending Calls Machinery Cleanups (gh-118296)
This does some cleanup in preparation for later changes.
2024-04-26 01:05:51 +00:00
Mark Shannon f180b31e76
GH-118095: Handle `RETURN_GENERATOR` in tier 2 (GH-118180) 2024-04-25 11:32:47 +01:00
Mark Shannon 2cf18a4430
GH-116422: Modify a few uops so that they can be supported by tier 2 with hot/cold splitting (GH-116832) 2024-03-15 10:48:00 +00:00
Brandt Bucher f0df35eeca
GH-115802: JIT "small" code for Windows (GH-115964) 2024-02-29 08:11:28 -08:00
Brett Simmers 0749244d13
gh-112175: Add `eval_breaker` to `PyThreadState` (#115194)
This change adds an `eval_breaker` field to `PyThreadState`. The primary
motivation is for performance in free-threaded builds: with thread-local eval
breakers, we can stop a specific thread (e.g., for an async exception) without
interrupting other threads.

The source of truth for the global instrumentation version is stored in the
`instrumentation_version` field in PyInterpreterState. Threads usually read the
version from their local `eval_breaker`, where it continues to be colocated
with the eval breaker bits.
2024-02-20 09:57:48 -05:00
Sam Gross a3af3cb4f4
gh-110481: Implement inter-thread queue for biased reference counting (#114824)
Biased reference counting maintains two refcount fields in each object:
`ob_ref_local` and `ob_ref_shared`. The true refcount is the sum of these two
fields. In some cases, when refcounting operations are split across threads,
the ob_ref_shared field can be negative (although the total refcount must be
at least zero). In this case, the thread that decremented the refcount
requests that the owning thread give up ownership and merge the refcount
fields.
2024-02-09 17:08:32 -05:00
Sam Gross 441affc9e7
gh-111964: Implement stop-the-world pauses (gh-112471)
The `--disable-gil` builds occasionally need to pause all but one thread.  Some
examples include:

* Cyclic garbage collection, where this is often called a "stop the world event"
* Before calling `fork()`, to ensure a consistent state for internal data structures
* During interpreter shutdown, to ensure that daemon threads aren't accessing Python objects

This adds the following functions to implement global and per-interpreter pauses:

* `_PyEval_StopTheWorldAll()` and `_PyEval_StartTheWorldAll()` (for the global runtime)
* `_PyEval_StopTheWorld()` and `_PyEval_StartTheWorld()` (per-interpreter)

(The function names may change.)

These functions are no-ops outside of the `--disable-gil` build.
2024-01-23 11:08:23 -07:00
Sam Gross a3c031884d
gh-112723: Call `PyThreadState_Clear()` from the correct interpreter (#112776)
The `PyThreadState_Clear()` function must only be called with the GIL
held and must be called from the same interpreter as the passed in
thread state. Otherwise, any Python objects on the thread state may be
destroyed using the wrong interpreter, leading to memory corruption.

This is also important for `Py_GIL_DISABLED` builds because free lists
will be associated with PyThreadStates and cleared in
`PyThreadState_Clear()`.

This fixes two places that called `PyThreadState_Clear()` from the wrong
interpreter and adds an assertion to `PyThreadState_Clear()`.
2023-12-12 17:20:21 -07:00
Sam Gross cf6110ba13
gh-111924: Use PyMutex for Runtime-global Locks. (gh-112207)
This replaces some usages of PyThread_type_lock with PyMutex, which does not require memory allocation to initialize.

This simplifies some of the runtime initialization and is also one step towards avoiding changing the default raw memory allocator during initialize/finalization, which can be non-thread-safe in some circumstances.
2023-12-07 12:33:40 -07:00
Pablo Galindo Salgado a73aa48e6b
gh-112367: Only free perf trampoline arenas at shutdown (#112368)
Signed-off-by: Pablo Galindo <pablogsal@gmail.com>
2023-12-01 13:20:51 +00:00
Tian Gao e0afed7e27
gh-103615: Use local events for opcode tracing (GH-109472)
* Use local monitoring for opcode trace

* Remove f_opcode_trace_set

* Add test for setting f_trace_opcodes after settrace
2023-11-03 16:39:50 +00:00
Eric Snow c6fe0869ab
gh-76785: Move the Cross-Interpreter Code to Its Own File (gh-111502)
This is partly to clear this stuff out of pystate.c, but also in preparation for moving some code out of _xxsubinterpretersmodule.c.  This change also moves this stuff to the internal API (new: Include/internal/pycore_crossinterp.h).  @vstinner did this previously and I undid it.  Now I'm re-doing it. :/
2023-10-30 16:53:10 -06:00
Donghee Na 2dcc57008b
gh-109693: Remove pycore_atomic.h (gh-110992) 2023-10-18 00:33:50 +09:00
Eric Snow 7bd560ce8d
gh-76785: Add SendChannel.send_buffer() (#110246)
(This is still a test module.)
2023-10-09 07:39:51 -06:00
Sam Gross 6e97a9647a
gh-109549: Add new states to PyThreadState to support PEP 703 (gh-109915)
This adds a new field 'state' to PyThreadState that can take on one of three values: _Py_THREAD_ATTACHED, _Py_THREAD_DETACHED, or _Py_THREAD_GC.  The "attached" and "detached" states correspond closely to acquiring and releasing the GIL.  The "gc" state is current unused, but will be used to implement stop-the-world GC for --disable-gil builds in the near future.
2023-10-05 09:46:33 -06:00
Mark Shannon bf4bc36069
GH-109369: Merge all eval-breaker flags and monitoring version into one word. (GH-109846) 2023-10-04 16:09:48 +01:00
Eric Snow fd7e08a6f3
gh-76785: Use Pending Calls When Releasing Cross-Interpreter Data (gh-109556)
This fixes some crashes in the _xxinterpchannels module, due to a race between interpreters.
2023-09-19 15:01:34 -06:00
Victor Stinner c494fb333b
gh-106320: Remove private _PyEval function (#108433)
Move private _PyEval functions to the internal C API
(pycore_ceval.h):

* _PyEval_GetBuiltin()
* _PyEval_GetBuiltinId()
* _PyEval_GetSwitchInterval()
* _PyEval_MakePendingCalls()
* _PyEval_SetProfile()
* _PyEval_SetSwitchInterval()
* _PyEval_SetTrace()

No longer export most of these functions.
2023-08-24 20:25:22 +02:00
Victor Stinner b7808820b1
gh-107211: No longer export internal functions (5) (#108423)
No longer export _PyCompile_AstOptimize() internal C API function.

Change comment style to "// comment" and add comment explaining why
other functions have to be exported.
2023-08-24 16:06:53 +00:00
Victor Stinner 0dd3fc2a64
gh-108216: Cleanup #include in internal header files (#108228)
* Add missing includes.
* Remove unused includes.
* Update old include/symbol names to newer names.
* Mention at least one included symbol.
* Sort includes.
* Update Tools/cases_generator/generate_cases.py used to generated
  pycore_opcode_metadata.h.
* Update Parser/asdl_c.py used to generate pycore_ast.h.
* Cleanup also includes in _testcapimodule.c and _testinternalcapi.c.
2023-08-21 18:05:59 +00:00
Guido van Rossum 61c7249759
gh-106581: Project through calls (#108067)
This finishes the work begun in gh-107760. When, while projecting a superblock, we encounter a call to a short, simple function, the superblock will now enter the function using `_PUSH_FRAME`, continue through it, and leave it using `_POP_FRAME`, and then continue through the original code. Multiple frame pushes and pops are even possible. It is also possible to stop appending to the superblock in the middle of a called function, when running out of space or encountering an unsupported bytecode.
2023-08-17 11:29:58 -07:00
Victor Stinner c5b13d6f80
gh-107211: No longer export internal functions (4) (#107217)
No longer export these 2 internal C API functions:

* _PyEval_SignalAsyncExc()
* _PyEval_SignalReceived()

Add also comments explaining why some internal functions have to be
exported, and update existing comments.
2023-07-25 03:16:28 +00:00
Brandt Bucher 8f4de57699
GH-106701: Move _PyUopExecute to Python/executor.c (GH-106924) 2023-07-20 20:37:19 +00:00
Carl Meyer 104d7b760f
gh-105340: include hidden fast-locals in locals() (#105715)
* gh-105340: include hidden fast-locals in locals()
2023-07-05 17:05:02 -06:00
Mark Shannon e5862113dd
GH-104584: Fix ENTER_EXECUTOR (GH-106141)
* Check eval-breaker in ENTER_EXECUTOR.

* Make sure that frame->prev_instr is set before entering executor.
2023-07-03 21:28:27 +01:00
Eric Snow 757b402ea1
gh-104812: Run Pending Calls in any Thread (gh-104813)
For a while now, pending calls only run in the main thread (in the main interpreter).  This PR changes things to allow any thread run a pending call, unless the pending call was explicitly added for the main thread to run.
2023-06-13 15:02:19 -06:00
Eric Snow 3698fda06e
gh-104341: Call _PyEval_ReleaseLock() with NULL When Finalizing the Current Thread (gh-105109)
This avoids the problematic race in drop_gil() by skipping the FORCE_SWITCHING code there for finalizing threads.

(The idea for this approach came out of discussions with @markshannon.)
2023-06-01 16:24:10 -06:00
Eric Snow 5c9ee498c6
gh-99113: A Per-Interpreter GIL! (gh-104210)
This is the culmination of PEP 684 (and of my 8-year long multi-core Python project)!

Each subinterpreter may now be created with its own GIL (via Py_NewInterpreterFromConfig()).  If not so configured then the interpreter will share with the main interpreter--the status quo since subinterpreters were added decades ago.  The main interpreter always has its own GIL and subinterpreters from Py_NewInterpreter() will always share with the main interpreter.
2023-05-08 13:15:09 -06:00
Eric Snow 92d8bfffbf
gh-99113: Make Sure the GIL is Acquired at the Right Places (gh-104208)
This is a pre-requisite for a per-interpreter GIL.  Without it this change isn't strictly necessary.  However, there is no real downside otherwise.
2023-05-06 15:59:30 -06:00
Eric Snow f3e7eb48f8
gh-99113: Add PyInterpreterConfig.own_gil (gh-104204)
We also add PyInterpreterState.ceval.own_gil to record if the interpreter actually has its own GIL.

Note that for now we don't actually respect own_gil; all interpreters still share the one GIL.  However, PyInterpreterState.ceval.own_gil does reflect PyInterpreterConfig.own_gil.  That lie is a temporary one that we will fix when the GIL really becomes per-interpreter.
2023-05-05 15:59:20 -06:00
Eric Snow 55671fe047
gh-99113: Share the GIL via PyInterpreterState.ceval.gil (gh-104203)
In preparation for a per-interpreter GIL, we add PyInterpreterState.ceval.gil, set it to the shared GIL for each interpreter, and use that rather than using _PyRuntime.ceval.gil directly.  Note that _PyRuntime.ceval.gil is still the actual GIL.
2023-05-05 13:23:00 -06:00
Mark Shannon 76449350b3
GH-91079: Decouple C stack overflow checks from Python recursion checks. (GH-96510) 2022-10-05 01:34:03 +01:00
Pablo Galindo Salgado 6d791a9736
gh-96143: Allow Linux perf profiler to see Python calls (GH-96123)
⚠️  ⚠️ Note for reviewers, hackers and fellow systems/low-level/compiler engineers ⚠️ ⚠️ 

If you have a lot of experience with this kind of shenanigans and want to improve the **first** version, **please make a PR against my branch** or **reach out by email** or **suggest code changes directly on GitHub**. 

If you have any **refinements or optimizations** please, wait until the first version is merged before starting hacking or proposing those so we can keep this PR productive.
2022-08-30 10:11:18 -07:00
Mark Shannon a4a9f2e879
GH-96177: Move GIL and eval breaker code out of ceval.c into ceval_gil.c. (GH-96204) 2022-08-24 14:21:01 +01:00
Christian Heimes e5e51556e4
gh-90473: Reduce recursion limit on WASI even further (GH-94333)
750 fails sometimes with newer wasmtime versions. 600 is a more
conservative value.
2022-06-27 16:19:47 +02:00