Commit Graph

47 Commits

Author SHA1 Message Date
Sam Gross b482538523
gh-124218: Refactor per-thread reference counting (#124844)
Currently, we only use per-thread reference counting for heap type objects and
the naming reflects that. We will extend it to a few additional types in an
upcoming change to avoid scaling bottlenecks when creating nested functions.

Rename some of the files and functions in preparation for this change.
2024-10-01 17:05:42 +00:00
Sam Gross f4997bb3ac
gh-123923: Defer refcounting for `f_funcobj` in `_PyInterpreterFrame` (#124026)
Use a `_PyStackRef` and defer the reference to `f_funcobj` when
possible. This avoids some reference count contention in the common case
of executing the same code object from multiple threads concurrently in
the free-threaded build.
2024-09-24 20:08:18 +00:00
Sam Gross b02301fa5a
gh-124068: Fix reference leak with generators in the free-threaded build (#124069)
If the generator is already cleared, then most fields in the
generator's frame are not valid other than f_executable. The invalid
fields may contain dangling pointers and should not be used.
2024-09-13 22:02:27 -04:00
Sam Gross b2afe2aae4
gh-123923: Defer refcounting for `f_executable` in `_PyInterpreterFrame` (#123924)
Use a `_PyStackRef` and defer the reference to `f_executable` when
possible. This avoids some reference count contention in the common case
of executing the same code object from multiple threads concurrently in
the free-threaded build.
2024-09-12 12:37:06 -04:00
Mark Shannon a4fd7aa4a6
GH-115776: Allow any fixed sized object to have inline values (GH-123192) 2024-08-21 15:52:04 +01:00
Sam Gross e001027188
gh-117139: Garbage collector support for deferred refcounting (#122956)
The free-threaded GC now visits interpreter stacks to keep objects
that use deferred reference counting alive.

Interpreter frames are zero initialized in the free-threaded GC so
that the GC doesn't see garbage data. This is a temporary measure
until stack spilling around escaping calls is implemented.

Co-authored-by: Ken Jin <kenjin@python.org>
2024-08-15 16:09:11 +00:00
Sam Gross 2d9d3a9f53
gh-122697: Fix free-threading memory leaks at shutdown (#122703)
We were not properly accounting for interpreter memory leaks at
shutdown and had two sources of leaks:

 * Objects that use deferred reference counting and were reachable via
   static types outlive the final GC. We now disable deferred reference
   counting on all objects if we are calling the GC due to interpreter
   shutdown.

 * `_PyMem_FreeDelayed` did not properly check for interpreter shutdown
   so we had some memory blocks that were enqueued to be freed, but
   never actually freed.

 * `_PyType_FinalizeIdPool` wasn't called at interpreter shutdown.
2024-08-08 12:48:17 -04:00
Sam Gross dc09301067
gh-122417: Implement per-thread heap type refcounts (#122418)
The free-threaded build partially stores heap type reference counts in
distributed manner in per-thread arrays. This avoids reference count
contention when creating or destroying instances.

Co-authored-by: Ken Jin <kenjin@python.org>
2024-08-06 14:36:57 -04:00
Sam Gross 5716cc3529
gh-100240: Use a consistent implementation for freelists (#121934)
This combines and updates our freelist handling to use a consistent
implementation. Objects in the freelist are linked together using the
first word of memory block.

If configured with freelists disabled, these operations are essentially
no-ops.
2024-07-22 12:08:27 -04:00
Sam Gross d23be3947c
gh-121794: Don't set `ob_tid` to zero in fast-path dealloc (#121799)
We should maintain the invariant that a zero `ob_tid` implies the
refcount fields are merged.

* Move the assignment in `_Py_MergeZeroLocalRefcount` to immediately
  before the refcount merge.
* Update `_PyTrash_thread_destroy_chain` to set `ob_ref_shared` to
  `_Py_REF_MERGED` when setting `ob_tid` to zero.

Also check this invariant with assertions in the GC in debug builds.
That uncovered a bug when running out of memory during GC.
2024-07-15 17:50:10 -04:00
Sam Gross e69d068ad0
gh-117657: Fix race involving GC and heap initialization (#119923)
The `_PyThreadState_Bind()` function is called before the first
`PyEval_AcquireThread()` so it's not synchronized with the stop the
world GC. We had a race where `gc_visit_heaps()` might visit a thread's
heap while it's being initialized.

Use a simple atomic int to avoid visiting heaps for threads that are not
yet fully initialized (i.e., before `tstate_mimalloc_bind()` is called).

The race was reproducible by running:
`python Lib/test/test_importlib/partial/pool_in_threads.py`.
2024-06-04 09:42:13 -04:00
Sam Gross 47fb4327b5
gh-117657: Fix race involving immortalizing objects (#119927)
The free-threaded build currently immortalizes objects that use deferred
reference counting (see gh-117783). This typically happens once the
first non-main thread is created, but the behavior can be suppressed for
tests, in subinterpreters, or during a compile() call.

This fixes a race condition involving the tracking of whether the
behavior is suppressed.
2024-06-03 20:58:41 +00:00
Sam Gross 60593b2052
gh-117657: Fix TSAN race in free-threaded GC (#119883)
Only call `gc_restore_tid()` from stop-the-world contexts.
`worklist_pop()` can be called while other threads are running, so use a
relaxed atomic to modify `ob_tid`.
2024-06-01 10:04:05 -04:00
Victor Stinner aa61f8bfcf
gh-110850: Remove _PyTime_TimeUnchecked() function (#118552)
Use the new public Raw functions:

* _PyTime_PerfCounterUnchecked() with PyTime_PerfCounterRaw()
* _PyTime_TimeUnchecked() with PyTime_TimeRaw()
* _PyTime_MonotonicUnchecked() with PyTime_MonotonicRaw()

Remove internal functions:

* _PyTime_PerfCounterUnchecked()
* _PyTime_TimeUnchecked()
* _PyTime_MonotonicUnchecked()
2024-05-05 12:15:19 +02:00
Alex Turner 2ba1aed596
gh-117657: TSAN fix race on `gstate->young.count` (#118313) 2024-04-29 20:26:26 +00:00
Sam Gross 7ccacb220d
gh-117783: Immortalize objects that use deferred reference counting (#118112)
Deferred reference counting is not fully implemented yet. As a temporary
measure, we immortalize objects that would use deferred reference
counting to avoid multi-threaded scaling bottlenecks.

This is only performed in the free-threaded build once the first
non-main thread is started. Additionally, some tests, including refleak
tests, suppress this behavior.
2024-04-29 14:36:02 -04:00
Sam Gross 4ad8f090cc
gh-117376: Partial implementation of deferred reference counting (#117696)
This marks objects as using deferred refrence counting using the
`ob_gc_bits` field in the free-threaded build and collects those objects
during GC.
2024-04-12 17:36:20 +00:00
Sam Gross 1a6594f661
gh-117439: Make refleak checking thread-safe without the GIL (#117469)
This keeps track of the per-thread total reference count operations in
PyThreadState in the free-threaded builds. The count is merged into the
interpreter's total when the thread exits.
2024-04-08 12:11:36 -04:00
Mark Shannon c32dc47aca
GH-115776: Embed the values array into the object, for "normal" Python objects. (GH-116115) 2024-04-02 11:59:21 +01:00
Sam Gross f05fb2e65c
gh-112529: Don't untrack tuples or dicts with zero refcount (#117370)
The free-threaded GC sometimes sees objects with zero refcount. This can
happen due to the delay in merging biased reference counting fields,
and, in the future, due to deferred reference counting. We should not
untrack these objects or they will never be collected.

This fixes the refleaks in the free-threaded build.
2024-03-29 13:33:04 -04:00
Mark Shannon e28477f214
GH-117108: Change the size of the GC increment to about 1% of the total heap size. (GH-117120) 2024-03-22 18:43:25 +00:00
Mark Shannon 15309329b6
GH-108362: Incremental Cycle GC (GH-116206) 2024-03-20 08:54:42 +00:00
Sam Gross 5d72b75388
gh-116604: Check for `gcstate->enabled` in _Py_RunGC in free-threaded build (#116663)
This isn't strictly necessary because the implementation of `gc_should_collect`
already checks `gcstate->enabled` in the free-threaded build, but it seems
like a good idea until the common pieces of gc.c and gc_free_threading.c are
refactored out.
2024-03-12 17:12:02 +00:00
Dino Viehland 7db871e4fa
gh-112075: Support freeing object memory via QSBR (#116344)
Free objects with qsbr if shared
2024-03-08 09:56:36 -08:00
Donghee Na 2d4955fcf2
gh-116397: Move the _PyGC_ClearAllFreeLists to the safe point (gh-116414) 2024-03-07 08:29:39 +09:00
Donghee Na 2e91578a76
gh-115103: Update refleak checker to trigger _PyMem_ProcessDelayed (gh-116238) 2024-03-03 06:44:16 +09:00
Sam Gross df5212df6c
gh-112529: Simplify PyObject_GC_IsTracked and PyObject_GC_IsFinalized (#114732) 2024-02-28 15:37:59 -05:00
Victor Stinner 145bc2d638
gh-110850: Use public PyTime functions (#115746)
Replace private _PyTime functions with public PyTime functions.

random_seed_time_pid() now reports errors to its caller.
2024-02-20 23:31:30 +00:00
Victor Stinner 52d1477566
gh-110850: Rename internal PyTime C API functions (#115734)
Rename functions:

* _PyTime_GetSystemClock() => _PyTime_TimeUnchecked()
* _PyTime_GetPerfCounter() => _PyTime_PerfCounterUnchecked()
* _PyTime_GetMonotonicClock() => _PyTime_MonotonicUnchecked()
* _PyTime_GetSystemClockWithInfo() => _PyTime_TimeWithInfo()
* _PyTime_GetMonotonicClockWithInfo() => _PyTime_MonotonicWithInfo()
* _PyTime_GetMonotonicClockWithInfo() => _PyTime_MonotonicWithInfo()

Changes:

* Remove "typedef PyTime_t PyTime_t;" which was
  "typedef PyTime_t _PyTime_t;" before a previous rename.
* Update comments of "Unchecked" functions.
* Remove invalid PyTime_Time() comment.
2024-02-20 22:16:37 +00:00
Victor Stinner d207c7cd5a
gh-110850: Cleanup pycore_time.h includes (#115724)
<pycore_time.h> include is no longer needed to get the PyTime_t type
in internal header files. This type is now provided by <Python.h>
include. Add <pycore_time.h> includes to C files instead.
2024-02-20 16:50:43 +00:00
Victor Stinner 9af80ec83d
gh-110850: Replace _PyTime_t with PyTime_t (#115719)
Run command:

sed -i -e 's!\<_PyTime_t\>!PyTime_t!g' $(find -name "*.c" -o -name "*.h")
2024-02-20 15:02:27 +00:00
Brett Simmers 0749244d13
gh-112175: Add `eval_breaker` to `PyThreadState` (#115194)
This change adds an `eval_breaker` field to `PyThreadState`. The primary
motivation is for performance in free-threaded builds: with thread-local eval
breakers, we can stop a specific thread (e.g., for an async exception) without
interrupting other threads.

The source of truth for the global instrumentation version is stored in the
`instrumentation_version` field in PyInterpreterState. Threads usually read the
version from their local `eval_breaker`, where it continues to be colocated
with the eval breaker bits.
2024-02-20 09:57:48 -05:00
Sam Gross b24c9161a6
gh-112529: Make the GC scheduling thread-safe (#114880)
The GC keeps track of the number of allocations (less deallocations)
since the last GC. This buffers the count in thread-local state and uses
atomic operations to modify the per-interpreter count. The thread-local
buffering avoids contention on shared state.

A consequence is that the GC scheduling is not as precise, so
"test_sneaky_frame_object" is skipped because it requires that the GC be
run exactly after allocating a frame object.
2024-02-16 11:22:27 -05:00
Donghee Na f15795c9a0
gh-111968: Rename freelist related struct names to Eric's suggestion (gh-115329) 2024-02-14 00:32:51 +00:00
Donghee Na d4d5bae147
gh-111968: Refactor _PyXXX_Fini to integrate with _PyObject_ClearFreeLists (gh-114899) 2024-02-10 00:57:04 +00:00
Sam Gross a3af3cb4f4
gh-110481: Implement inter-thread queue for biased reference counting (#114824)
Biased reference counting maintains two refcount fields in each object:
`ob_ref_local` and `ob_ref_shared`. The true refcount is the sum of these two
fields. In some cases, when refcounting operations are split across threads,
the ob_ref_shared field can be negative (although the total refcount must be
at least zero). In this case, the thread that decremented the refcount
requests that the owning thread give up ownership and merge the refcount
fields.
2024-02-09 17:08:32 -05:00
Mark Shannon 8a3c499ffe
GH-108362: Revert "GH-108362: Incremental GC implementation (GH-108038)" (#115132)
Revert "GH-108362: Incremental GC implementation (GH-108038)"

This reverts commit 36518e69d7.
2024-02-07 12:38:34 +00:00
Mark Shannon 36518e69d7
GH-108362: Incremental GC implementation (GH-108038) 2024-02-05 18:28:51 +00:00
Donghee Na 13907968d7
gh-111968: Use per-thread freelists for dict in free-threading (gh-114323) 2024-02-01 20:53:53 +00:00
Sam Gross 587d480203
gh-112529: Remove PyGC_Head from object pre-header in free-threaded build (#114564)
* gh-112529: Remove PyGC_Head from object pre-header in free-threaded build

This avoids allocating space for PyGC_Head in the free-threaded build.
The GC implementation for free-threaded CPython does not use the
PyGC_Head structure.

 * The trashcan mechanism uses the `ob_tid` field instead of `_gc_prev`
   in the free-threaded build.
 * The GDB libpython.py file now determines the offset of the managed
   dict field based on whether the running process is a free-threaded
   build. Those are identified by the `ob_ref_local` field in PyObject.
 * Fixes `_PySys_GetSizeOf()` which incorrectly incorrectly included the
   size of `PyGC_Head` in the size of static `PyTypeObject`.
2024-02-01 12:29:19 -08:00
Sam Gross e6d6d5dcc0
gh-114746: Avoid quadratic behavior in free-threaded GC (GH-114817)
The free-threaded build's GC implementation is non-generational, but was
scheduled as if it were collecting a young generation leading to
quadratic behavior. This increases the minimum threshold and scales it
to the number of live objects as we do for the old generation in the
default build.

Note that the scheduling is still not thread-safe without the GIL. Those
changes will come in later PRs.

A few tests, like "test_sneaky_frame_object" rely on prompt scheduling
of the GC. For now, to keep that test passing, we disable the scaled
threshold after calls like `gc.set_threshold(1, 0, 0)`.
2024-02-01 10:26:23 +01:00
Sam Gross b52fc70d1a
gh-112529: Implement GC for free-threaded builds (#114262)
* gh-112529: Implement GC for free-threaded builds

This implements a mark and sweep GC for the free-threaded builds of
CPython. The implementation relies on mimalloc to find GC tracked
objects (i.e., "containers").
2024-01-25 10:27:36 -08:00
Donghee Na 7fa511ba57
gh-111968: Use per-thread freelists for generator in free-threading (gh-114189) 2024-01-18 18:15:00 +00:00
Donghee Na 867f59f234
gh-111968: Use per-thread freelists for PyContext in free-threading (gh-114122) 2024-01-16 16:14:56 +00:00
Donghee Na 2e7577b622
gh-111968: Use per-thread freelists for tuple in free-threading (gh-113921) 2024-01-12 03:46:28 +09:00
Donghee Na f728f7242c
gh-111968: Use per-thread freelists for float in free-threading (gh-113886) 2024-01-10 15:47:13 +00:00
Donghee Na 57bdc6c30d
gh-111968: Introduce _PyFreeListState and _PyFreeListState_GET API (gh-113584) 2024-01-10 08:04:41 +09:00