Fix mimalloc allocator for huge memory allocation (around
8,589,934,592 GiB) on s390x.
Abort allocation early in mimalloc if the number of slices doesn't
fit into uint32_t, to prevent a integer overflow (cast 64-bit
size_t to uint32_t).
This implements the delayed reuse of mimalloc pages that contain Python
objects in the free-threaded build.
Allocations of the same size class are grouped in data structures called
pages. These are different from operating system pages. For thread-safety, we
want to ensure that memory used to store PyObjects remains valid as long as
there may be concurrent lock-free readers; we want to delay using it for
other size classes, in other heaps, or returning it to the operating system.
When a mimalloc page becomes empty, instead of immediately freeing it, we tag
it with a QSBR goal and insert it into a per-thread state linked list of
pages to be freed. When mimalloc needs a fresh page, we process the queue and
free any still empty pages that are now deemed safe to be freed. Pages
waiting to be freed are still available for allocations of the same size
class and allocating from a page prevent it from being freed. There is
additional logic to handle abandoned pages when threads exit.
This sets `MI_DEBUG` to `2` in debug builds to enable `mi_assert_internal()`
calls. Expensive internal assertions are not enabled.
This also disables an assertion in free-threaded builds that would be
triggered by the free-threaded GC because we traverse heaps that are not
owned by the current thread.
This avoids filling the memory occupied by ob_tid, ob_ref_local, and
ob_ref_shared with debug bytes (e.g., 0xDD) in mimalloc in the
free-threaded build.
Fixes a few issues related to refleak tracking in the free-threaded build:
- Count blocks in abandoned segments
- Call `_mi_page_free_collect` earlier during heap traversal in order to get an accurate count of blocks in use.
- Add missing refcount tracking in `_Py_DecRefSharedDebug` and `_Py_ExplicitMergeRefcount`.
- Pause threads in `get_num_global_allocated_blocks` to ensure that traversing the mimalloc heaps is safe.
This adds support for visiting abandoned pages in mimalloc and improves
the performance of the page visiting code. Abandoned pages contain
memory blocks from threads that have exited. At some point, they may be
later reclaimed by other threads. We still need to visit those pages in
the free-threaded GC because they contain live objects.
This also reduces the overhead of visiting mimalloc pages:
* Special cases for full, empty, and pages containing only a single
block.
* Fix free_map to use one bit instead of one byte per block.
* Use fast integer division by a constant algorithm when computing
block offset from block size and index.
* gh-112532: Tag mimalloc heaps and pages
Mimalloc pages are data structures that contain contiguous allocations
of the same block size. Note that they are distinct from operating
system pages. Mimalloc pages are contained in segments.
When a thread exits, it abandons any segments and contained pages that
have live allocations. These segments and pages may be later reclaimed
by another thread. To support GC and certain thread-safety guarantees in
free-threaded builds, we want pages to only be reclaimed by the
corresponding heap in the claimant thread. For example, we want pages
containing GC objects to only be claimed by GC heaps.
This allows heaps and pages to be tagged with an integer tag that is
used to ensure that abandoned pages are only claimed by heaps with the
same tag. Heaps can be initialized with a tag (0-15); any page allocated
by that heap copies the corresponding tag.
* Fix conversion warning
* gh-112532: Isolate abandoned segments by interpreter
Mimalloc segments are data structures that contain memory allocations along
with metadata. Each segment is "owned" by a thread. When a thread exits,
it abandons its segments to a global pool to be later reclaimed by other
threads. This changes the pool to be per-interpreter instead of process-wide.
This will be important for when we use mimalloc to find GC objects in the
`--disable-gil` builds. We want heaps to only store Python objects from a
single interpreter. Absent this change, the abandoning and reclaiming process
could break this isolation.
* Add missing '&_mi_abandoned_default' to 'tld_empty'
* gh-112532: Use separate mimalloc heaps for GC objects
In `--disable-gil` builds, we now use four separate heaps in
anticipation of using mimalloc to find GC objects when the GIL is
disabled. To support this, we also make a few changes to mimalloc:
* `mi_heap_t` and `mi_tld_t` initialization is split from allocation.
This allows us to have a `mi_tld_t` per-`PyThreadState`, which is
important to keep interpreter isolation, since the same OS thread may
run in multiple interpreters (using different PyThreadStates.)
* Heap abandoning (mi_heap_collect_ex) can now be called from a
different thread than the one that created the heap. This is necessary
because we may clear and delete the containing PyThreadStates from a
different thread during finalization and after fork().
* Use enum instead of defines and guard mimalloc includes.
* The enum typedef will be convenient for future PRs that use the type.
* Guarding the mimalloc includes allows us to unconditionally include
pycore_mimalloc.h from other header files that rely on things like
`struct _mimalloc_thread_state`.
* Only define _mimalloc_thread_state in Py_GIL_DISABLED builds
gh-112027: Don't print mimalloc warning after mmap
This changes the warning to a "verbose"-level message in prim.c. The
address passed to mmap is only a hint -- it's normal for mmap() to
sometimes not respect the hint and return a different address.
* Add mimalloc v2.12
Modified src/alloc.c to remove include of alloc-override.c and not
compile new handler.
Did not include the following files:
- include/mimalloc-new-delete.h
- include/mimalloc-override.h
- src/alloc-override-osx.c
- src/alloc-override.c
- src/static.c
- src/region.c
mimalloc is thread safe and shares a single heap across all runtimes,
therefore finalization and getting global allocated blocks across all
runtimes is different.
* mimalloc: minimal changes for use in Python:
- remove debug spam for freeing large allocations
- use same bytes (0xDD) for freed allocations in CPython and mimalloc
This is important for the test_capi debug memory tests
* Don't export mimalloc symbol in libpython.
* Enable mimalloc as Python allocator option.
* Add mimalloc MIT license.
* Log mimalloc in Lib/test/pythoninfo.py.
* Document new mimalloc support.
* Use macro defs for exports as done in:
https://github.com/python/cpython/pull/31164/
Co-authored-by: Sam Gross <colesbury@gmail.com>
Co-authored-by: Christian Heimes <christian@python.org>
Co-authored-by: Victor Stinner <vstinner@python.org>