Revert "gh-115398: Increment PyExpat_CAPI_MAGIC for SetReparseDeferralEnabled addition (GH-116301)"
This reverts part of commit eda2963378. Why? this comment buried in an earlier code review explains:
I checked again how that value is used in practice, it's here:
0c80da4c14/Modules/_elementtree.c (L4363-L4372)
Based on that code my understanding is that loading bigger structs from the future is considered okay unless `PyExpat_CAPI_MAGIC` differs, which implies that (1) magic needs to stay the same to support loading the future from the past and (2) that `PyExpat_CAPI_MAGIC` should only ever change for changes that do not increase size (but keep it constant).
To summarize, that supports your argument.
I checked branches 3.8, 3.9, 3.10, 3.11, 3.12 now and they all have the same comparison code there so reverting that magic string bump will support seamless backporting.
This implements the delayed reuse of mimalloc pages that contain Python
objects in the free-threaded build.
Allocations of the same size class are grouped in data structures called
pages. These are different from operating system pages. For thread-safety, we
want to ensure that memory used to store PyObjects remains valid as long as
there may be concurrent lock-free readers; we want to delay using it for
other size classes, in other heaps, or returning it to the operating system.
When a mimalloc page becomes empty, instead of immediately freeing it, we tag
it with a QSBR goal and insert it into a per-thread state linked list of
pages to be freed. When mimalloc needs a fresh page, we process the queue and
free any still empty pages that are now deemed safe to be freed. Pages
waiting to be freed are still available for allocations of the same size
class and allocating from a page prevent it from being freed. There is
additional logic to handle abandoned pages when threads exit.
This sets `MI_DEBUG` to `2` in debug builds to enable `mi_assert_internal()`
calls. Expensive internal assertions are not enabled.
This also disables an assertion in free-threaded builds that would be
triggered by the free-threaded GC because we traverse heaps that are not
owned by the current thread.
* Increment PyExpat_CAPI_MAGIC due to SetReparseDeferralEnabled addition.
This is a followup to git commit
6a95676bb5 from Github PR #115623.
* RESTify news API list.
Make `_thread.ThreadHandle` thread-safe in free-threaded builds
We protect the mutable state of `ThreadHandle` using a `_PyOnceFlag`.
Concurrent operations (i.e. `join` or `detach`) on `ThreadHandle` block
until it is their turn to execute or an earlier operation succeeds.
Once an operation has been applied successfully all future operations
complete immediately.
The `join()` method is now idempotent. It may be called multiple times
but the underlying OS thread will only be joined once. After `join()`
succeeds, any future calls to `join()` will succeed immediately.
The internal thread handle `detach()` method has been removed.
Allow controlling Expat >=2.6.0 reparse deferral (CVE-2023-52425) by adding five new methods:
- `xml.etree.ElementTree.XMLParser.flush`
- `xml.etree.ElementTree.XMLPullParser.flush`
- `xml.parsers.expat.xmlparser.GetReparseDeferralEnabled`
- `xml.parsers.expat.xmlparser.SetReparseDeferralEnabled`
- `xml.sax.expatreader.ExpatParser.flush`
Based on the "flush" idea from https://github.com/python/cpython/pull/115138#issuecomment-1932444270 .
### Notes
- Please treat as a security fix related to CVE-2023-52425.
Includes code suggested-by: Snild Dolkow <snild@sony.com>
and by core dev Serhiy Storchaka.
This changes the `sym_set_...()` functions to return a `bool` which is `false`
when the symbol is `bottom` after the operation.
All calls to such functions now check this result and go to `hit_bottom`,
a special error label that prints a different message and then reports
that it wasn't able to optimize the trace. No executor will be produced
in this case.
This undoes the *temporary* default disabling of the T2 optimizer pass in gh-115860.
- Add a new test that reproduces Brandt's example from gh-115859; it indeed crashes before gh-116028 with PYTHONUOPSOPTIMIZE=1
- Re-enable the optimizer pass in T2, stop checking PYTHONUOPSOPTIMIZE
- Rename the env var to disable T2 entirely to PYTHON_UOPS_OPTIMIZE (must be explicitly set to 0 to disable)
- Fix skipIf conditions on tests in test_opt.py accordingly
- Export sym_is_bottom() (for debugging)
- Fix various things in the `_BINARY_OP_` specializations in the abstract interpreter:
- DECREF(temp)
- out-of-space check after sym_new_const()
- add sym_matches_type() checks, so even if we somehow reach a binary op with symbolic constants of the wrong type on the stack we won't trigger the type assert
- Any `sym_set_...` call that attempts to set conflicting information
cause the symbol to become `bottom` (contradiction).
- All `sym_is...` and similar calls return false or NULL for `bottom`.
- Everything's tested.
- The tests still pass with `PYTHONUOPSOPTIMIZE=1`.
* Rename _Py_UOpsAbstractInterpContext to _Py_UOpsContext and _Py_UOpsSymType to _Py_UopsSymbol.
* #define shortened form of _Py_uop_... names for improved readability.
PyTime_t no longer uses an arbitrary unit, it's always a number of
nanoseconds (64-bit signed integer).
* Rename _PyTime_FromNanosecondsObject() to _PyTime_FromLong().
* Rename _PyTime_AsNanosecondsObject() to _PyTime_AsLong().
* Remove pytime_from_nanoseconds().
* Remove pytime_as_nanoseconds().
* Remove _PyTime_FromNanoseconds().
Remove references to the old names _PyTime_MIN
and _PyTime_MAX, now that PyTime_MIN and
PyTime_MAX are public.
Replace also _PyTime_MIN with PyTime_MIN.
This adds `_PyMem_FreeDelayed()` and supporting functions. The
`_PyMem_FreeDelayed()` function frees memory with the same allocator as
`PyMem_Free()`, but after some delay to ensure that concurrent lock-free
readers have finished.
<pycore_time.h> include is no longer needed to get the PyTime_t type
in internal header files. This type is now provided by <Python.h>
include. Add <pycore_time.h> includes to C files instead.
This avoids filling the memory occupied by ob_tid, ob_ref_local, and
ob_ref_shared with debug bytes (e.g., 0xDD) in mimalloc in the
free-threaded build.
This change adds an `eval_breaker` field to `PyThreadState`. The primary
motivation is for performance in free-threaded builds: with thread-local eval
breakers, we can stop a specific thread (e.g., for an async exception) without
interrupting other threads.
The source of truth for the global instrumentation version is stored in the
`instrumentation_version` field in PyInterpreterState. Threads usually read the
version from their local `eval_breaker`, where it continues to be colocated
with the eval breaker bits.
Keep the old private _PyCFunctionFastWithKeywords name (Python 3.7)
as an alias to the new public name PyCFunctionFastWithKeywords
(Python 3.13a4).
_PyCFunctionWithKeywords doesn't exist in Python 3.13a3, whereas
_PyCFunctionFastWithKeywords was removed in Python 3.13a4.
This adds a safe memory reclamation scheme based on FreeBSD's "GUS" and
quiescent state based reclamation (QSBR). The API provides a mechanism
for callers to detect when it is safe to free memory that may be
concurrently accessed by readers.
The ID of the owning thread (`rlock_owner`) may be accessed by
multiple threads without holding the underlying lock; relaxed
atomics are used in place of the previous loads/stores.
The number of times that the lock has been acquired (`rlock_count`)
is only ever accessed by the thread that holds the lock; we do not
need to use atomics to access it.
The GC keeps track of the number of allocations (less deallocations)
since the last GC. This buffers the count in thread-local state and uses
atomic operations to modify the per-interpreter count. The thread-local
buffering avoids contention on shared state.
A consequence is that the GC scheduling is not as precise, so
"test_sneaky_frame_object" is skipped because it requires that the GC be
run exactly after allocating a frame object.
Makes _PyType_Lookup thread safe, including:
Thread safety of the underlying cache.
Make mutation of mro and type members thread safe
Also _PyType_GetMRO and _PyType_GetBases are currently returning borrowed references which aren't safe.
This adds `Py_XBEGIN_CRITICAL_SECTION` and
`Py_XEND_CRITICAL_SECTION`, which accept a possibly NULL object as an
argument. If the argument is NULL, then nothing is locked or unlocked.
Otherwise, they behave like `Py_BEGIN/END_CRITICAL_SECTION`.
Use critical sections to make deque methods that operate on mutable
state thread-safe when the GIL is disabled. This is mostly accomplished
by using the @critical_section Argument Clinic directive, though there
are a few places where this was not possible and critical sections had
to be manually acquired/released.
Add PythonFinalizationError exception. This exception derived from
RuntimeError is raised when an operation is blocked during the Python
finalization.
The following functions now raise PythonFinalizationError, instead of
RuntimeError:
* _thread.start_new_thread()
* subprocess.Popen
* os.fork()
* os.fork1()
* os.forkpty()
Morever, _winapi.Overlapped finalizer now logs an unraisable
PythonFinalizationError, instead of an unraisable RuntimeError.
The free-threaded GC uses mimallocs segment thread IDs to restore
the overwritten `ob_tid` thread ids in PyObjects. For that reason, it's
important that PyObjects and mimalloc use the same identifiers.
These are intended to be used in places where atomics are required in
free-threaded builds but not in the default build. We don't want to
introduce the potential performance overhead of an atomic operation in the
default build.
For the most part, these changes make is substantially easier to backport subinterpreter-related code to 3.12, especially the related modules (e.g. _xxsubinterpreters). The main motivation is to support releasing a PyPI package with the 3.13 capabilities compiled for 3.12.
A lot of the changes here involve either hiding details behind macros/functions or splitting up some files.
We add _winapi.BatchedWaitForMultipleObjects to wait for larger numbers of handles.
This is an internal module, hence undocumented, and should be used with caution.
Check the docstring for info before using BatchedWaitForMultipleObjects.
Biased reference counting maintains two refcount fields in each object:
`ob_ref_local` and `ob_ref_shared`. The true refcount is the sum of these two
fields. In some cases, when refcounting operations are split across threads,
the ob_ref_shared field can be negative (although the total refcount must be
at least zero). In this case, the thread that decremented the refcount
requests that the owning thread give up ownership and merge the refcount
fields.
Starts adding thread safety to dict objects.
Use @critical_section for APIs which are exposed via argument clinic and don't directly correlate with a public C API which needs to acquire the lock
Use a _lock_held suffix for keeping changes to complicated functions simple and just wrapping them with a critical section
Acquire and release the lock in an existing function where it won't be overly disruptive to the existing logic
This marks dead ThreadHandles as non-joinable earlier in
`PyOS_AfterFork_Child()` before we execute any Python code. The handles
are stored in a global linked list in `_PyRuntimeState` because `fork()`
affects the entire process.
The `PyDict_SetDefaultRef` function is similar to `PyDict_SetDefault`,
but returns a strong reference through the optional `**result` pointer
instead of a borrowed reference.
Co-authored-by: Petr Viktorin <encukou@gmail.com>
Add optional 'filter' parameter to iterdump() that allows a "LIKE"
pattern for filtering database objects to dump.
Co-authored-by: Erlend E. Aasland <erlend@python.org>
The new `PyList_GetItemRef` is similar to `PyList_GetItem`, but returns
a strong reference instead of a borrowed reference. Additionally, if the
passed "list" object is not a list, the function sets a `TypeError`
instead of calling `PyErr_BadInternalCall()`.
* gh-112529: Remove PyGC_Head from object pre-header in free-threaded build
This avoids allocating space for PyGC_Head in the free-threaded build.
The GC implementation for free-threaded CPython does not use the
PyGC_Head structure.
* The trashcan mechanism uses the `ob_tid` field instead of `_gc_prev`
in the free-threaded build.
* The GDB libpython.py file now determines the offset of the managed
dict field based on whether the running process is a free-threaded
build. Those are identified by the `ob_ref_local` field in PyObject.
* Fixes `_PySys_GetSizeOf()` which incorrectly incorrectly included the
size of `PyGC_Head` in the size of static `PyTypeObject`.
Add an option (--enable-experimental-jit for configure-based builds
or --experimental-jit for PCbuild-based ones) to build an
*experimental* just-in-time compiler, based on copy-and-patch (https://fredrikbk.com/publications/copy-and-patch.pdf).
See Tools/jit/README.md for more information on how to install the required build-time tooling.
For interpreters that share state with the main interpreter, this points
to the same static memory structure. For interpreters with their own
obmalloc state, it is heap allocated. Add free_obmalloc_arenas() which
will free the obmalloc arenas and radix tree structures for interpreters
with their own obmalloc state.
Co-authored-by: Eric Snow <ericsnowcurrently@gmail.com>