Replace the private _PyUnicodeWriter with the public PyUnicodeWriter.
Replace PyObject_Repr() + _PyUnicodeWriter_WriteStr()
with PyUnicodeWriter_WriteRepr().
Fix a crash caused by immortal interned strings being shared between
sub-interpreters that use basic single-phase init. In that case, the string
can be used by an interpreter that outlives the interpreter that created and
interned it. For interpreters that share obmalloc state, also share the
interned dict with the main interpreter.
This is an un-revert of gh-124646 that then addresses the Py_TRACE_REFS
failures identified by gh-124785.
* Replace unicode_compare_eq() with unicode_eq().
* Use unicode_eq() in setobject.c.
* Replace _PyUnicode_EQ() with _PyUnicode_Equal().
* Remove unicode_compare_eq() and _PyUnicode_EQ().
* Make slices marshallable
* Emit slices as constants
* Update Python/marshal.c
Co-authored-by: Peter Bierma <zintensitydev@gmail.com>
* Refactor codegen_slice into two functions so it
always has the same net effect
* Fix for free-threaded builds
* Simplify marshal loading of slices
* Only return SUCCESS/ERROR from codegen_slice
---------
Co-authored-by: Mark Shannon <mark@hotpy.org>
Co-authored-by: Peter Bierma <zintensitydev@gmail.com>
Stop the world when invalidating function versions
The tier1 interpreter specializes `CALL` instructions based on the values
of certain function attributes (e.g. `__code__`, `__defaults__`). The tier1
interpreter uses function versions to verify that the attributes of a function
during execution of a specialization match those seen during specialization.
A function's version is initialized in `MAKE_FUNCTION` and is invalidated when
any of the critical function attributes are changed. The tier1 interpreter stores
the function version in the inline cache during specialization. A guard is used by
the specialized instruction to verify that the version of the function on the operand
stack matches the cached version (and therefore has all of the expected attributes).
It is assumed that once the guard passes, all attributes will remain unchanged
while executing the rest of the specialized instruction.
Stopping the world when invalidating function versions ensures that all critical
function attributes will remain unchanged after the function version guard passes
in free-threaded builds. It's important to note that this is only true if the remainder
of the specialized instruction does not enter and exit a stop-the-world point.
We will stop the world the first time any of the following function attributes
are mutated:
- defaults
- vectorcall
- kwdefaults
- closure
- code
This should happen rarely and only happens once per function, so the performance
impact on majority of code should be minimal.
Additionally, refactor the API for manipulating function versions to more clearly
match the stated semantics.
Currently, we only use per-thread reference counting for heap type objects and
the naming reflects that. We will extend it to a few additional types in an
upcoming change to avoid scaling bottlenecks when creating nested functions.
Rename some of the files and functions in preparation for this change.
Instead of be limited just by the size of addressable memory (2**63
bytes), Python integers are now also limited by the number of bits, so
the number of bit now always fit in a 64-bit integer.
Both limits are much larger than what might be available in practice,
so it doesn't affect users.
_PyLong_NumBits() and _PyLong_Frexp() are now always successful.
* Setting the __module__ attribute for a class now removes the
__firstlineno__ item from the type's dict.
* The _collections_abc and _pydecimal modules now completely replace the
collections.abc and decimal modules after importing them. This
allows to get the source of classes and functions defined in these
modules.
* inspect.findsource() now checks whether the first line number for a
class is out of bound.
Use a `_PyStackRef` and defer the reference to `f_funcobj` when
possible. This avoids some reference count contention in the common case
of executing the same code object from multiple threads concurrently in
the free-threaded build.
We need to return immediately if there's an error during dictionary
lookup.
Also avoid the conditional-if operator. MSVC versions through v19.27 miscompile
compound literals with side effects within a conditional operator. This caused
crashes in the Windows10 buildbot.
Use a `_PyStackRef` and defer the reference to `f_executable` when
possible. This avoids some reference count contention in the common case
of executing the same code object from multiple threads concurrently in
the free-threaded build.
In gh-121602, I applied a fix to a builtin types initialization bug.
That fix made sense in the context of some broader future changes,
but introduced a little bit of extra complexity. That fix has turned
out to be incomplete for some of the builtin types we haven't
been testing. I found that out while improving the tests.
A while back, @markshannon suggested a simpler fix that doesn't
have that problem, which I've already applied to 3.12 and 3.13.
I'm switching to that here. Given the potential long-term
benefits of the more complex (but still incomplete) approach,
I'll circle back to it in the future, particularly after I've improved
the tests so no corner cases slip through the cracks.
(This is effectively a "forward-port" of 716c677 from 3.13.)
Switch more _Py_IsImmortal(...) assertions to _Py_IsImmortalLoose(...)
The remaining calls to _Py_IsImmortal are in free-threaded-only code,
initialization of core objects, tests, and guards that fall back to
code that works with mortal objects.
* Replace PyLong_AS_LONG() with PyLong_AsLong().
* Call PyLong_AsLong() only once per the replacement code.
* Use PyMapping_GetOptionalItem() instead of PyObject_GetItem().
Check that the current default heap is initialized in
`_mi_os_get_aligned_hint` and `mi_os_claim_huge_pages`.
The mimalloc function `_mi_os_get_aligned_hint` assumes that there is an
initialized default heap. This is true for our main thread, but not for
background threads. The problematic code path is usually called during
initialization (i.e., `Py_Initialize`), but it may also be called if the
program allocates large amounts of memory in total.
The crash only affected the free-threaded build.
There were a still a number of gaps in the tests, including not looking
at all the builtin types and not checking wrappers in subinterpreters
that weren't in the main interpreter. This fixes all that.
I considered incorporating the names of the PyTypeObject fields
(a la gh-122866), but figured doing so doesn't add much value.
This replaces `_PyList_FromArraySteal` with `_PyList_FromStackRefSteal`.
It's functionally equivalent, but takes a `_PyStackRef` array instead of
an array of `PyObject` pointers.
Co-authored-by: Ken Jin <kenjin@python.org>
* Parameters after the var-positional parameter are now keyword-only
instead of positional-or-keyword.
* Correctly calculate min_kw_only.
* Raise errors for invalid combinations of the var-positional parameter
with "*", "/" and deprecation markers.
We were not properly accounting for interpreter memory leaks at
shutdown and had two sources of leaks:
* Objects that use deferred reference counting and were reachable via
static types outlive the final GC. We now disable deferred reference
counting on all objects if we are calling the GC due to interpreter
shutdown.
* `_PyMem_FreeDelayed` did not properly check for interpreter shutdown
so we had some memory blocks that were enqueued to be freed, but
never actually freed.
* `_PyType_FinalizeIdPool` wasn't called at interpreter shutdown.
Return -1 and set an exception on error; return 0 if the iterator is
exhausted, and return 1 if the next item was fetched successfully.
Prefer this API to PyIter_Next(), which requires the caller to use
PyErr_Occurred() to differentiate between iterator exhaustion and errors.
Co-authered-by: Irit Katriel <iritkatriel@yahoo.com>
The free-threaded build partially stores heap type reference counts in
distributed manner in per-thread arrays. This avoids reference count
contention when creating or destroying instances.
Co-authored-by: Ken Jin <kenjin@python.org>
The `PyStructSequence` destructor would crash if it was deallocated after
its type's dictionary was cleared by the GC, because it couldn't compute
the "real size" of the instance. This could occur with relatively
straightforward code in the free-threaded build or with a reference
cycle involving the type in the default build, due to differing orders
in which `tp_clear()` was called.
Account for the non-sequence fields in `tp_basicsize` and use that,
along with `Py_SIZE()`, to compute the "real" size of a
`PyStructSequence` in the dealloc function. This avoids the accesses to
the type's dictionary during dealloc, which were unsafe.