* Replace PyLong_AS_LONG() with PyLong_AsLong().
* Call PyLong_AsLong() only once per the replacement code.
* Use PyMapping_GetOptionalItem() instead of PyObject_GetItem().
Check that the current default heap is initialized in
`_mi_os_get_aligned_hint` and `mi_os_claim_huge_pages`.
The mimalloc function `_mi_os_get_aligned_hint` assumes that there is an
initialized default heap. This is true for our main thread, but not for
background threads. The problematic code path is usually called during
initialization (i.e., `Py_Initialize`), but it may also be called if the
program allocates large amounts of memory in total.
The crash only affected the free-threaded build.
There were a still a number of gaps in the tests, including not looking
at all the builtin types and not checking wrappers in subinterpreters
that weren't in the main interpreter. This fixes all that.
I considered incorporating the names of the PyTypeObject fields
(a la gh-122866), but figured doing so doesn't add much value.
This replaces `_PyList_FromArraySteal` with `_PyList_FromStackRefSteal`.
It's functionally equivalent, but takes a `_PyStackRef` array instead of
an array of `PyObject` pointers.
Co-authored-by: Ken Jin <kenjin@python.org>
* Parameters after the var-positional parameter are now keyword-only
instead of positional-or-keyword.
* Correctly calculate min_kw_only.
* Raise errors for invalid combinations of the var-positional parameter
with "*", "/" and deprecation markers.
We were not properly accounting for interpreter memory leaks at
shutdown and had two sources of leaks:
* Objects that use deferred reference counting and were reachable via
static types outlive the final GC. We now disable deferred reference
counting on all objects if we are calling the GC due to interpreter
shutdown.
* `_PyMem_FreeDelayed` did not properly check for interpreter shutdown
so we had some memory blocks that were enqueued to be freed, but
never actually freed.
* `_PyType_FinalizeIdPool` wasn't called at interpreter shutdown.
Return -1 and set an exception on error; return 0 if the iterator is
exhausted, and return 1 if the next item was fetched successfully.
Prefer this API to PyIter_Next(), which requires the caller to use
PyErr_Occurred() to differentiate between iterator exhaustion and errors.
Co-authered-by: Irit Katriel <iritkatriel@yahoo.com>
The free-threaded build partially stores heap type reference counts in
distributed manner in per-thread arrays. This avoids reference count
contention when creating or destroying instances.
Co-authored-by: Ken Jin <kenjin@python.org>
The `PyStructSequence` destructor would crash if it was deallocated after
its type's dictionary was cleared by the GC, because it couldn't compute
the "real size" of the instance. This could occur with relatively
straightforward code in the free-threaded build or with a reference
cycle involving the type in the default build, due to differing orders
in which `tp_clear()` was called.
Account for the non-sequence fields in `tp_basicsize` and use that,
along with `Py_SIZE()`, to compute the "real" size of a
`PyStructSequence` in the dealloc function. This avoids the accesses to
the type's dictionary during dealloc, which were unsafe.
* gh-120974: Make _asyncio._leave_task atomic in the free-threaded build
Update `_PyDict_DelItemIf` to allow for an argument to be passed to the
predicate.
This refactors asyncio to use the common freelist helper functions and
macros. As a side effect, the freelist for _asyncio.Future is now
re-enabled in the free-threaded build.
This combines and updates our freelist handling to use a consistent
implementation. Objects in the freelist are linked together using the
first word of memory block.
If configured with freelists disabled, these operations are essentially
no-ops.
compare_unicode_generic(), compare_unicode_unicode() and
compare_generic() are callbacks used by do_lookup(). When enabling
assertions, it's not possible to inline these functions.
* Switch PyUnicode_InternInPlace to _PyUnicode_InternMortal, clarify docs
* Document immortality in some functions that take `const char *`
This is PyUnicode_InternFromString;
PyDict_SetItemString, PyObject_SetAttrString;
PyObject_DelAttrString; PyUnicode_InternFromString;
and the PyModule_Add convenience functions.
Always point out a non-immortalizing alternative.
* Don't immortalize user-provided attr names in _ctypes
We should maintain the invariant that a zero `ob_tid` implies the
refcount fields are merged.
* Move the assignment in `_Py_MergeZeroLocalRefcount` to immediately
before the refcount merge.
* Update `_PyTrash_thread_destroy_chain` to set `ob_ref_shared` to
`_Py_REF_MERGED` when setting `ob_tid` to zero.
Also check this invariant with assertions in the GC in debug builds.
That uncovered a bug when running out of memory during GC.
They are alternate constructors which only accept numbers
(including objects with special methods __float__, __complex__
and __index__), but not strings.
Performance improvement to `float.fromhex`: use a lookup table
for computing the hexadecimal value of a character, in place of the
previous switch-case construct. Patch by Bruno Lima.
* The result has type Py_ssize_t, not intptr_t.
* Type cast between unsigned and signdet integer types should be explicit.
* Downcasting should be explicit.
* Fix integer overflow check in sum().
When builtin static types are initialized for a subinterpreter, various "tp" slots have already been inherited (for the main interpreter). This was interfering with the logic in add_operators() (in Objects/typeobject.c), causing a wrapper to get created when it shouldn't. This change fixes that by preserving the original data from the static type struct and checking that.
The `used` field must be written using atomic stores because `set_len`
and iterators may access the field concurrently without holding the
per-object lock.
The `_PySeqLock_EndRead` function needs an acquire fence to ensure that
the load of the sequence happens after any loads within the read side
critical section. The missing fence can trigger bugs on macOS arm64.
Additionally, we need a release fence in `_PySeqLock_LockWrite` to
ensure that the sequence update is visible before any modifications to
the cache entry.
Make error message for index() methods consistent
Remove the repr of the searched value (which can be arbitrary large)
from ValueError messages for list.index(), range.index(), deque.index(),
deque.remove() and ShareableList.index(). Make the error messages
consistent with error messages for other index() and remove()
methods.
Refactor the fast Unicode hash check into `_PyObject_HashFast` and use relaxed
atomic loads in the free-threaded build.
After this change, the TSAN doesn't report data races for this method.
In some cases, previously computed as (nan+nanj), we could
recover meaningful component values in the result, see
e.g. the C11, Annex G.5.2, routine _Cdivd().
Fix warnings when using -Wimplicit-fallthrough compiler flag.
Annotate explicitly "fall through" switch cases with a new
_Py_FALLTHROUGH macro which uses __attribute__((fallthrough)) if
available. Replace "fall through" comments with _Py_FALLTHROUGH.
Add _Py__has_attribute() macro. No longer define __has_attribute()
macro if it's not defined. Move also _Py__has_builtin() at the top
of pyport.h.
Co-Authored-By: Nikita Sobolev <mail@sobolevn.me>
This PR sets up tagged pointers for CPython.
The general idea is to create a separate struct _PyStackRef for everything on the evaluation stack to store the bits. This forces the C compiler to warn us if we try to cast things or pull things out of the struct directly.
Only for free threading: We tag the low bit if something is deferred - that means we skip incref and decref operations on it. This behavior may change in the future if Mark's plans to defer all objects in the interpreter loop pans out.
This implies a strict stack reference discipline is required. ALL incref and decref operations on stackrefs must use the stackref variants. It is unsafe to untag something then do normal incref/decref ops on it.
The new incref and decref variants are called dup and close. They mimic a "handle" API operating on these stackrefs.
Please read Include/internal/pycore_stackref.h for more information!
---------
Co-authored-by: Mark Shannon <9448417+markshannon@users.noreply.github.com>
Remove the const qualifier of the argument of functions:
* _PyLong_IsCompact()
* _PyLong_CompactValue()
Py_TYPE() argument is not const.
Fix the compiler warning:
Include/cpython/longintrepr.h: In function ‘_PyLong_CompactValue’:
Include/pyport.h:19:31: error: cast discards ‘const’ qualifier from
pointer target type [-Werror=cast-qual]
(...)
Include/cpython/longintrepr.h:133:30: note: in expansion of macro
‘Py_TYPE’
assert(PyType_HasFeature(Py_TYPE(op), Py_TPFLAGS_LONG_SUBCLASS));
PyDict_Next no longer locks the dictionary in the free-threaded build. Locking
around individual PyDict_Next calls is not sufficient because the function
returns borrowed references and because it allows concurrent modifications
during the iteraiton loop.
The internal locking also interferes with correct external synchronization
because it may suspend outer critical sections created by the caller.
Moves the logic to update the type's dictionary to its own function in order
to make the lock scoping more clear.
Also, ensure that `name` is decref'd on the error path.
PyUnicode_FromFormat() no longer produces the ending \ufffd
character for truncated C string when use precision with %s and %V.
It now truncates the string before the start of truncated multibyte sequences.
This makes the following macros public as part of the non-limited C-API for
locking a single object or two objects at once.
* `Py_BEGIN_CRITICAL_SECTION(op)` / `Py_END_CRITICAL_SECTION()`
* `Py_BEGIN_CRITICAL_SECTION2(a, b)` / `Py_END_CRITICAL_SECTION2()`
The supporting functions and structs used by the macros are also exposed for
cases where C macros are not available.
* Add an InternalDocs file describing how interning should work and how to use it.
* Add internal functions to *explicitly* request what kind of interning is done:
- `_PyUnicode_InternMortal`
- `_PyUnicode_InternImmortal`
- `_PyUnicode_InternStatic`
* Switch uses of `PyUnicode_InternInPlace` to those.
* Disallow using `_Py_SetImmortal` on strings directly.
You should use `_PyUnicode_InternImmortal` instead:
- Strings should be interned before immortalization, otherwise you're possibly
interning a immortalizing copy.
- `_Py_SetImmortal` doesn't handle the `SSTATE_INTERNED_MORTAL` to
`SSTATE_INTERNED_IMMORTAL` update, and those flags can't be changed in
backports, as they are now part of public API and version-specific ABI.
* Add private `_only_immortal` argument for `sys.getunicodeinternedsize`, used in refleak test machinery.
* Make sure the statically allocated string singletons are unique. This means these sets are now disjoint:
- `_Py_ID`
- `_Py_STR` (including the empty string)
- one-character latin-1 singletons
Now, when you intern a singleton, that exact singleton will be interned.
* Add a `_Py_LATIN1_CHR` macro, use it instead of `_Py_ID`/`_Py_STR` for one-character latin-1 singletons everywhere (including Clinic).
* Intern `_Py_STR` singletons at startup.
* For free-threaded builds, intern `_Py_LATIN1_CHR` singletons at startup.
* Beef up the tests. Cover internal details (marked with `@cpython_only`).
* Add lots of assertions
Co-Authored-By: Eric Snow <ericsnowcurrently@gmail.com>
The public PyUnicodeWriter API enables overallocation by default and
so is more efficient.
Benchmark:
python -m pyperf timeit \
-s 't = int | float | complex | str | bytes | bytearray' \
' | memoryview | list | dict' \
'str(t)'
Result:
1.29 us +- 0.02 us -> 1.00 us +- 0.02 us: 1.29x faster
The public PyUnicodeWriter API enables overallocation by default and
so is more efficient.
Benchmark:
python -m pyperf timeit \
-s 't = list[int, float, complex, str, bytes, bytearray, ' \
'memoryview, list, dict]' \
'str(t)'
Result:
1.49 us +- 0.03 us -> 1.10 us +- 0.02 us: 1.35x faster