This PR sets up tagged pointers for CPython.
The general idea is to create a separate struct _PyStackRef for everything on the evaluation stack to store the bits. This forces the C compiler to warn us if we try to cast things or pull things out of the struct directly.
Only for free threading: We tag the low bit if something is deferred - that means we skip incref and decref operations on it. This behavior may change in the future if Mark's plans to defer all objects in the interpreter loop pans out.
This implies a strict stack reference discipline is required. ALL incref and decref operations on stackrefs must use the stackref variants. It is unsafe to untag something then do normal incref/decref ops on it.
The new incref and decref variants are called dup and close. They mimic a "handle" API operating on these stackrefs.
Please read Include/internal/pycore_stackref.h for more information!
---------
Co-authored-by: Mark Shannon <9448417+markshannon@users.noreply.github.com>
Remove the const qualifier of the argument of functions:
* _PyLong_IsCompact()
* _PyLong_CompactValue()
Py_TYPE() argument is not const.
Fix the compiler warning:
Include/cpython/longintrepr.h: In function ‘_PyLong_CompactValue’:
Include/pyport.h:19:31: error: cast discards ‘const’ qualifier from
pointer target type [-Werror=cast-qual]
(...)
Include/cpython/longintrepr.h:133:30: note: in expansion of macro
‘Py_TYPE’
assert(PyType_HasFeature(Py_TYPE(op), Py_TPFLAGS_LONG_SUBCLASS));
PyDict_Next no longer locks the dictionary in the free-threaded build. Locking
around individual PyDict_Next calls is not sufficient because the function
returns borrowed references and because it allows concurrent modifications
during the iteraiton loop.
The internal locking also interferes with correct external synchronization
because it may suspend outer critical sections created by the caller.
Moves the logic to update the type's dictionary to its own function in order
to make the lock scoping more clear.
Also, ensure that `name` is decref'd on the error path.
PyUnicode_FromFormat() no longer produces the ending \ufffd
character for truncated C string when use precision with %s and %V.
It now truncates the string before the start of truncated multibyte sequences.
This makes the following macros public as part of the non-limited C-API for
locking a single object or two objects at once.
* `Py_BEGIN_CRITICAL_SECTION(op)` / `Py_END_CRITICAL_SECTION()`
* `Py_BEGIN_CRITICAL_SECTION2(a, b)` / `Py_END_CRITICAL_SECTION2()`
The supporting functions and structs used by the macros are also exposed for
cases where C macros are not available.
* Add an InternalDocs file describing how interning should work and how to use it.
* Add internal functions to *explicitly* request what kind of interning is done:
- `_PyUnicode_InternMortal`
- `_PyUnicode_InternImmortal`
- `_PyUnicode_InternStatic`
* Switch uses of `PyUnicode_InternInPlace` to those.
* Disallow using `_Py_SetImmortal` on strings directly.
You should use `_PyUnicode_InternImmortal` instead:
- Strings should be interned before immortalization, otherwise you're possibly
interning a immortalizing copy.
- `_Py_SetImmortal` doesn't handle the `SSTATE_INTERNED_MORTAL` to
`SSTATE_INTERNED_IMMORTAL` update, and those flags can't be changed in
backports, as they are now part of public API and version-specific ABI.
* Add private `_only_immortal` argument for `sys.getunicodeinternedsize`, used in refleak test machinery.
* Make sure the statically allocated string singletons are unique. This means these sets are now disjoint:
- `_Py_ID`
- `_Py_STR` (including the empty string)
- one-character latin-1 singletons
Now, when you intern a singleton, that exact singleton will be interned.
* Add a `_Py_LATIN1_CHR` macro, use it instead of `_Py_ID`/`_Py_STR` for one-character latin-1 singletons everywhere (including Clinic).
* Intern `_Py_STR` singletons at startup.
* For free-threaded builds, intern `_Py_LATIN1_CHR` singletons at startup.
* Beef up the tests. Cover internal details (marked with `@cpython_only`).
* Add lots of assertions
Co-Authored-By: Eric Snow <ericsnowcurrently@gmail.com>
The public PyUnicodeWriter API enables overallocation by default and
so is more efficient.
Benchmark:
python -m pyperf timeit \
-s 't = int | float | complex | str | bytes | bytearray' \
' | memoryview | list | dict' \
'str(t)'
Result:
1.29 us +- 0.02 us -> 1.00 us +- 0.02 us: 1.29x faster
The public PyUnicodeWriter API enables overallocation by default and
so is more efficient.
Benchmark:
python -m pyperf timeit \
-s 't = list[int, float, complex, str, bytes, bytearray, ' \
'memoryview, list, dict]' \
'str(t)'
Result:
1.49 us +- 0.03 us -> 1.10 us +- 0.02 us: 1.35x faster
This exposes `PyUnstable_Object_ClearWeakRefsNoCallbacks` as an unstable
C-API function to provide a thread-safe mechanism for clearing weakrefs
without executing callbacks.
Some C-API extensions need to clear weakrefs without calling callbacks,
such as after running finalizers like we do in subtype_dealloc.
Previously they could use `_PyWeakref_ClearRef` on each weakref, but
that's not thread-safe in the free-threaded build.
Co-authored-by: Petr Viktorin <encukou@gmail.com>
In gh-120009 I used an atexit hook to finalize the _datetime module's static types at interpreter shutdown. However, atexit hooks are executed very early in finalization, which is a problem in the few cases where a subclass of one of those static types is still alive until the final GC collection. The static builtin types don't have this probably because they are finalized toward the end, after the final GC collection. To avoid the problem for _datetime, I have applied a similar approach here.
Also, credit goes to @mgorny and @neonene for the new tests.
FYI, I would have liked to take a slightly cleaner approach with managed static types, but wanted to get a smaller fix in first for the sake of backporting. I'll circle back to the cleaner approach with a future change on the main branch.
We need to write to `ob_ref_local` and `ob_tid` before `ob_ref_shared`.
Once we mark `ob_ref_shared` as merged, some other thread may free the
object because the caller also passes in `-1` as `extra` to give up its
only reference.
We make use of the same mechanism that we use for the static builtin types. This required a few tweaks.
The relevant code could use some cleanup but I opted to avoid the significant churn in this change. I'll tackle that separately.
This change is the final piece needed to make _datetime support multiple interpreters. I've updated the module slot accordingly.
The free-threaded build currently immortalizes objects that use deferred
reference counting (see gh-117783). This typically happens once the
first non-main thread is created, but the behavior can be suppressed for
tests, in subinterpreters, or during a compile() call.
This fixes a race condition involving the tracking of whether the
behavior is suppressed.