Starting in Python 3.12, we prevented calling fork() and starting new threads
during interpreter finalization (shutdown). This has led to a number of
regressions and flaky tests. We should not prevent starting new threads
(or `fork()`) until all non-daemon threads exit and finalization starts in
earnest.
This changes the checks to use `_PyInterpreterState_GetFinalizing(interp)`,
which is set immediately before terminating non-daemon threads.
* Split long.c tests of _testcapi into two parts: limited C API tests
in _testlimitedcapi and non-limited C API tests in _testcapi.
* Move testcapi_long.h from Modules/_testcapi/ to
Modules/_testlimitedcapi/.
* Add MODULE__TESTLIMITEDCAPI_DEPS to Makefile.pre.in.
Split unicode.c tests of _testcapi into two parts: limited C API
tests in _testlimitedcapi and non-limited C API tests in _testcapi.
Update test_codecs.
Split abstract.c and float.c tests of _testcapi into two parts:
limited C API tests in _testlimitedcapi and non-limited C API tests
in _testcapi.
Update test_bytes and test_class.
This includes adding what should be a relatively temporary
`Modules/_decimal/windows/mpdecimal.h` shim to choose between `mpdecimal32vc.h`
or `mpdecimal64vc.h` based on which of `CONFIG_64` or `CONFIG_32` is defined.
Even though it has no internal references to Python objects it still
has a reference to its type by virtue of being a heap type. We need
to provide a traverse function that visits the type, but we do not
need to provide a clear function.
There is a race between when `Thread._tstate_lock` is released[^1] in `Thread._wait_for_tstate_lock()`
and when `Thread._stop()` asserts[^2] that it is unlocked. Consider the following execution
involving threads A, B, and C:
1. A starts.
2. B joins A, blocking on its `_tstate_lock`.
3. C joins A, blocking on its `_tstate_lock`.
4. A finishes and releases its `_tstate_lock`.
5. B acquires A's `_tstate_lock` in `_wait_for_tstate_lock()`, releases it, but is swapped
out before calling `_stop()`.
6. C is scheduled, acquires A's `_tstate_lock` in `_wait_for_tstate_lock()` but is swapped
out before releasing it.
7. B is scheduled, calls `_stop()`, which asserts that A's `_tstate_lock` is not held.
However, C holds it, so the assertion fails.
The race can be reproduced[^3] by inserting sleeps at the appropriate points in
the threading code. To do so, run the `repro_join_race.py` from the linked repo.
There are two main parts to this PR:
1. `_tstate_lock` is replaced with an event that is attached to `PyThreadState`.
The event is set by the runtime prior to the thread being cleared (in the same
place that `_tstate_lock` was released). `Thread.join()` blocks waiting for the
event to be set.
2. `_PyInterpreterState_WaitForThreads()` provides the ability to wait for all
non-daemon threads to exit. To do so, an `is_daemon` predicate was added to
`PyThreadState`. This field is set each time a thread is created. `threading._shutdown()`
now calls into `_PyInterpreterState_WaitForThreads()` instead of waiting on
`_tstate_lock`s.
[^1]: 441affc9e7/Lib/threading.py (L1201)
[^2]: 441affc9e7/Lib/threading.py (L1115)
[^3]: 8194653279
---------
Co-authored-by: blurb-it[bot] <43283697+blurb-it[bot]@users.noreply.github.com>
Co-authored-by: Antoine Pitrou <antoine@python.org>
Use the NtQueryInformationProcess system call to efficiently retrieve the parent process ID in a single step, rather than using the process snapshots API which retrieves large amounts of unnecessary information and is more prone to failure (since it makes heap allocations).
Includes a fallback to the original win32_getppid implementation in case the unstable API appears to return strange results.
The fildes converter of Argument Clinic now always call
PyObject_AsFileDescriptor(), not only for the limited C API.
The _PyLong_FileDescriptor_Converter() converter stays as a fallback
when PyObject_AsFileDescriptor() cannot be used.
Return 0 on success. Set an exception and return -1 on error.
Fix os.timerfd_settime(): properly report exceptions on
_PyTime_FromSecondsDouble() failure.
No longer export _PyTime_FromSecondsDouble().
Move the following files from Modules/_testcapi/ to
Modules/_testlimitedcapi/:
* bytearray.c
* bytes.c
* pyos.c
* sys.c
Changes:
* Replace PyBytes_AS_STRING() with PyBytes_AsString().
* Replace PyBytes_GET_SIZE() with PyBytes_Size().
* Update related test_capi tests.
* Copy Modules/_testcapi/util.h to Modules/_testlimitedcapi/util.h.
Add a new C extension "_testlimitedcapi" which is only built with the
limited C API.
Move heaptype_relative.c and vectorcall_limited.c from
Modules/_testcapi/ to Modules/_testlimitedcapi/.
* configure: add _testlimitedcapi test extension.
* Update generate_stdlib_module_names.py.
* Update make check-c-globals.
Co-authored-by: Erlend E. Aasland <erlend.aasland@protonmail.com>
Previously, the `locked` field was set after releasing the lock. This reverses
the order so that the `locked` field is set while the lock is still held.
There is still one thread-safety issue where `locked` is checked prior to
releasing the lock, however, in practice that will only be an issue when
unlocking the lock is contended, which should be rare.
The problem manifested when the .py module got reloaded and the corresponding extension module didn't. The .py module registers types with the extension and the extension was not allowing that to happen more than once. The solution: let it happen more than once.