PyMutex is a one byte lock with fast, inlineable lock and unlock functions for the common uncontended case. The design is based on WebKit's WTF::Lock.
PyMutex is built using the _PyParkingLot APIs, which provides a cross-platform futex-like API (based on WebKit's WTF::ParkingLot). This internal API will be used for building other synchronization primitives used to implement PEP 703, such as one-time initialization and events.
This also includes tests and a mini benchmark in Tools/lockbench/lockbench.py to compare with the existing PyThread_type_lock.
Uncontended acquisition + release:
* Linux (x86-64): PyMutex: 11 ns, PyThread_type_lock: 44 ns
* macOS (arm64): PyMutex: 13 ns, PyThread_type_lock: 18 ns
* Windows (x86-64): PyMutex: 13 ns, PyThread_type_lock: 38 ns
PR Overview:
The primary purpose of this PR is to implement PyMutex, but there are a number of support pieces (described below).
* PyMutex: A 1-byte lock that doesn't require memory allocation to initialize and is generally faster than the existing PyThread_type_lock. The API is internal only for now.
* _PyParking_Lot: A futex-like API based on the API of the same name in WebKit. Used to implement PyMutex.
* _PyRawMutex: A word sized lock used to implement _PyParking_Lot.
* PyEvent: A one time event. This was used a bunch in the "nogil" fork and is useful for testing the PyMutex implementation, so I've included it as part of the PR.
* pycore_llist.h: Defines common operations on doubly-linked list. Not strictly necessary (could do the list operations manually), but they come up frequently in the "nogil" fork. ( Similar to https://man.freebsd.org/cgi/man.cgi?queue)
---------
Co-authored-by: Eric Snow <ericsnowcurrently@gmail.com>
* bpo-9566: Silence warnings from pyatomic.h macros
Apparently MSVC is too stupid to understand that the alternate branch is
not taken and emits a warning for it.
Warnings added in https://github.com/python/cpython/pull/2383
* bpo-9566: A better fix for the pyatomic.h warning
* bpo-9566: Remove a slash
_Py_atomic_* are currently not implemented as atomic operations
when building with MSVC. This patch attempts to implement parts
of the functionality required.
Issue #26161: Use Py_uintptr_t instead of void* for atomic pointers in
pyatomic.h. Use atomic_uintptr_t when <stdatomic.h> is used.
Using void* causes compilation warnings depending on which implementation of
atomic types is used.
Python.h header to fix a compilation error with OpenMP. PyThreadState_GET()
becomes an alias to PyThreadState_Get() to avoid ABI incompatibilies.
It is important that the _PyThreadState_Current variable is always accessed
with the same implementation of pyatomic.h. Use the PyThreadState_Get()
function so extension modules will all reuse the same implementation.
Disable completly pyatomic.h on C++, because <stdatomic.h> is not compatible with C++.
<pyatomic.h> is only needed by the optimized PyThreadState_GET() macro in
pystate.h. Instead, declare PyThreadState_GET() as an alias to
PyThreadState_Get(), as done for limited API.
http://code.google.com/p/data-race-test/wiki/ThreadSanitizer is a dynamic data
race detector that runs on top of valgrind. With this patch, the binaries at
http://code.google.com/p/data-race-test/wiki/ThreadSanitizer#Binaries pass many
but not all of the Python tests. All of regrtest still passes outside of tsan.
I've implemented part of the C1x atomic types so that we can explicitly mark
variables that are used across threads, and get defined behavior as compilers
advance.
I've added tsan's client header and implementation to the codebase in
dynamic_annotations.{h,c} (docs at
http://code.google.com/p/data-race-test/wiki/DynamicAnnotations).
Unfortunately, I haven't been able to get helgrind and drd to give sensible
error messages, even when I use their client annotations, so I'm not supporting
them.