* Move existing tests for PySys_GetObject() and PySys_SetObject() into
specialized files.
* Add test for PySys_GetXOptions() using _testcapi.
* Add tests for PySys_FormatStdout(), PySys_FormatStderr(),
PySys_WriteStdout() and PySys_WriteStderr() using ctypes.
The docs state that the space, tab, colon, and comma characters are
ignored in Py_BuildValue() format strings.
Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
PyMutex is a one byte lock with fast, inlineable lock and unlock functions for the common uncontended case. The design is based on WebKit's WTF::Lock.
PyMutex is built using the _PyParkingLot APIs, which provides a cross-platform futex-like API (based on WebKit's WTF::ParkingLot). This internal API will be used for building other synchronization primitives used to implement PEP 703, such as one-time initialization and events.
This also includes tests and a mini benchmark in Tools/lockbench/lockbench.py to compare with the existing PyThread_type_lock.
Uncontended acquisition + release:
* Linux (x86-64): PyMutex: 11 ns, PyThread_type_lock: 44 ns
* macOS (arm64): PyMutex: 13 ns, PyThread_type_lock: 18 ns
* Windows (x86-64): PyMutex: 13 ns, PyThread_type_lock: 38 ns
PR Overview:
The primary purpose of this PR is to implement PyMutex, but there are a number of support pieces (described below).
* PyMutex: A 1-byte lock that doesn't require memory allocation to initialize and is generally faster than the existing PyThread_type_lock. The API is internal only for now.
* _PyParking_Lot: A futex-like API based on the API of the same name in WebKit. Used to implement PyMutex.
* _PyRawMutex: A word sized lock used to implement _PyParking_Lot.
* PyEvent: A one time event. This was used a bunch in the "nogil" fork and is useful for testing the PyMutex implementation, so I've included it as part of the PR.
* pycore_llist.h: Defines common operations on doubly-linked list. Not strictly necessary (could do the list operations manually), but they come up frequently in the "nogil" fork. ( Similar to https://man.freebsd.org/cgi/man.cgi?queue)
---------
Co-authored-by: Eric Snow <ericsnowcurrently@gmail.com>
On a Python built in debug mode, Py_DECREF() now calls
_Py_NegativeRefcount() if the object is a dangling pointer to
deallocated memory: memory filled with 0xDD "dead byte" by the debug
hook on memory allocators. The fix is to check the reference count
*before* checking for _Py_IsImmortal().
Add test_decref_freed_object() to test_capi.test_misc.
* Rename SAVE_IP to _SET_IP
* Rename EXIT_TRACE to _EXIT_TRACE
* Rename SAVE_CURRENT_IP to _SAVE_CURRENT_IP
* Rename INSERT to _INSERT (This is for Ken Jin's abstract interpreter)
* Rename IS_NONE to _IS_NONE
* Rename JUMP_TO_TOP to _JUMP_TO_TOP
This adds a 16-bit inline cache entry to the conditional branch instructions POP_JUMP_IF_{FALSE,TRUE,NONE,NOT_NONE} and their instrumented variants, which is used to keep track of the branch direction.
Each time we encounter these instructions we shift the cache entry left by one and set the bottom bit to whether we jumped.
Then when it's time to translate such a branch to Tier 2 uops, we use the bit count from the cache entry to decided whether to continue translating the "didn't jump" branch or the "jumped" branch.
The counter is initialized to a pattern of alternating ones and zeros to avoid bias.
The .pyc file magic number is updated. There's a new test, some fixes for existing tests, and a few miscellaneous cleanups.
This mis-initialization caused the executor optimization to kick in sooner than intended. It also set the lower 4 bits of the counter to `1` -- those bits are supposed to be reserved (the actual counter is in the upper 12 bits).
This finishes the work begun in gh-107760. When, while projecting a superblock, we encounter a call to a short, simple function, the superblock will now enter the function using `_PUSH_FRAME`, continue through it, and leave it using `_POP_FRAME`, and then continue through the original code. Multiple frame pushes and pops are even possible. It is also possible to stop appending to the superblock in the middle of a called function, when running out of space or encountering an unsupported bytecode.
* Split `CALL_PY_EXACT_ARGS` into uops
This is only the first step for doing `CALL` in Tier 2.
The next step involves tracing into the called code object and back.
After that we'll have to do the remaining `CALL` specialization.
Finally we'll have to deal with `KW_NAMES`.
Note: this moves setting `frame->return_offset` directly in front of
`DISPATCH_INLINED()`, to make it easier to move it into `_PUSH_FRAME`.
Cover all the Mapping Protocol, almost all the Sequence Protocol
(except PySequence_Fast) and a part of the Object Protocol.
Move existing tests to Lib/test/test_capi/test_abstract.py and
Modules/_testcapi/abstract.c.
Add also tests for PyDict C API.
These aren't automatically translated because (ironically)
they are macros deferring to POP_JUMP_IF_{TRUE,FALSE},
which are not viable uops (being manually translated).
The hack is that we emit IS_NONE and then set opcode and
jump to the POP_JUMP_IF_{TRUE,FALSE} translation code.
The Tier 2 opcode _IS_ITER_EXHAUSTED_LIST (and _TUPLE)
didn't set it->it_seq to NULL, causing a subtle bug
that resulted in test_exhausted_iterator in list_tests.py
to fail when running all tests with -Xuops.
The bug was introduced in gh-106696.
Added this as an explicit test.
Also fixed the dependencies for ceval.o -- it depends on executor_cases.c.h.
During superblock generation, a JUMP_BACKWARD instruction is translated to either a JUMP_TO_TOP micro-op (when the target of the jump is exactly the beginning of the superblock, closing the loop), or a SAVE_IP + EXIT_TRACE pair, when the jump goes elsewhere.
The new JUMP_TO_TOP instruction includes a CHECK_EVAL_BREAKER() call, so a closed loop can still be interrupted.
- Hand-written uops JUMP_IF_{TRUE,FALSE}.
These peek at the top of the stack.
The jump target (in superblock space) is absolute.
- Hand-written translation for POP_JUMP_IF_{TRUE,FALSE},
assuming the jump is unlikely.
Once we implement jump-likelihood profiling,
we can implement the jump-unlikely case (in another PR).
- Tests (including some test cleanup).
- Improvements to len(ex) and ex[i] to expose the whole trace.
This adds several of unspecialized opcodes to superblocks:
TO_BOOL, BINARY_SUBSCR, STORE_SUBSCR,
UNPACK_SEQUENCE, LOAD_GLOBAL, LOAD_ATTR,
COMPARE_OP, BINARY_OP.
While we may not want that eventually, for now this helps finding bugs.
There is a rudimentary test checking for UNPACK_SEQUENCE.
Once we're ready to undo this, that would be simple:
just replace the call to variable_used_unspecialized
with a call to variable_used (as shown in a comment).
Or add individual opcdes to FORBIDDEN_NAMES_IN_UOPS.
The uops test wasn't testing anything by default,
and was failing when run with -Xuops.
Made the two executor-related context managers global,
so TestUops can use them (notably `with temporary_optimizer(opt)`).
Made clear_executor() a little more thorough.
Fixed a crash upon finalizing a uop optimizer,
by adding a `tp_dealloc` handler.
test_counter_optimizer() and test_long_loop() of test_capi now create
a new function at each call. Otherwise, the optimizer counters are
not the expected values when the test is run more than once.
For a while now, pending calls only run in the main thread (in the main interpreter). This PR changes things to allow any thread run a pending call, unless the pending call was explicitly added for the main thread to run.
When I added the relevant condition to type_ready_set_bases() in gh-103912, I had missed that the function also sets tp_base and ob_type (if necessary). That led to problems for third-party static types.
We fix that here, by making those extra operations distinct and by adjusting the condition to be more specific.
In gh-103912 we added tp_bases and tp_mro to each PyInterpreterState.types.builtins entry. However, doing so ignored the fact that both PyTypeObject fields are public API, and not documented as internal (as opposed to tp_subclasses). We address that here by reverting back to shared objects, making them immortal in the process.