Add an option (--enable-experimental-jit for configure-based builds
or --experimental-jit for PCbuild-based ones) to build an
*experimental* just-in-time compiler, based on copy-and-patch (https://fredrikbk.com/publications/copy-and-patch.pdf).
See Tools/jit/README.md for more information on how to install the required build-time tooling.
For interpreters that share state with the main interpreter, this points
to the same static memory structure. For interpreters with their own
obmalloc state, it is heap allocated. Add free_obmalloc_arenas() which
will free the obmalloc arenas and radix tree structures for interpreters
with their own obmalloc state.
Co-authored-by: Eric Snow <ericsnowcurrently@gmail.com>
* gh-112529: Implement GC for free-threaded builds
This implements a mark and sweep GC for the free-threaded builds of
CPython. The implementation relies on mimalloc to find GC tracked
objects (i.e., "containers").
The `--disable-gil` builds occasionally need to pause all but one thread. Some
examples include:
* Cyclic garbage collection, where this is often called a "stop the world event"
* Before calling `fork()`, to ensure a consistent state for internal data structures
* During interpreter shutdown, to ensure that daemon threads aren't accessing Python objects
This adds the following functions to implement global and per-interpreter pauses:
* `_PyEval_StopTheWorldAll()` and `_PyEval_StartTheWorldAll()` (for the global runtime)
* `_PyEval_StopTheWorld()` and `_PyEval_StartTheWorld()` (per-interpreter)
(The function names may change.)
These functions are no-ops outside of the `--disable-gil` build.
* gh-112529: Use GC heaps for GC allocations in free-threaded builds
The free-threaded build's garbage collector implementation will need to
find GC objects by traversing mimalloc heaps. This hooks up the
allocation calls with the correct heaps by using a thread-local
"current_obj_heap" variable.
* Refactor out setting heap based on type
Fix a few places where the lltrace debug output printed ``(null)`` instead of an opcode name, because it was calling ``_PyUOpName()`` on a Tier-1 opcode.
It can be used to set the location of a .python_history file
---------
Co-authored-by: Levi Sabah <0xl3vi@gmail.com>
Co-authored-by: Hugo van Kemenade <hugovk@users.noreply.github.com>
This splits part of Modules/gcmodule.c of into Python/gc.c, which
now contains the core garbage collection implementation. The Python
module remain in the Modules/gcmodule.c file.
* gh-112532: Tag mimalloc heaps and pages
Mimalloc pages are data structures that contain contiguous allocations
of the same block size. Note that they are distinct from operating
system pages. Mimalloc pages are contained in segments.
When a thread exits, it abandons any segments and contained pages that
have live allocations. These segments and pages may be later reclaimed
by another thread. To support GC and certain thread-safety guarantees in
free-threaded builds, we want pages to only be reclaimed by the
corresponding heap in the claimant thread. For example, we want pages
containing GC objects to only be claimed by GC heaps.
This allows heaps and pages to be tagged with an integer tag that is
used to ensure that abandoned pages are only claimed by heaps with the
same tag. Heaps can be initialized with a tag (0-15); any page allocated
by that heap copies the corresponding tag.
* Fix conversion warning
* gh-112532: Isolate abandoned segments by interpreter
Mimalloc segments are data structures that contain memory allocations along
with metadata. Each segment is "owned" by a thread. When a thread exits,
it abandons its segments to a global pool to be later reclaimed by other
threads. This changes the pool to be per-interpreter instead of process-wide.
This will be important for when we use mimalloc to find GC objects in the
`--disable-gil` builds. We want heaps to only store Python objects from a
single interpreter. Absent this change, the abandoning and reclaiming process
could break this isolation.
* Add missing '&_mi_abandoned_default' to 'tld_empty'
* gh-112532: Use separate mimalloc heaps for GC objects
In `--disable-gil` builds, we now use four separate heaps in
anticipation of using mimalloc to find GC objects when the GIL is
disabled. To support this, we also make a few changes to mimalloc:
* `mi_heap_t` and `mi_tld_t` initialization is split from allocation.
This allows us to have a `mi_tld_t` per-`PyThreadState`, which is
important to keep interpreter isolation, since the same OS thread may
run in multiple interpreters (using different PyThreadStates.)
* Heap abandoning (mi_heap_collect_ex) can now be called from a
different thread than the one that created the heap. This is necessary
because we may clear and delete the containing PyThreadStates from a
different thread during finalization and after fork().
* Use enum instead of defines and guard mimalloc includes.
* The enum typedef will be convenient for future PRs that use the type.
* Guarding the mimalloc includes allows us to unconditionally include
pycore_mimalloc.h from other header files that rely on things like
`struct _mimalloc_thread_state`.
* Only define _mimalloc_thread_state in Py_GIL_DISABLED builds
Fix error check on mmap(2)
It should check MAP_FAILED instead of NULL for error.
On mmap(2) man page:
RETURN VALUE
On success, mmap() returns a pointer to the mapped area.
On error, the value MAP_FAILED (that is, (void *) -1) is
returned, and errno is set to indicate the error.
It was raised in two cases:
* in the import statement when looking up __import__
* in pickling some builtin type when looking up built-ins iter, getattr, etc.
We need the TracebackException of uncaught exceptions for a single purpose: the error display. Thus we only need to pass the formatted error display between interpreters. Passing a pickled TracebackException is overkill.
The `PyThreadState_Clear()` function must only be called with the GIL
held and must be called from the same interpreter as the passed in
thread state. Otherwise, any Python objects on the thread state may be
destroyed using the wrong interpreter, leading to memory corruption.
This is also important for `Py_GIL_DISABLED` builds because free lists
will be associated with PyThreadStates and cleared in
`PyThreadState_Clear()`.
This fixes two places that called `PyThreadState_Clear()` from the wrong
interpreter and adds an assertion to `PyThreadState_Clear()`.
When an exception is uncaught in Interpreter.exec_sync(), it helps to show that exception's error display if uncaught in the calling interpreter. We do so here by generating a TracebackException in the subinterpreter and passing it between interpreters using pickle.
This fixes a recently introduced bug where the deferred count is being unnecessarily decremented to counteract an increment elsewhere that is no longer happening. This caused the values to flip around to "very large" 64-bit numbers.
glibc-2.34 implements closefrom(3) using the same semantics as on BSD.
Check for closefrom() in configure and use the check result in
fileutils.c, rather than hardcoding a FreeBSD check.
Some implementations of closefrom() return an int. Explicitly discard
the return value by casting it to void, to avoid future compiler
warnings.
Signed-off-by: Sam James <sam@gentoo.org>
This replaces some usages of PyThread_type_lock with PyMutex, which does not require memory allocation to initialize.
This simplifies some of the runtime initialization and is also one step towards avoiding changing the default raw memory allocator during initialize/finalization, which can be non-thread-safe in some circumstances.
Every PyThreadState instance is now actually a _PyThreadStateImpl.
It is safe to cast from `PyThreadState*` to `_PyThreadStateImpl*` and back.
The _PyThreadStateImpl will contain fields that we do not want to expose
in the public C API.
This updates `dtoa.c` to avoid using the Bigint free-list in --disable-gil builds and
to pre-computes the needed powers of 5 during interpreter initialization.
* gh-111962: Make dtoa thread-safe in `--disable-gil` builds.
This avoids using the Bigint free-list in `--disable-gil` builds
and pre-computes the needed powers of 5 during interpreter initialization.
* Fix size of cached powers of 5 array.
We need the powers of 5 up to 5**512 because we only jump straight to
underflow when the exponent is less than -512 (or larger than 308).
* Rename Py_NOGIL to Py_GIL_DISABLED
* Changes from review
* Fix assertion placement
If the input prompt to the builtin input function on terminal has any null
character, then raise ValueError instead of silently truncating it.
Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
Previously arbitrary errors could be cleared during formatting error
messages for ImportError or AttributeError for modules. Now all
unexpected errors are reported.
* Implement _Py_HashPointerRaw() as a static inline function.
* Add Py_HashPointer() tests to test_capi.test_hash.
* Keep _Py_HashPointer() function as an alias to Py_HashPointer().
Change the declaration of the keywords parameter in functions
PyArg_ParseTupleAndKeywords() and PyArg_VaParseTupleAndKeywords() from `char **`
to `char * const *` in C and `const char * const *` in C++.
It makes these functions compatible with argument of type `const char * const *`,
`const char **` or `char * const *` in C++ and `char * const *` in C
without explicit type cast.
Co-authored-by: C.A.M. Gerlach <CAM.Gerlach@Gerlach.CAM>
Use a fraction internally in the _PyTime API to reduce the risk of
integer overflow: simplify the fraction using Greatest Common
Divisor (GCD). The fraction API is used by time functions:
perf_counter(), monotonic() and process_time().
For example, QueryPerformanceFrequency() usually returns 10 MHz on
Windows 10 and newer. The fraction SEC_TO_NS / frequency =
1_000_000_000 / 10_000_000 can be simplified to 100 / 1.
* Add _PyTimeFraction type.
* Add functions:
* _PyTimeFraction_Set()
* _PyTimeFraction_Mul()
* _PyTimeFraction_Resolution()
* No longer check "numer * denom <= _PyTime_MAX" in
_PyTimeFraction_Set(). _PyTimeFraction_Mul() uses _PyTime_Mul()
which handles integer overflow.
* Move _PyRuntimeState.time to _posixstate.ticks_per_second and
time_module_state.ticks_per_second.
* Add time_module_state.clocks_per_second.
* Rename _PyTime_GetClockWithInfo() to py_clock().
* Rename _PyTime_GetProcessTimeWithInfo() to py_process_time().
* Add process_time_times() helper function, called by
py_process_time().
* os.times() is now always built: no longer rely on HAVE_TIMES.
This makes the Tier 2 interpreter a little faster.
I calculated by about 3%,
though I hesitate to claim an exact number.
This starts by doubling the trace size limit (to 512),
making it more likely that loops fit in a trace.
The rest of the approach is to only load
`oparg` and `operand` in cases that use them.
The code generator know when these are used.
For `oparg`, it will conditionally emit
```
oparg = CURRENT_OPARG();
```
at the top of the case block.
(The `oparg` variable may be referenced multiple times
by the instructions code block, so it must be in a variable.)
For `operand`, it will use `CURRENT_OPERAND()` directly
instead of referencing the `operand` variable,
which no longer exists.
(There is only one place where this will be used.)
This uses the new mechanism whereby certain uops
are replaced by others during translation,
using the `_PyUop_Replacements` table.
We further special-case the `_FOR_ITER_TIER_TWO` uop
to update the deoptimization target to point
just past the corresponding `END_FOR` opcode.
Two tiny code cleanups are also part of this PR.
- Ensure that `assert(type_version != 0);` always comes *before* using `type_version`
Also:
- In cases_generator, rename `-v` to from `--verbose` to `--viable`
- Double max trace size to 256
- Add a dependency on executor_cases.c.h for ceval.o
- Mark `_SPECIALIZE_UNPACK_SEQUENCE` as `TIER_ONE_ONLY`
- Add debug output back showing the optimized trace
- Bunch of cleanups to Tools/cases_generator/
* Replace jumps with deopts in tier 2
* Fewer special cases of uop names
* Add target field to uop IR
* Remove more redundant SET_IP and _CHECK_VALIDITY micro-ops
* Extend whitelist of non-escaping API functions.
_PyDict_Pop_KnownHash(): remove the default value and the return type
becomes an int.
Co-authored-by: Stefan Behnel <stefan_ml@behnel.de>
Co-authored-by: Antoine Pitrou <pitrou@free.fr>
In PGO mode, this function caused a compiler error in MSVC.
It turns out that optimizing for space only save the day, and is even faster.
However, without PGO, this is neither necessary nor slower.
Critical sections are helpers to replace the global interpreter lock
with finer grained locking. They provide similar guarantees to the GIL
and avoid the deadlock risk that plain locking involves. Critical
sections are implicitly ended whenever the GIL would be released. They
are resumed when the GIL would be acquired. Nested critical sections
behave as if the sections were interleaved.
* Revert "gh-111089: Use PyUnicode_AsUTF8() in Argument Clinic (#111585)"
This reverts commit d9b606b3d0.
* Revert "gh-111089: Use PyUnicode_AsUTF8() in getargs.c (#111620)"
This reverts commit cde1071b2a.
* Revert "gh-111089: PyUnicode_AsUTF8() now raises on embedded NUL (#111091)"
This reverts commit d731579bfb.
* Revert "gh-111089: Add PyUnicode_AsUTF8() to the limited C API (#111121)"
This reverts commit d8f32be5b6.
* Revert "gh-111089: Use PyUnicode_AsUTF8() in sqlite3 (#111122)"
This reverts commit 37e4e20eaa.
I added _Py_excinfo to the internal API (and added its functions in Python/errors.c) in gh-111530 (9322ce9). Since then I've had a nagging sense that I should have added the type and functions in its own PR. While I do plan on using _Py_excinfo outside crossinterp.c very soon (see gh-111572/gh-111573), I'd still feel more comfortable if the _Py_excinfo stuff went in as its own PR. Hence, here we are.
(FWIW, I may combine that with gh-111572, which I may, in turn, combine with gh-111573. We'll see.)
Joining a thread now ensures the underlying OS thread has exited. This is required for safer fork() in multi-threaded processes.
---------
Co-authored-by: blurb-it[bot] <43283697+blurb-it[bot]@users.noreply.github.com>
Replace most of calls of _PyErr_WriteUnraisableMsg() and some
calls of PyErr_WriteUnraisable(NULL) with PyErr_FormatUnraisable().
Co-authored-by: Victor Stinner <vstinner@python.org>
This moves several general internal APIs out of _xxsubinterpretersmodule.c and into the new Python/crossinterp.c (and the corresponding internal headers).
Specifically:
* _Py_excinfo, etc.: the initial implementation for non-object exception snapshots (in pycore_pyerrors.h and Python/errors.c)
* _PyXI_exception_info, etc.: helpers for passing an exception beween interpreters (wraps _Py_excinfo)
* _PyXI_namespace, etc.: helpers for copying a dict of attrs between interpreters
* _PyXI_Enter(), _PyXI_Exit(): functions that abstract out the transitions between one interpreter and a second that will do some work temporarily
Again, these were all abstracted out of _xxsubinterpretersmodule.c as generalizations. I plan on proposing these as public API at some point.
- There is no longer a separate Python/executor.c file.
- Conventions in Python/bytecodes.c are slightly different -- don't use `goto error`,
you must use `GOTO_ERROR(error)` (same for others like `unused_local_error`).
- The `TIER_ONE` and `TIER_TWO` symbols are only valid in the generated (.c.h) files.
- In Lib/test/support/__init__.py, `Py_C_RECURSION_LIMIT` is imported from `_testcapi`.
- On Windows, in debug mode, stack allocation grows from 8MiB to 12MiB.
- **Beware!** This changes the env vars to enable uops and their debugging
to `PYTHON_UOPS` and `PYTHON_LLTRACE`.
Replace PyUnicode_AsUTF8AndSize() with PyUnicode_AsUTF8() to remove
the explicit check for embedded null characters.
The change avoids to have to include explicitly <string.h> to get the
strlen() function when using a recent version of the limited C API.
This is partly to clear this stuff out of pystate.c, but also in preparation for moving some code out of _xxsubinterpretersmodule.c. This change also moves this stuff to the internal API (new: Include/internal/pycore_crossinterp.h). @vstinner did this previously and I undid it. Now I'm re-doing it. :/
* gh-106320: Re-add _PyLong_FromByteArray(), _PyLong_AsByteArray() and _PyLong_GCD() to the public header files since they are used by third-party packages and there is no efficient replacement.
See https://github.com/python/cpython/issues/111140
See https://github.com/python/cpython/issues/111139
* gh-111262: Re-add _PyDict_Pop() to have a C-API until a new public one is designed.
There were a few things I did in gh-110565 that need to be fixed. I also forgot to add tests in that PR.
(Note that this PR exposes a refleak introduced by gh-110246. I'll take care of that separately.)
Move the following private functions and structures to
pycore_modsupport.h internal C API:
* _PyArg_BadArgument()
* _PyArg_CheckPositional()
* _PyArg_NoKeywords()
* _PyArg_NoPositional()
* _PyArg_ParseStack()
* _PyArg_ParseStackAndKeywords()
* _PyArg_Parser structure
* _PyArg_UnpackKeywords()
* _PyArg_UnpackKeywordsWithVararg()
* _PyArg_UnpackStack()
* _Py_ANY_VARARGS()
Changes:
* Python/getargs.h now includes pycore_modsupport.h to export
functions.
* clinic.py now adds pycore_modsupport.h when one of these functions
is used.
* Add pycore_modsupport.h includes when a C extension uses one of
these functions.
* Define Py_BUILD_CORE_MODULE in C extensions which now include
directly or indirectly (via code generated by Argument Clinic)
pycore_modsupport.h:
* _csv
* _curses_panel
* _dbm
* _gdbm
* _multiprocessing.posixshmem
* _sqlite.row
* _statistics
* grp
* resource
* syslog
* _testcapi: bad_get() no longer uses METH_FASTCALL calling
convention but METH_VARARGS. Replace _PyArg_UnpackStack() with
PyArg_ParseTuple().
* _testcapi: add PYTESTCAPI_NEED_INTERNAL_API macro which is defined
by _testcapi sub-modules which need the internal C API
(pycore_modsupport.h): exceptions.c, float.c, vectorcall.c,
watchers.c.
* Remove Include/cpython/modsupport.h header file.
Include/modsupport.h no longer includes the removed header file.
* Fix mypy clinic.py
It already mostly worked, except in the case when invalid keyword
argument with non-ASCII name was passed to function with non-ASCII
parameter names. Then it crashed in the debug mode.
* The lexer, which include the actual lexeme producing logic, goes into
the `lexer` directory.
* The wrappers, one wrapper per input mode (file, string, utf-8, and
readline), go into the `tokenizer` directory and include logic for
creating a lexer instance and managing the buffer for different modes.
---------
Co-authored-by: Pablo Galindo <pablogsal@gmail.com>
Co-authored-by: blurb-it[bot] <43283697+blurb-it[bot]@users.noreply.github.com>
The docs state that the space, tab, colon, and comma characters are
ignored in Py_BuildValue() format strings.
Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
Add wrapper for timerfd_create, timerfd_settime, and timerfd_gettime to os module.
Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
Co-authored-by: Adam Turner <9087854+AA-Turner@users.noreply.github.com>
Co-authored-by: Erlend E. Aasland <erlend.aasland@protonmail.com>
Co-authored-by: Victor Stinner <vstinner@python.org>
sys.audit() now has assertions to check that the event argument is
not NULL and that the format argument does not use the "N" format.
Add tests on PySys_AuditTuple().
This adds a new field 'state' to PyThreadState that can take on one of three values: _Py_THREAD_ATTACHED, _Py_THREAD_DETACHED, or _Py_THREAD_GC. The "attached" and "detached" states correspond closely to acquiring and releasing the GIL. The "gc" state is current unused, but will be used to implement stop-the-world GC for --disable-gil builds in the near future.
We do the following:
* add a per-interpreter XID registry (PyInterpreterState.xidregistry)
* put heap types there (keep static types in _PyRuntimeState.xidregistry)
* clear the registries during interpreter/runtime finalization
* avoid duplicate entries in the registry (when _PyCrossInterpreterData_RegisterClass() is called more than once for a type)
* use Py_TYPE() instead of PyObject_Type() in _PyCrossInterpreterData_Lookup()
The per-interpreter registry helps preserve isolation between interpreters. This is important when heap types are registered, which is something we haven't been doing yet but I will likely do soon.
In Python/bytecodes.c, you now write
```
DEOPT_IF(condition);
```
The code generator expands this to
```
DEOPT_IF(condition, opcode);
```
where `opcode` is the name of the unspecialized instruction.
This works inside macro expansions too.
**CAVEAT:** The entire `DEOPT_IF(condition)` statement must be on a single line.
If it isn't, the substitution will fail; an error will be printed by the code generator
and the C compiler will report some errors.
Add PyThreadState_GetUnchecked() function: similar to
PyThreadState_Get(), but don't issue a fatal error if it is NULL. The
caller is responsible to check if the result is NULL. Previously,
this function was private and known as _PyThreadState_UncheckedGet().
In a few places we switch to another interpreter without knowing if it has a thread state associated with the current thread. For the main interpreter there wasn't much of a problem, but for subinterpreters we were *mostly* okay re-using the tstate created with the interpreter (located via PyInterpreterState_ThreadHead()). There was a good chance that tstate wasn't actually in use by another thread.
However, there are no guarantees of that. Furthermore, re-using an already used tstate is currently fragile. To address this, now we create a new thread state in each of those places and use it.
One consequence of this change is that PyInterpreterState_ThreadHead() may not return NULL (though that won't happen for the main interpreter).
The existence of background threads running on a subinterpreter was preventing interpreters from getting properly destroyed, as well as impacting the ability to run the interpreter again. It also affected how we wait for non-daemon threads to finish.
We add PyInterpreterState.threads.main, with some internal C-API functions.
This change makes sure sys.path[0] is set properly for subinterpreters. Before, it wasn't getting set at all. This PR does not address the broader concerns from gh-109853.
* Remove unused <locale.h> includes.
* Remove unused <fcntl.h> include in traceback.h.
* Remove redundant <assert.h> and <stddef.h> includes. They are already
included by "Python.h".
* Remove <object.h> include in faulthandler.c. Python.h already includes it.
* Add missing <stdbool.h> in pycore_pythread.h if HAVE_PTHREAD_STUBS
is defined.
* Fix also warnings in pthread_stubs.h: don't redefine macros if they
are already defined, like the __NEED_pthread_t macro.
* pycore_pythread.h is now the central place to make sure that
_POSIX_THREADS and _POSIX_SEMAPHORES macros are defined if
available.
* Make sure that pycore_pythread.h is included when _POSIX_THREADS
and _POSIX_SEMAPHORES macros are tested.
* PY_TIMEOUT_MAX is now defined as a constant, since its value
depends on _POSIX_THREADS, instead of being defined as a macro.
* Prevent integer overflow in the preprocessor when computing
PY_TIMEOUT_MAX_VALUE on Windows:
replace "0xFFFFFFFELL * 1000 < LLONG_MAX"
with "0xFFFFFFFELL < LLONG_MAX / 1000".
* Document the change and give hints how to fix affected code.
* Add an exception for PY_TIMEOUT_MAX name to smelly.py
* Add PY_TIMEOUT_MAX to the stable ABI
These are the most popular specializations of `LOAD_ATTR` and `STORE_ATTR`
that weren't already viable uops:
* Split LOAD_ATTR_METHOD_WITH_VALUES
* Split LOAD_ATTR_METHOD_NO_DICT
* Split LOAD_ATTR_SLOT
* Split STORE_ATTR_SLOT
* Split STORE_ATTR_INSTANCE_VALUE
Also:
* Add `-v` flag to code generator which prints a list of non-viable uops
(easter-egg: it can print execution counts -- see source)
* Double _Py_UOP_MAX_TRACE_LENGTH to 128
I had dropped one of the DEOPT_IF() calls! :-(
PyImport_GetImporter() now sets RuntimeError if it fails to get sys.path_hooks
or sys.path_importer_cache or they are not list and dict correspondingly.
Previously it could return NULL without setting error in obscure cases,
crash or raise SystemError if these attributes have wrong type.
PyMutex is a one byte lock with fast, inlineable lock and unlock functions for the common uncontended case. The design is based on WebKit's WTF::Lock.
PyMutex is built using the _PyParkingLot APIs, which provides a cross-platform futex-like API (based on WebKit's WTF::ParkingLot). This internal API will be used for building other synchronization primitives used to implement PEP 703, such as one-time initialization and events.
This also includes tests and a mini benchmark in Tools/lockbench/lockbench.py to compare with the existing PyThread_type_lock.
Uncontended acquisition + release:
* Linux (x86-64): PyMutex: 11 ns, PyThread_type_lock: 44 ns
* macOS (arm64): PyMutex: 13 ns, PyThread_type_lock: 18 ns
* Windows (x86-64): PyMutex: 13 ns, PyThread_type_lock: 38 ns
PR Overview:
The primary purpose of this PR is to implement PyMutex, but there are a number of support pieces (described below).
* PyMutex: A 1-byte lock that doesn't require memory allocation to initialize and is generally faster than the existing PyThread_type_lock. The API is internal only for now.
* _PyParking_Lot: A futex-like API based on the API of the same name in WebKit. Used to implement PyMutex.
* _PyRawMutex: A word sized lock used to implement _PyParking_Lot.
* PyEvent: A one time event. This was used a bunch in the "nogil" fork and is useful for testing the PyMutex implementation, so I've included it as part of the PR.
* pycore_llist.h: Defines common operations on doubly-linked list. Not strictly necessary (could do the list operations manually), but they come up frequently in the "nogil" fork. ( Similar to https://man.freebsd.org/cgi/man.cgi?queue)
---------
Co-authored-by: Eric Snow <ericsnowcurrently@gmail.com>
There is a WIP proposal to enable webassembly stack switching which have been
implemented in v8:
https://github.com/WebAssembly/js-promise-integration
It is not possible to switch stacks that contain JS frames so the Emscripten JS
trampolines that allow calling functions with the wrong number of arguments
don't work in this case. However, the js-promise-integration proposal requires
the [type reflection for Wasm/JS API](https://github.com/WebAssembly/js-types)
proposal, which allows us to actually count the number of arguments a function
expects.
For better compatibility with stack switching, this PR checks if type reflection
is available, and if so we use a switch block to decide the appropriate
signature. If type reflection is unavailable, we should use the current EMJS
trampoline.
We cache the function argument counts since when I didn't cache them performance
was negatively affected.
Co-authored-by: T. Wouters <thomas@python.org>
Co-authored-by: Brett Cannon <brett@python.org>
I must have overlooked this when refactoring the code generator.
The Tier 1 interpreter contained a few silly things like
```
goto resume_frame;
STACK_SHRINK(1);
```
(and other variations, some where the unconditional `goto` was hidden in a macro).
* Rename SAVE_IP to _SET_IP
* Rename EXIT_TRACE to _EXIT_TRACE
* Rename SAVE_CURRENT_IP to _SAVE_CURRENT_IP
* Rename INSERT to _INSERT (This is for Ken Jin's abstract interpreter)
* Rename IS_NONE to _IS_NONE
* Rename JUMP_TO_TOP to _JUMP_TO_TOP
This adds a 16-bit inline cache entry to the conditional branch instructions POP_JUMP_IF_{FALSE,TRUE,NONE,NOT_NONE} and their instrumented variants, which is used to keep track of the branch direction.
Each time we encounter these instructions we shift the cache entry left by one and set the bottom bit to whether we jumped.
Then when it's time to translate such a branch to Tier 2 uops, we use the bit count from the cache entry to decided whether to continue translating the "didn't jump" branch or the "jumped" branch.
The counter is initialized to a pattern of alternating ones and zeros to avoid bias.
The .pyc file magic number is updated. There's a new test, some fixes for existing tests, and a few miscellaneous cleanups.
Fix _thread.start_new_thread() race condition. If a thread is created
during Python finalization, the newly spawned thread now exits
immediately instead of trying to access freed memory and lead to a
crash.
thread_run() calls PyEval_AcquireThread() which checks if the thread
must exit. The problem was that tstate was dereferenced earlier in
_PyThreadState_Bind() which leads to a crash most of the time.
Move _PyThreadState_CheckConsistency() from thread_run() to
_PyThreadState_Bind().
thread_run() of _threadmodule.c now calls
_PyThreadState_CheckConsistency() to check if tstate is a dangling
pointer when Python is built in debug mode.
Rename ceval_gil.c is_tstate_valid() to
_PyThreadState_CheckConsistency() to reuse it in _threadmodule.c.
Symbols of the C API should be prefixed by "Py_" to avoid conflict
with existing names in 3rd party C extensions on "#include <Python.h>".
test.pythoninfo now logs Py_C_RECURSION_LIMIT constant and other
_testcapi and _testinternalcapi constants.
Statistics gathering is now off by default. Use the "-X pystats"
command line option or set the new PYTHONSTATS environment variable
to 1 to turn statistics gathering on at Python startup.
Statistics are no longer dumped at exit if statistics gathering was
off or statistics have been cleared.
Changes:
* Add PYTHONSTATS environment variable.
* sys._stats_dump() now returns False if statistics are not dumped
because they are all equal to zero.
* Add PyConfig._pystats member.
* Add tests on sys functions and on setting PyConfig._pystats to 1.
* Add Include/cpython/pystats.h and Include/internal/pycore_pystats.h
header files.
* Rename '_py_stats' variable to '_Py_stats'.
* Exclude Include/cpython/pystats.h from the Py_LIMITED_API.
* Move pystats.h include from object.h to Python.h.
* Add _Py_StatsOn() and _Py_StatsOff() functions. Remove
'_py_stats_struct' variable from the API: make it static in
specialize.c.
* Document API in Include/pystats.h and Include/cpython/pystats.h.
* Complete pystats documentation in Doc/using/configure.rst.
* Don't write "all zeros" stats: if _stats_off() and _stats_clear()
or _stats_dump() were called.
* _PyEval_Fini() now always call _Py_PrintSpecializationStats() which
does nothing if stats are all zeros.
Co-authored-by: Michael Droettboom <mdboom@gmail.com>
Move the private _PyErr_WriteUnraisableMsg() functions to the
internal C API (pycore_pyerrors.h).
Move write_unraisable_exc() from _testcapi to _testinternalcapi.
Remove <ctype.h> in C files which don't use it; only sre.c and
_decimal.c still use it.
Remove _PY_PORT_CTYPE_UTF8_ISSUE code from pyport.h:
* Code added by commit b5047fd019
in 2004 for MacOSX and FreeBSD.
* Test removed by commit 52ddaefb6b
in 2007, since Python str type now uses locale independent
functions like Py_ISALPHA() and Py_TOLOWER() and the Unicode
database.
Modules/_sre/sre.c replaces _PY_PORT_CTYPE_UTF8_ISSUE with new
functions: sre_isalnum(), sre_tolower(), sre_toupper().
Remove unused includes:
* _localemodule.c: remove <stdio.h>.
* getargs.c: remove <float.h>.
* dynload_win.c: remove <direct.h>, it no longer calls _getcwd()
since commit fb1f68ed7c (in 2001).
Python.h no longer includes <time.h>, <sys/select.h> and <sys/time.h>
standard header files.
* Add <time.h> include to xxsubtype.c.
* Add <sys/time.h> include to posixmodule.c and semaphore.c.
* readline.c includes <sys/select.h> instead of <sys/time.h>.
* resource.c no longer includes <time.h> and <sys/time.h>.
Replace <ctype.h> locale dependent functions with Python "pyctype.h"
locale independent functions:
* Replace isalpha() with Py_ISALPHA().
* Replace isdigit() with Py_ISDIGIT().
* Replace isxdigit() with Py_ISXDIGIT().
* Replace tolower() with Py_TOLOWER().
Leave Modules/_sre/sre.c unchanged, it uses locale dependent
functions on purpose.
Include explicitly <ctype.h> in _decimal.c to get isascii().
pycore_create_interpreter() now returns a status, rather than
calling Py_FatalError().
* PyInterpreterState_New() now calls Py_ExitStatusException() instead
of calling Py_FatalError() directly.
* Replace Py_FatalError() with PyStatus in init_interpreter() and
_PyObject_InitState().
* _PyErr_SetFromPyStatus() now raises RuntimeError, instead of
ValueError. It can now call PyErr_NoMemory(), raise MemoryError,
if it detects _PyStatus_NO_MEMORY() error message.
Argument Clinic now only includes pycore_gc.h if PyGC_Head is needed,
and only includes pycore_runtime.h if _Py_ID() is needed.
* Add 'condition' optional argument to Clinic.add_include().
* deprecate_keyword_use() includes pycore_runtime.h when using
the _PyID() function.
* Fix rendering of includes: comments start at the column 35.
* Mark PC/clinic/_wmimodule.cpp.h and
"Objects/stringlib/clinic/*.h.h" header files as generated in
.gitattributes.
Effects:
* 42 header files generated by AC no longer include the internal C
API, instead of 4 header files before. For example,
Modules/clinic/_abc.c.h no longer includes the internal C API.
* Fix _testclinic_depr.c.h: it now always includes pycore_runtime.h
to get _Py_ID().
Python built with "configure --with-trace-refs" (tracing references)
is now ABI compatible with Python release build and debug build.
Moreover, it now also supports the Limited API.
Change Py_TRACE_REFS build:
* Remove _PyObject_EXTRA_INIT macro.
* The PyObject structure no longer has two extra members (_ob_prev
and _ob_next).
* Use a hash table (_Py_hashtable_t) to trace references (all
objects): PyInterpreterState.object_state.refchain.
* Py_TRACE_REFS build is now ABI compatible with release build and
debug build.
* Limited C API extensions can now be built with Py_TRACE_REFS:
xxlimited, xxlimited_35, _testclinic_limited.
* No longer rename PyModule_Create2() and PyModule_FromDefAndSpec2()
functions to PyModule_Create2TraceRefs() and
PyModule_FromDefAndSpec2TraceRefs().
* _Py_PrintReferenceAddresses() is now called before
finalize_interp_delete() which deletes the refchain hash table.
* test_tracemalloc find_trace() now also filters by size to ignore
the memory allocated by _PyRefchain_Trace().
Test changes for Py_TRACE_REFS:
* Add test.support.Py_TRACE_REFS constant.
* Add test_sys.test_getobjects() to test sys.getobjects() function.
* test_exceptions skips test_recursion_normalizing_with_no_memory()
and test_memory_error_in_PyErr_PrintEx() if Python is built with
Py_TRACE_REFS.
* test_repl skips test_no_memory().
* test_capi skisp test_set_nomemory().
Move PyUnstable_ExecutableKinds and associated macros from the
internal C API to the public C API.
Rename constants: replace "PY_" prefix with "PyUnstable_" prefix.
This mis-initialization caused the executor optimization to kick in sooner than intended. It also set the lower 4 bits of the counter to `1` -- those bits are supposed to be reserved (the actual counter is in the upper 12 bits).
Also remove NOP instructions.
The "stubs" are not optimized in this fashion (their SAVE_IP should always be preserved since it's where to jump next, and they don't contain NOPs by their nature).
Remove these private functions from the public C API:
* _PyRun_AnyFileObject()
* _PyRun_InteractiveLoopObject()
* _PyRun_SimpleFileObject()
* _Py_SourceAsString()
Move them to the internal C API: add a new pycore_pythonrun.h header
file. No longer export these functions.
* Rename _PyUnstable_GetUnaryIntrinsicName() to
PyUnstable_GetUnaryIntrinsicName()
* Rename _PyUnstable_GetBinaryIntrinsicName()
to PyUnstable_GetBinaryIntrinsicName().
Functions like PyErr_SetFromErrno() and SetFromWindowsErr() should be
called immediately after using the C API which sets errno or the Windows
error code.
Move these private functions to the internal C API
(pycore_abstract.h):
* _Py_convert_optional_to_ssize_t()
* _PyNumber_Index()
Argument Clinic now emits #include "pycore_abstract.h" when these
functions are used.
The parser of the c-analyzer tool now uses a list of files which use
the limited C API, rather than a list of files using the internal C
API.
Instead of using `GO_TO_INSTRUCTION(CALL_PY_EXACT_ARGS)` we just add the macro elements of the latter to the macro for the former. This requires lengthening the uops array in struct opcode_macro_expansion. (It also required changes to stacking.py that were merged already.)
Move private functions to the internal C API (pycore_sysmodule.h):
* _PySys_GetAttr()
* _PySys_GetSizeOf()
No longer export most of these functions.
Fix also a typo in Include/cpython/optimizer.h: add a missing space.
Move private functions to the internal C API (pycore_dict.h):
* _PyDictView_Intersect()
* _PyDictView_New()
* _PyDict_ContainsId()
* _PyDict_DelItemId()
* _PyDict_DelItem_KnownHash()
* _PyDict_GetItemIdWithError()
* _PyDict_GetItem_KnownHash()
* _PyDict_HasSplitTable()
* _PyDict_NewPresized()
* _PyDict_Next()
* _PyDict_Pop()
* _PyDict_SetItemId()
* _PyDict_SetItem_KnownHash()
* _PyDict_SizeOf()
No longer export most of these functions.
Move also the _PyDictViewObject structure to the internal C API.
Move dict_getitem_knownhash() function from _testcapi to the
_testinternalcapi extension. Update test_capi.test_dict for this
change.
I was comparing the last preceding poke with the *last* peek,
rather than the *first* peek.
Unfortunately this bug obscured another bug:
When the last preceding poke is UNUSED, the first peek disappears,
leaving the variable unassigned. This is how I fixed it:
- Rename CopyEffect to CopyItem.
- Change CopyItem to contain StackItems instead of StackEffects.
- Update those StackItems when adjusting the manager higher or lower.
- Assert that those StackItems' offsets are equivalent.
- Other clever things.
---------
Co-authored-by: Irit Katriel <1055913+iritkatriel@users.noreply.github.com>
Remove the internal _PyDict_GetItemStringWithError() function. It can
now be replaced with the new public PyDict_ContainsString() and
PyDict_GetItemStringRef() functions.
getargs.c now now uses a strong reference for current_arg.
find_keyword() returns a strong reference.
Replace _PyDict_GetItemStringWithError() calls with
PyDict_GetItemStringRef() which returns a strong reference to the
item.
Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
Replace PyDict_GetItem() calls with PyDict_GetItemRef()
or PyDict_GetItemWithError() to handle errors.
* Replace PyLong_AS_LONG() with _PyLong_AsInt()
and check for errors.
* Check for PyDict_Contains() error.
* pycore_init_builtins() checks for _PyType_Lookup() failure.
Remove private _PyDict_GetItemStringWithError() function of the
public C API: the new PyDict_GetItemStringRef() can be used instead.
* Move private _PyDict_GetItemStringWithError() to the internal C API.
* _testcapi get_code_extra_index() uses PyDict_GetItemStringRef().
Avoid using private functions in _testcapi which tests the public C
API.
Such C API functions as PyErr_SetString(), PyErr_Format(),
PyErr_SetFromErrnoWithFilename() and many others no longer crash or
ignore errors if it failed to format the error message or decode the
filename. Instead, they keep a corresponding error.
This finishes the work begun in gh-107760. When, while projecting a superblock, we encounter a call to a short, simple function, the superblock will now enter the function using `_PUSH_FRAME`, continue through it, and leave it using `_POP_FRAME`, and then continue through the original code. Multiple frame pushes and pops are even possible. It is also possible to stop appending to the superblock in the middle of a called function, when running out of space or encountering an unsupported bytecode.
* Split `CALL_PY_EXACT_ARGS` into uops
This is only the first step for doing `CALL` in Tier 2.
The next step involves tracing into the called code object and back.
After that we'll have to do the remaining `CALL` specialization.
Finally we'll have to deal with `KW_NAMES`.
Note: this moves setting `frame->return_offset` directly in front of
`DISPATCH_INLINED()`, to make it easier to move it into `_PUSH_FRAME`.
- The `dump_stack()` method could call a `__repr__` method implemented in Python,
causing (infinite) recursion.
I rewrote it to only print out the values for some fundamental types (`int`, `str`, etc.);
for everything else it just prints `<type_name @ 0xdeadbeef>`.
- The lltrace-like feature for uops wrote to `stderr`, while the one in `ceval.c` writes to `stdout`;
I changed the uops to write to stdout as well.
Introducing a new file, stacking.py, that takes over several responsibilities related to symbolic evaluation of push/pop operations, with more generality.
The linked list of objects was a global variable, which broke isolation between interpreters, causing crashes. To solve this, we've moved the linked list to each interpreter.
gh-107184 introduced a refleak in test_import.SubinterpImportTests (specifically test_singlephase_check_with_setting_and_override and test_single_init_extension_compat). We fix it here by making sure _testsinglephase is removed from sys.modules whenever we clear the runtime's internal state for the module.
The underlying problem is strictly contained in the internal function _PyImport_ClearExtension() (AKA _testinternalcapi.clear_extension()), which is only used in tests.
(This also fixes an intermittent segfault introduced in the same place, in test_disallowed_reimport.)
There's no need to use a dummy uop to skip unused cache entries. The macro syntax lets you write `unused/1` instead.
Similarly, move `unused/5` from op `_LOAD_ATTR_INSTANCE_VALUE` to macro `LOAD_ATTR_INSTANCE_VALUE`.
This fixes a crasher due to a race condition, triggered infrequently when two isolated (own GIL) subinterpreters simultaneously initialize their sys or builtins modules. The crash happened due the combination of the "detached" thread state we were using and the "last holder" logic we use for the GIL. It turns out it's tricky to use the same thread state for different threads. Who could have guessed?
We solve the problem by eliminating the one object we were still sharing between interpreters. We replace it with a low-level hashtable, using the "raw" allocator to avoid tying it to the main interpreter.
We also remove the accommodations for "detached" thread states, which were a dubious idea to start with.
The _xxsubinterpreters module should not rely on internal API. Some of the functions it uses were recently moved there however. Here we move them back (and expose them properly).
We tried this before with a dict and for all interned strings. That ran into problems due to interpreter isolation. However, exclusively using a per-interpreter cache caused some inconsistency that can eliminate the benefit of interning. Here we circle back to using a global cache, but only for statically allocated strings. We also use a more-basic _Py_hashtable_t for that global cache instead of a dict.
Ideally we would only have the global cache, but the optional isolation of each interpreter's allocator means that a non-static string object must not outlive its interpreter. Thus we would have to store a copy of each such interned string in the global cache, tied to the main interpreter.
Move private _PyDict functions to the internal C API (pycore_dict.h):
* _PyDict_Contains_KnownHash()
* _PyDict_DebugMallocStats()
* _PyDict_DelItemIf()
* _PyDict_GetItemWithError()
* _PyDict_HasOnlyStringKeys()
* _PyDict_MaybeUntrack()
* _PyDict_MergeEx()
No longer export these functions.
Move private debug _PyObject functions to the internal C API
(pycore_object.h):
* _PyDebugAllocatorStats()
* _PyObject_CheckConsistency()
* _PyObject_DebugTypeStats()
* _PyObject_IsFreed()
No longer export most of these functions, except of
_PyObject_IsFreed().
Move test functions using _PyObject_IsFreed() from _testcapi to
_testinternalcapi. check_pyobject_is_freed() test no longer catch
_testcapi.error: the tested function cannot raise _testcapi.error.
Rename private C API constants:
* Rename PY_MONITORING_UNGROUPED_EVENTS to _PY_MONITORING_UNGROUPED_EVENTS
* Rename PY_MONITORING_EVENTS to _PY_MONITORING_EVENTS
* No longer export most private _PyHash symbols, only export the ones
which are needed by shared extensions.
* Modules/_xxtestfuzz/fuzzer.c now uses the internal C API.
By turning `assert(kwnames == NULL)` into a macro that is not in the "forbidden" list, many instructions that formerly were skipped because they contained such an assert (but no other mention of `kwnames`) are now supported in Tier 2. This covers 10 instructions in total (all specializations of `CALL` that invoke some C code):
- `CALL_NO_KW_TYPE_1`
- `CALL_NO_KW_STR_1`
- `CALL_NO_KW_TUPLE_1`
- `CALL_NO_KW_BUILTIN_O`
- `CALL_NO_KW_BUILTIN_FAST`
- `CALL_NO_KW_LEN`
- `CALL_NO_KW_ISINSTANCE`
- `CALL_NO_KW_METHOD_DESCRIPTOR_O`
- `CALL_NO_KW_METHOD_DESCRIPTOR_NOARGS`
- `CALL_NO_KW_METHOD_DESCRIPTOR_FAST`
These aren't automatically translated because (ironically)
they are macros deferring to POP_JUMP_IF_{TRUE,FALSE},
which are not viable uops (being manually translated).
The hack is that we emit IS_NONE and then set opcode and
jump to the POP_JUMP_IF_{TRUE,FALSE} translation code.
The Tier 2 opcode _IS_ITER_EXHAUSTED_LIST (and _TUPLE)
didn't set it->it_seq to NULL, causing a subtle bug
that resulted in test_exhausted_iterator in list_tests.py
to fail when running all tests with -Xuops.
The bug was introduced in gh-106696.
Added this as an explicit test.
Also fixed the dependencies for ceval.o -- it depends on executor_cases.c.h.
This moves EXIT_TRACE, SAVE_IP, JUMP_TO_TOP, and
_POP_JUMP_IF_{FALSE,TRUE} from ceval.c to bytecodes.c.
They are no less special than before, but this way
they are discoverable o the copy-and-patch tooling.
During superblock generation, a JUMP_BACKWARD instruction is translated to either a JUMP_TO_TOP micro-op (when the target of the jump is exactly the beginning of the superblock, closing the loop), or a SAVE_IP + EXIT_TRACE pair, when the jump goes elsewhere.
The new JUMP_TO_TOP instruction includes a CHECK_EVAL_BREAKER() call, so a closed loop can still be interrupted.
* Convert PyObject_DelAttr() and PyObject_DelAttrString() macros to
functions.
* Add PyObject_DelAttr() and PyObject_DelAttrString() functions to
the stable ABI.
* Replace PyObject_SetAttr(obj, name, NULL) with
PyObject_DelAttr(obj, name).
- Hand-written uops JUMP_IF_{TRUE,FALSE}.
These peek at the top of the stack.
The jump target (in superblock space) is absolute.
- Hand-written translation for POP_JUMP_IF_{TRUE,FALSE},
assuming the jump is unlikely.
Once we implement jump-likelihood profiling,
we can implement the jump-unlikely case (in another PR).
- Tests (including some test cleanup).
- Improvements to len(ex) and ex[i] to expose the whole trace.
This adds several of unspecialized opcodes to superblocks:
TO_BOOL, BINARY_SUBSCR, STORE_SUBSCR,
UNPACK_SEQUENCE, LOAD_GLOBAL, LOAD_ATTR,
COMPARE_OP, BINARY_OP.
While we may not want that eventually, for now this helps finding bugs.
There is a rudimentary test checking for UNPACK_SEQUENCE.
Once we're ready to undo this, that would be simple:
just replace the call to variable_used_unspecialized
with a call to variable_used (as shown in a comment).
Or add individual opcdes to FORBIDDEN_NAMES_IN_UOPS.
Instead of special-casing specific instructions,
we add a few more special values to the 'size' field of expansions,
so in the future we can automatically handle
additional super-instructions in the generator.
The uops test wasn't testing anything by default,
and was failing when run with -Xuops.
Made the two executor-related context managers global,
so TestUops can use them (notably `with temporary_optimizer(opt)`).
Made clear_executor() a little more thorough.
Fixed a crash upon finalizing a uop optimizer,
by adding a `tp_dealloc` handler.
When `_PyOptimizer_BackEdge` returns `NULL`, we should restore `next_instr` (and `stack_pointer`). To accomplish this we should jump to `resume_with_error` instead of just `error`.
The problem this causes is subtle -- the only repro I have is in PR gh-106393, at commit d7df54b139bcc47f5ea094bfaa9824f79bc45adc. But the fix is real (as shown later in that PR).
While we're at it, also improve the debug output: the offsets at which traces are identified are now measured in bytes, and always show the start offset. This makes it easier to correlate executor calls with optimizer calls, and either with `dis` output.
<!-- gh-issue-number: gh-104584 -->
* Issue: gh-104584
<!-- /gh-issue-number -->
Remove private pylifecycle.h functions: move them to the internal C
API ( pycore_atexit.h, pycore_pylifecycle.h and pycore_signal.h). No
longer export most of these functions.
Move _testcapi.test_atexit() to _testinternalcapi.
Remove private _PyUnicode_TransformDecimalAndSpaceToASCII() and other
private _PyUnicode C API functions: move them to the internal C API
(pycore_unicodeobject.h). No longer most of these functions.
Replace _testcapi.unicode_transformdecimalandspacetoascii() with
_testinternal._PyUnicode_TransformDecimalAndSpaceToASCII().
Remove more private _PyUnicode C API functions:
move them to the internal C API (pycore_unicodeobject.h).
No longer export most pycore_unicodeobject.h functions.
- Tweak uops debugging output
- Fix the bug from gh-106290
- Rename `SET_IP` to `SAVE_IP` (per https://github.com/faster-cpython/ideas/issues/558)
- Add a `SAVE_IP` uop at the start of the trace (ditto)
- Allow `unbound_local_error`; this gives us uops for `LOAD_FAST_CHECK`, `LOAD_CLOSURE`, and `DELETE_FAST`
- Longer traces
- Support `STORE_FAST_LOAD_FAST`, `STORE_FAST_STORE_FAST`
- Add deps on pycore_uops.h to Makefile(.pre.in)
Remove the following functions from the C API, move them to the internal C
API: add a new pycore_modsupport.h internal header file:
* PyModule_CreateInitialized()
* _PyArg_NoKwnames()
* _Py_VaBuildStack()
No longer export these functions.
Remove private _PyThreadState and _PyInterpreterState C API
functions: move them to the internal C API (pycore_pystate.h and
pycore_interp.h). Don't export most of these functions anymore, but
still export functions used by tests.
Remove _PyThreadState_Prealloc() and _PyThreadState_Init() from the C
API, but keep it in the stable API.
Remove the "cpython/pytime.h" header file: it only contained private
functions. Move functions to the internal pycore_time.h header file.
Move tests from _testcapi to _testinternalcapi. Rename also test
methods to have the same name than tested C functions.
No longer export these functions:
* _PyTime_Add()
* _PyTime_As100Nanoseconds()
* _PyTime_FromMicrosecondsClamp()
* _PyTime_FromTimespec()
* _PyTime_FromTimeval()
* _PyTime_GetPerfCounterWithInfo()
* _PyTime_MulDiv()
Remove the following private functions of the C API:
* _PyCodecInfo_GetIncrementalDecoder()
* _PyCodecInfo_GetIncrementalEncoder()
* _PyCodec_DecodeText()
* _PyCodec_EncodeText()
* _PyCodec_Forget()
* _PyCodec_Lookup()
* _PyCodec_LookupTextEncoding()
Move these functions to a new pycore_codecs.h internal header file.
These functions are no longer exported.
* EOFError no longer overrides other errors such as MemoryError or OSError at
the start of the object.
* Raise more relevant error when the NULL object occurs as a code object
component.
* Minimize an overhead of calling PyErr_Occurred().
This produces longer traces (superblocks?).
Also improved debug output (uop names are now printed instead of numeric opcodes). This would be simpler if the numeric opcode values were generated by generate_cases.py, but that's another project.
Refactored some code in generate_cases.py so the essential algorithm for cache effects is only run once. (Deciding which effects are used and what the total cache size is, regardless of what's used.)
Remove the following private functions from the public C API:
* _Py_CheckFunctionResult()
* _PyObject_CallMethod()
* _PyObject_CallMethodId()
* _PyObject_CallMethodIdNoArgs()
* _PyObject_CallMethodIdObjArgs()
* _PyObject_CallMethodIdOneArg()
* _PyObject_MakeTpCall()
* _PyObject_VectorcallMethodId()
* _PyStack_AsDict()
Move these functions to the internal C API (pycore_call.h).
No longer export the following functions:
* _PyObject_Call()
* _PyObject_CallMethod()
* _PyObject_CallMethodId()
* _PyObject_CallMethodIdObjArgs()
* _PyObject_Call_Prepend()
* _PyObject_FastCallDictTstate()
* _PyStack_AsDict()
The following functions are still exported for stdlib shared
extensions:
* _Py_CheckFunctionResult()
* _PyObject_MakeTpCall()
Mark the following internal functions as extern:
* _PyStack_UnpackDict()
* _PyStack_UnpackDict_Free()
* _PyStack_UnpackDict_FreeNoDecRef()
This effectively reverts bb578a0, restoring the original DEOPT_IF() macro in ceval_macros.h, and redefining it in the Tier 2 interpreter. We can get rid of the PREDICTED() macros there as well!
Added a new, experimental, tracing optimizer and interpreter (a.k.a. "tier 2"). This currently pessimizes, so don't use yet -- this is infrastructure so we can experiment with optimizing passes. To enable it, pass ``-Xuops`` or set ``PYTHONUOPS=1``. To get debug output, set ``PYTHONUOPSDEBUG=N`` where ``N`` is a debug level (0-4, where 0 is no debug output and 4 is excessively verbose).
All of this code is likely to change dramatically before the 3.13 feature freeze. But this is a first step.
Remove old aliases which were kept backwards compatibility with
Python 3.8:
* _PyObject_CallMethodNoArgs()
* _PyObject_CallMethodOneArg()
* _PyObject_CallOneArg()
* _PyObject_FastCallDict()
* _PyObject_Vectorcall()
* _PyObject_VectorcallMethod()
* _PyVectorcall_Function()
Update code which used these aliases to use new names.
These functions are broken by design because they discard any exceptions raised
inside, including MemoryError and KeyboardInterrupt. They should not be
used in new code.
* PyDict_GetItem() and PyObject_HasAttr() suppress arbitrary errors and
should not be used.
* PyUnicode_CompareWithASCIIString() only works if the second argument
is ASCII string.
* Refleak in get_suggestions_for_name_error.
* Use of borrowed pointer after possible freeing (self).
* Add some missing error checks.
It now raises an exception if sys.modules doesn't hold a strong
reference to the module.
Elaborate the comment explaining why a weak reference is used to
create a borrowed reference.
* Replace PyWeakref_GET_OBJECT() with _PyWeakref_GET_REF().
* _sqlite/blob.c now holds a strong reference to the blob object
while calling close_blob().
* _xidregistry_find_type() now holds a strong reference to registered
while using it.
finalize_modules_clear_weaklist() now holds a strong reference to the
module longer than before: replace PyWeakref_GET_OBJECT() with
_PyWeakref_GET_REF().
* Add tests on PyImport_AddModuleRef(), PyImport_AddModule() and
PyImport_AddModuleObject().
* pythonrun.c: Replace Py_XNewRef(PyImport_AddModule(name)) with
PyImport_AddModuleRef(name).
Refactor PyRun_InteractiveOneObjectEx(), _PyRun_SimpleFileObject()
and PyRun_SimpleStringFlags():
* Keep a strong reference to the __main__ module while using its
dictionary (PyModule_GetDict()). Use PyImport_AddModule() with
Py_XNewRef().
* Declare variables closer to where they are defined.
* Rename variables to use name longer than 1 character.
* Add pyrun_one_parse_ast() sub-function.