This commit replaces the Python implementation of the tokenize module with an implementation
that reuses the real C tokenizer via a private extension module. The tokenize module now implements
a compatibility layer that transforms tokens from the C tokenizer into Python tokenize tokens for backward
compatibility.
As the C tokenizer does not emit some tokens that the Python tokenizer provides (such as comments and non-semantic newlines), a new special mode has been added to the C tokenizer mode that currently is only used via
the extension module that exposes it to the Python layer. This new mode forces the C tokenizer to emit these new extra tokens and add the appropriate metadata that is needed to match the old Python implementation.
Co-authored-by: Pablo Galindo <pablogsal@gmail.com>
This implements PEP 695, Type Parameter Syntax. It adds support for:
- Generic functions (def func[T](): ...)
- Generic classes (class X[T](): ...)
- Type aliases (type X = ...)
- New scoping when the new syntax is used within a class body
- Compiler and interpreter changes to support the new syntax and scoping rules
Co-authored-by: Marc Mueller <30130371+cdce8p@users.noreply.github.com>
Co-authored-by: Eric Traut <eric@traut.com>
Co-authored-by: Larry Hastings <larry@hastings.org>
Co-authored-by: Alex Waygood <Alex.Waygood@Gmail.com>
When monitoring LINE events, instrument all instructions that can have a predecessor on a different line.
Then check that the a new line has been hit in the instrumentation code.
This brings the behavior closer to that of 3.11, simplifying implementation and porting of tools.
This PR removes `_Py_dg_stdnan` and `_Py_dg_infinity` in favour of
using the standard `NAN` and `INFINITY` macros provided by C99.
This change has the side-effect of fixing a bug on MIPS where the
hard-coded value used by `_Py_dg_stdnan` gave a signalling NaN
rather than a quiet NaN.
---------
Co-authored-by: Mark Dickinson <dickinsm@gmail.com>
This is the culmination of PEP 684 (and of my 8-year long multi-core Python project)!
Each subinterpreter may now be created with its own GIL (via Py_NewInterpreterFromConfig()). If not so configured then the interpreter will share with the main interpreter--the status quo since subinterpreters were added decades ago. The main interpreter always has its own GIL and subinterpreters from Py_NewInterpreter() will always share with the main interpreter.
We also add PyInterpreterState.ceval.own_gil to record if the interpreter actually has its own GIL.
Note that for now we don't actually respect own_gil; all interpreters still share the one GIL. However, PyInterpreterState.ceval.own_gil does reflect PyInterpreterConfig.own_gil. That lie is a temporary one that we will fix when the GIL really becomes per-interpreter.
Here we are doing no more than adding the value for Py_mod_multiple_interpreters and using it for stdlib modules. We will start checking for it in gh-104206 (once PyInterpreterState.ceval.own_gil is added in gh-104204).
In preparation for a per-interpreter GIL, we add PyInterpreterState.ceval.gil, set it to the shared GIL for each interpreter, and use that rather than using _PyRuntime.ceval.gil directly. Note that _PyRuntime.ceval.gil is still the actual GIL.
This function no longer makes sense, since its runtime parameter is
no longer used. Use directly _PyThreadState_GET() and
_PyInterpreterState_GET() instead.
This breaks the tests, but we are keeping it as a separate commit so
that the move operation and editing of the moved files are separate, for
a cleaner history.
We also expose PyInterpreterConfig. This is part of the PEP 684 (per-interpreter GIL) implementation. We will add docs as soon as we can.
FYI, I'm adding the new config field for per-interpreter GIL in gh-99114.
This is strictly about moving the "obmalloc" runtime state from
`_PyRuntimeState` to `PyInterpreterState`. Doing so improves isolation
between interpreters, specifically most of the memory (incl. objects)
allocated for each interpreter's use. This is important for a
per-interpreter GIL, but such isolation is valuable even without it.
FWIW, a per-interpreter obmalloc is the proverbial
canary-in-the-coalmine when it comes to the isolation of objects between
interpreters. Any object that leaks (unintentionally) to another
interpreter is highly likely to cause a crash (on debug builds at
least). That's a useful thing to know, relative to interpreter
isolation.
This speeds up `super()` (by around 85%, for a simple one-level
`super().meth()` microbenchmark) by avoiding allocation of a new
single-use `super()` object on each use.
Deep-frozen code objects are cannot be shared (currently) by
interpreters, due to how adaptive specialization can modify the
bytecodes. We work around this by only using the deep-frozen objects in
the main interpreter. This does incur a performance penalty for
subinterpreters, which we may be able to resolve later.
We replace _PyRuntime.tstate_current with a thread-local variable. As part of this change, we add a _Py_thread_local macro in pyport.h (only for the core runtime) to smooth out the compiler differences. The main motivation here is in support of a per-interpreter GIL, but this change also provides some performance improvement opportunities.
Note that we do not provide a fallback to the thread-local, either falling back to the old tstate_current or to thread-specific storage (PyThread_tss_*()). If that proves problematic then we can circle back. I consider it unlikely, but will run the buildbots to double-check.
Also note that this does not change any of the code related to the GILState API, where it uses a thread state stored in thread-specific storage. I suspect we can combine that with _Py_tss_tstate (from here). However, that can be addressed separately and is not urgent (nor critical).
(While this change was mostly done independently, I did take some inspiration from earlier (~2020) work by @markshannon (main...markshannon:threadstate_in_tls) and @vstinner (#23976).)
This is the implementation of PEP683
Motivation:
The PR introduces the ability to immortalize instances in CPython which bypasses reference counting. Tagging objects as immortal allows up to skip certain operations when we know that the object will be around for the entire execution of the runtime.
Note that this by itself will bring a performance regression to the runtime due to the extra reference count checks. However, this brings the ability of having truly immutable objects that are useful in other contexts such as immutable data sharing between sub-interpreters.
* The majority of the monitoring code is in instrumentation.c
* The new instrumentation bytecodes are in bytecodes.c
* legacy_tracing.c adapts the new API to the old sys.setrace and sys.setprofile APIs
The function is like Py_AtExit() but for a single interpreter. This is a companion to the atexit module's register() function, taking a C callback instead of a Python one.
We also update the _xxinterpchannels module to use _Py_AtExit(), which is the motivating case. (This is inspired by pain points felt while working on gh-101660.)
In gh-102744 we added is_core_module() (in Python/import.c), which relies on get_core_module_dict() (also added in that PR). The problem is that_PyImport_FixupBuiltin(), which ultimately calls is_core_module(), is called on the builtins module before interp->builtins_copyis set. Consequently, the builtins module isn't considered a "core" module while it is getting "fixed up" and its module def m_copy erroneously gets set. Under isolated interpreters this causes problems since sys and builtins are allowed even though they are still single-phase init modules. (This was discovered while working on gh-101660.)
The solution is to stop relying on get_core_module_dict() in is_core_module().
Sharing mutable (or non-immortal) objects between interpreters is generally not safe. We can work around that but not easily.
There are two restrictions that are critical for objects that break interpreter isolation.
The first is that the object's state be guarded by a global lock. For now the GIL meets this requirement, but a granular global lock is needed once we have a per-interpreter GIL.
The second restriction is that the object (and, for a container, its items) be deallocated/resized only when the interpreter in which it was allocated is the current one. This is because every interpreter has (or will have, see gh-101660) its own object allocator. Deallocating an object with a different allocator can cause crashes.
The dict for the cache of module defs is completely internal, which simplifies what we have to do to meet those requirements. To do so, we do the following:
* add a mechanism for re-using a temporary thread state tied to the main interpreter in an arbitrary thread
* add _PyRuntime.imports.extensions.main_tstate`
* add _PyThreadState_InitDetached() and _PyThreadState_ClearDetached() (pystate.c)
* add _PyThreadState_BindDetached() and _PyThreadState_UnbindDetached() (pystate.c)
* make sure the cache dict (_PyRuntime.imports.extensions.dict) and its items are all owned by the main interpreter)
* add a placeholder using for a granular global lock
Note that the cache is only used for legacy extension modules and not for multi-phase init modules.
https://github.com/python/cpython/issues/100227
This reverts commit 87be8d9.
This approach to keeping the interned strings safe is turning out to be too complex for my taste (due to obmalloc isolation). For now I'm going with the simpler solution, making the dict per-interpreter. We can revisit that later if we want a sharing solution.
This is effectively two changes. The first (the bulk of the change) is where we add _Py_AddToGlobalDict() (and _PyRuntime.cached_objects.main_tstate, etc.). The second (much smaller) change is where we update PyUnicode_InternInPlace() to use _Py_AddToGlobalDict() instead of calling PyDict_SetDefault() directly.
Basically, _Py_AddToGlobalDict() is a wrapper around PyDict_SetDefault() that should be used whenever we need to add a value to a runtime-global dict object (in the few cases where we are leaving the container global rather than moving it to PyInterpreterState, e.g. the interned strings dict). _Py_AddToGlobalDict() does all the necessary work to make sure the target global dict is shared safely between isolated interpreters. This is especially important as we move the obmalloc state to each interpreter (gh-101660), as well as, potentially, the GIL (PEP 684).
https://github.com/python/cpython/issues/100227
* Eliminate all remaining uses of Py_SIZE and Py_SET_SIZE on PyLongObject, adding asserts.
* Change layout of size/sign bits in longobject to support future addition of immortal ints and tagged medium ints.
* Add functions to hide some internals of long object, and for setting sign and digit count.
* Replace uses of IS_MEDIUM_VALUE macro with _PyLong_IsCompact().
Aside from sys and builtins, _io is the only core builtin module that hasn't been ported to multi-phase init. We may do so later (e.g. gh-101948), but in the meantime we must at least take care of the module's static types properly. (This came up while working on gh-101660.)
https://github.com/python/cpython/issues/94673
The error-handling code in new_interpreter() has been broken for a while. We hadn't noticed because those code mostly doesn't fail. (I noticed while working on gh-101660.) The problem is that we try to clear/delete the newly-created thread/interpreter using itself, which just failed. The solution is to switch back to the calling thread state first.
https://github.com/python/cpython/issues/98608
Moving it valuable with a per-interpreter GIL. However, it is also useful without one, since it allows us to identify refleaks within a single interpreter or where references are escaping an interpreter. This becomes more important as we move the obmalloc state to PyInterpreterState.
https://github.com/python/cpython/issues/102304
Prior to this change, errors in _Py_NewInterpreterFromConfig() were always fatal. Instead, callers should be able to handle such errors and keep going. That's what this change supports. (This was an oversight in the original implementation of _Py_NewInterpreterFromConfig().) Note that the existing [fatal] behavior of the public Py_NewInterpreter() is preserved.
https://github.com/python/cpython/issues/98608
The essentially eliminates the global variable, with the associated benefits. This is also a precursor to isolating this bit of state to PyInterpreterState.
Folks that currently read _Py_RefTotal directly would have to start using _Py_GetGlobalRefTotal() instead.
https://github.com/python/cpython/issues/102304
This deprecates `st_ctime` fields on Windows, with the intent to change them to contain the correct value in 3.14. For now, they should keep returning the creation time as they always have.
This behavior is optional, because in some extreme cases it
may just make debugging harder. The tool defaults it to off,
but it is on in Makefile.pre.in.
Also note that this makes diffs to generated_cases.c.h noisier,
since whenever you insert or delete a line in bytecodes.c,
all subsequent #line directives will change.
It doesn't make sense to use multi-phase init for these modules. Using a per-interpreter "m_copy" (instead of PyModuleDef.m_base.m_copy) makes this work okay. (This came up while working on gh-101660.)
Note that we might instead end up disallowing re-load for sys/builtins since they are so special.
https://github.com/python/cpython/issues/102660
* Rename local variables, names and consts, from the interpeter loop. Will allow non-code objects in frames for better introspection of C builtins and extensions.
* Remove unused dummy variables.
Add `MS_WINDOWS_DESKTOP`, `MS_WINDOWS_APPS`, `MS_WINDOWS_SYSTEM` and `MS_WINDOWS_GAMES` preprocessor definitions to allow switching off functionality missing from particular API partitions ("partitions" are used in Windows to identify overlapping subsets of APIs).
CPython only officially supports `MS_WINDOWS_DESKTOP` and `MS_WINDOWS_SYSTEM` (APPS is included by normal desktop builds, but APPS without DESKTOP is not covered). Other configurations are a convenience for people building their own runtimes.
`MS_WINDOWS_GAMES` is for the Xbox subset of the Windows API, which is also available on client OS, but is restricted compared to `MS_WINDOWS_DESKTOP`. These restrictions may change over time, as they relate to the build headers rather than the OS support, and so we assume that Xbox builds will use the latest available version of the GDK.
Specific changes:
* move the import lock to PyInterpreterState
* move the "find_and_load" diagnostic state to PyInterpreterState
Note that the import lock exists to keep multiple imports of the same module in the same interpreter (but in different threads) from stomping on each other. Independently, we use a distinct global lock to protect globally shared import state, especially related to loaded extension modules. For now we can rely on the GIL as that lock but with a per-interpreter GIL we'll need a new global lock.
The remaining state in _PyRuntimeState.imports will (probably) continue being global.
https://github.com/python/cpython/issues/100227
Some incompatible changes had gone in, and the "ignore" lists weren't properly undated. This change fixes that. It's necessary prior to enabling test_check_c_globals, which I hope to do soon.
Note that this does include moving last_resort_memory_error to PyInterpreterState.
https://github.com/python/cpython/issues/90110
This is related to fixing the refleaks introduced by commit 096d009. I haven't been able to find the leak yet, but these changes are a consequence of that effort. This includes some cleanup, some tweaks to the existing tests, and a bunch of new test cases. The only change here that might have impact outside the tests in question is in imp.py, where I update imp.load_dynamic() to use spec_from_file_location() instead of creating a ModuleSpec directly.
Also note that I've updated the tests to only skip if we're checking for refleaks (regrtest's --huntrleaks), whereas in gh-101969 I had skipped the tests entirely. The tests will be useful for some upcoming work and I'd rather the refleaks not hold that up. (It isn't clear how quickly we'll be able to fix the leaking code, though it will certainly be done in the short term.)
https://github.com/python/cpython/issues/102251