A PyThreadState can be in one of many states in its lifecycle, represented by some status value. Those statuses haven't been particularly clear, so we're addressing that here. Specifically:
* made the distinct lifecycle statuses clear on PyThreadState
* identified expectations of how various lifecycle-related functions relate to status
* noted the various places where those expectations don't match the actual behavior
At some point we'll need to address the mismatches.
(This change also includes some cleanup.)
https://github.com/python/cpython/issues/59956
`warnings.warn()` gains the ability to skip stack frames based on code
filename prefix rather than only a numeric `stacklevel=` via a new
`skip_file_prefixes=` keyword argument.
To use this, ensure that clang support was selected in Visual Studio Installer, then set the PlatformToolset environment variable to "ClangCL" and build as normal from the command line.
It remains unsupported, but at least is possible now for experimentation.
We've factored out a struct from the two PyThreadState fields. This accomplishes two things:
* make it clear that the trashcan-related code doesn't need any other parts of PyThreadState
* allows us to use the trashcan mechanism even when there isn't a "current" thread state
We still expect the caller to hold the GIL.
https://github.com/python/cpython/issues/59956
This PR fixes object allocation in long_subtype_new to ensure that there's at least one digit in all cases, and makes sure that the value of that digit is copied over from the source long.
Needs backport to 3.11, but not any further: the change to require at least one digit was only introduced for Python 3.11.
Fixes#101037.
The objective of this change is to help make the GILState-related code easier to understand. This mostly involves moving code around and some semantically equivalent refactors. However, there are a also a small number of slight changes in structure and behavior:
* tstate_current is moved out of _PyRuntimeState.gilstate
* autoTSSkey is moved out of _PyRuntimeState.gilstate
* autoTSSkey is initialized earlier
* autoTSSkey is re-initialized (after fork) earlier
https://github.com/python/cpython/issues/59956
When executing the BUILD_LIST opcode, steal the references from the stack,
in a manner similar to the BUILD_TUPLE opcode. Implement this by offloading
the logic to a new private API, _PyList_FromArraySteal(), that works similarly
to _PyTuple_FromArraySteal().
This way, instead of performing multiple stack pointer adjustments while the
list is being initialized, the stack is adjusted only once and a fast memory
copy operation is performed in one fell swoop.
Not comprehensive, best effort warning. There are cases when threads exist on some platforms that this code cannot detect. macOS when API permissions allow and Linux with a readable /proc procfs present are the currently supported cases where a warning should show up reliably.
Starting with a DeprecationWarning for now, it is less disruptive than something like RuntimeWarning and most likely to only be seen in people's CI tests - a good place to start with this messaging.
* move _PyRuntime.global_objects.interned to _PyRuntime.cached_objects.interned_strings (and use _Py_CACHED_OBJECT())
* rename _PyRuntime.global_objects to _PyRuntime.static_objects
(This also relates to gh-96075.)
https://github.com/python/cpython/issues/90111
The Py_CLEAR(), Py_SETREF() and Py_XSETREF() macros now only evaluate
their arguments once. If an argument has side effects, these side
effects are no longer duplicated.
Use temporary variables to avoid duplicating side effects of macro
arguments. If available, use _Py_TYPEOF() to avoid type punning.
Otherwise, use memcpy() for the assignment to prevent a
miscompilation with strict aliasing caused by type punning.
Add _Py_TYPEOF() macro: __typeof__() on GCC and clang.
Add test_py_clear() and test_py_setref() unit tests to _testcapi.
* Add API to allow extensions to set callback function on creation and destruction of PyCodeObject
Co-authored-by: Ye11ow-Flash <janshah@cs.stonybrook.edu>
Convert macros to static inline functions to avoid macro pitfalls,
like duplication of side effects:
* _PyObject_SIZE()
* _PyObject_VAR_SIZE()
The result type is size_t (unsigned).
* Change _PyDict_KeysSize() and shared_keys_usable_size() return type
from signed (Py_ssize_t) to unsigned (size_t) type.
* new_values() argument type is now unsigned (size_t).
* init_inline_values() now uses size_t rather than int for the 'i'
iterator variable.
* type.__sizeof__() implementation now uses unsigned (size_t) type.
The following macros are modified to use _Py_RVALUE(), so they can no
longer be used as l-value:
* DK_LOG_SIZE()
* _PyCode_CODE()
* _PyList_ITEMS()
* _PyTuple_ITEMS()
* _Py_SLIST_HEAD()
* _Py_SLIST_ITEM_NEXT()
_PyCode_CODE() is private and other macros are part of the internal
C API.
Convert macros to static inline functions to avoid macro pitfalls,
like duplication of side effects:
* DK_ENTRIES()
* DK_UNICODE_ENTRIES()
* PyCode_GetNumFree()
* PyFloat_AS_DOUBLE()
* PyInstanceMethod_GET_FUNCTION()
* PyMemoryView_GET_BASE()
* PyMemoryView_GET_BUFFER()
* PyMethod_GET_FUNCTION()
* PyMethod_GET_SELF()
* PySet_GET_SIZE()
* _PyHeapType_GET_MEMBERS()
Changes:
* PyCode_GetNumFree() casts PyCode_GetNumFree.co_nfreevars from int
to Py_ssize_t to be future proof, and because Py_ssize_t is
commonly used in the C API.
* PyCode_GetNumFree() doesn't cast its argument: the replaced macro
already required the exact type PyCodeObject*.
* Add assertions in some functions using "CAST" macros to check
the arguments type when Python is built with assertions
(debug build).
* Remove an outdated comment in unicodeobject.h.
Newly supported interpreter definition syntax:
- `op(NAME, (input_stack_effects -- output_stack_effects)) { ... }`
- `macro(NAME) = OP1 + OP2;`
Also some other random improvements:
- Convert `WITH_EXCEPT_START` to use stack effects
- Fix lexer to balk at unrecognized characters, e.g. `@`
- Fix moved output names; support object pointers in cache
- Introduce `error()` method to print errors
- Introduce read_uint16(p) as equivalent to `*p`
Co-authored-by: Brandt Bucher <brandtbucher@gmail.com>
The ``structmember.h`` header is deprecated, though it continues to be available
and there are no plans to remove it. There are no deprecation warnings. Old code
can stay unchanged (unless the extra include and non-namespaced macros bother
you greatly). Specifically, no uses in CPython are updated -- that would just be
unnecessary churn.
The ``structmember.h`` header is deprecated, though it continues to be
available and there are no plans to remove it.
Its contents are now available just by including ``Python.h``,
with a ``Py`` prefix added if it was missing:
- `PyMemberDef`, `PyMember_GetOne` and`PyMember_SetOne`
- Type macros like `Py_T_INT`, `Py_T_DOUBLE`, etc.
(previously ``T_INT``, ``T_DOUBLE``, etc.)
- The flags `Py_READONLY` (previously ``READONLY``) and
`Py_AUDIT_READ` (previously all uppercase)
Several items are not exposed from ``Python.h``:
- `T_OBJECT` (use `Py_T_OBJECT_EX`)
- `T_NONE` (previously undocumented, and pretty quirky)
- The macro ``WRITE_RESTRICTED`` which does nothing.
- The macros ``RESTRICTED`` and ``READ_RESTRICTED``, equivalents of
`Py_AUDIT_READ`.
- In some configurations, ``<stddef.h>`` is not included from ``Python.h``.
It should be included manually when using ``offsetof()``.
The deprecated header continues to provide its original
contents under the original names.
Your old code can stay unchanged, unless the extra include and non-namespaced
macros bother you greatly.
There is discussion on the issue to rename `T_PYSSIZET` to `PY_T_SSIZE` or
similar. I chose not to do that -- users will probably copy/paste that with any
spelling, and not renaming it makes migration docs simpler.
Co-Authored-By: Alexander Belopolsky <abalkin@users.noreply.github.com>
Co-Authored-By: Matthias Braun <MatzeB@users.noreply.github.com>
This is part of the effort to consolidate global variables, to make them easier to manage (and make it easier to later move some of them to PyInterpreterState).
https://github.com/python/cpython/issues/81057
Introduce the autocommit attribute to Connection and the autocommit
parameter to connect() for PEP 249-compliant transaction handling.
Co-authored-by: Alex Waygood <Alex.Waygood@Gmail.com>
Co-authored-by: C.A.M. Gerlach <CAM.Gerlach@Gerlach.CAM>
Co-authored-by: Géry Ogam <gery.ogam@gmail.com>
We actually don't move PyImport_Inittab. Instead, we make a copy that we keep on _PyRuntimeState and use only that after Py_Initialize(). We also prevent folks from modifying PyImport_Inittab (the best we can) after that point.
https://github.com/python/cpython/issues/81057
The global allocators were stored in 3 static global variables: _PyMem_Raw, _PyMem, and _PyObject. State for the "small block" allocator was stored in another 13. That makes a total of 16 global variables. We are moving all 16 to the _PyRuntimeState struct as part of the work for gh-81057. (If PEP 684 is accepted then we will follow up by moving them all to PyInterpreterState.)
https://github.com/python/cpython/issues/81057
As we consolidate global variables, we find some objects that are almost suitable to add to _PyRuntimeState.global_objects, but have some small/sneaky bit of per-interpreter state (e.g. a weakref list). We're adding PyInterpreterState.static_objects so we can move such objects there. (We'll removed the _not_used field once we've added others.)
https://github.com/python/cpython/issues/81057
Up until now we had a single generated initializer macro for all the statically declared global objects in _PyRuntimeState, including several one-offs (e.g. the empty tuple). The one-offs don't need to be generated, but were because we had one big initializer. Having separate initializers for set of generated global objects allows us to generate only the ones we need to. This allows us to add initializers for one-off global objects without having to generate them.
https://github.com/python/cpython/issues/81057
* Adds EXIT_INTERPRETER instruction to exit PyEval_EvalDefault()
* Simplifies RETURN_VALUE, YIELD_VALUE and RETURN_GENERATOR instructions as they no longer need to check for entry frames.
The Py_CLEAR(), Py_SETREF() and Py_XSETREF() macros now only evaluate
their argument once. If an argument has side effects, these side
effects are no longer duplicated.
Add test_py_clear() and test_py_setref() unit tests to _testcapi.
Add _PyStaticObject_CheckRefcnt() function to make
_PyStaticObjects_CheckRefcnt() shorter. Use
_PyObject_ASSERT_FAILED_MSG() to log the object causing the fatal
error.
We do the following:
* move the generated _PyUnicode_InitStaticStrings() to its own file
* move the generated _PyStaticObjects_CheckRefcnt() to its own file
* include pycore_global_objects.h in extension modules instead of pycore_runtime_init.h
These changes help us avoid including things that aren't needed.
https://github.com/python/cpython/issues/90868
Add PyFrame_GetVar() and PyFrame_GetVarString() functions to get a
frame variable by its name.
Move PyFrameObject C API tests from test_capi to test_frame.
Previously, the optional restrictions on subinterpreters were: disallow fork, subprocess, and threads. By default, we were disallowing all three for "isolated" interpreters. We always allowed all three for the main interpreter and those created through the legacy `Py_NewInterpreter()` API.
Those settings were a bit conservative, so here we've adjusted the optional restrictions to: fork, exec, threads, and daemon threads. The default for "isolated" interpreters disables fork, exec, and daemon threads. Regular threads are allowed by default. We continue always allowing everything For the main interpreter and the legacy API.
In the code, we add `_PyInterpreterConfig.allow_exec` and `_PyInterpreterConfig.allow_daemon_threads`. We also add `Py_RTFLAGS_DAEMON_THREADS` and `Py_RTFLAGS_EXEC`.
Change FOR_ITER to have the same stack effect regardless of whether it branches or not.
Performance is unchanged as FOR_ITER (and specialized forms jump over the cleanup code).
(see https://github.com/python/cpython/issues/98608)
This change does the following:
1. change the argument to a new `_PyInterpreterConfig` struct
2. rename the function to `_Py_NewInterpreterFromConfig()`, inspired by `Py_InitializeFromConfig()` (takes a `_PyInterpreterConfig` instead of `isolated_subinterpreter`)
3. split up the boolean `isolated_subinterpreter` into the corresponding multiple granular settings
* allow_fork
* allow_subprocess
* allow_threads
4. add `PyInterpreterState.feature_flags` to store those settings
5. add a function for checking if a feature is enabled on an opaque `PyInterpreterState *`
6. drop `PyConfig._isolated_interpreter`
The existing default (see `Py_NewInterpeter()` and `Py_Initialize*()`) allows fork, subprocess, and threads and the optional "isolated" interpreter (see the `_xxsubinterpreters` module) disables all three. None of that changes here; the defaults are preserved.
Note that the given `_PyInterpreterConfig` will not be used outside `_Py_NewInterpreterFromConfig()`, nor preserved. This contrasts with how `PyConfig` is currently preserved, used, and even modified outside `Py_InitializeFromConfig()`. I'd rather just avoid that mess from the start for `_PyInterpreterConfig`. We can preserve it later if we find an actual need.
This change allows us to follow up with a number of improvements (e.g. stop disallowing subprocess and support disallowing exec instead).
(Note that this PR adds "private" symbols. We'll probably make them public, and add docs, in a separate change.)
Added os.setns and os.unshare to easily switch between namespaces
on Linux.
Co-authored-by: Christian Heimes <christian@python.org>
Co-authored-by: CAM Gerlach <CAM.Gerlach@Gerlach.CAM>
Co-authored-by: Victor Stinner <vstinner@python.org>
_Py_block_ty defines four types of block, FunctionBlock, ClassBlock, ModuleBlock and AnnotationBlock.
But _symtable_entry.ste_type only comments three of them, I think it's better both sides are consistent.
It had to live as a global outside of PyConfig for stable ABI reasons in
the pre-3.12 backports.
This removes the `_Py_global_config_int_max_str_digits` and gets rid of
the equivalent field in the internal `struct _is PyInterpreterState` as
code can just use the existing nested config struct within that.
Adds tests to verify unique settings and configs in subinterpreters.
Converting a large enough `int` to a decimal string raises `ValueError` as expected. However, the raise comes _after_ the quadratic-time base-conversion algorithm has run to completion. For effective DOS prevention, we need some kind of check before entering the quadratic-time loop. Oops! =)
The quick fix: essentially we catch _most_ values that exceed the threshold up front. Those that slip through will still be on the small side (read: sufficiently fast), and will get caught by the existing check so that the limit remains exact.
The justification for the current check. The C code check is:
```c
max_str_digits / (3 * PyLong_SHIFT) <= (size_a - 11) / 10
```
In GitHub markdown math-speak, writing $M$ for `max_str_digits`, $L$ for `PyLong_SHIFT` and $s$ for `size_a`, that check is:
$$\left\lfloor\frac{M}{3L}\right\rfloor \le \left\lfloor\frac{s - 11}{10}\right\rfloor$$
From this it follows that
$$\frac{M}{3L} < \frac{s-1}{10}$$
hence that
$$\frac{L(s-1)}{M} > \frac{10}{3} > \log_2(10).$$
So
$$2^{L(s-1)} > 10^M.$$
But our input integer $a$ satisfies $|a| \ge 2^{L(s-1)}$, so $|a|$ is larger than $10^M$. This shows that we don't accidentally capture anything _below_ the intended limit in the check.
<!-- gh-issue-number: gh-95778 -->
* Issue: gh-95778
<!-- /gh-issue-number -->
Co-authored-by: Gregory P. Smith [Google LLC] <greg@krypto.org>
Integer to and from text conversions via CPython's bignum `int` type is not safe against denial of service attacks due to malicious input. Very large input strings with hundred thousands of digits can consume several CPU seconds.
This PR comes fresh from a pile of work done in our private PSRT security response team repo.
Signed-off-by: Christian Heimes [Red Hat] <christian@python.org>
Tons-of-polishing-up-by: Gregory P. Smith [Google] <greg@krypto.org>
Reviews via the private PSRT repo via many others (see the NEWS entry in the PR).
<!-- gh-issue-number: gh-95778 -->
* Issue: gh-95778
<!-- /gh-issue-number -->
I wrote up [a one pager for the release managers](https://docs.google.com/document/d/1KjuF_aXlzPUxTK4BMgezGJ2Pn7uevfX7g0_mvgHlL7Y/edit#). Much of that text wound up in the Issue. Backports PRs already exist. See the issue for links.
⚠️⚠️ Note for reviewers, hackers and fellow systems/low-level/compiler engineers ⚠️⚠️
If you have a lot of experience with this kind of shenanigans and want to improve the **first** version, **please make a PR against my branch** or **reach out by email** or **suggest code changes directly on GitHub**.
If you have any **refinements or optimizations** please, wait until the first version is merged before starting hacking or proposing those so we can keep this PR productive.
In the limited C API with a debug build, Py_INCREF() is implemented
by calling _Py_IncRef() which calls Py_INCREF(). Only call
_Py_INCREF_STAT_INC() once.
* gh-93503: Add APIs to set profiling and tracing functions in all threads in the C-API
* Use a separate API
* Fix NEWS entry
* Add locks around the loop
* Document ignoring exceptions
* Use the new APIs in the sys module
* Update docs
We only statically initialize for core code and builtin modules. Extension modules still create
the tuple at runtime. We'll solve that part of interpreter isolation separately.
This change includes generated code. The non-generated changes are in:
* Tools/clinic/clinic.py
* Python/getargs.c
* Include/cpython/modsupport.h
* Makefile.pre.in (re-generate global strings after running clinic)
* very minor tweaks to Modules/_codecsmodule.c and Python/Python-tokenize.c
All other changes are generated code (clinic, global strings).
* Store tp_weaklist on the interpreter state for static builtin types.
* Factor out _PyStaticType_GET_WEAKREFS_LISTPTR().
* Add _PyStaticType_ClearWeakRefs().
* Add a comment about how _PyStaticType_ClearWeakRefs() loops.
* Document the change.
* Update Doc/whatsnew/3.12.rst
* Fix a typo.
This is the last precursor to storing tp_subclasses (and tp_weaklist) on the interpreter state for static builtin types.
Here we add per-type storage on PyInterpreterState, but only for the static builtin types. This involves the following:
* add PyInterpreterState.types
* move PyInterpreterState.type_cache to it
* add a "num_builtins_initialized" field
* add a "builtins" field (a static array big enough for all the static builtin types)
* add _PyStaticType_GetState() to look up a static builtin type's state
* (temporarily) add PyTypeObject.tp_static_builtin_index (to hold the type's index into PyInterpreterState.types.builtins)
We will be eliminating tp_static_builtin_index in a later change.
* Add _Py_memory_repeat function to pycore_list
* Add _Py_RefcntAdd function to pycore_object
* Use the new functions in tuplerepeat, list_repeat, and list_inplace_repeat
This is the first of several precursors to storing tp_subclasses (and tp_weaklist) on the interpreter state for static builtin types.
We do the following:
* add `_PyStaticType_InitBuiltin()`
* add `_Py_TPFLAGS_STATIC_BUILTIN`
* set it on all static builtin types in `_PyStaticType_InitBuiltin()`
* shuffle some code around to be able to use _PyStaticType_InitBuiltin()
* rename `_PyStructSequence_InitType()` to `_PyStructSequence_InitBuiltinWithFlags()`
* add `_PyStructSequence_InitBuiltin()`.
Move the follow functions and type from frameobject.h to pyframe.h,
so the standard <Python.h> provide frame getter functions:
* PyFrame_Check()
* PyFrame_GetBack()
* PyFrame_GetBuiltins()
* PyFrame_GetGenerator()
* PyFrame_GetGlobals()
* PyFrame_GetLasti()
* PyFrame_GetLocals()
* PyFrame_Type
Remove #include "frameobject.h" from many C files. It's no longer
needed.
Deprecate global configuration variable like
Py_IgnoreEnvironmentFlag: the Py_InitializeFromConfig() API should be
instead.
Fix declaration of Py_GETENV(): use PyAPI_FUNC(), not PyAPI_DATA().