An unrecognized format character in PyUnicode_FromFormat() and
PyUnicode_FromFormatV() now sets a SystemError.
In previous versions it caused all the rest of the format string to be
copied as-is to the result string, and any extra arguments discarded.
This is the first of several precursors to storing tp_subclasses (and tp_weaklist) on the interpreter state for static builtin types.
We do the following:
* add `_PyStaticType_InitBuiltin()`
* add `_Py_TPFLAGS_STATIC_BUILTIN`
* set it on all static builtin types in `_PyStaticType_InitBuiltin()`
* shuffle some code around to be able to use _PyStaticType_InitBuiltin()
* rename `_PyStructSequence_InitType()` to `_PyStructSequence_InitBuiltinWithFlags()`
* add `_PyStructSequence_InitBuiltin()`.
This was added for bpo-40514 (gh-84694) to test out a per-interpreter GIL. However, it has since proven unnecessary to keep the experiment in the repo. (It can be done as a branch in a fork like normal.) So here we are removing:
* the configure option
* the macro
* the code enabled by the macro
Avoid _PyCodec_Lookup() and PyCodec_LookupError() for most common
built-in encodings and error handlers to avoid creating a temporary
Unicode string object, whereas these encodings and error handlers are
known to be valid.
Remove the PyUnicode_InternImmortal() function and the
SSTATE_INTERNED_IMMORTAL macro.
The PyUnicode_InternImmortal() function is still exported in the
stable ABI. The function is removed from the API.
PyASCIIObject.state.interned size is now a single bit, rather than 2
bits.
Keep SSTATE_NOT_INTERNED and SSTATE_INTERNED_MORTAL macros for
backward compatibility, but no longer use them internally since the
interned member is now a single bit and so can only have two values
(interned or not interned).
Update stats of _PyUnicode_ClearInterned().
Replace "(PyCFunction)(void(*)(void))func" cast with
_PyCFunction_CAST(func).
Change generated by the command:
sed -i -e \
's!(PyCFunction)(void(\*)(void)) *\([A-Za-z0-9_]\+\)!_PyCFunction_CAST(\1)!g' \
$(find -name "*.c")
If the error handler returns position less or equal than the starting
position of non-encodable characters, most of built-in encoders didn't
properly re-size the output buffer. This led to out-of-bounds writes,
and segfaults.
The left-hand side expression of the if-check can be converted to a
constant by the compiler, but the addition on the right-hand side is
performed during runtime.
Move the addition from the right-hand side to the left-hand side by
turning it into a subtraction there. Since the values are known to
be large enough to not turn negative, this is a safe operation.
Prevents a very unlikely integer overflow on 32 bit systems.
Fixes GH-91421.
Add macros to cast objects to PyASCIIObject*, PyCompactUnicodeObject*
and PyUnicodeObject*: _PyASCIIObject_CAST(),
_PyCompactUnicodeObject_CAST() and _PyUnicodeObject_CAST(). Using
these new macros make the code more readable and check their argument
with: assert(PyUnicode_Check(op)).
Remove redundant assert(PyUnicode_Check(op)) in macros using directly
or indirectly these new CAST macros.
Replacing existing casts with these macros.
We're no longer using _Py_IDENTIFIER() (or _Py_static_string()) in any core CPython code. It is still used in a number of non-builtin stdlib modules.
The replacement is: PyUnicodeObject (not pointer) fields under _PyRuntimeState, statically initialized as part of _PyRuntime. A new _Py_GET_GLOBAL_IDENTIFIER() macro facilitates lookup of the fields (along with _Py_GET_GLOBAL_STRING() for non-identifier strings).
https://bugs.python.org/issue46541#msg411799 explains the rationale for this change.
The core of the change is in:
* (new) Include/internal/pycore_global_strings.h - the declarations for the global strings, along with the macros
* Include/internal/pycore_runtime_init.h - added the static initializers for the global strings
* Include/internal/pycore_global_objects.h - where the struct in pycore_global_strings.h is hooked into _PyRuntimeState
* Tools/scripts/generate_global_objects.py - added generation of the global string declarations and static initializers
I've also added a --check flag to generate_global_objects.py (along with make check-global-objects) to check for unused global strings. That check is added to the PR CI config.
The remainder of this change updates the core code to use _Py_GET_GLOBAL_IDENTIFIER() instead of _Py_IDENTIFIER() and the related _Py*Id functions (likewise for _Py_GET_GLOBAL_STRING() instead of _Py_static_string()). This includes adding a few functions where there wasn't already an alternative to _Py*Id(), replacing the _Py_Identifier * parameter with PyObject *.
The following are not changed (yet):
* stop using _Py_IDENTIFIER() in the stdlib modules
* (maybe) get rid of _Py_IDENTIFIER(), etc. entirely -- this may not be doable as at least one package on PyPI using this (private) API
* (maybe) intern the strings during runtime init
https://bugs.python.org/issue46541
Add _PyUnicode_FiniTypes() function, called by
finalize_interp_types(). It clears these static types:
* EncodingMapType
* PyFieldNameIter_Type
* PyFormatterIter_Type
_PyStaticType_Dealloc() now does nothing if tp_subclasses
is not NULL.
Add types removed by mistake by the commit adding
_PyTypes_FiniTypes().
Move also PyBool_Type at the end, since it depends on PyLong_Type.
PyBytes_Type and PyUnicode_Type no longer depend explicitly on
PyBaseObject_Type: it's the default of PyType_Ready().
This reverts commit ea251806b8.
Keep "assert(interned == NULL);" in _PyUnicode_Fini(), but only for
the main interpreter.
Keep _PyUnicode_ClearInterned() changes avoiding the creation of a
temporary Python list object.
This change is strictly renames and moving code around. It helps in the following ways:
* ensures type-related init functions focus strictly on one of the three aspects (state, objects, types)
* passes in PyInterpreterState * to all those functions, simplifying work on moving types/objects/state to the interpreter
* consistent naming conventions help make what's going on more clear
* keeping API related to a type in the corresponding header file makes it more obvious where to look for it
https://bugs.python.org/issue46008
Move Include/longobject.h non-limited API to a new
Include/cpython/longobject.h header file.
Move the following definitions to the internal C API:
* _PyLong_DigitValue
* _PyLong_FormatAdvancedWriter()
* _PyLong_FormatWriter()
They support now splitting escape sequences between input chunks.
Add the third parameter "final" in codecs.raw_unicode_escape_decode().
It is True by default to match the former behavior.
They support now splitting escape sequences between input chunks.
Add the third parameter "final" in codecs.unicode_escape_decode().
It is True by default to match the former behavior.
Detect refcount bugs in C extensions when the empty Unicode string
singleton is destroyed by mistake.
* Move forward declarations to the top of unicodeobject.c.
* Simplifiy unicode_is_singleton().