We do the following:
* move the generated _PyUnicode_InitStaticStrings() to its own file
* move the generated _PyStaticObjects_CheckRefcnt() to its own file
* include pycore_global_objects.h in extension modules instead of pycore_runtime_init.h
These changes help us avoid including things that aren't needed.
https://github.com/python/cpython/issues/90868
The os module and the PyUnicode_FSDecoder() function no longer accept
bytes-like paths, like bytearray and memoryview types: only the exact
bytes type is accepted for bytes strings.
asciilib_count() is the same than ucs1lib_count(): the code is not
specialized for ASCII strings, so it's not worth it to have a
separated function. Remove asciilib_count() function.
An unrecognized format character in PyUnicode_FromFormat() and
PyUnicode_FromFormatV() now sets a SystemError.
In previous versions it caused all the rest of the format string to be
copied as-is to the result string, and any extra arguments discarded.
This is the first of several precursors to storing tp_subclasses (and tp_weaklist) on the interpreter state for static builtin types.
We do the following:
* add `_PyStaticType_InitBuiltin()`
* add `_Py_TPFLAGS_STATIC_BUILTIN`
* set it on all static builtin types in `_PyStaticType_InitBuiltin()`
* shuffle some code around to be able to use _PyStaticType_InitBuiltin()
* rename `_PyStructSequence_InitType()` to `_PyStructSequence_InitBuiltinWithFlags()`
* add `_PyStructSequence_InitBuiltin()`.
This was added for bpo-40514 (gh-84694) to test out a per-interpreter GIL. However, it has since proven unnecessary to keep the experiment in the repo. (It can be done as a branch in a fork like normal.) So here we are removing:
* the configure option
* the macro
* the code enabled by the macro
Avoid _PyCodec_Lookup() and PyCodec_LookupError() for most common
built-in encodings and error handlers to avoid creating a temporary
Unicode string object, whereas these encodings and error handlers are
known to be valid.
Remove the PyUnicode_InternImmortal() function and the
SSTATE_INTERNED_IMMORTAL macro.
The PyUnicode_InternImmortal() function is still exported in the
stable ABI. The function is removed from the API.
PyASCIIObject.state.interned size is now a single bit, rather than 2
bits.
Keep SSTATE_NOT_INTERNED and SSTATE_INTERNED_MORTAL macros for
backward compatibility, but no longer use them internally since the
interned member is now a single bit and so can only have two values
(interned or not interned).
Update stats of _PyUnicode_ClearInterned().
Replace "(PyCFunction)(void(*)(void))func" cast with
_PyCFunction_CAST(func).
Change generated by the command:
sed -i -e \
's!(PyCFunction)(void(\*)(void)) *\([A-Za-z0-9_]\+\)!_PyCFunction_CAST(\1)!g' \
$(find -name "*.c")
If the error handler returns position less or equal than the starting
position of non-encodable characters, most of built-in encoders didn't
properly re-size the output buffer. This led to out-of-bounds writes,
and segfaults.
The left-hand side expression of the if-check can be converted to a
constant by the compiler, but the addition on the right-hand side is
performed during runtime.
Move the addition from the right-hand side to the left-hand side by
turning it into a subtraction there. Since the values are known to
be large enough to not turn negative, this is a safe operation.
Prevents a very unlikely integer overflow on 32 bit systems.
Fixes GH-91421.
Add macros to cast objects to PyASCIIObject*, PyCompactUnicodeObject*
and PyUnicodeObject*: _PyASCIIObject_CAST(),
_PyCompactUnicodeObject_CAST() and _PyUnicodeObject_CAST(). Using
these new macros make the code more readable and check their argument
with: assert(PyUnicode_Check(op)).
Remove redundant assert(PyUnicode_Check(op)) in macros using directly
or indirectly these new CAST macros.
Replacing existing casts with these macros.
We're no longer using _Py_IDENTIFIER() (or _Py_static_string()) in any core CPython code. It is still used in a number of non-builtin stdlib modules.
The replacement is: PyUnicodeObject (not pointer) fields under _PyRuntimeState, statically initialized as part of _PyRuntime. A new _Py_GET_GLOBAL_IDENTIFIER() macro facilitates lookup of the fields (along with _Py_GET_GLOBAL_STRING() for non-identifier strings).
https://bugs.python.org/issue46541#msg411799 explains the rationale for this change.
The core of the change is in:
* (new) Include/internal/pycore_global_strings.h - the declarations for the global strings, along with the macros
* Include/internal/pycore_runtime_init.h - added the static initializers for the global strings
* Include/internal/pycore_global_objects.h - where the struct in pycore_global_strings.h is hooked into _PyRuntimeState
* Tools/scripts/generate_global_objects.py - added generation of the global string declarations and static initializers
I've also added a --check flag to generate_global_objects.py (along with make check-global-objects) to check for unused global strings. That check is added to the PR CI config.
The remainder of this change updates the core code to use _Py_GET_GLOBAL_IDENTIFIER() instead of _Py_IDENTIFIER() and the related _Py*Id functions (likewise for _Py_GET_GLOBAL_STRING() instead of _Py_static_string()). This includes adding a few functions where there wasn't already an alternative to _Py*Id(), replacing the _Py_Identifier * parameter with PyObject *.
The following are not changed (yet):
* stop using _Py_IDENTIFIER() in the stdlib modules
* (maybe) get rid of _Py_IDENTIFIER(), etc. entirely -- this may not be doable as at least one package on PyPI using this (private) API
* (maybe) intern the strings during runtime init
https://bugs.python.org/issue46541
Add _PyUnicode_FiniTypes() function, called by
finalize_interp_types(). It clears these static types:
* EncodingMapType
* PyFieldNameIter_Type
* PyFormatterIter_Type
_PyStaticType_Dealloc() now does nothing if tp_subclasses
is not NULL.
Add types removed by mistake by the commit adding
_PyTypes_FiniTypes().
Move also PyBool_Type at the end, since it depends on PyLong_Type.
PyBytes_Type and PyUnicode_Type no longer depend explicitly on
PyBaseObject_Type: it's the default of PyType_Ready().
This reverts commit ea251806b8.
Keep "assert(interned == NULL);" in _PyUnicode_Fini(), but only for
the main interpreter.
Keep _PyUnicode_ClearInterned() changes avoiding the creation of a
temporary Python list object.
This change is strictly renames and moving code around. It helps in the following ways:
* ensures type-related init functions focus strictly on one of the three aspects (state, objects, types)
* passes in PyInterpreterState * to all those functions, simplifying work on moving types/objects/state to the interpreter
* consistent naming conventions help make what's going on more clear
* keeping API related to a type in the corresponding header file makes it more obvious where to look for it
https://bugs.python.org/issue46008
Move Include/longobject.h non-limited API to a new
Include/cpython/longobject.h header file.
Move the following definitions to the internal C API:
* _PyLong_DigitValue
* _PyLong_FormatAdvancedWriter()
* _PyLong_FormatWriter()
They support now splitting escape sequences between input chunks.
Add the third parameter "final" in codecs.raw_unicode_escape_decode().
It is True by default to match the former behavior.
They support now splitting escape sequences between input chunks.
Add the third parameter "final" in codecs.unicode_escape_decode().
It is True by default to match the former behavior.
Detect refcount bugs in C extensions when the empty Unicode string
singleton is destroyed by mistake.
* Move forward declarations to the top of unicodeobject.c.
* Simplifiy unicode_is_singleton().
Reorganize pycore_interp_init() to initialize singletons before the
the first PyType_Ready() call. Fix an issue when Python is configured
using --without-doc-strings.
* Remove m68k-specific hack from ascii_decode
On m68k, alignments of primitives is more relaxed, with 4-byte and
8-byte types only requiring 2-byte alignment, thus using sizeof(size_t)
does not work. Instead, use the portable alternative.
Note that this is a minimal fix that only relaxes the assertion and the
condition for when to use the optimised version remains overly strict.
Such issues will be fixed tree-wide in the next commit.
NB: In C11 we could use _Alignof(size_t) instead, but for compatibility
we use autoconf.
* Optimise string routines for architectures with non-natural alignment
C only requires that sizeof(x) is a multiple of alignof(x), not that the
two are equal. Thus anywhere where we optimise based on alignment we
should be using alignof(x) not sizeof(x).
This is more annoying than it would be in C11 where we could just use
_Alignof(x) (and alignof(x) in C++11), but since we still require only
C99 we must plumb the information all the way from autoconf through the
various typedefs and defines.
Python no longer fails at startup with a fatal error if a command
line argument contains an invalid Unicode character.
The Py_DecodeLocale() function now escapes byte sequences which would
be decoded as Unicode characters outside the [U+0000; U+10ffff]
range.
Use MAX_UNICODE constant in unicodeobject.c.
Pass the current interpreter (interp) rather than the current Python
thread state (tstate) to internal functions which only use the
interpreter.
Modified functions:
* _PyXXX_Fini() and _PyXXX_ClearFreeList() functions
* _PyEval_SignalAsyncExc(), make_pending_calls()
* _PySys_GetObject(), sys_set_object(), sys_set_object_id(), sys_set_object_str()
* should_audit(), set_flags_from_config(), make_flags()
* _PyAtExit_Call()
* init_stdio_encoding()
* etc.
Make the Unicode dictionary of interned strings compatible with
subinterpreters.
Remove the INTERN_NAME_STRINGS macro in typeobject.c: names are
always now interned (even if EXPERIMENTAL_ISOLATED_SUBINTERPRETERS
macro is defined).
_PyUnicode_ClearInterned() now uses PyDict_Next() to no longer
allocate memory, to ensure that the interned dictionary is cleared.
Make _PyUnicode_FromId() function compatible with subinterpreters.
Each interpreter now has an array of identifier objects (interned
strings decoded from UTF-8).
* Add PyInterpreterState.unicode.identifiers: array of identifiers
objects.
* Add _PyRuntimeState.unicode_ids used to allocate unique indexes
to _Py_Identifier.
* Rewrite the _Py_Identifier structure.
Microbenchmark on _PyUnicode_FromId(&PyId_a) with _Py_IDENTIFIER(a):
[ref] 2.42 ns +- 0.00 ns -> [atomic] 3.39 ns +- 0.00 ns: 1.40x slower
This change adds 1 ns per _PyUnicode_FromId() call in average.
No longer use deprecated aliases to functions:
* Replace PyObject_MALLOC() with PyObject_Malloc()
* Replace PyObject_REALLOC() with PyObject_Realloc()
* Replace PyObject_FREE() with PyObject_Free()
* Replace PyObject_Del() with PyObject_Free()
* Replace PyObject_DEL() with PyObject_Free()
No longer use deprecated aliases to functions:
* Replace PyMem_MALLOC() with PyMem_Malloc()
* Replace PyMem_REALLOC() with PyMem_Realloc()
* Replace PyMem_FREE() with PyMem_Free()
* Replace PyMem_Del() with PyMem_Free()
* Replace PyMem_DEL() with PyMem_Free()
Modify also the PyMem_DEL() macro to use directly PyMem_Free().
* UCD_Check() uses PyModule_Check()
* Simplify the internal _PyUnicode_Name_CAPI structure:
* Remove size and state members
* Remove state and self parameters of getcode() and getname()
functions
* Remove global_module_state
The private _PyUnicode_Name_CAPI structure of the PyCapsule API
unicodedata.ucnhash_CAPI moves to the internal C API. Moreover, the
structure gets a new state member which must be passed to the
getcode() and getname() functions.
* Move Include/ucnhash.h to Include/internal/pycore_ucnhash.h
* unicodedata module is now built with Py_BUILD_CORE_MODULE.
* unicodedata: move hashAPI variable into unicodedata_module_state.
Remove complex special methods __int__, __float__, __floordiv__,
__mod__, __divmod__, __rfloordiv__, __rmod__ and __rdivmod__
which always raised a TypeError.
Enable recursion checks which were disabled when get __bases__ of
non-type objects in issubclass() and isinstance() and when intern
strings. It fixes a stack overflow when getting __bases__ leads
to infinite recursion.
Originally recursion checks was disabled for PyDict_GetItem() which
silences all errors including the one raised in case of detected
recursion and can return incorrect result. But now the code uses
PyDict_GetItemWithError() and PyDict_SetDefault() instead.
Always create the empty bytes string singleton.
Optimize PyBytes_FromStringAndSize(str, 0): it no longer has to check
if the empty string singleton was created or not, it is always
available.
Add functions:
* _PyBytes_Init()
* bytes_get_empty(), bytes_new_empty()
* bytes_create_empty_string_singleton()
* unicode_create_empty_string_singleton()
_Py_unicode_state: rename empty structure member to empty_string.
Each interpreter now has its own Unicode latin1 singletons.
Remove "ifdef EXPERIMENTAL_ISOLATED_SUBINTERPRETERS"
and "ifdef LATIN1_SINGLETONS": always enable latin1 singletons.
Optimize unicode_result_ready(): only attempt to get a latin1
singleton for PyUnicode_1BYTE_KIND.
Functions of unicodeobject.c, like PyUnicode_New(), no longer check
if the empty Unicode singleton has been initialized or not. Consider
that it is always initialized. The Unicode API must not be used
before _PyUnicode_Init() or after _PyUnicode_Fini().
The PyObject_INIT() and PyObject_INIT_VAR() macros become aliases to,
respectively, PyObject_Init() and PyObject_InitVar() functions.
Rename _PyObject_INIT() and _PyObject_INIT_VAR() static inline
functions to, respectively, _PyObject_Init() and _PyObject_InitVar(),
and move them to pycore_object.h. Remove their return value:
their return type becomes void.
The _datetime module is now built with the Py_BUILD_CORE_MODULE macro
defined.
Remove an outdated comment on _Py_tracemalloc_config.
The PEP 353, written in 2005, introduced PY_FORMAT_SIZE_T. Python no
longer supports macOS 10.4 and Visual Studio 2010, but requires more
recent macOS and Visual Studio versions. In 2020 with Python 3.10, it
is now safe to use directly "%zu" to format size_t and "%zi" to
format Py_ssize_t.
Previously, the result could have been an instance of a subclass of int.
Also revert bpo-26202 and make attributes start, stop and step of the range
object having exact type int.
Add private function _PyNumber_Index() which preserves the old behavior
of PyNumber_Index() for performance to use it in the conversion functions
like PyLong_AsLong().
Move PyInterpreterState.fs_codec into a new
PyInterpreterState.unicode structure.
Give a name to the fs_codec structure and use this structure in
unicodeobject.c.
When Python is built in the experimental isolated subinterpreters
mode, disable Unicode singletons and Unicode interned strings since
they are shared by all interpreters.
Temporary workaround until these caches are made per-interpreter.
Added str.removeprefix and str.removesuffix methods and corresponding
bytes, bytearray, and collections.UserString methods to remove affixes
from a string if present. See PEP 616 for a full description.
Rename _PyInterpreterState_GET_UNSAFE() to _PyInterpreterState_GET()
for consistency with _PyThreadState_GET() and to have a shorter name
(help to fit into 80 columns).
Add also "assert(tstate != NULL);" to the function.
Don't access PyInterpreterState.config member directly anymore, but
use new functions:
* _PyInterpreterState_GetConfig()
* _PyInterpreterState_SetConfig()
* _Py_GetConfig()
Add _PyIndex_Check() function to the internal C API: fast inlined
verson of PyIndex_Check().
Add Include/internal/pycore_abstract.h header file.
Replace PyIndex_Check() with _PyIndex_Check() in C files of Objects
and Python subdirectories.
str.encode() and str.decode() no longer check the encoding and errors
in development mode or in debug mode during Python finalization. The
codecs machinery can no longer work on very late calls to
str.encode() and str.decode().
This change should help to call _PyObject_Dump() to debug during late
Python finalization.
Move the bytes_methods.h header file to the internal C API as
pycore_bytes_methods.h: it only contains private symbols (prefixed by
"_Py"), except of the PyDoc_STRVAR_shared() macro.
gcc -Wcast-qual turns up a number of instances of casting away constness of pointers. Some of these can be safely modified, by either:
Adding the const to the type cast, as in:
- return _PyUnicode_FromUCS1((unsigned char*)s, size);
+ return _PyUnicode_FromUCS1((const unsigned char*)s, size);
or, Removing the cast entirely, because it's not necessary (but probably was at one time), as in:
- PyDTrace_FUNCTION_ENTRY((char *)filename, (char *)funcname, lineno);
+ PyDTrace_FUNCTION_ENTRY(filename, funcname, lineno);
These changes will not change code, but they will make it much easier to check for errors in consts
The bulk of this patch was generated automatically with:
for name in \
PyObject_Vectorcall \
Py_TPFLAGS_HAVE_VECTORCALL \
PyObject_VectorcallMethod \
PyVectorcall_Function \
PyObject_CallOneArg \
PyObject_CallMethodNoArgs \
PyObject_CallMethodOneArg \
;
do
echo $name
git grep -lwz _$name | xargs -0 sed -i "s/\b_$name\b/$name/g"
done
old=_PyObject_FastCallDict
new=PyObject_VectorcallDict
git grep -lwz $old | xargs -0 sed -i "s/\b$old\b/$new/g"
and then cleaned up:
- Revert changes to in docs & news
- Revert changes to backcompat defines in headers
- Nudge misaligned comments
Add a fast-path for UTF-8 encoding in PyUnicode_EncodeFSDefault()
and PyUnicode_DecodeFSDefaultAndSize().
Add _PyUnicode_FiniEncodings() helper function for _PyUnicode_Fini().
* Remove _Py_INC_REFTOTAL and _Py_DEC_REFTOTAL macros: modify
directly _Py_RefTotal.
* _Py_ForgetReference() is no longer defined if the Py_TRACE_REFS
macro is not defined.
* Remove _Py_NewReference() implementation from object.c:
unify the two implementations in object.h inline function.
* Fix Py_TRACE_REFS build: _Py_INC_TPALLOCS() macro has been removed.
Replace Py_FatalError() calls with _PyErr_WriteUnraisableMsg(),
_PyObject_ASSERT_FAILED_MSG() or Py_UNREACHABLE()
in unicode_dealloc() and unicode_release_interned().
bpo-36389, bpo-38376: The _PyObject_CheckConsistency() function is
now also available in release mode. For example, it can be used to
debug a crash in the visit_decref() function of the GC.
Modify the following functions to also work in release mode:
* _PyDict_CheckConsistency()
* _PyObject_CheckConsistency()
* _PyType_CheckConsistency()
* _PyUnicode_CheckConsistency()
Other changes:
* _PyMem_IsPtrFreed(ptr) now also returns 1 if ptr is NULL
(equals to 0).
* _PyBytesWriter_CheckConsistency() now returns 1 and is only used
with assert().
* Reorder _PyObject_Dump() to write safe fields first, and only
attempt to render repr() at the end.
In ArgumentClinic, value "NULL" should now be used only for unrepresentable default values
(like in the optional third parameter of getattr). "None" should be used if None is accepted
as argument and passing None has the same effect as not passing the argument at all.
In development mode and in debug build, encoding and errors arguments
are now checked on string encoding and decoding operations. Examples:
open(), str.encode() and bytes.decode().
By default, for best performances, the errors argument is only
checked at the first encoding/decoding error, and the encoding
argument is sometimes ignored for empty strings.
* The UTF-8 incremental decoders fails now fast if encounter
a sequence that can't be handled by the error handler.
* The UTF-16 incremental decoders with the surrogatepass error
handler decodes now a lone low surrogate with final=False.