* Add an InternalDocs file describing how interning should work and how to use it.
* Add internal functions to *explicitly* request what kind of interning is done:
- `_PyUnicode_InternMortal`
- `_PyUnicode_InternImmortal`
- `_PyUnicode_InternStatic`
* Switch uses of `PyUnicode_InternInPlace` to those.
* Disallow using `_Py_SetImmortal` on strings directly.
You should use `_PyUnicode_InternImmortal` instead:
- Strings should be interned before immortalization, otherwise you're possibly
interning a immortalizing copy.
- `_Py_SetImmortal` doesn't handle the `SSTATE_INTERNED_MORTAL` to
`SSTATE_INTERNED_IMMORTAL` update, and those flags can't be changed in
backports, as they are now part of public API and version-specific ABI.
* Add private `_only_immortal` argument for `sys.getunicodeinternedsize`, used in refleak test machinery.
* Make sure the statically allocated string singletons are unique. This means these sets are now disjoint:
- `_Py_ID`
- `_Py_STR` (including the empty string)
- one-character latin-1 singletons
Now, when you intern a singleton, that exact singleton will be interned.
* Add a `_Py_LATIN1_CHR` macro, use it instead of `_Py_ID`/`_Py_STR` for one-character latin-1 singletons everywhere (including Clinic).
* Intern `_Py_STR` singletons at startup.
* For free-threaded builds, intern `_Py_LATIN1_CHR` singletons at startup.
* Beef up the tests. Cover internal details (marked with `@cpython_only`).
* Add lots of assertions
Co-Authored-By: Eric Snow <ericsnowcurrently@gmail.com>
The module itself is a thin wrapper around calls to functions in
`Python/codecs.c`, so that's where the meaningful changes happened:
- Move codecs-related state that lives on `PyInterpreterState` to a
struct declared in `pycore_codecs.h`.
- In free-threaded builds, add a mutex to `codecs_state` to synchronize
operations on `search_path`. Because `search_path_mutex` is used as a
normal mutex and not a critical section, we must be extremely careful
with operations called while holding it.
- The codec registry is explicitly initialized as part of
`_PyUnicode_InitEncodings` to simplify thread-safety.
Remove <ctype.h> in C files which don't use it; only sre.c and
_decimal.c still use it.
Remove _PY_PORT_CTYPE_UTF8_ISSUE code from pyport.h:
* Code added by commit b5047fd019
in 2004 for MacOSX and FreeBSD.
* Test removed by commit 52ddaefb6b
in 2007, since Python str type now uses locale independent
functions like Py_ISALPHA() and Py_TOLOWER() and the Unicode
database.
Modules/_sre/sre.c replaces _PY_PORT_CTYPE_UTF8_ISSUE with new
functions: sre_isalnum(), sre_tolower(), sre_toupper().
Remove unused includes:
* _localemodule.c: remove <stdio.h>.
* getargs.c: remove <float.h>.
* dynload_win.c: remove <direct.h>, it no longer calls _getcwd()
since commit fb1f68ed7c (in 2001).
Replace _PyDict_GetItemStringWithError() calls with
PyDict_GetItemStringRef() which returns a strong reference to the
item.
Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
Remove private _PyDict_GetItemStringWithError() function of the
public C API: the new PyDict_GetItemStringRef() can be used instead.
* Move private _PyDict_GetItemStringWithError() to the internal C API.
* _testcapi get_code_extra_index() uses PyDict_GetItemStringRef().
Avoid using private functions in _testcapi which tests the public C
API.
We're no longer using _Py_IDENTIFIER() (or _Py_static_string()) in any core CPython code. It is still used in a number of non-builtin stdlib modules.
The replacement is: PyUnicodeObject (not pointer) fields under _PyRuntimeState, statically initialized as part of _PyRuntime. A new _Py_GET_GLOBAL_IDENTIFIER() macro facilitates lookup of the fields (along with _Py_GET_GLOBAL_STRING() for non-identifier strings).
https://bugs.python.org/issue46541#msg411799 explains the rationale for this change.
The core of the change is in:
* (new) Include/internal/pycore_global_strings.h - the declarations for the global strings, along with the macros
* Include/internal/pycore_runtime_init.h - added the static initializers for the global strings
* Include/internal/pycore_global_objects.h - where the struct in pycore_global_strings.h is hooked into _PyRuntimeState
* Tools/scripts/generate_global_objects.py - added generation of the global string declarations and static initializers
I've also added a --check flag to generate_global_objects.py (along with make check-global-objects) to check for unused global strings. That check is added to the PR CI config.
The remainder of this change updates the core code to use _Py_GET_GLOBAL_IDENTIFIER() instead of _Py_IDENTIFIER() and the related _Py*Id functions (likewise for _Py_GET_GLOBAL_STRING() instead of _Py_static_string()). This includes adding a few functions where there wasn't already an alternative to _Py*Id(), replacing the _Py_Identifier * parameter with PyObject *.
The following are not changed (yet):
* stop using _Py_IDENTIFIER() in the stdlib modules
* (maybe) get rid of _Py_IDENTIFIER(), etc. entirely -- this may not be doable as at least one package on PyPI using this (private) API
* (maybe) intern the strings during runtime init
https://bugs.python.org/issue46541
* Move _PyObject_CallNoArgs() to pycore_call.h (internal C API).
* _ssl, _sqlite and _testcapi extensions now call the public
PyObject_CallNoArgs() function, rather than _PyObject_CallNoArgs().
* _lsprof extension is now built with Py_BUILD_CORE_MODULE macro
defined to get access to internal _PyObject_CallNoArgs().
Fix typo in the private _PyObject_CallNoArg() function name: rename
it to _PyObject_CallNoArgs() to be consistent with the public
function PyObject_CallNoArgs().
* UCD_Check() uses PyModule_Check()
* Simplify the internal _PyUnicode_Name_CAPI structure:
* Remove size and state members
* Remove state and self parameters of getcode() and getname()
functions
* Remove global_module_state
The private _PyUnicode_Name_CAPI structure of the PyCapsule API
unicodedata.ucnhash_CAPI moves to the internal C API. Moreover, the
structure gets a new state member which must be passed to the
getcode() and getname() functions.
* Move Include/ucnhash.h to Include/internal/pycore_ucnhash.h
* unicodedata module is now built with Py_BUILD_CORE_MODULE.
* unicodedata: move hashAPI variable into unicodedata_module_state.
Rename _PyInterpreterState_GET_UNSAFE() to _PyInterpreterState_GET()
for consistency with _PyThreadState_GET() and to have a shorter name
(help to fit into 80 columns).
Add also "assert(tstate != NULL);" to the function.
Replace _PyInterpreterState_Get() function call with
_PyInterpreterState_GET_UNSAFE() macro which is more efficient but
don't check if tstate or interp is NULL.
_Py_GetConfigsAsDict() now uses _PyThreadState_GET().
The bulk of this patch was generated automatically with:
for name in \
PyObject_Vectorcall \
Py_TPFLAGS_HAVE_VECTORCALL \
PyObject_VectorcallMethod \
PyVectorcall_Function \
PyObject_CallOneArg \
PyObject_CallMethodNoArgs \
PyObject_CallMethodOneArg \
;
do
echo $name
git grep -lwz _$name | xargs -0 sed -i "s/\b_$name\b/$name/g"
done
old=_PyObject_FastCallDict
new=PyObject_VectorcallDict
git grep -lwz $old | xargs -0 sed -i "s/\b$old\b/$new/g"
and then cleaned up:
- Revert changes to in docs & news
- Revert changes to backcompat defines in headers
- Nudge misaligned comments
Fix codecs.lookup() to normalize the encoding name the same way
than encodings.normalize_encoding(), except that codecs.lookup()
also converts the name to lower case.
* group the (stateful) runtime globals into various topical structs
* consolidate the topical structs under a single top-level _PyRuntimeState struct
* add a check-c-globals.py script that helps identify runtime globals
Other globals are excluded (see globals.txt and check-c-globals.py).
Replace
_PyObject_CallArg1(func, arg)
with
PyObject_CallFunctionObjArgs(func, arg, NULL)
Using the _PyObject_CallArg1() macro increases the usage of the C stack, which
was unexpected and unwanted. PyObject_CallFunctionObjArgs() doesn't have this
issue.
Replace
PyObject_CallFunction(func, "O", arg)
and
PyObject_CallFunction(func, "O", arg, NULL)
with
_PyObject_CallArg1(func, arg)
Replace
PyObject_CallFunction(func, NULL)
with
_PyObject_CallNoArg(func)
_PyObject_CallNoArg() and _PyObject_CallArg1() are simpler and don't allocate
memory on the C stack.