Fix parsing a numeric literal immediately (without spaces) followed by
"not in" keywords, like in "1not in x". Now the parser only emits
a warning, not a syntax error.
We're no longer using _Py_IDENTIFIER() (or _Py_static_string()) in any core CPython code. It is still used in a number of non-builtin stdlib modules.
The replacement is: PyUnicodeObject (not pointer) fields under _PyRuntimeState, statically initialized as part of _PyRuntime. A new _Py_GET_GLOBAL_IDENTIFIER() macro facilitates lookup of the fields (along with _Py_GET_GLOBAL_STRING() for non-identifier strings).
https://bugs.python.org/issue46541#msg411799 explains the rationale for this change.
The core of the change is in:
* (new) Include/internal/pycore_global_strings.h - the declarations for the global strings, along with the macros
* Include/internal/pycore_runtime_init.h - added the static initializers for the global strings
* Include/internal/pycore_global_objects.h - where the struct in pycore_global_strings.h is hooked into _PyRuntimeState
* Tools/scripts/generate_global_objects.py - added generation of the global string declarations and static initializers
I've also added a --check flag to generate_global_objects.py (along with make check-global-objects) to check for unused global strings. That check is added to the PR CI config.
The remainder of this change updates the core code to use _Py_GET_GLOBAL_IDENTIFIER() instead of _Py_IDENTIFIER() and the related _Py*Id functions (likewise for _Py_GET_GLOBAL_STRING() instead of _Py_static_string()). This includes adding a few functions where there wasn't already an alternative to _Py*Id(), replacing the _Py_Identifier * parameter with PyObject *.
The following are not changed (yet):
* stop using _Py_IDENTIFIER() in the stdlib modules
* (maybe) get rid of _Py_IDENTIFIER(), etc. entirely -- this may not be doable as at least one package on PyPI using this (private) API
* (maybe) intern the strings during runtime init
https://bugs.python.org/issue46541
@pablogsal, sorry i failed to rebase to main, so i recreated https://github.com/python/cpython/pull/22190#issuecomment-1024633392
> PyRun_InteractiveOne\*() functions allow to explicitily set fd instead of stdin.
but stdin was hardcoded in readline call.
> This patch does not fix target file for prompt unlike original bpo one : prompt fd is unrelated to tokenizer source which could be read only. It is more of a bugfix regarding the docs : actual documentation say "prompt the user" so one would expect prompt to go on stdout not a file for both PyRun_InteractiveOne\*() and PyRun_InteractiveLoop\*().
Automerge-Triggered-By: GH:pablogsal
There are two errors that this commit fixes:
* The parser was not correctly computing the offset and the string
source for E_LINECONT errors due to the incorrect usage of strtok().
* The parser was not correctly unwinding the call stack when a tokenizer
exception happened in rules involving optionals ('?', [...]) as we
always make them return valid results by using the comma operator. We
need to check first if we don't have an error before continuing.
They support now splitting escape sequences between input chunks.
Add the third parameter "final" in codecs.unicode_escape_decode().
It is True by default to match the former behavior.