Fix parsing a numeric literal immediately (without spaces) followed by
"not in" keywords, like in "1not in x". Now the parser only emits
a warning, not a syntax error.
We're no longer using _Py_IDENTIFIER() (or _Py_static_string()) in any core CPython code. It is still used in a number of non-builtin stdlib modules.
The replacement is: PyUnicodeObject (not pointer) fields under _PyRuntimeState, statically initialized as part of _PyRuntime. A new _Py_GET_GLOBAL_IDENTIFIER() macro facilitates lookup of the fields (along with _Py_GET_GLOBAL_STRING() for non-identifier strings).
https://bugs.python.org/issue46541#msg411799 explains the rationale for this change.
The core of the change is in:
* (new) Include/internal/pycore_global_strings.h - the declarations for the global strings, along with the macros
* Include/internal/pycore_runtime_init.h - added the static initializers for the global strings
* Include/internal/pycore_global_objects.h - where the struct in pycore_global_strings.h is hooked into _PyRuntimeState
* Tools/scripts/generate_global_objects.py - added generation of the global string declarations and static initializers
I've also added a --check flag to generate_global_objects.py (along with make check-global-objects) to check for unused global strings. That check is added to the PR CI config.
The remainder of this change updates the core code to use _Py_GET_GLOBAL_IDENTIFIER() instead of _Py_IDENTIFIER() and the related _Py*Id functions (likewise for _Py_GET_GLOBAL_STRING() instead of _Py_static_string()). This includes adding a few functions where there wasn't already an alternative to _Py*Id(), replacing the _Py_Identifier * parameter with PyObject *.
The following are not changed (yet):
* stop using _Py_IDENTIFIER() in the stdlib modules
* (maybe) get rid of _Py_IDENTIFIER(), etc. entirely -- this may not be doable as at least one package on PyPI using this (private) API
* (maybe) intern the strings during runtime init
https://bugs.python.org/issue46541
@pablogsal, sorry i failed to rebase to main, so i recreated https://github.com/python/cpython/pull/22190#issuecomment-1024633392
> PyRun_InteractiveOne\*() functions allow to explicitily set fd instead of stdin.
but stdin was hardcoded in readline call.
> This patch does not fix target file for prompt unlike original bpo one : prompt fd is unrelated to tokenizer source which could be read only. It is more of a bugfix regarding the docs : actual documentation say "prompt the user" so one would expect prompt to go on stdout not a file for both PyRun_InteractiveOne\*() and PyRun_InteractiveLoop\*().
Automerge-Triggered-By: GH:pablogsal
There are two errors that this commit fixes:
* The parser was not correctly computing the offset and the string
source for E_LINECONT errors due to the incorrect usage of strtok().
* The parser was not correctly unwinding the call stack when a tokenizer
exception happened in rules involving optionals ('?', [...]) as we
always make them return valid results by using the comma operator. We
need to check first if we don't have an error before continuing.
They support now splitting escape sequences between input chunks.
Add the third parameter "final" in codecs.unicode_escape_decode().
It is True by default to match the former behavior.
* Move _PyObject_CallNoArgs() to pycore_call.h (internal C API).
* _ssl, _sqlite and _testcapi extensions now call the public
PyObject_CallNoArgs() function, rather than _PyObject_CallNoArgs().
* _lsprof extension is now built with Py_BUILD_CORE_MODULE macro
defined to get access to internal _PyObject_CallNoArgs().
Fix typo in the private _PyObject_CallNoArg() function name: rename
it to _PyObject_CallNoArgs() to be consistent with the public
function PyObject_CallNoArgs().
The traceback.c and traceback.py mechanisms now utilize the newly added code.co_positions and PyCode_Addr2Location
to print carets on the specific expressions involved in a traceback.
Co-authored-by: Pablo Galindo <Pablogsal@gmail.com>
Co-authored-by: Ammar Askar <ammar@ammaraskar.com>
Co-authored-by: Batuhan Taskaya <batuhanosmantaskaya@gmail.com>
This is something I noticed while (now discontinued) experimenting
with the idea of annotating operators with location information. Unfortunately
without this addition, adding any `attributes` to stuff like `unaryop`
doesn't change anything since the code assumes they are singletons and
caches all instances. This patch fixes this assumption with including
the attributes as well as constructor fields.
ASDL Generator was lack of proper annotation related to generated
module. This patch implements a MetadataVisitor that produces a
metadata object to pass to other visitors that are visiting that
same module. For the inital patch, it dynamically retrieves int
sequences (like cmpop), that was previously hardcoded. It offers
an interface that is easy to extend.
Emit a deprecation warning if the numeric literal is immediately followed by
one of keywords: and, else, for, if, in, is, or. Raise a syntax error with
more informative message if it is immediately followed by other keyword or
identifier.
Automerge-Triggered-By: GH:pablogsal
When compiling an AST object with a direct / indirect reference
cycles, on the conversion phase because of exceeding amount of
calls, a segfault was raised. This patch adds recursion guards to
places for preventing user inputs to not to crash AST but instead
raise a RecursionError.
When the parser does a second pass to check for errors, these rules can
have some small side-effects as they may advance the parser more than
the point reached in the first pass. This can cause the tokenizer to ask
for extra tokens in interactive mode causing the tokenizer to show the
prompt instead of failing instantly.
To avoid this, add a new mode to the tokenizer that is activated in the
second pass and deactivates asking for new tokens when the interactive
line is finished. As the parsing should have reached the last line in
the first pass, the second pass should not need to ask for more tokens.
The invalid assignment rules are very delicate since the parser can
easily raise an invalid assignment when a keyword argument is provided.
As they are very deep into the grammar tree, is very difficult to
specify in which contexts these rules can be used and in which don't.
For that, we need to use a different version of the rule that doesn't do
error checking in those situations where we don't want the rule to raise
(keyword arguments and generator expressions).
We also need to check if we are in left-recursive rule, as those can try
to eagerly advance the parser even if the parse will fail at the end of
the expression. Failing to do this allows the parser to start parsing a
call as a tuple and incorrectly identify a keyword argument as an
invalid assignment, before it realizes that it was not a tuple after all.
This works by not caching the handle and instead getting the handle from
the file descriptor each time, so that if the actual handle changes by
fd redirection closing/opening the console handle beneath our feet, we
will keep working correctly.