Remove the token.h header file. There was never any public tokenizer
C API. The token.h header file was only designed to be used by Python
internals.
Move Include/token.h to Include/internal/pycore_token.h. Including
this header file now requires that the Py_BUILD_CORE macro is
defined. It no longer checks for the Py_LIMITED_API macro.
Rename functions:
* PyToken_OneChar() => _PyToken_OneChar()
* PyToken_TwoChars() => _PyToken_TwoChars()
* PyToken_ThreeChars() => _PyToken_ThreeChars()
The traceback.c and traceback.py mechanisms now utilize the newly added code.co_positions and PyCode_Addr2Location
to print carets on the specific expressions involved in a traceback.
Co-authored-by: Pablo Galindo <Pablogsal@gmail.com>
Co-authored-by: Ammar Askar <ammar@ammaraskar.com>
Co-authored-by: Batuhan Taskaya <batuhanosmantaskaya@gmail.com>
The invalid assignment rules are very delicate since the parser can
easily raise an invalid assignment when a keyword argument is provided.
As they are very deep into the grammar tree, is very difficult to
specify in which contexts these rules can be used and in which don't.
For that, we need to use a different version of the rule that doesn't do
error checking in those situations where we don't want the rule to raise
(keyword arguments and generator expressions).
We also need to check if we are in left-recursive rule, as those can try
to eagerly advance the parser even if the parse will fail at the end of
the expression. Failing to do this allows the parser to start parsing a
call as a tuple and incorrectly identify a keyword argument as an
invalid assignment, before it realizes that it was not a tuple after all.
To improve the user experience understanding what part of the error messages associated with SyntaxErrors is wrong, we can highlight the whole error range and not only place the caret at the first character. In this way:
>>> foo(x, z for z in range(10), t, w)
File "<stdin>", line 1
foo(x, z for z in range(10), t, w)
^
SyntaxError: Generator expression must be parenthesized
becomes
>>> foo(x, z for z in range(10), t, w)
File "<stdin>", line 1
foo(x, z for z in range(10), t, w)
^^^^^^^^^^^^^^^^^^^^
SyntaxError: Generator expression must be parenthesized
* Add source location attributes to alias.
* Move alias star construction to pegen helper.
Co-authored-by: blurb-it[bot] <43283697+blurb-it[bot]@users.noreply.github.com>
Co-authored-by: Pablo Galindo <Pablogsal@gmail.com>
Remove parser functions using the "struct _mod" type, because the
AST C API was removed:
* PyParser_ASTFromFile()
* PyParser_ASTFromFileObject()
* PyParser_ASTFromFilename()
* PyParser_ASTFromString()
* PyParser_ASTFromStringObject()
These functions were undocumented and excluded from the limited C
API.
Add pycore_parser.h internal header file. Rename functions:
* PyParser_ASTFromFileObject() => _PyParser_ASTFromFile()
* PyParser_ASTFromStringObject() => _PyParser_ASTFromString()
These functions are no longer exported (replace PyAPI_FUNC() with
extern).
Remove also _PyPegen_run_parser_from_file() function. Update
test_peg_generator to use _PyPegen_run_parser_from_file_pointer()
instead.
These functions were undocumented and excluded from the limited C
API.
Most names defined by these header files were not prefixed by "Py"
and so could create names conflicts. For example, Python-ast.h
defined a "Yield" macro which was conflict with the "Yield" name used
by the Windows <winbase.h> header.
Use the Python ast module instead.
* Move Include/asdl.h to Include/internal/pycore_asdl.h.
* Move Include/Python-ast.h to Include/internal/pycore_ast.h.
* Remove ast.h header file.
* pycore_symtable.h no longer includes Python-ast.h.
* Add to the peg generator a new directive ('&&') that allows to expect
a token and hard fail the parsing if the token is not found. This
allows to quickly emmit syntax errors for missing tokens.
* Use the new grammar element to hard-fail if the ':' is missing before
suites.
* Implement running the parser a second time for the errors messages
The first parser run is only responsible for detecting whether
there is a `SyntaxError` or not. If there isn't the AST gets returned.
Otherwise, the parser is run a second time with all the `invalid_*`
rules enabled so that all the customized error messages get produced.
* Add new capability to the PEG parser to type variable assignments. For instance:
```
| a[asdl_stmt_seq*]=';'.small_stmt+ [';'] NEWLINE { a }
```
* Add new sequence types from the asdl definition (automatically generated)
* Make `asdl_seq` type a generic aliasing pointer type.
* Create a new `asdl_generic_seq` for the generic case using `void*`.
* The old `asdl_seq_GET`/`ast_seq_SET` macros now are typed.
* New `asdl_seq_GET_UNTYPED`/`ast_seq_SET_UNTYPED` macros for dealing with generic sequences.
* Changes all possible `asdl_seq` types to use specific versions everywhere.
This program can segfault the parser by stack overflow:
```
import ast
code = "f(" + ",".join(['a' for _ in range(100000)]) + ")"
print("Ready!")
ast.parse(code)
```
the reason is that the rule for arguments has a simple recursion when collecting args:
args[expr_ty]:
[...]
| a=named_expression b=[',' c=args { c }] {
[...] }
`GET_INVALID_TARGET` might unexpectedly return `NULL`, which if not
caught will cause a SEGFAULT. Therefore, this commit introduces a new
inline function `RAISE_SYNTAX_ERROR_INVALID_TARGET` that always
checks for `GET_INVALID_TARGET` returning NULL and can be used in
the grammar, replacing the long C ternary operation used till now.
The following error messages get produced:
- `cannot delete ...` for invalid `del` targets
- `... is an illegal 'for' target` for invalid targets in for
statements
- `... is an illegal 'with' target` for invalid targets in
with statements
Additionally, a few `cut`s were added in various places before the
invocation of the `invalid_*` rule, in order to speed things
up.
Co-authored-by: Pablo Galindo <Pablogsal@gmail.com>
This commit removes the old parser, the deprecated parser module, the old parser compatibility flags and environment variables and all associated support code and documentation.