When trying to extract the error line for the error message there
are two distinct cases:
1. The input comes from a file, which means that we can extract the
error line by using `PyErr_ProgramTextObject` and which we already
do.
2. The input does not come from a file, at which point we need to get
the source code from the tokenizer:
* If the tokenizer's current line number is the same with the line
of the error, we get the line from `tok->buf` and we're ready.
* Else, we can extract the error line from the source code in the
following two ways:
* If the input comes from a string we have all the input
in `tok->str` and we can extract the error line from it.
* If the input comes from stdin, i.e. the interactive prompt, we
do not have access to the previous line. That's why a new
field `tok->stdin_content` is added which holds the whole input for the
current (multiline) statement or expression. We can then extract the
error line from `tok->stdin_content` like we do in the string case above.
Co-authored-by: Pablo Galindo <Pablogsal@gmail.com>
* Implement running the parser a second time for the errors messages
The first parser run is only responsible for detecting whether
there is a `SyntaxError` or not. If there isn't the AST gets returned.
Otherwise, the parser is run a second time with all the `invalid_*`
rules enabled so that all the customized error messages get produced.
- Use the proper asdl sequence when creating empty arguments
- Remove reduntant casts (thanks to new typed asdl_sequences)
- Remove MarshalPrototypeVisitor and some utilities from asdl generator
- Fix the header of `Python/ast.c` (kept from pgen times)
Automerge-Triggered-By: @pablogsal
* Add new capability to the PEG parser to type variable assignments. For instance:
```
| a[asdl_stmt_seq*]=';'.small_stmt+ [';'] NEWLINE { a }
```
* Add new sequence types from the asdl definition (automatically generated)
* Make `asdl_seq` type a generic aliasing pointer type.
* Create a new `asdl_generic_seq` for the generic case using `void*`.
* The old `asdl_seq_GET`/`ast_seq_SET` macros now are typed.
* New `asdl_seq_GET_UNTYPED`/`ast_seq_SET_UNTYPED` macros for dealing with generic sequences.
* Changes all possible `asdl_seq` types to use specific versions everywhere.
This program can segfault the parser by stack overflow:
```
import ast
code = "f(" + ",".join(['a' for _ in range(100000)]) + ")"
print("Ready!")
ast.parse(code)
```
the reason is that the rule for arguments has a simple recursion when collecting args:
args[expr_ty]:
[...]
| a=named_expression b=[',' c=args { c }] {
[...] }
This will improve the debug experience if something fails in the produced AST. Previously, errors in the produced AST can be felt much later like in the garbage collector or the compiler, making debugging them much more difficult.
This commit changes the parsing of f-string expressions with the new parser. The parser gets pre-fed with the location of the expression itself (not the f-string, which was what we were doing before). This allows us to completely skip the shifting of the AST nodes after the parsing is completed.
The following error messages get produced:
- `cannot delete ...` for invalid `del` targets
- `... is an illegal 'for' target` for invalid targets in for
statements
- `... is an illegal 'with' target` for invalid targets in
with statements
Additionally, a few `cut`s were added in various places before the
invocation of the `invalid_*` rule, in order to speed things
up.
Co-authored-by: Pablo Galindo <Pablogsal@gmail.com>
This commit removes the old parser, the deprecated parser module, the old parser compatibility flags and environment variables and all associated support code and documentation.