Commit Graph

53 Commits

Author SHA1 Message Date
Pablo Galindo bafe0aade5
bpo-44335: Ensure the tokenizer doesn't go into Python with the error set (GH-26608) 2021-06-08 20:02:03 +01:00
Pablo Galindo d334c73b56
bpo-44335: Fix a regression when identifying invalid characters in syntax errors (GH-26589) 2021-06-08 12:25:22 +01:00
Serhiy Storchaka 39dd141a4b
bpo-44273: Improve syntax error message for assigning to "..." (GH-26477)
Use "ellipsis" instead of "Ellipsis" in syntax error messages to eliminate confusion with built-in variable Ellipsis.
2021-06-01 12:07:05 +01:00
Pablo Galindo bd7476dae3
bpo-44201: Avoid side effects of "invalid_*" rules in the REPL (GH-26298)
When the parser does a second pass to check for errors, these rules can
have some small side-effects as they may advance the parser more than
the point reached in the first pass. This can cause the tokenizer to ask
for extra tokens in interactive mode causing the tokenizer to show the
prompt instead of failing instantly.

To avoid this, add a new mode to the tokenizer that is activated in the
second pass and deactivates asking for new tokens when the interactive
line is finished. As the parsing should have reached the last line in
the first pass, the second pass should not need to ask for more tokens.
2021-05-22 23:05:00 +01:00
Pablo Galindo c878a97968
bpo-44180: Fix edge cases in invalid assigment rules in the parser (GH-26283)
The invalid assignment rules are very delicate since the parser can
easily raise an invalid assignment when a keyword argument is provided.
As they are very deep into the grammar tree, is very difficult to
specify in which contexts these rules can be used and in which don't.
For that, we need to use a different version of the rule that doesn't do
error checking in those situations where we don't want the rule to raise
(keyword arguments and generator expressions).

We also need to check if we are in left-recursive rule, as those can try
to eagerly advance the parser even if the parse will fail at the end of
the expression. Failing to do this allows the parser to start parsing a
call as a tuple and incorrectly identify a keyword argument as an
invalid assignment, before it realizes that it was not a tuple after all.
2021-05-21 18:34:54 +01:00
Pablo Galindo b51081c1a8
bpo-44180: Report generic syntax errors in the furthest position reached in the first parser pass (GH-26253) 2021-05-21 16:09:51 +01:00
Pablo Galindo 80b089179f
bpo-44143: Fix crash in the parser when raising tokenizer errors with an exception set (GH-26144) 2021-05-15 17:58:02 +01:00
Pablo Galindo 9142088e74
bpo-43822: Prioritize tokenizer errors over custom syntax errors when raising parser exceptions (GH-25866) 2021-05-04 01:32:46 +01:00
Brandt Bucher dbe60ee09d
bpo-43892: Validate the first term of complex literal value patterns (GH-25735) 2021-04-29 17:19:28 -07:00
Nick Coghlan 1e7b858575
bpo-43892: Make match patterns explicit in the AST (GH-25585)
Co-authored-by: Brandt Bucher <brandtbucher@gmail.com>
2021-04-28 22:58:44 -07:00
Pablo Galindo a77aac4fca
bpo-43914: Highlight invalid ranges in SyntaxErrors (#25525)
To improve the user experience understanding what part of the error messages associated with SyntaxErrors is wrong, we can highlight the whole error range and not only place the caret at the first character. In this way:

>>> foo(x, z for z in range(10), t, w)
  File "<stdin>", line 1
    foo(x, z for z in range(10), t, w)
           ^
SyntaxError: Generator expression must be parenthesized

becomes

>>> foo(x, z for z in range(10), t, w)
  File "<stdin>", line 1
    foo(x, z for z in range(10), t, w)
           ^^^^^^^^^^^^^^^^^^^^
SyntaxError: Generator expression must be parenthesized
2021-04-23 14:27:05 +01:00
Pablo Galindo b280248be8
bpo-43822: Improve syntax errors for missing commas (GH-25377) 2021-04-15 21:38:45 +01:00
Pablo Galindo b86ed8e3bb
bpo-43797: Improve syntax error for invalid comparisons (#25317)
* bpo-43797: Improve syntax error for invalid comparisons

* Update Lib/test/test_fstring.py

Co-authored-by: Guido van Rossum <gvanrossum@gmail.com>

* Apply review comments

* can't -> cannot

Co-authored-by: Guido van Rossum <gvanrossum@gmail.com>
2021-04-12 16:59:30 +01:00
Matthew Suozzo 75a06f067b
bpo-43798: Add source location attributes to alias (GH-25324)
* Add source location attributes to alias.
* Move alias star construction to pegen helper.

Co-authored-by: blurb-it[bot] <43283697+blurb-it[bot]@users.noreply.github.com>
Co-authored-by: Pablo Galindo <Pablogsal@gmail.com>
2021-04-10 22:56:28 +02:00
Pablo Galindo d00a449d6d
Simplify _PyPegen_fill_token in pegen.c (GH-25295) 2021-04-09 01:32:25 +01:00
Pablo Galindo 58bafe42ab
Sanitize macros and debug functions in pegen.c (GH-25291) 2021-04-09 01:17:31 +01:00
Pablo Galindo 4f642dae4e
Break down some complex functions in pegen.c for readability (GH-25292) 2021-04-09 00:48:53 +01:00
Erlend Egeberg Aasland c0e11a3ceb
Fix possible refleak involving _PyArena_AddPyObject (GH-25289) 2021-04-09 00:05:44 +01:00
Victor Stinner d27f8d2e07
bpo-43244: Rename pycore_ast.h functions to _PyAST_xxx() (GH-25252)
Rename AST functions of pycore_ast.h to use the "_PyAST_" prefix.
Remove macros creating aliases without prefix. For example, Module()
becomes _PyAST_Module(). Update Grammar/python.gram to use
_PyAST_xxx() functions.
2021-04-07 21:34:22 +02:00
Victor Stinner 8370e07e1e
bpo-43244: Remove the pyarena.h header (GH-25007)
Remove the pyarena.h header file with functions:

* PyArena_New()
* PyArena_Free()
* PyArena_Malloc()
* PyArena_AddPyObject()

These functions were undocumented, excluded from the limited C API,
and were only used internally by the compiler.

Add pycore_pyarena.h header. Rename functions:

* PyArena_New() => _PyArena_New()
* PyArena_Free() => _PyArena_Free()
* PyArena_Malloc() => _PyArena_Malloc()
* PyArena_AddPyObject() => _PyArena_AddPyObject()
2021-03-24 02:23:01 +01:00
Victor Stinner 57364ce34e
bpo-43244: Remove parser_interface.h header file (GH-25001)
Remove parser functions using the "struct _mod" type, because the
AST C API was removed:

* PyParser_ASTFromFile()
* PyParser_ASTFromFileObject()
* PyParser_ASTFromFilename()
* PyParser_ASTFromString()
* PyParser_ASTFromStringObject()

These functions were undocumented and excluded from the limited C
API.

Add pycore_parser.h internal header file. Rename functions:

* PyParser_ASTFromFileObject() => _PyParser_ASTFromFile()
* PyParser_ASTFromStringObject() => _PyParser_ASTFromString()

These functions are no longer exported (replace PyAPI_FUNC() with
extern).

Remove also _PyPegen_run_parser_from_file() function. Update
test_peg_generator to use _PyPegen_run_parser_from_file_pointer()
instead.
2021-03-24 01:29:09 +01:00
Victor Stinner 94faa0724f
bpo-43244: Remove ast.h, asdl.h, Python-ast.h headers (GH-24933)
These functions were undocumented and excluded from the limited C
API.

Most names defined by these header files were not prefixed by "Py"
and so could create names conflicts. For example, Python-ast.h
defined a "Yield" macro which was conflict with the "Yield" name used
by the Windows <winbase.h> header.

Use the Python ast module instead.

* Move Include/asdl.h to Include/internal/pycore_asdl.h.
* Move Include/Python-ast.h to Include/internal/pycore_ast.h.
* Remove ast.h header file.
* pycore_symtable.h no longer includes Python-ast.h.
2021-03-23 20:47:40 +01:00
Pablo Galindo 96eeff5162
bpo-43555: Report the column offset for invalid line continuation character (GH-24939) 2021-03-22 17:28:11 +00:00
Pablo Galindo 123ff266cd
bpo-43591: Fix error location in interactive mode for errors at the end of the line (GH-24973)
Co-authored-by: Erlend Egeberg Aasland
2021-03-22 16:24:39 +00:00
Victor Stinner eec8e61992
bpo-43244: Remove the PyAST_Validate() function (GH-24911)
Remove the PyAST_Validate() function. It is no longer possible to
build a AST object (mod_ty type) with the public C API. The function
was already excluded from the limited C API (PEP 384).

Rename PyAST_Validate() function to _PyAST_Validate(), move it to the
internal C API, and don't export it anymore (replace PyAPI_FUNC with
extern).

The function was added in bpo-12575 by
the commit 832bfe2ebd.
2021-03-18 14:57:49 +01:00
Victor Stinner e0bf70d08c
bpo-43244: Fix test_peg_generator for PyAST_Validate() (GH-24912)
test_peg_generator now defines _Py_TEST_PEGEN macro when building C
code to not call PyAST_Validate() in Parser/pegen.c. Moreover, it
defines Py_BUILD_CORE_MODULE macro to get access to the internal
C API.

Remove "global_ast_state" from Python-ast.c when it's built by
test_peg_generator: always get the AST state from the current interpreter.
2021-03-18 02:46:06 +01:00
Pablo Galindo cd8dcbc851
bpo-43410: Fix crash in the parser when producing syntax errors when reading from stdin (GH-24763) 2021-03-14 04:38:40 +01:00
Pablo Galindo 58fb156edd
bpo-42997: Improve error message for missing : before suites (GH-24292)
* Add to the peg generator a new directive ('&&') that allows to expect
  a token and hard fail the parsing if the token is not found. This
  allows to quickly emmit syntax errors for missing tokens.

* Use the new grammar element to hard-fail if the ':' is missing before
  suites.
2021-02-02 19:54:22 +00:00
Pablo Galindo 4090151816
bpo-42986: Fix parser crash when reporting syntax errors in f-string with newlines (GH-24279) 2021-01-31 22:48:23 +00:00
Batuhan Taskaya a698d52c39
bpo-40176: Improve error messages for unclosed string literals (GH-19346)
Automerge-Triggered-By: GH:isidentical
2021-01-20 13:38:47 -08:00
Pablo Galindo c3f167d7b2
bpo-42864: Simplify the tokenizer exceptions after generic SyntaxError (GH-24273)
Automerge-Triggered-By: GH:pablogsal
2021-01-20 11:11:56 -08:00
Pablo Galindo d6d6371447
bpo-42864: Improve error messages regarding unclosed parentheses (GH-24161) 2021-01-19 23:59:33 +00:00
Lysandros Nikolaou e5fe509054
bpo-42827: Fix crash on SyntaxError in multiline expressions (GH-24140)
When trying to extract the error line for the error message there
are two distinct cases:

1. The input comes from a file, which means that we can extract the
   error line by using `PyErr_ProgramTextObject` and which we already
   do.
2. The input does not come from a file, at which point we need to get
   the source code from the tokenizer:
   * If the tokenizer's current line number is the same with the line
     of the error, we get the line from `tok->buf` and we're ready.
   * Else, we can extract the error line from the source code in the
     following two ways:
     * If the input comes from a string we have all the input
       in `tok->str` and we can extract the error line from it.
     * If the input comes from stdin, i.e. the interactive prompt, we
       do not have access to the previous line. That's why a new
       field `tok->stdin_content` is added which holds the whole input for the
       current (multiline) statement or expression. We can then extract the
       error line from `tok->stdin_content` like we do in the string case above.

Co-authored-by: Pablo Galindo <Pablogsal@gmail.com>
2021-01-14 21:36:30 +00:00
Pablo Galindo 06f8c3328d
bpo-42214: Fix check for NOTEQUAL token in the PEG parser for the barry_as_flufl rule (GH-23048) 2020-10-30 23:48:42 +00:00
Batuhan Taskaya 3af4b58552
bpo-42206: Propagate and raise errors from PyAST_Validate in the parser (GH-23035) 2020-10-30 11:48:41 +00:00
Lysandros Nikolaou bca7014032
bpo-42123: Run the parser two times and only enable invalid rules on the second run (GH-22111)
* Implement running the parser a second time for the errors messages

The first parser run is only responsible for detecting whether
there is a `SyntaxError` or not. If there isn't the AST gets returned.
Otherwise, the parser is run a second time with all the `invalid_*`
rules enabled so that all the customized error messages get produced.
2020-10-27 00:42:04 +02:00
Pablo Galindo e68c67805e
bpo-42150: Avoid buffer overflow in the new parser (GH-22978) 2020-10-25 23:03:41 +00:00
Batuhan Taskaya 02a1603f91
bpo-42000: Cleanup the AST related C-code (GH-22641)
- Use the proper asdl sequence when creating empty arguments
- Remove reduntant casts (thanks to new typed asdl_sequences)
- Remove MarshalPrototypeVisitor and some utilities from asdl generator
- Fix the header of `Python/ast.c` (kept from pgen times)

Automerge-Triggered-By: @pablogsal
2020-10-10 10:14:59 -07:00
Pablo Galindo a5634c4067
bpo-41746: Add type information to asdl_seq objects (GH-22223)
* Add new capability to the PEG parser to type variable assignments. For instance:
```
       | a[asdl_stmt_seq*]=';'.small_stmt+ [';'] NEWLINE { a }
```

* Add new sequence types from the asdl definition (automatically generated)
* Make `asdl_seq` type a generic aliasing pointer type.
* Create a new `asdl_generic_seq` for the generic case using `void*`.
* The old `asdl_seq_GET`/`ast_seq_SET` macros now are typed.
* New `asdl_seq_GET_UNTYPED`/`ast_seq_SET_UNTYPED` macros for dealing with generic sequences.
* Changes all possible `asdl_seq` types to use specific versions everywhere.
2020-09-16 19:42:00 +01:00
Pablo Galindo 315a61f7a9
bpo-41697: Correctly handle KeywordOrStarred when parsing arguments in the parser (GH-22077) 2020-09-03 15:29:32 +01:00
Pablo Galindo 4a97b1517a
bpo-41690: Use a loop to collect args in the parser instead of recursion (GH-22053)
This program can segfault the parser by stack overflow:

```
import ast

code = "f(" + ",".join(['a' for _ in range(100000)]) + ")"
print("Ready!")
ast.parse(code)
```

the reason is that the rule for arguments has a simple recursion when collecting args:

args[expr_ty]:
    [...]
    | a=named_expression b=[',' c=args { c }] {
        [...] }
2020-09-02 17:44:19 +01:00
Pablo Galindo 1332226b32
Validate the AST produced by the parser in debug mode (GH-21643)
This will improve the debug experience if something fails in the produced AST. Previously, errors in the produced AST can be felt much later like in the garbage collector or the compiler, making debugging them much more difficult.
2020-07-27 23:46:59 +01:00
Lysandros Nikolaou 782f44b8fb
bpo-41215: Make assertion in the new parser more strict (GH-21364) 2020-07-07 01:42:21 +03:00
Pablo Galindo 1ac0cbca36
bpo-41215: Don't use NULL by default in the PEG parser keyword list (GH-21355)
Automerge-Triggered-By: @lysnikolaou
2020-07-06 12:31:16 -07:00
Guido van Rossum 9d197c7d48
bpo-35975: Only use cf_feature_version if PyCF_ONLY_AST in cf_flags (#21021) 2020-06-27 17:33:49 -07:00
Lysandros Nikolaou 1f0f4abb11
bpo-41076: Pre-feed the parser with the f-string expression location (GH-21054)
This commit changes the parsing of f-string expressions with the new parser. The parser gets pre-fed with the location of the expression itself (not the f-string, which was what we were doing before). This allows us to completely skip the shifting of the AST nodes after the parsing is completed.
2020-06-28 00:41:48 +01:00
Lysandros Nikolaou 6dcbc2422d
bpo-41132: Use pymalloc allocator in the f-string parser (GH-21173) 2020-06-27 18:47:00 +01:00
Lysandros Nikolaou 2e0a920e9e
bpo-41084: Adjust message when an f-string expression causes a SyntaxError (GH-21084)
Prefix the error message with `fstring: `, when parsing an f-string expression throws a `SyntaxError`.
2020-06-26 12:24:05 +01:00
Lysandros Nikolaou 861efc6e8f
bpo-40958: Avoid 'possible loss of data' warning on Windows (GH-20970) 2020-06-20 05:57:27 -07:00
Lysandros Nikolaou 01ece63d42
bpo-40334: Produce better error messages on invalid targets (GH-20106)
The following error messages get produced:
- `cannot delete ...` for invalid `del` targets
- `... is an illegal 'for' target` for invalid targets in for
  statements
- `... is an illegal 'with' target` for invalid targets in
  with statements

Additionally, a few `cut`s were added in various places before the
invocation of the `invalid_*` rule, in order to speed things
up.

Co-authored-by: Pablo Galindo <Pablogsal@gmail.com>
2020-06-19 00:10:43 +01:00