Commit Graph

822 Commits

Author SHA1 Message Date
Serhiy Storchaka 74ea6b5a75
bpo-40593: Improve syntax errors for invalid characters in source code. (GH-20033) 2020-05-12 12:42:04 +03:00
Shantanu 27c0d9b54a
bpo-40334: produce specialized errors for invalid del targets (GH-19911) 2020-05-11 14:53:58 -07:00
Pablo Galindo 5b956ca42d
bpo-40585: Normalize errors messages in codeop when comparing them (GH-20030)
With the new parser, the error message contains always the trailing
newlines, causing the comparison of the repr of the error messages
in codeop to fail. This commit makes the new parser mirror the old parser's
behaviour regarding trailing newlines.
2020-05-11 01:41:26 +01:00
Pablo Galindo ac7a92cc0a
bpo-40334: Avoid collisions between parser variables and grammar variables (GH-19987)
This is for the C generator:
- Disallow rule and variable names starting with `_`
- Rename most local variable names generated by the parser to start with `_`

Exceptions:
- Renaming `p` to `_p` will be a separate PR
- There are still some names that might clash, e.g.
  - anything starting with `Py`
  - C reserved words (`if` etc.)
  - Macros like `EXTRA` and `CHECK`
2020-05-09 21:34:50 -07:00
Joannah Nanjekye d10091aa17
bpo-40502: Initialize n->n_col_offset (GH-19988)
* initialize n->n_col_offset

* 📜🤖 Added by blurb_it.

* Move initialization

Co-authored-by: nanjekyejoannah <joannah.nanjekye@ibm.com>
Co-authored-by: blurb-it[bot] <43283697+blurb-it[bot]@users.noreply.github.com>
2020-05-08 17:58:28 -03:00
Pablo Galindo db9163ceef
bpo-40555: Check for p->error_indicator in loop rules after the main loop is done (GH-19986) 2020-05-08 03:38:44 +01:00
Lysandros Nikolaou 4638c64295
bpo-40334: Error message for invalid default args in function call (GH-19973)
When parsing something like `f(g()=2)`, where the name of a default arg
is not a NAME, but an arbitrary expression, a specialised error message
is emitted.
2020-05-07 11:44:06 +01:00
Lysandros Nikolaou 2f37c355ab
bpo-40334: Fix error location upon parsing an invalid string literal (GH-19962)
When parsing a string with an invalid escape, the old parser used to
point to the beginning of the invalid string. This commit changes the new
parser to match that behaviour, since it's currently pointing to the
end of the string (or to be more precise, to the beginning of the next
token).
2020-05-07 11:37:51 +01:00
Pablo Galindo 470aac4d8e
bpo-40334: Generate comments in the parser code to improve debugging (GH-19966) 2020-05-06 23:14:43 +01:00
Pablo Galindo 99db2a1db7
bpo-40334: Allow trailing comma in parenthesised context managers (GH-19964) 2020-05-06 22:54:34 +01:00
Lysandros Nikolaou 999ec9ab6a
bpo-40334: Add type to the assignment rule in the grammar file (GH-19963) 2020-05-06 19:11:04 +01:00
Batuhan Taskaya 091951a67c
bpo-40528: Improve and clear several aspects of the ASDL definition code for the AST (GH-19952) 2020-05-06 15:29:32 +01:00
Lysandros Nikolaou 846d8b28ab
bpo-40246: Revert reporting of invalid string prefixes (GH-19888)
Due to backwards compatibility concerns regarding keywords immediately followed by a string without whitespace between them (like in `bg="#d00" if clear else"#fca"`) will fail to parse,
commit 41d5b94af4 has to be reverted.
2020-05-04 12:32:18 +01:00
Lysandros Nikolaou e10e7c771b
bpo-40334: Spacialized error message for invalid args after bare '*' (GH-19865)
When parsing things like `def f(*): pass` the old parser used to output `SyntaxError: named arguments must follow bare *`, which the new parser wasn't able to do.
2020-05-04 11:58:31 +01:00
Shantanu c3f001461d
bpo-40491: Fix typo in syntax error for numeric literals (GH-19893) 2020-05-04 11:13:30 +03:00
Shantanu 603d354626
bpo-40493: fix function type comment parsing (GH-19894)
The grammar for func_type_input rejected things like `(*t1) ->t2`. This fixes that.

Automerge-Triggered-By: @gvanrossum
2020-05-03 22:08:14 -07:00
Lysandros Nikolaou 7f06af684a
bpo-40334: Set error_indicator in _PyPegen_raise_error (GH-19887)
Due to PyErr_Occurred not being called at the beginning of each rule, we need to set the error indicator, so that rules do not get expanded after an exception has been thrown
2020-05-04 01:20:09 +01:00
Lysandros Nikolaou 03b7642265
bpo-40334: Make the PyPegen* and PyParser* APIs more consistent (GH-19839)
This commit makes both APIs more consistent by doing the following:
- Remove the `PyPegen_CodeObjectFrom*` functions, which weren't used 
  and will probably not be needed. Functions like `Py_CompileStringObject`
  can be used instead.
- Include a `const char *filename` parameter in `PyPegen_ASTFromString`.
- Rename `PyPegen_ASTFromFile` to `PyPegen_ASTFromFilename`, because
  its signature is not the same with `PyParser_ASTFromFile`.
2020-05-01 18:30:51 +01:00
Guido van Rossum d9d6eadf00
Ensure that tok->type_comments is set on every path (GH-19828) 2020-05-01 17:42:32 +01:00
Guido van Rossum 3941d9700b
bpo-40334: Refactor lambda_parameters similar to parameters (GH-19830) 2020-05-01 17:42:03 +01:00
Pablo Galindo d955241469
bpo-40334: Correct return value of func_type_comment (GH-19833) 2020-05-01 08:32:09 -07:00
Batuhan Taskaya 76c1b4d5c5
bpo-40334: Improve column offsets for thrown syntax errors by Pegen (GH-19782) 2020-05-01 14:13:43 +01:00
Pablo Galindo b796b3fb48
bpo-40334: Simplify type handling in the PEG c_generator (GH-19818) 2020-05-01 12:32:26 +01:00
Lysandros Nikolaou 3e0a6f37df
bpo-40334: Add support for feature_version in new PEG parser (GH-19827)
`ast.parse` and `compile` support a `feature_version` parameter that
tells the parser to parse the input string, as if it were written in
an older Python version.
The `feature_version` is propagated to the tokenizer, which uses it
to handle the three different stages of support for `async` and
`await`. Additionally, it disallows the following at parser level:
- The '@' operator in < 3.5
- Async functions in < 3.5
- Async comprehensions in < 3.6
- Underscores in numeric literals in < 3.6
- Await expression in < 3.5
- Variable annotations in < 3.6
- Async for-loops in < 3.5
- Async with-statements in < 3.5
- F-strings in < 3.6

Closes we-like-parsers/cpython#124.
2020-04-30 20:27:52 -07:00
Guido van Rossum c001c09e90
bpo-40334: Support type comments (GH-19780)
This implements full support for # type: <type> comments, # type: ignore <stuff> comments, and the func_type parsing mode for ast.parse() and compile().

Closes https://github.com/we-like-parsers/cpython/issues/95.

(For now, you need to use the master branch of mypy, since another issue unique to 3.9 had to be fixed there, and there's no mypy release yet.)

The only thing missing is `feature_version=N`, which is being tracked in https://github.com/we-like-parsers/cpython/issues/124.
2020-04-30 12:12:19 -07:00
Pablo Galindo 4db245ee9d
bpo-40334: refactor and cleanup for the PEG generators (GH-19775) 2020-04-29 10:42:21 +01:00
Lysandros Nikolaou 6d65087655
bpo-40334: Disallow invalid single statements in the new parser (GH-19774)
After parsing is done in single statement mode, the tokenizer buffer has to be checked for additional lines and a `SyntaxError` must be raised, in case there are any.

Co-authored-by: Pablo Galindo <Pablogsal@gmail.com>
2020-04-29 02:42:27 +01:00
Pablo Galindo 2208134918
bpo-40334: Explicitly cast to int in pegen.c to fix a compiler warning (GH-19779) 2020-04-29 02:04:06 +01:00
Lysandros Nikolaou 37af21b667
bpo-40334: Fix shifting of nested f-strings in the new parser (GH-19771)
`JoinedStr`s and `FormattedValue also needs to be shifted, in order to correctly compute the location information of nested f-strings.
2020-04-29 01:43:50 +01:00
Lysandros Nikolaou d55133f49f
bpo-40334: Catch E_EOF error, when the tokenizer returns ERRORTOKEN (GH-19743)
An E_EOF error was only being caught after the parser exited before this commit. There are some cases though, where the tokenizer returns ERRORTOKEN *and* has set an E_EOF error (like when EOF directly follows a line continuation character) which weren't correctly handled before.
2020-04-28 01:23:35 +01:00
Pablo Galindo b94dbd7ac3
bpo-40334: Support PyPARSE_DONT_IMPLY_DEDENT in the new parser (GH-19736) 2020-04-27 18:35:58 +01:00
Pablo Galindo 2b74c835a7
bpo-40334: Support CO_FUTURE_BARRY_AS_BDFL in the new parser (GH-19721)
This commit also allows to pass flags to the new parser in all interfaces and fixes a bug in the parser generator that was causing to inline rules with actions, making them disappear.
2020-04-27 18:02:07 +01:00
Pablo Galindo 9f27dd3e16
Use Py_ssize_t instead of ssize_t (GH-19685) 2020-04-24 01:13:33 +01:00
Lysandros Nikolaou ebebb6429c
bpo-40334: Improve various PEG-Parser related stuff (GH-19669)
The changes in this commit are all related to @vstinner's original review comments of the initial PEP 617 implementation PR.
2020-04-23 16:36:06 +01:00
Pablo Galindo 1df5a9e88c
bpo-40334: Fix build errors and warnings in test_peg_generator (GH-19672) 2020-04-23 12:42:13 +01:00
Pablo Galindo ee40e4b856
bpo-40334: Don't downcast from Py_ssize_t to int (GH-19671) 2020-04-23 03:43:08 +01:00
Pablo Galindo 0b7829e089
Compile extensions in test_peg_generator with C99 (GH-19668) 2020-04-23 03:24:25 +01:00
Pablo Galindo 458004bf79
bpo-40334: Fix errors in parse_string.c with old compilers (GH-19666) 2020-04-23 00:13:47 +01:00
Pablo Galindo c5fc156852
bpo-40334: PEP 617 implementation: New PEG parser for CPython (GH-19503)
Co-authored-by: Guido van Rossum <guido@python.org>
Co-authored-by: Lysandros Nikolaou <lisandrosnik@gmail.com>
2020-04-22 23:29:27 +01:00
Pablo Galindo 11a7f158ef
bpo-40335: Correctly handle multi-line strings in tokenize error scenarios (GH-19619)
Co-authored-by: Guido van Rossum <gvanrossum@gmail.com>
2020-04-21 01:53:04 +01:00
Lysandros Nikolaou 9a4b38f66b
bpo-40267: Fix message when last input character produces a SyntaxError (GH-19521)
When there is a SyntaxError after reading the last input character from
the tokenizer and if no newline follows it, the error message used to be
`unexpected EOF while parsing`, which is wrong.
2020-04-15 11:22:10 -07:00
Victor Stinner 4a21e57fe5
bpo-40268: Remove unused structmember.h includes (GH-19530)
If only offsetof() is needed: include stddef.h instead.

When structmember.h is used, add a comment explaining that
PyMemberDef is used.
2020-04-15 02:35:41 +02:00
Victor Stinner 62183b8d6d
bpo-40268: Remove explicit pythread.h includes (#19529)
Remove explicit pythread.h includes: it is always included
by Python.h.
2020-04-15 02:04:42 +02:00
Victor Stinner e5014be049
bpo-40268: Remove a few pycore_pystate.h includes (GH-19510) 2020-04-14 17:52:15 +02:00
Victor Stinner 81a7be3fa2
bpo-40268: Rename _PyInterpreterState_GET_UNSAFE() (GH-19509)
Rename _PyInterpreterState_GET_UNSAFE() to _PyInterpreterState_GET()
for consistency with _PyThreadState_GET() and to have a shorter name
(help to fit into 80 columns).

Add also "assert(tstate != NULL);" to the function.
2020-04-14 15:14:01 +02:00
Victor Stinner 4a3fe08353
bpo-40268: Include explicitly pycore_interp.h (GH-19505)
pycore_pystate.h no longer includes pycore_interp.h:
it's now included explicitly in files accessing PyInterpreterState.
2020-04-14 14:26:24 +02:00
Lysandros Nikolaou 41d5b94af4
bpo-40246: Report a better error message for invalid string prefixes (GH-19476) 2020-04-12 19:21:00 +01:00
Pablo Galindo 168660b547
bpo-40141: Add line and column information to ast.keyword nodes (GH-19283) 2020-04-02 00:47:39 +01:00
Alexander Riccio 51e3e450fb
bpo-40020: Fix realloc leak on failure in growable_comment_array_add (GH-19083)
Fix a leak and subsequent crash in parsetok.c caused by realloc misuse on a rare codepath. 

Realloc returns a null pointer on failure, and then growable_comment_array_deallocate crashes later when it dereferences it.
2020-03-30 23:15:59 +02:00
Victor Stinner 87d3b9db4a
bpo-39882: Add _Py_FatalErrorFormat() function (GH-19157) 2020-03-25 19:27:36 +01:00