Emit a deprecation warning if the numeric literal is immediately followed by
one of keywords: and, else, for, if, in, is, or. Raise a syntax error with
more informative message if it is immediately followed by other keyword or
identifier.
Automerge-Triggered-By: GH:pablogsal
When compiling an AST object with a direct / indirect reference
cycles, on the conversion phase because of exceeding amount of
calls, a segfault was raised. This patch adds recursion guards to
places for preventing user inputs to not to crash AST but instead
raise a RecursionError.
When the parser does a second pass to check for errors, these rules can
have some small side-effects as they may advance the parser more than
the point reached in the first pass. This can cause the tokenizer to ask
for extra tokens in interactive mode causing the tokenizer to show the
prompt instead of failing instantly.
To avoid this, add a new mode to the tokenizer that is activated in the
second pass and deactivates asking for new tokens when the interactive
line is finished. As the parsing should have reached the last line in
the first pass, the second pass should not need to ask for more tokens.
The invalid assignment rules are very delicate since the parser can
easily raise an invalid assignment when a keyword argument is provided.
As they are very deep into the grammar tree, is very difficult to
specify in which contexts these rules can be used and in which don't.
For that, we need to use a different version of the rule that doesn't do
error checking in those situations where we don't want the rule to raise
(keyword arguments and generator expressions).
We also need to check if we are in left-recursive rule, as those can try
to eagerly advance the parser even if the parse will fail at the end of
the expression. Failing to do this allows the parser to start parsing a
call as a tuple and incorrectly identify a keyword argument as an
invalid assignment, before it realizes that it was not a tuple after all.
This works by not caching the handle and instead getting the handle from
the file descriptor each time, so that if the actual handle changes by
fd redirection closing/opening the console handle beneath our feet, we
will keep working correctly.
To improve the user experience understanding what part of the error messages associated with SyntaxErrors is wrong, we can highlight the whole error range and not only place the caret at the first character. In this way:
>>> foo(x, z for z in range(10), t, w)
File "<stdin>", line 1
foo(x, z for z in range(10), t, w)
^
SyntaxError: Generator expression must be parenthesized
becomes
>>> foo(x, z for z in range(10), t, w)
File "<stdin>", line 1
foo(x, z for z in range(10), t, w)
^^^^^^^^^^^^^^^^^^^^
SyntaxError: Generator expression must be parenthesized
* Add source location attributes to alias.
* Move alias star construction to pegen helper.
Co-authored-by: blurb-it[bot] <43283697+blurb-it[bot]@users.noreply.github.com>
Co-authored-by: Pablo Galindo <Pablogsal@gmail.com>
Rename AST functions of pycore_ast.h to use the "_PyAST_" prefix.
Remove macros creating aliases without prefix. For example, Module()
becomes _PyAST_Module(). Update Grammar/python.gram to use
_PyAST_xxx() functions.
* pycore_ast.h no longer defines the Yield macro.
* Fix a compiler warning on Windows: "warning C4005: 'Yield': macro
redefinition".
* Python-ast.c now defines directly functions with their real
_Py_xxx() name, rather than xxx().
* Remove "#undef Yield" in C files including pycore_ast.h.
Remove the pyarena.h header file with functions:
* PyArena_New()
* PyArena_Free()
* PyArena_Malloc()
* PyArena_AddPyObject()
These functions were undocumented, excluded from the limited C API,
and were only used internally by the compiler.
Add pycore_pyarena.h header. Rename functions:
* PyArena_New() => _PyArena_New()
* PyArena_Free() => _PyArena_Free()
* PyArena_Malloc() => _PyArena_Malloc()
* PyArena_AddPyObject() => _PyArena_AddPyObject()
Remove parser functions using the "struct _mod" type, because the
AST C API was removed:
* PyParser_ASTFromFile()
* PyParser_ASTFromFileObject()
* PyParser_ASTFromFilename()
* PyParser_ASTFromString()
* PyParser_ASTFromStringObject()
These functions were undocumented and excluded from the limited C
API.
Add pycore_parser.h internal header file. Rename functions:
* PyParser_ASTFromFileObject() => _PyParser_ASTFromFile()
* PyParser_ASTFromStringObject() => _PyParser_ASTFromString()
These functions are no longer exported (replace PyAPI_FUNC() with
extern).
Remove also _PyPegen_run_parser_from_file() function. Update
test_peg_generator to use _PyPegen_run_parser_from_file_pointer()
instead.
These functions were undocumented and excluded from the limited C
API.
Most names defined by these header files were not prefixed by "Py"
and so could create names conflicts. For example, Python-ast.h
defined a "Yield" macro which was conflict with the "Yield" name used
by the Windows <winbase.h> header.
Use the Python ast module instead.
* Move Include/asdl.h to Include/internal/pycore_asdl.h.
* Move Include/Python-ast.h to Include/internal/pycore_ast.h.
* Remove ast.h header file.
* pycore_symtable.h no longer includes Python-ast.h.
Remove the PyAST_Validate() function. It is no longer possible to
build a AST object (mod_ty type) with the public C API. The function
was already excluded from the limited C API (PEP 384).
Rename PyAST_Validate() function to _PyAST_Validate(), move it to the
internal C API, and don't export it anymore (replace PyAPI_FUNC with
extern).
The function was added in bpo-12575 by
the commit 832bfe2ebd.
test_peg_generator now defines _Py_TEST_PEGEN macro when building C
code to not call PyAST_Validate() in Parser/pegen.c. Moreover, it
defines Py_BUILD_CORE_MODULE macro to get access to the internal
C API.
Remove "global_ast_state" from Python-ast.c when it's built by
test_peg_generator: always get the AST state from the current interpreter.
Include/{odictobject.h,parser_interface.h,picklebufobject.h,pydebug.h,pyfpe.h}
into Include/cpython/.
Parser: peg_api: include Python.h instead of parser_interface.h.
* Add to the peg generator a new directive ('&&') that allows to expect
a token and hard fail the parsing if the token is not found. This
allows to quickly emmit syntax errors for missing tokens.
* Use the new grammar element to hard-fail if the ':' is missing before
suites.
When trying to extract the error line for the error message there
are two distinct cases:
1. The input comes from a file, which means that we can extract the
error line by using `PyErr_ProgramTextObject` and which we already
do.
2. The input does not come from a file, at which point we need to get
the source code from the tokenizer:
* If the tokenizer's current line number is the same with the line
of the error, we get the line from `tok->buf` and we're ready.
* Else, we can extract the error line from the source code in the
following two ways:
* If the input comes from a string we have all the input
in `tok->str` and we can extract the error line from it.
* If the input comes from stdin, i.e. the interactive prompt, we
do not have access to the previous line. That's why a new
field `tok->stdin_content` is added which holds the whole input for the
current (multiline) statement or expression. We can then extract the
error line from `tok->stdin_content` like we do in the string case above.
Co-authored-by: Pablo Galindo <Pablogsal@gmail.com>
This is only there so that alternative implementations written in statically-typed languages can use this grammar without
having type errors in the way.
Automerge-Triggered-By: GH:lysnikolaou
No longer use deprecated aliases to functions:
* Replace PyMem_MALLOC() with PyMem_Malloc()
* Replace PyMem_REALLOC() with PyMem_Realloc()
* Replace PyMem_FREE() with PyMem_Free()
* Replace PyMem_Del() with PyMem_Free()
* Replace PyMem_DEL() with PyMem_Free()
Modify also the PyMem_DEL() macro to use directly PyMem_Free().
Currently walruses are not allowerd in set literals and set comprehensions:
>>> {y := 4, 4**2, 3**3}
File "<stdin>", line 1
{y := 4, 4**2, 3**3}
^
SyntaxError: invalid syntax
but they should be allowed as well per PEP 572
Call _PyAST_Fini() on all interpreters, not only on the main
interpreter. Also, call it ealier to fix a reference leak.
Python types contain a reference to themselves in in their
PyTypeObject.tp_mro member. _PyAST_Fini() must called before the last
GC collection to destroy AST types.
_PyInterpreterState_Clear() now calls _PyAST_Fini(). It now also
calls _PyWarnings_Fini() on subinterpeters, not only on the main
interpreter.
Add an assertion in AST init_types() to ensure that the _ast module
is no longer used after _PyAST_Fini() has been called.
The ast module internal state is now per interpreter.
* Rename "astmodulestate" to "struct ast_state"
* Add pycore_ast.h internal header: the ast_state structure is now
declared in pycore_ast.h.
* Add PyInterpreterState.ast (struct ast_state)
* Remove get_ast_state()
* Rename get_global_ast_state() to get_ast_state()
* PyAST_obj2mod() now handles get_ast_state() failures
Left-recursive rules need to check for errors explicitly, since
even if the rule returns NULL, the parsing might continue and lead
to long-distance failures.
Co-authored-by: Pablo Galindo <Pablogsal@gmail.com>
* Implement running the parser a second time for the errors messages
The first parser run is only responsible for detecting whether
there is a `SyntaxError` or not. If there isn't the AST gets returned.
Otherwise, the parser is run a second time with all the `invalid_*`
rules enabled so that all the customized error messages get produced.
- Use the proper asdl sequence when creating empty arguments
- Remove reduntant casts (thanks to new typed asdl_sequences)
- Remove MarshalPrototypeVisitor and some utilities from asdl generator
- Fix the header of `Python/ast.c` (kept from pgen times)
Automerge-Triggered-By: @pablogsal
* Add new capability to the PEG parser to type variable assignments. For instance:
```
| a[asdl_stmt_seq*]=';'.small_stmt+ [';'] NEWLINE { a }
```
* Add new sequence types from the asdl definition (automatically generated)
* Make `asdl_seq` type a generic aliasing pointer type.
* Create a new `asdl_generic_seq` for the generic case using `void*`.
* The old `asdl_seq_GET`/`ast_seq_SET` macros now are typed.
* New `asdl_seq_GET_UNTYPED`/`ast_seq_SET_UNTYPED` macros for dealing with generic sequences.
* Changes all possible `asdl_seq` types to use specific versions everywhere.
Partially revert commit ac46eb4ad6662cf6d771b20d8963658b2186c48c:
"bpo-38113: Update the Python-ast.c generator to PEP384 (gh-15957)".
Using a module state per module instance is causing subtle practical
problems.
For example, the Mercurial project replaces the __import__() function
to implement lazy import, whereas Python expected that "import _ast"
always return a fully initialized _ast module.
Add _PyAST_Fini() to clear the state at exit.
The _ast module has no state (set _astmodule.m_size to 0). Remove
astmodule_traverse(), astmodule_clear() and astmodule_free()
functions.
This program can segfault the parser by stack overflow:
```
import ast
code = "f(" + ",".join(['a' for _ in range(100000)]) + ")"
print("Ready!")
ast.parse(code)
```
the reason is that the rule for arguments has a simple recursion when collecting args:
args[expr_ty]:
[...]
| a=named_expression b=[',' c=args { c }] {
[...] }
This consolidates the handling of my_fgets return values, so that interrupts are always handled, even if they come after EOF.
I believe PyOS_StdioReadline is still buggy in that I/O errors will not result in a proper Python exception being set. However, that is a separate issue.
This will improve the debug experience if something fails in the produced AST. Previously, errors in the produced AST can be felt much later like in the garbage collector or the compiler, making debugging them much more difficult.
GCC says
```
../cpython/Parser/string_parser.c: In function ‘fstring_find_expr’:
../cpython/Parser/string_parser.c:404:93: warning: ‘cols’ may be used uninitialized in this function [-Wmaybe-uninitialized]
404 | p2->starting_col_offset = p->tok->first_lineno == p->tok->lineno ? t->col_offset + cols : cols;
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~
../cpython/Parser/string_parser.c:384:16: note: ‘cols’ was declared here
384 | int lines, cols;
| ^~~~
../cpython/Parser/string_parser.c:403:45: warning: ‘lines’ may be used uninitialized in this function [-Wmaybe-uninitialized]
403 | p2->starting_lineno = t->lineno + lines - 1;
| ~~~~~~~~~~~~~~~~~~^~~
../cpython/Parser/string_parser.c:384:9: note: ‘lines’ was declared here
384 | int lines, cols;
| ^~~~~
```
and, indeed, if `PyBytes_AsString` somehow fails, lines & cols will not be initialized.
Fix a crash in the _ast module: it can no longer be loaded more than
once. It now uses a global state rather than a module state.
* Move _ast module state: use a global state instead.
* Set _astmodule.m_size to -1, so the extension cannot be loaded more
than once.
Rework asdl_c.py to pass the module state to functions in
Python-ast.c, instead of using astmodulestate_global.
Handle also PyState_AddModule() failure in init_types().
This commit changes the parsing of f-string expressions with the new parser. The parser gets pre-fed with the location of the expression itself (not the f-string, which was what we were doing before). This allows us to completely skip the shifting of the AST nodes after the parsing is completed.
`GET_INVALID_TARGET` might unexpectedly return `NULL`, which if not
caught will cause a SEGFAULT. Therefore, this commit introduces a new
inline function `RAISE_SYNTAX_ERROR_INVALID_TARGET` that always
checks for `GET_INVALID_TARGET` returning NULL and can be used in
the grammar, replacing the long C ternary operation used till now.
The following error messages get produced:
- `cannot delete ...` for invalid `del` targets
- `... is an illegal 'for' target` for invalid targets in for
statements
- `... is an illegal 'with' target` for invalid targets in
with statements
Additionally, a few `cut`s were added in various places before the
invocation of the `invalid_*` rule, in order to speed things
up.
Co-authored-by: Pablo Galindo <Pablogsal@gmail.com>
On Windows, #include "pyerrors.h" no longer defines "snprintf" and
"vsnprintf" macros.
PyOS_snprintf() and PyOS_vsnprintf() should be used to get portable
behavior.
Replace snprintf() calls with PyOS_snprintf() and replace vsnprintf()
calls with PyOS_vsnprintf().
This commit removes the old parser, the deprecated parser module, the old parser compatibility flags and environment variables and all associated support code and documentation.
It no longer serves a purpose (there's only one parser) and having "new" in any name will eventually look odd. Also, it impinges on a potential sub-namespace, `__new_...__`.
A line with only a line continuation character should be considered
a blank line at tokenizer level so that only a single NEWLINE token
gets emitted. The old parser was working around the issue, but the
new parser threw a `SyntaxError` for valid input. For example,
an empty line following a line continuation character was interpreted
as a `SyntaxError`.
Co-authored-by: Pablo Galindo <Pablogsal@gmail.com>
my_fgets() now calls _PyOS_InterruptOccurred(tstate) to check for
pending signals, rather calling PyOS_InterruptOccurred().
my_fgets() is called with the GIL released, whereas
PyOS_InterruptOccurred() must be called with the GIL held.
test_repl: use text=True and avoid SuppressCrashReport in
test_multiline_string_parsing().
Fix my_fgets() on Windows: fgets(fp) does crash if fileno(fp) is closed.
Fix GIL usage in PyOS_Readline(): lock the GIL to set an exception.
Pass tstate to my_fgets() and _PyOS_WindowsConsoleReadline(). Cleanup
these functions.
These are like keywords but they only work in context; they are not reserved except when there is an exact match.
This would enable things like match statements without reserving `match` (which would be bad for the `re.match()` function and probably lots of other places).
Automerge-Triggered-By: @gvanrossum
When a `SyntaxError` in the expression part of a fstring is found,
the filename attribute of the `SyntaxError` is always `<fstring>`.
With this commit, it gets changed to always have the name of the file
the fstring resides in.
Co-authored-by: Pablo Galindo <Pablogsal@gmail.com>
The error message, generated for a non-parenthesized generator expression
in function calls, was still the generic `invalid syntax`, when the generator expression wasn't appearing as the first argument in the call. With this patch, even on input like `f(a, b, c for c in d, e)`, the correct error message gets produced.
- Switch from getopt to argparse.
- Removed the limitation of not being able to produce both C and H simultaneously.
This will make it run faster since it parses the asdl definition once and uses the generated tree to generate both the header and the C source.
The following improvements are implemented in this commit:
- `p->error_indicator` is set, in case malloc or realloc fail.
- Avoid memory leaks in the case that realloc fails.
- Call `PyErr_NoMemory()` instead of `PyErr_Format()`, because it requires no memory.
Co-authored-by: Pablo Galindo <Pablogsal@gmail.com>
This commit fixes the new parser to disallow invalid targets in the
following scenarios:
- Augmented assignments must only accept a single target (Name,
Attribute or Subscript), but no tuples or lists.
- `except` clauses should only accept a single `Name` as a target.
Co-authored-by: Pablo Galindo <Pablogsal@gmail.com>
This commit fixes SyntaxError locations when the caret is not displayed,
by doing the following:
- `col_number` always gets set to the location of the offending
node/expr. When no caret is to be displayed, this gets achieved
by setting the object holding the error line to None.
- Introduce a new function `_PyPegen_raise_error_known_location`,
which can be called, when an arbitrary `lineno`/`col_offset`
needs to be passed. This function then gets used in the grammar
(through some new macros and inline functions) so that SyntaxError
locations of the new parser match that of the old.
With the new parser, the error message contains always the trailing
newlines, causing the comparison of the repr of the error messages
in codeop to fail. This commit makes the new parser mirror the old parser's
behaviour regarding trailing newlines.
This is for the C generator:
- Disallow rule and variable names starting with `_`
- Rename most local variable names generated by the parser to start with `_`
Exceptions:
- Renaming `p` to `_p` will be a separate PR
- There are still some names that might clash, e.g.
- anything starting with `Py`
- C reserved words (`if` etc.)
- Macros like `EXTRA` and `CHECK`
When parsing something like `f(g()=2)`, where the name of a default arg
is not a NAME, but an arbitrary expression, a specialised error message
is emitted.
When parsing a string with an invalid escape, the old parser used to
point to the beginning of the invalid string. This commit changes the new
parser to match that behaviour, since it's currently pointing to the
end of the string (or to be more precise, to the beginning of the next
token).
Due to backwards compatibility concerns regarding keywords immediately followed by a string without whitespace between them (like in `bg="#d00" if clear else"#fca"`) will fail to parse,
commit 41d5b94af4 has to be reverted.
When parsing things like `def f(*): pass` the old parser used to output `SyntaxError: named arguments must follow bare *`, which the new parser wasn't able to do.
Due to PyErr_Occurred not being called at the beginning of each rule, we need to set the error indicator, so that rules do not get expanded after an exception has been thrown
This commit makes both APIs more consistent by doing the following:
- Remove the `PyPegen_CodeObjectFrom*` functions, which weren't used
and will probably not be needed. Functions like `Py_CompileStringObject`
can be used instead.
- Include a `const char *filename` parameter in `PyPegen_ASTFromString`.
- Rename `PyPegen_ASTFromFile` to `PyPegen_ASTFromFilename`, because
its signature is not the same with `PyParser_ASTFromFile`.
`ast.parse` and `compile` support a `feature_version` parameter that
tells the parser to parse the input string, as if it were written in
an older Python version.
The `feature_version` is propagated to the tokenizer, which uses it
to handle the three different stages of support for `async` and
`await`. Additionally, it disallows the following at parser level:
- The '@' operator in < 3.5
- Async functions in < 3.5
- Async comprehensions in < 3.6
- Underscores in numeric literals in < 3.6
- Await expression in < 3.5
- Variable annotations in < 3.6
- Async for-loops in < 3.5
- Async with-statements in < 3.5
- F-strings in < 3.6
Closeswe-like-parsers/cpython#124.
This implements full support for # type: <type> comments, # type: ignore <stuff> comments, and the func_type parsing mode for ast.parse() and compile().
Closes https://github.com/we-like-parsers/cpython/issues/95.
(For now, you need to use the master branch of mypy, since another issue unique to 3.9 had to be fixed there, and there's no mypy release yet.)
The only thing missing is `feature_version=N`, which is being tracked in https://github.com/we-like-parsers/cpython/issues/124.
After parsing is done in single statement mode, the tokenizer buffer has to be checked for additional lines and a `SyntaxError` must be raised, in case there are any.
Co-authored-by: Pablo Galindo <Pablogsal@gmail.com>
An E_EOF error was only being caught after the parser exited before this commit. There are some cases though, where the tokenizer returns ERRORTOKEN *and* has set an E_EOF error (like when EOF directly follows a line continuation character) which weren't correctly handled before.
This commit also allows to pass flags to the new parser in all interfaces and fixes a bug in the parser generator that was causing to inline rules with actions, making them disappear.
When there is a SyntaxError after reading the last input character from
the tokenizer and if no newline follows it, the error message used to be
`unexpected EOF while parsing`, which is wrong.
Rename _PyInterpreterState_GET_UNSAFE() to _PyInterpreterState_GET()
for consistency with _PyThreadState_GET() and to have a shorter name
(help to fit into 80 columns).
Add also "assert(tstate != NULL);" to the function.
Fix a leak and subsequent crash in parsetok.c caused by realloc misuse on a rare codepath.
Realloc returns a null pointer on failure, and then growable_comment_array_deallocate crashes later when it dereferences it.
* Re-add removed classes Suite, slice, Param, AugLoad and AugStore.
* Add docstrings for dummy classes.
* Add docstrings for attribute aliases.
* Set __module__ to "ast" instead of "_ast".
* Remove the slice type.
* Make Slice a kind of the expr type instead of the slice type.
* Replace ExtSlice(slices) with Tuple(slices, Load()).
* Replace Index(value) with a value itself.
All non-terminal nodes in AST for expressions are now of the expr type.
The Py_FatalError() function is replaced with a macro which logs
automatically the name of the current function, unless the
Py_LIMITED_API macro is defined.
Changes:
* Add _Py_FatalErrorFunc() function.
* Remove the function name from the message of Py_FatalError() calls
which included the function name.
* Update tests.
The AST "Suite" node is no longer used and it can be removed from the ASDL definition and related structures (compiler, visitors, ...).
Co-Authored-By: Victor Stinner <vstinner@python.org>
Co-authored-by: Brett Cannon <54418+brettcannon@users.noreply.github.com>
Co-authored-by: Pablo Galindo <Pablogsal@gmail.com>
The function PyTokenizer_FromUTF8 from Parser/tokenizer.c had a comment:
/* XXX: constify members. */
This patch addresses that.
In the tok_state struct:
* end and start were non-const but could be made const
* str and input were const but should have been non-const
Changes to support this include:
* decode_str() now returns a char * since it is allocated.
* PyTokenizer_FromString() and PyTokenizer_FromUTF8() each creates a
new char * for an allocate string instead of reusing the input
const char *.
* PyTokenizer_Get() and tok_get() now take const char ** arguments.
* Various local vars are const or non-const accordingly.
I was able to remove five casts that cast away constness.
Summary: This mostly migrates Python-ast.c to PEP384 and removes all statics from the whole file. This modifies the generator itself that generates the Python-ast.c. It leaves in the usage of _PyObject_LookupAttr even though it's not fully PEP384 compatible (this could always be shimmed in by anyone who needs it).
This is the converse of GH-15353 -- in addition to plenty of
scripts in the tree that are marked with the executable bit
(and so can be directly executed), there are a few that have
a leading `#!` which could let them be executed, but it doesn't
do anything because they don't have the executable bit set.
Here's a command which finds such files and marks them. The
first line finds files in the tree with a `#!` line *anywhere*;
the next-to-last step checks that the *first* line is actually of
that form. In between we filter out files that already have the
bit set, and some files that are meant as fragments to be
consumed by one or another kind of preprocessor.
$ git grep -l '^#!' \
| grep -vxFf <( \
git ls-files --stage \
| perl -lane 'print $F[3] if (!/^100644/)' \
) \
| grep -ve '\.in$' -e '^Doc/includes/' \
| while read f; do
head -c2 "$f" | grep -qxF '#!' \
&& chmod a+x "$f"; \
done
* Refactor Parser/pgen and add documentation and explanations
To improve the readability and maintainability of the parser
generator perform the following transformations:
* Separate the metagrammar parser in its own class to simplify
the parser generator logic.
* Create separate classes for DFAs and NFAs and move methods that
act exclusively on them from the parser generator to these
classes.
* Add docstrings and comment documenting the process to go from
the grammar file into NFAs and then DFAs. Detail some of the
algorithms and give some background explanations of some concepts
that will helps readers not familiar with the parser generation
process.
* Select more descriptive names for some variables and variables.
* PEP8 formatting and quote-style homogenization.
The output of the parser generator remains the same (Include/graminit.h
and Python/graminit.c remain untouched by running the new parser generator).
When using the "=" debug functionality of f-strings, use another Constant node (or a merged constant node) instead of adding expr_text to the FormattedValue node.
This disallows things like `# type: ignoreé`, which seems wrong.
Also switch to using Py_ISALNUM for the alnum check, for consistency
with other code (and maybe correctness re: locale issues?).
https://bugs.python.org/issue36878
GH-13238 made extra text after a # type: ignore accepted by the parser.
This finishes the job and actually plumbs the extra text through the
parser and makes it available in the AST.
This makes the parser consistent with the tokenize module (already the case
in `pypy`).
sample
------
```python
x = 5\
```
before
------
```console
$ python3 t.py
$ python3 -mtokenize t.py
t.py:2:0: error: EOF in multi-line statement
```
after
-----
```console
$ ./python t.py
File "t.py", line 3
x = 5\
^
SyntaxError: unexpected EOF while parsing
$ ./python -m tokenize t.py
t.py:2:0: error: EOF in multi-line statement
```
https://bugs.python.org/issue2180
In the parser, when using the type_comments=True option, recognize
a TYPE_IGNORE as anything containing `# type: ignore` followed by
a non-alphanumeric character. This is to allow ignores such as
`# type: ignore[E1000]`.
If a "=" is specified a the end of an f-string expression, the f-string will evaluate to the text of the expression, followed by '=', followed by the repr of the value of the expression.
This commit contains the implementation of PEP570: Python positional-only parameters.
* Update Grammar/Grammar with new typedarglist and varargslist
* Regenerate grammar files
* Update and regenerate AST related files
* Update code object
* Update marshal.c
* Update compiler and symtable
* Regenerate importlib files
* Update callable objects
* Implement positional-only args logic in ceval.c
* Regenerate frozen data
* Update standard library to account for positional-only args
* Add test file for positional-only args
* Update other test files to account for positional-only args
* Add News entry
* Update inspect module and related tests
Now that the parser generator is written in Python (Parser/pgen) we can make use of it to regenerate the Lib/keyword file that contains the language keywords instead of parsing the autogenerated grammar files. This also allows checking in the CI that the autogenerated files are up to date.
Currently, when arguments on Parser/asdl_c.py are parsed
``ìf`` sentence is used. This PR Propose to use ``elif``
to avoid multiple evaluting of the ifs.
https://bugs.python.org/issue36385
The value is a string for string and byte literals, None otherwise.
It is 'u' for u"..." literals, 'b' for b"..." literals, '' for "..." literals.
The 'r' (raw) prefix is ignored.
Does not apply to f-strings.
This appears sufficient to make mypy capable of using the stdlib ast module instead of typed_ast (assuming a mypy patch I'm working on).
WIP: I need to make the tests pass. @ilevkivskyi @serhiy-storchaka
https://bugs.python.org/issue36280
d_initial, the first state of a particular DFA in the parser has always been initialized to 0 in the old pgen as well as the new pgen. As this value is not used and the first state of each DFA is assumed to be the first element in the array representing it, remove d_initial from the parser to reduce complexity.
This adds a `feature_version` flag to `ast.parse()` (documented) and `compile()` (hidden) that allow tweaking the parser to support older versions of the grammar. In particular if `feature_version` is 5 or 6, the hacks for the `async` and `await` keyword from PEP 492 are reinstated. (For 7 or higher, these are unconditionally treated as keywords, but they are still special tokens rather than `NAME` tokens that the parser driver recognizes.)
https://bugs.python.org/issue35975
Pgen is the oldest piece of technology in the CPython repository, building it requires various #if[n]def PGEN hacks in other parts of the code and it also depends more and more on CPython internals. This commit removes the old pgen C code and replaces it for a new version implemented in pure Python. This is a modified and adapted version of lib2to3/pgen2 that can generate grammar files compatibles with the current parser.
This commit also eliminates all the #ifdef and code branches related to pgen, simplifying the code and making it more maintainable. The regen-grammar step now uses $(PYTHON_FOR_REGEN) that can be any version of the interpreter, so the new pgen code maintains compatibility with older versions of the interpreter (this also allows regenerating the grammar with the current CI solution that uses Python3.5). The new pgen Python module also makes use of the Grammar/Tokens file that holds the token specification, so is always kept in sync and avoids having to maintain duplicate token definitions.
* Add tokenization of :=
- Add token to Include/token.h. Add token to documentation in Doc/library/token.rst.
- Run `./python Lib/token.py` to regenerate Lib/token.py.
- Update Parser/tokenizer.c: add case to handle `:=`.
* Add initial usage of := in grammar.
* Update Python.asdl to match the grammar updates. Regenerated Include/Python-ast.h and Python/Python-ast.c
* Update AST and compiler files in Python/ast.c and Python/compile.c. Basic functionality, this isn't scoped properly
* Regenerate Lib/symbol.py using `./python Lib/symbol.py`
* Tests - Fix failing tests in test_parser.py due to changes in token numbers for internal representation
* Tests - Add simple test for := token
* Tests - Add simple tests for named expressions using expr and suite
* Tests - Update number of levels for nested expressions to prevent stack overflow
* Update symbol table to handle NamedExpr
* Update Grammar to allow assignment expressions in if statements.
Regenerate Python/graminit.c accordingly using `make regen-grammar`
* Tests - Add additional tests for named expressions in RoundtripLegalSyntaxTestCase, based on examples and information directly from PEP 572
Note: failing tests are currently commented out (4 out of 24 tests currently fail)
* Tests - Add temporary syntax test failure tests in test_parser.py
Note: There is an outstanding TODO for this -- syntax tests need to be
moved to a different file (presumably test_syntax.py), but this is
covering what needs to be tested at the moment, and it's more convenient
to run a single test for the time being
* Add support for allowing assignment expressions as function argument annotations. Uncomment tests for these cases because they all pass now!
* Tests - Move existing syntax tests out of test_parser.py and into test_named_expressions.py. Refactor syntax tests to use unittest
* Add TargetScopeError exception to extend SyntaxError
Note: This simply creates the TargetScopeError exception, it is not yet
used anywhere
* Tests - Update tests per PEP 572
Continue refactoring test suite:
The named expression test suite now checks for any invalid cases that
throw exceptions (no longer limited to SyntaxErrors), assignment tests
to ensure that variables are properly assigned, and scope tests to
ensure that variable availability and values are correct
Note:
- There are still tests that are marked to skip, as they are not yet
implemented
- There are approximately 300 lines of the PEP that have not yet been
addressed, though these may be deferred
* Documentation - Small updates to XXX/todo comments
- Remove XXX from child description in ast.c
- Add comment with number of previously supported nested expressions for
3.7.X in test_parser.py
* Fix assert in seq_for_testlist()
* Cleanup - Denote "Not implemented -- No keyword args" on failing test case. Fix PEP8 error for blank lines at beginning of test classes in test_parser.py
* Tests - Wrap all file opens in `with...as` to ensure files are closed
* WIP: handle f(a := 1)
* Tests and Cleanup - No longer skips keyword arg test. Keyword arg test now uses a simpler test case and does not rely on an external file. Remove print statements from ast.c
* Tests - Refactor last remaining test case that relied on on external file to use a simpler test case without the dependency
* Tests - Add better description of remaning skipped tests. Add test checking scope when using assignment expression in a function argument
* Tests - Add test for nested comprehension, testing value and scope. Fix variable name in skipped comprehension scope test
* Handle restriction of LHS for named expressions - can only assign to LHS of type NAME. Specifically, restrict assignment to tuples
This adds an alternative set_context specifically for named expressions,
set_namedexpr_context. Thus, context is now set differently for standard
assignment versus assignment for named expressions in order to handle
restrictions.
* Tests - Update negative test case for assigning to lambda to match new error message. Add negative test case for assigning to tuple
* Tests - Reorder test cases to group invalid syntax cases and named assignment target errors
* Tests - Update test case for named expression in function argument - check that result and variable are set correctly
* Todo - Add todo for TargetScopeError based on Guido's comment (2b3acd37bd (r30472562))
* Tests - Add named expression tests for assignment operator in function arguments
Note: One of two tests are skipped, as function arguments are currently treating
an assignment expression inside of parenthesis as one child, which does
not properly catch the named expression, nor does it count arguments
properly
* Add NamedStore to expr_context. Regenerate related code with `make regen-ast`
* Add usage of NamedStore to ast_for_named_expr in ast.c. Update occurances of checking for Store to also handle NamedStore where appropriate
* Add ste_comprehension to _symtable_entry to track if the namespace is a comprehension. Initialize ste_comprehension to 0. Set set_comprehension to 1 in symtable_handle_comprehension
* s/symtable_add_def/symtable_add_def_helper. Add symtable_add_def to handle grabbing st->st_cur and passing it to symtable_add_def_helper. This now allows us to call the original code from symtable_add_def by instead calling symtable_add_def_helper with a different ste.
* Refactor symtable_record_directive to take lineno and col_offset as arguments instead of stmt_ty. This allows symtable_record_directive to be used for stmt_ty and expr_ty
* Handle elevating scope for named expressions in comprehensions.
* Handle error for usage of named expression inside a class block
* Tests - No longer skip scope tests. Add additional scope tests
* Cleanup - Update error message for named expression within a comprehension within a class. Update comments. Add assert for symtable_extend_namedexpr_scope to validate that we always find at least a ModuleScope if we don't find a Class or FunctionScope
* Cleanup - Add missing case for NamedStore in expr_context_name. Remove unused var in set_namedexpr_content
* Refactor - Consolidate set_context and set_namedexpr_context to reduce duplicated code. Special cases for named expressions are handled by checking if ctx is NamedStore
* Cleanup - Add additional use cases for ast_for_namedexpr in usage comment. Fix multiple blank lines in test_named_expressions
* Tests - Remove unnecessary test case. Renumber test case function names
* Remove TargetScopeError for now. Will add back if needed
* Cleanup - Small comment nit for consistency
* Handle positional argument check with named expression
* Add TargetScopeError exception definition. Add documentation for TargetScopeError in c-api docs. Throw TargetScopeError instead of SyntaxError when using a named expression in a comprehension within a class scope
* Increase stack size for parser by 200. This is a minimal change (approx. 5kb) and should not have an impact on any systems. Update parser test to allow 99 nested levels again
* Add TargetScopeError to exception_hierarchy.txt for test_baseexception.py_
* Tests - Major update for named expression tests, both in test_named_expressions and test_parser
- Add test for TargetScopeError
- Add tests for named expressions in comprehension scope and edge cases
- Add tests for named expressions in function arguments (declarations
and call sites)
- Reorganize tests to group them more logically
* Cleanup - Remove unnecessary comment
* Cleanup - Comment nitpicks
* Explicitly disallow assignment expressions to a name inside parentheses, e.g.: ((x) := 0)
- Add check for LHS types to detect a parenthesis then a name (see note)
- Add test for this scenario
- Update tests for changed error message for named assignment to a tuple
(also, see note)
Note: This caused issues with the previous error handling for named assignment
to a LHS that contained an expression, such as a tuple. Thus, the check
for the LHS of a named expression must be changed to be more specific if
we wish to maintain the previous error messages
* Cleanup - Wrap lines more strictly in test file
* Revert "Explicitly disallow assignment expressions to a name inside parentheses, e.g.: ((x) := 0)"
This reverts commit f1531400ca7d7a2d148830c8ac703f041740896d.
* Add NEWS.d entry
* Tests - Fix error in test_pickle.test_exceptions by adding TargetScopeError to list of exceptions
* Tests - Update error message tests to reflect improved messaging convention (s/can't/cannot)
* Remove cases that cannot be reached in compile.c. Small linting update.
* Update Grammar/Tokens to add COLONEQUAL. Regenerate all files
* Update TargetScopeError PRE_INIT and POST_INIT, as this was purposefully left out when fixing rebase conflicts
* Add NamedStore back and regenerate files
* Pass along line number and end col info for named expression
* Simplify News entry
* Fix compiler warning and explicity mark fallthrough
The majority of this PR is tediously passing `end_lineno` and `end_col_offset` everywhere. Here are non-trivial points:
* It is not possible to reconstruct end positions in AST "on the fly", some information is lost after an AST node is constructed, so we need two more attributes for every AST node `end_lineno` and `end_col_offset`.
* I add end position information to both CST and AST. Although it may be technically possible to avoid adding end positions to CST, the code becomes more cumbersome and less efficient.
* Since the end position is not known for non-leaf CST nodes while the next token is added, this requires a bit of extra care (see `_PyNode_FinalizeEndPos`). Unless I made some mistake, the algorithm should be linear.
* For statements, I "trim" the end position of suites to not include the terminal newlines and dedent (this seems to be what people would expect), for example in
```python
class C:
pass
pass
```
the end line and end column for the class definition is (2, 8).
* For `end_col_offset` I use the common Python convention for indexing, for example for `pass` the `end_col_offset` is 4 (not 3), so that `[0:4]` gives one the source code that corresponds to the node.
* I added a helper function `ast.get_source_segment()`, to get source text segment corresponding to a given AST node. It is also useful for testing.
An (inevitable) downside of this PR is that AST now takes almost 25% more memory. I think however it is probably justified by the benefits.
"Include/token.h", "Lib/token.py" (containing now some data moved from
"Lib/tokenize.py") and new files "Parser/token.c" (containing the code
moved from "Parser/tokenizer.c") and "Doc/library/token-list.inc" (included
in "Doc/library/token.rst") are now generated from "Grammar/Tokens" by
"Tools/scripts/generate_token.py". The script overwrites files only if
needed and can be used on the read-only sources tree.
"Lib/symbol.py" is now generated by "Tools/scripts/generate_symbol_py.py"
instead of been executable itself.
Added new make targets "regen-token" and "regen-symbol" which are now
dependencies of "regen-all".
The documentation contains now strings for operators and punctuation tokens.
* ast.h now includes Python-ast.h and node.h
* parsetok.h now includes node.h and grammar.h
* symtable.h now includes Python-ast.h
* Modify asdl_c.py to enhance Python-ast.h:
* Add #ifndef/#define Py_PYTHON_AST_H to be able to include the header
twice
* Add "extern { ... }" for C++
* Undefine "Yield" macro conflicting with winbase.h
* Remove "#undef Yield" from C files, it's now done in Python-ast.h
* Remove now useless includes in C files
If Py_BUILD_CORE is defined, the PyThreadState_GET() macro access
_PyRuntime which comes from the internal pycore_state.h header.
Public headers must not require internal headers.
Move PyThreadState_GET() and _PyInterpreterState_GET_UNSAFE() from
Include/pystate.h to Include/internal/pycore_state.h, and rename
PyThreadState_GET() to _PyThreadState_GET() there.
The PyThreadState_GET() macro of pystate.h is now redefined when
pycore_state.h is included, to use the fast _PyThreadState_GET().
Changes:
* Add _PyThreadState_GET() macro
* Replace "PyThreadState_GET()->interp" with
_PyInterpreterState_GET_UNSAFE()
* Replace PyThreadState_GET() with _PyThreadState_GET() in internal C
files (compiled with Py_BUILD_CORE defined), but keep
PyThreadState_GET() in the public header files.
* _testcapimodule.c: replace PyThreadState_GET() with
PyThreadState_Get(); the module is not compiled with Py_BUILD_CORE
defined.
* pycore_state.h now requires Py_BUILD_CORE to be defined.
Add more fields to _PyCoreConfig:
* _check_hash_pycs_mode
* bytes_warning
* debug
* inspect
* interactive
* legacy_windows_fs_encoding
* legacy_windows_stdio
* optimization_level
* quiet
* unbuffered_stdio
* user_site_directory
* verbose
* write_bytecode
Changes:
* Remove pymain_get_global_config() and pymain_set_global_config()
which became useless. These functions have been replaced by
_PyCoreConfig_GetGlobalConfig() and
_PyCoreConfig_SetGlobalConfig().
* sys.flags.dont_write_bytecode value is now restricted to 1 even if
-B option is specified multiple times on the command line.
* PyThreadState_Clear() now uses the config from the current
interpreter rather than using global Py_VerboseFlag
Remove the docstring attribute of AST types and restore docstring
expression as a first stmt in their body.
Co-authored-by: INADA Naoki <methane@users.noreply.github.com>
bpo-32096, bpo-30860: Partially revert the commit
2ebc5ce42a8a9e047e790aefbf9a94811569b2b6:
* Move structures back from Include/internal/mem.h to
Objects/obmalloc.c
* Remove _PyObject_Initialize() and _PyMem_Initialize()
* Remove Include/internal/pymalloc.h
* Add test_capi.test_pre_initialization_api():
Make sure that it's possible to call Py_DecodeLocale(), and then call
Py_SetProgramName() with the decoded string, before Py_Initialize().
PyMem_RawMalloc() and Py_DecodeLocale() can be called again before
_PyRuntimeState_Init().
Co-Authored-By: Eric Snow <ericsnowcurrently@gmail.com>
Remove the following fields from tok_state structure which are now
used unused:
* altwarning: "Issue warning if alternate tabs don't match"
* alterror: "Issue error if alternate tabs don't match"
* alttabsize: "Alternate tab spacing"
Replace alttabsize variable with ALTTABSIZE define.
* Don't use "Python runtime" anymore to parse command line options or
to get environment variables: pymain_init() is now a strict
separation.
* Use an error message rather than "crashing" directly with
Py_FatalError(). Limit the number of calls to Py_FatalError(). It
prepares the code to handle errors more nicely later.
* Warnings options (-W, PYTHONWARNINGS) and "XOptions" (-X) are now
only added to the sys module once Python core is properly
initialized.
* _PyMain is now the well identified owner of some important strings
like: warnings options, XOptions, and the "program name". The
program name string is now properly freed at exit.
pymain_free() is now responsible to free the "command" string.
* Rename most methods in Modules/main.c to use a "pymain_" prefix to
avoid conflits and ease debug.
* Replace _Py_CommandLineDetails_INIT with memset(0)
* Reorder a lot of code to fix the initialization ordering. For
example, initializing standard streams now comes before parsing
PYTHONWARNINGS.
* Py_Main() now handles errors when adding warnings options and
XOptions.
* Add _PyMem_GetDefaultRawAllocator() private function.
* Cleanup _PyMem_Initialize(): remove useless global constants: move
them into _PyMem_Initialize().
* Call _PyRuntime_Initialize() as soon as possible:
_PyRuntime_Initialize() now returns an error message on failure.
* Add _PyInitError structure and following macros:
* _Py_INIT_OK()
* _Py_INIT_ERR(msg)
* _Py_INIT_USER_ERR(msg): "user" error, don't abort() in that case
* _Py_INIT_FAILED(err)