This disallows things like `# type: ignoreé`, which seems wrong.
Also switch to using Py_ISALNUM for the alnum check, for consistency
with other code (and maybe correctness re: locale issues?).
https://bugs.python.org/issue36878
GH-13238 made extra text after a # type: ignore accepted by the parser.
This finishes the job and actually plumbs the extra text through the
parser and makes it available in the AST.
This makes the parser consistent with the tokenize module (already the case
in `pypy`).
sample
------
```python
x = 5\
```
before
------
```console
$ python3 t.py
$ python3 -mtokenize t.py
t.py:2:0: error: EOF in multi-line statement
```
after
-----
```console
$ ./python t.py
File "t.py", line 3
x = 5\
^
SyntaxError: unexpected EOF while parsing
$ ./python -m tokenize t.py
t.py:2:0: error: EOF in multi-line statement
```
https://bugs.python.org/issue2180
In the parser, when using the type_comments=True option, recognize
a TYPE_IGNORE as anything containing `# type: ignore` followed by
a non-alphanumeric character. This is to allow ignores such as
`# type: ignore[E1000]`.
This adds a `feature_version` flag to `ast.parse()` (documented) and `compile()` (hidden) that allow tweaking the parser to support older versions of the grammar. In particular if `feature_version` is 5 or 6, the hacks for the `async` and `await` keyword from PEP 492 are reinstated. (For 7 or higher, these are unconditionally treated as keywords, but they are still special tokens rather than `NAME` tokens that the parser driver recognizes.)
https://bugs.python.org/issue35975
Pgen is the oldest piece of technology in the CPython repository, building it requires various #if[n]def PGEN hacks in other parts of the code and it also depends more and more on CPython internals. This commit removes the old pgen C code and replaces it for a new version implemented in pure Python. This is a modified and adapted version of lib2to3/pgen2 that can generate grammar files compatibles with the current parser.
This commit also eliminates all the #ifdef and code branches related to pgen, simplifying the code and making it more maintainable. The regen-grammar step now uses $(PYTHON_FOR_REGEN) that can be any version of the interpreter, so the new pgen code maintains compatibility with older versions of the interpreter (this also allows regenerating the grammar with the current CI solution that uses Python3.5). The new pgen Python module also makes use of the Grammar/Tokens file that holds the token specification, so is always kept in sync and avoids having to maintain duplicate token definitions.
"Include/token.h", "Lib/token.py" (containing now some data moved from
"Lib/tokenize.py") and new files "Parser/token.c" (containing the code
moved from "Parser/tokenizer.c") and "Doc/library/token-list.inc" (included
in "Doc/library/token.rst") are now generated from "Grammar/Tokens" by
"Tools/scripts/generate_token.py". The script overwrites files only if
needed and can be used on the read-only sources tree.
"Lib/symbol.py" is now generated by "Tools/scripts/generate_symbol_py.py"
instead of been executable itself.
Added new make targets "regen-token" and "regen-symbol" which are now
dependencies of "regen-all".
The documentation contains now strings for operators and punctuation tokens.
Remove the following fields from tok_state structure which are now
used unused:
* altwarning: "Issue warning if alternate tabs don't match"
* alterror: "Issue error if alternate tabs don't match"
* alttabsize: "Alternate tab spacing"
Replace alttabsize variable with ALTTABSIZE define.
* add test to check if were modifying token
* copy list so import tokenize doesnt have side effects on token
* shorten line
* add tokenize tokens to token.h to get them to show up in token
* move ERRORTOKEN back to its previous location, and fix nitpick
* copy comments from token.h automatically
* fix whitespace and make more pythonic
* change to fix comments from @haypo
* update token.rst and Misc/NEWS
* change wording
* some more wording changes
This commit simplifies async/await tokenization in tokenizer.c,
tokenize.py & lib2to3/tokenize.py. Previous solution was to keep
a stack of async-def & def blocks, whereas the new approach is just
to remember position of the outermost async-def block.
This change won't bring any parsing performance improvements, but
it makes the code much easier to read and validate.
This commit fixes how one-line async-defs and defs are tracked
by tokenizer. It allows to correctly parse invalid code such
as:
>>> async def f():
... def g(): pass
... async = 10
and valid code such as:
>>> async def f():
... async def g(): pass
... await z
As a consequence, is is now possible to have one-line
'async def foo(): await ..' functions:
>>> async def foo(): return await bar()
* The first line of Python script could be executed twice when the source
encoding (not equal to 'utf-8') was specified on the second line.
* Now the source encoding declaration on the second line isn't effective if
the first line contains anything except a comment.
* As a consequence, 'python -x' works now again with files with the source
encoding declarations specified on the second file, and can be used again
to make Python batch files on Windows.
* The tokenize module now ignore the source encoding declaration on the second
line if the first line contains anything except a comment.
* IDLE now ignores the source encoding declaration on the second line if the
first line contains anything except a comment.
* 2to3 and the findnocoding.py script now ignore the source encoding
declaration on the second line if the first line contains anything except
a comment.
* The first line of Python script could be executed twice when the source
encoding (not equal to 'utf-8') was specified on the second line.
* Now the source encoding declaration on the second line isn't effective if
the first line contains anything except a comment.
* As a consequence, 'python -x' works now again with files with the source
encoding declarations specified on the second file, and can be used again
to make Python batch files on Windows.
* The tokenize module now ignore the source encoding declaration on the second
line if the first line contains anything except a comment.
* IDLE now ignores the source encoding declaration on the second line if the
first line contains anything except a comment.
* 2to3 and the findnocoding.py script now ignore the source encoding
declaration on the second line if the first line contains anything except
a comment.
On Windows, set the binary mode on stdin, stdout, stderr and all
io.FileIO objects (to not translate newlines, \r\n <=> \n). The Python parser
translates newlines (\r\n => \n).
svn+ssh://pythondev@svn.python.org/python/trunk
........
r79723 | benjamin.peterson | 2010-04-03 17:48:51 -0500 (Sat, 03 Apr 2010) | 1 line
ensure that the locale does not affect the tokenization of identifiers
........
svn+ssh://pythondev@svn.python.org/python/trunk
........
r78826 | victor.stinner | 2010-03-10 23:30:19 +0100 (mer., 10 mars 2010) | 5 lines
Issue #3137: Don't ignore errors at startup, especially a keyboard interrupt
(SIGINT). If an error occurs while importing the site module, the error is
printed and Python exits. Initialize the GIL before importing the site
module.
........
svn+ssh://pythondev@svn.python.org/python/trunk
........
r78603 | victor.stinner | 2010-03-03 00:20:02 +0100 (mer., 03 mars 2010) | 5 lines
Issue #7820: The parser tokenizer restores all bytes in the right if the BOM
check fails.
Fix an assertion in pydebug mode.
........
svn+ssh://pythondev@svn.python.org/python/trunk
........
r76052 | gregory.p.smith | 2009-11-01 20:02:38 -0600 (Sun, 01 Nov 2009) | 5 lines
see issue1006238, this merges in the following patch to ease cross
compiling the printf %zd check.
http://sources.gentoo.org/viewcvs.py/gentoo-x86/dev-lang/python/files/python-2.5-cross-printf.patch?rev=1.1&view=markup
........
r76522 | barry.warsaw | 2009-11-25 12:38:32 -0600 (Wed, 25 Nov 2009) | 2 lines
Add mktime_tz to __all__. It's documented as being available in email.utils.
........
r76591 | benjamin.peterson | 2009-11-29 16:26:26 -0600 (Sun, 29 Nov 2009) | 4 lines
now that deepcopy can handle instance methods, this hack can be removed #7409
Thanks Robert Collins
........
r76689 | benjamin.peterson | 2009-12-06 11:37:48 -0600 (Sun, 06 Dec 2009) | 1 line
rewrite translate_newlines for clarity
........
r76697 | benjamin.peterson | 2009-12-06 15:24:30 -0600 (Sun, 06 Dec 2009) | 2 lines
fix test_parser from tokenizer tweak
........
r76733 | benjamin.peterson | 2009-12-09 21:37:59 -0600 (Wed, 09 Dec 2009) | 1 line
substitute PyDict_Check() for PyObject_IsInstance
........
svn+ssh://pythondev@svn.python.org/python/trunk
........
r75149 | gregory.p.smith | 2009-09-29 16:56:31 -0500 (Tue, 29 Sep 2009) | 3 lines
Mention issue6972 in extractall docs about overwriting things outside of
the supplied path.
........
r75260 | andrew.kuchling | 2009-10-05 16:24:20 -0500 (Mon, 05 Oct 2009) | 1 line
Wording fix
........
r75261 | andrew.kuchling | 2009-10-05 16:24:35 -0500 (Mon, 05 Oct 2009) | 1 line
Fix narkup
........
r75262 | andrew.kuchling | 2009-10-05 16:25:03 -0500 (Mon, 05 Oct 2009) | 1 line
Document 'skip' parameter to constructor
........
r75263 | andrew.kuchling | 2009-10-05 16:25:35 -0500 (Mon, 05 Oct 2009) | 1 line
Note side benefit of socket.create_connection()
........
r75265 | andrew.kuchling | 2009-10-05 17:31:11 -0500 (Mon, 05 Oct 2009) | 1 line
Reword sentence
........
r75266 | andrew.kuchling | 2009-10-05 17:32:48 -0500 (Mon, 05 Oct 2009) | 1 line
Use standard comma punctuation; reword some sentences in the docs
........
r75267 | andrew.kuchling | 2009-10-05 17:42:56 -0500 (Mon, 05 Oct 2009) | 1 line
Backport r73983: Document the thousands separator.
........
r75292 | benjamin.peterson | 2009-10-08 22:11:36 -0500 (Thu, 08 Oct 2009) | 1 line
death to old CVS keyword
........
r75300 | benjamin.peterson | 2009-10-09 16:48:14 -0500 (Fri, 09 Oct 2009) | 1 line
fix some coding style
........
r75376 | benjamin.peterson | 2009-10-11 20:26:07 -0500 (Sun, 11 Oct 2009) | 1 line
platform we don't care about
........
r75405 | neil.schemenauer | 2009-10-14 12:17:14 -0500 (Wed, 14 Oct 2009) | 4 lines
Issue #1754094: Improve the stack depth calculation in the compiler.
There should be no other effect than a small decrease in memory use.
Patch by Christopher Tur Lesniewski-Laas.
........
r75429 | benjamin.peterson | 2009-10-14 20:47:28 -0500 (Wed, 14 Oct 2009) | 1 line
pep8ify if blocks
........
r75430 | benjamin.peterson | 2009-10-14 20:49:37 -0500 (Wed, 14 Oct 2009) | 1 line
use floor division and add a test that exercises the tabsize codepath
........
r75431 | benjamin.peterson | 2009-10-14 20:56:25 -0500 (Wed, 14 Oct 2009) | 1 line
change test to what I intended
........
r75432 | benjamin.peterson | 2009-10-14 22:05:39 -0500 (Wed, 14 Oct 2009) | 1 line
some cleanups
........
r75433 | benjamin.peterson | 2009-10-14 22:06:55 -0500 (Wed, 14 Oct 2009) | 1 line
make inspect.isabstract() always return a boolean; add a test for it, too #7069
........
r75437 | benjamin.peterson | 2009-10-15 10:44:46 -0500 (Thu, 15 Oct 2009) | 1 line
only clear a module's __dict__ if the module is the only one with a reference to it #7140
........
r75445 | vinay.sajip | 2009-10-16 09:06:44 -0500 (Fri, 16 Oct 2009) | 1 line
Issue #7120: logging: Removed import of multiprocessing which is causing crash in GAE.
........
r75501 | antoine.pitrou | 2009-10-18 13:37:11 -0500 (Sun, 18 Oct 2009) | 3 lines
Add a comment about unreachable code, and fix a typo
........
r75551 | benjamin.peterson | 2009-10-19 22:14:10 -0500 (Mon, 19 Oct 2009) | 1 line
use property api
........
r75572 | benjamin.peterson | 2009-10-20 16:55:17 -0500 (Tue, 20 Oct 2009) | 1 line
clarify buffer arg #7178
........
r75589 | benjamin.peterson | 2009-10-21 21:26:47 -0500 (Wed, 21 Oct 2009) | 1 line
whitespace
........
r75590 | benjamin.peterson | 2009-10-21 21:36:47 -0500 (Wed, 21 Oct 2009) | 1 line
rewrite to be nice to other implementations
........
r75591 | benjamin.peterson | 2009-10-21 21:50:38 -0500 (Wed, 21 Oct 2009) | 4 lines
rewrite for style, clarify, and comments
Also, use the hasattr() like scheme of allowing BaseException exceptions through.
........
r75657 | antoine.pitrou | 2009-10-24 07:41:27 -0500 (Sat, 24 Oct 2009) | 3 lines
Fix compilation error in debug mode.
........
r75742 | benjamin.peterson | 2009-10-26 17:51:16 -0500 (Mon, 26 Oct 2009) | 1 line
use 'is' instead of id()
........
r75868 | benjamin.peterson | 2009-10-27 15:59:18 -0500 (Tue, 27 Oct 2009) | 1 line
test expect base classes
........
r75952 | georg.brandl | 2009-10-29 15:38:32 -0500 (Thu, 29 Oct 2009) | 1 line
Use the correct function name in docstring.
........
r75953 | georg.brandl | 2009-10-29 15:39:50 -0500 (Thu, 29 Oct 2009) | 1 line
Remove mention of the old -X command line switch.
........
r75954 | georg.brandl | 2009-10-29 15:53:00 -0500 (Thu, 29 Oct 2009) | 1 line
Use constants instead of magic integers for test result. Do not re-run with --verbose3 for environment changing tests.
........
r75955 | georg.brandl | 2009-10-29 15:54:03 -0500 (Thu, 29 Oct 2009) | 1 line
Use a single style for all the docstrings in the math module.
........
r75956 | georg.brandl | 2009-10-29 16:16:34 -0500 (Thu, 29 Oct 2009) | 1 line
I do not think the "railroad" program mentioned is still available.
........
r75957 | georg.brandl | 2009-10-29 16:44:56 -0500 (Thu, 29 Oct 2009) | 1 line
Fix constant name.
........
r76057 | benjamin.peterson | 2009-11-02 09:06:45 -0600 (Mon, 02 Nov 2009) | 1 line
prevent a rather unlikely segfault
........
r76105 | georg.brandl | 2009-11-04 01:38:12 -0600 (Wed, 04 Nov 2009) | 1 line
#7259: show correct equivalent for operator.i* operations in docstring; fix minor issues in operator docs.
........
r76139 | benjamin.peterson | 2009-11-06 19:04:38 -0600 (Fri, 06 Nov 2009) | 1 line
spelling
........
r76143 | georg.brandl | 2009-11-07 02:26:07 -0600 (Sat, 07 Nov 2009) | 1 line
#7271: fix typo.
........
r76162 | benjamin.peterson | 2009-11-08 22:10:53 -0600 (Sun, 08 Nov 2009) | 1 line
discuss how to use -p
........
r76223 | georg.brandl | 2009-11-12 02:29:46 -0600 (Thu, 12 Nov 2009) | 1 line
Give the profile module a module directive.
........