There are two copies of the grammar -- the one used by Python itself as
Grammar/Grammar, and the one used by lib2to3 which has necessarily diverged at
Lib/lib2to3/Grammar.txt because it needs to support older syntax an we want it
to be reasonable stable to avoid requiring fixer rewrites.
This brings suport for syntax like `if x:= foo():` to match what the live
Python grammar does.
This should've been added at the time of the walrus operator itself, but lib2to3 being
independent is often overlooked. So we do consider this a bugfix rather than enhancement.
New tests also added.
I also made the comments in line with the builtin Grammar/Grammar. PEP 306 was
withdrawn, Kees Blom's railroad program has been lost to the sands of time for
at least 16 years now (I found a python-dev post from people looking for it).
Fix two (in my opinion) spurious failure conditions in the lib2to3.tests.test_parser.TestParserIdempotency test_parser test.
Use the same encoding found in the initial file to write a temp file for a diff. This retains the BOM if the encoding was initially utf-8-sig.
If the file cannot be parsed using the normal grammar, try again with no print statement which should succeed for valid files using future print_function
For case (1), the driver was correctly handling a BOM in a utf-8 file, but then the test was not writing a comparison file using 'utf-8-sig' to diff against, so the BOM got removed. I don't think that is the fault of the parser, and lib2to3 will retain the BOM.
For case (2), lib2to3 pre-detects the use of from __future__ import print_function or allows the user to force this interpretation with a -p flag, and then selects a different grammar with the print statement removed. That makes the test cases unfair to this test as the driver itself doesn't know which grammar to use. As a minimal fix, the test will try using a grammar with the print statement, and if that fails fall back on a grammar without it. A more thorough handling of the idempotency test would to be to parse all files using both grammars and ignore if one of the two failed but otherwise check both. I didn't think this was necessary but can change.
This is more complicated than it should be because we need to preserve the
useful mtime-based regeneration feature that lib2to3.pgen2.driver.load_grammar
has. We only look for the pickled grammar file with pkgutil.get_data and only if
the source file does not exist.
Note: this doesn't unpack f-strings into the underlying JoinedStr AST.
Ideally we'd fully implement JoinedStr here but given its additional
complexity, I think this is worth bandaiding as is. This unblocks tools like
https://github.com/google/yapf to format 3.6 syntax using f-strings.
* Allow underscores in numeric literals in lib2to3.
* Stricter literal parsing for Python 3.6 in lib2to3.pgen2.tokenize.
* Add test case for underscores in literals in Python 3.
This commit simplifies async/await tokenization in tokenizer.c,
tokenize.py & lib2to3/tokenize.py. Previous solution was to keep
a stack of async-def & def blocks, whereas the new approach is just
to remember position of the outermost async-def block.
This change won't bring any parsing performance improvements, but
it makes the code much easier to read and validate.