This commit simplifies async/await tokenization in tokenizer.c,
tokenize.py & lib2to3/tokenize.py. Previous solution was to keep
a stack of async-def & def blocks, whereas the new approach is just
to remember position of the outermost async-def block.
This change won't bring any parsing performance improvements, but
it makes the code much easier to read and validate.
This commit fixes how one-line async-defs and defs are tracked
by tokenizer. It allows to correctly parse invalid code such
as:
>>> async def f():
... def g(): pass
... async = 10
and valid code such as:
>>> async def f():
... async def g(): pass
... await z
As a consequence, is is now possible to have one-line
'async def foo(): await ..' functions:
>>> async def foo(): return await bar()
The new parser does not rely on Spark (which is now removed from our repo),
uses modern 3.x idioms and is significantly smaller and simpler.
It generates exactly the same AST files (.h and .c), so in practice no builds
should be affected.
* The first line of Python script could be executed twice when the source
encoding (not equal to 'utf-8') was specified on the second line.
* Now the source encoding declaration on the second line isn't effective if
the first line contains anything except a comment.
* As a consequence, 'python -x' works now again with files with the source
encoding declarations specified on the second file, and can be used again
to make Python batch files on Windows.
* The tokenize module now ignore the source encoding declaration on the second
line if the first line contains anything except a comment.
* IDLE now ignores the source encoding declaration on the second line if the
first line contains anything except a comment.
* 2to3 and the findnocoding.py script now ignore the source encoding
declaration on the second line if the first line contains anything except
a comment.
* The first line of Python script could be executed twice when the source
encoding (not equal to 'utf-8') was specified on the second line.
* Now the source encoding declaration on the second line isn't effective if
the first line contains anything except a comment.
* As a consequence, 'python -x' works now again with files with the source
encoding declarations specified on the second file, and can be used again
to make Python batch files on Windows.
* The tokenize module now ignore the source encoding declaration on the second
line if the first line contains anything except a comment.
* IDLE now ignores the source encoding declaration on the second line if the
first line contains anything except a comment.
* 2to3 and the findnocoding.py script now ignore the source encoding
declaration on the second line if the first line contains anything except
a comment.
The GIL must be held to call PyMem_Malloc(), whereas PyOS_Readline() releases
the GIL to read input.
The result of the C callback PyOS_ReadlineFunctionPointer must now be a string
allocated by PyMem_RawMalloc() or PyMem_RawRealloc() (or NULL if an error
occurred), instead of a string allocated by PyMem_Malloc() or PyMem_Realloc().
Fixing this issue was required to setup a hook on PyMem_Malloc(), for example
using the tracemalloc module.
PyOS_Readline() copies the result of PyOS_ReadlineFunctionPointer() into a new
buffer allocated by PyMem_Malloc(). So the public API of PyOS_Readline() does
not change.