When there is a SyntaxError after reading the last input character from
the tokenizer and if no newline follows it, the error message used to be
`unexpected EOF while parsing`, which is wrong.
Fix a leak and subsequent crash in parsetok.c caused by realloc misuse on a rare codepath.
Realloc returns a null pointer on failure, and then growable_comment_array_deallocate crashes later when it dereferences it.
The function PyTokenizer_FromUTF8 from Parser/tokenizer.c had a comment:
/* XXX: constify members. */
This patch addresses that.
In the tok_state struct:
* end and start were non-const but could be made const
* str and input were const but should have been non-const
Changes to support this include:
* decode_str() now returns a char * since it is allocated.
* PyTokenizer_FromString() and PyTokenizer_FromUTF8() each creates a
new char * for an allocate string instead of reusing the input
const char *.
* PyTokenizer_Get() and tok_get() now take const char ** arguments.
* Various local vars are const or non-const accordingly.
I was able to remove five casts that cast away constness.
GH-13238 made extra text after a # type: ignore accepted by the parser.
This finishes the job and actually plumbs the extra text through the
parser and makes it available in the AST.
This adds a `feature_version` flag to `ast.parse()` (documented) and `compile()` (hidden) that allow tweaking the parser to support older versions of the grammar. In particular if `feature_version` is 5 or 6, the hacks for the `async` and `await` keyword from PEP 492 are reinstated. (For 7 or higher, these are unconditionally treated as keywords, but they are still special tokens rather than `NAME` tokens that the parser driver recognizes.)
https://bugs.python.org/issue35975
Pgen is the oldest piece of technology in the CPython repository, building it requires various #if[n]def PGEN hacks in other parts of the code and it also depends more and more on CPython internals. This commit removes the old pgen C code and replaces it for a new version implemented in pure Python. This is a modified and adapted version of lib2to3/pgen2 that can generate grammar files compatibles with the current parser.
This commit also eliminates all the #ifdef and code branches related to pgen, simplifying the code and making it more maintainable. The regen-grammar step now uses $(PYTHON_FOR_REGEN) that can be any version of the interpreter, so the new pgen code maintains compatibility with older versions of the interpreter (this also allows regenerating the grammar with the current CI solution that uses Python3.5). The new pgen Python module also makes use of the Grammar/Tokens file that holds the token specification, so is always kept in sync and avoids having to maintain duplicate token definitions.
The majority of this PR is tediously passing `end_lineno` and `end_col_offset` everywhere. Here are non-trivial points:
* It is not possible to reconstruct end positions in AST "on the fly", some information is lost after an AST node is constructed, so we need two more attributes for every AST node `end_lineno` and `end_col_offset`.
* I add end position information to both CST and AST. Although it may be technically possible to avoid adding end positions to CST, the code becomes more cumbersome and less efficient.
* Since the end position is not known for non-leaf CST nodes while the next token is added, this requires a bit of extra care (see `_PyNode_FinalizeEndPos`). Unless I made some mistake, the algorithm should be linear.
* For statements, I "trim" the end position of suites to not include the terminal newlines and dedent (this seems to be what people would expect), for example in
```python
class C:
pass
pass
```
the end line and end column for the class definition is (2, 8).
* For `end_col_offset` I use the common Python convention for indexing, for example for `pass` the `end_col_offset` is 4 (not 3), so that `[0:4]` gives one the source code that corresponds to the node.
* I added a helper function `ast.get_source_segment()`, to get source text segment corresponding to a given AST node. It is also useful for testing.
An (inevitable) downside of this PR is that AST now takes almost 25% more memory. I think however it is probably justified by the benefits.
svn+ssh://pythondev@svn.python.org/python/trunk
........
r57820 | georg.brandl | 2007-08-31 08:59:27 +0200 (Fri, 31 Aug 2007) | 2 lines
Document new shorthand notation for index entries.
........
r57827 | georg.brandl | 2007-08-31 10:47:51 +0200 (Fri, 31 Aug 2007) | 2 lines
Fix subitem markup.
........
r57833 | martin.v.loewis | 2007-08-31 12:01:07 +0200 (Fri, 31 Aug 2007) | 1 line
Mark registry components as 64-bit on Win64.
........
r57854 | bill.janssen | 2007-08-31 21:02:23 +0200 (Fri, 31 Aug 2007) | 1 line
deprecate use of FakeSocket
........
r57855 | bill.janssen | 2007-08-31 21:02:46 +0200 (Fri, 31 Aug 2007) | 1 line
remove mentions of socket.ssl in comments
........
r57856 | bill.janssen | 2007-08-31 21:03:31 +0200 (Fri, 31 Aug 2007) | 1 line
remove use of non-existent SSLFakeSocket in apparently untested code
........
r57859 | martin.v.loewis | 2007-09-01 08:36:03 +0200 (Sat, 01 Sep 2007) | 3 lines
Bug #1737210: Change Manufacturer of Windows installer to PSF.
Will backport to 2.5.
........
r57865 | georg.brandl | 2007-09-01 09:51:24 +0200 (Sat, 01 Sep 2007) | 2 lines
Fix RST link (backport from Py3k).
........
r57876 | georg.brandl | 2007-09-01 17:49:49 +0200 (Sat, 01 Sep 2007) | 2 lines
Document sets' ">" and "<" operations (backport from py3k).
........
r57878 | skip.montanaro | 2007-09-01 19:40:03 +0200 (Sat, 01 Sep 2007) | 4 lines
Added a note and examples to explain that re.split does not split on an
empty pattern match. (issue 852532).
........
r57879 | walter.doerwald | 2007-09-01 20:18:09 +0200 (Sat, 01 Sep 2007) | 2 lines
Fix wrong function names.
........
r57880 | walter.doerwald | 2007-09-01 20:34:05 +0200 (Sat, 01 Sep 2007) | 2 lines
Fix typo.
........
r57889 | andrew.kuchling | 2007-09-01 22:31:59 +0200 (Sat, 01 Sep 2007) | 1 line
Markup fix
........
r57892 | andrew.kuchling | 2007-09-01 22:43:36 +0200 (Sat, 01 Sep 2007) | 1 line
Add various items
........
r57895 | andrew.kuchling | 2007-09-01 23:17:58 +0200 (Sat, 01 Sep 2007) | 1 line
Wording change
........
r57896 | andrew.kuchling | 2007-09-01 23:18:31 +0200 (Sat, 01 Sep 2007) | 1 line
Add more items
........
r57904 | ronald.oussoren | 2007-09-02 11:46:07 +0200 (Sun, 02 Sep 2007) | 3 lines
Macosx: this patch ensures that the value of MACOSX_DEPLOYMENT_TARGET used
by the Makefile is also used at configure-time.
........
r57925 | georg.brandl | 2007-09-03 09:16:46 +0200 (Mon, 03 Sep 2007) | 2 lines
Fix#883466: don't allow Unicode as arguments to quopri and uu codecs.
........
r57936 | matthias.klose | 2007-09-04 01:33:04 +0200 (Tue, 04 Sep 2007) | 2 lines
- Added support for linking the bsddb module against BerkeleyDB 4.6.x.
........
r57954 | mark.summerfield | 2007-09-04 10:16:15 +0200 (Tue, 04 Sep 2007) | 3 lines
Added cross-references plus a note about dict & list shallow copying.
........
r57958 | martin.v.loewis | 2007-09-04 11:51:57 +0200 (Tue, 04 Sep 2007) | 3 lines
Document that we rely on the OS to release the crypto
context. Fixes#1626801.
........
r57960 | martin.v.loewis | 2007-09-04 15:13:14 +0200 (Tue, 04 Sep 2007) | 3 lines
Patch #1388440: Add set_completion_display_matches_hook and
get_completion_type to readline.
........
r57961 | martin.v.loewis | 2007-09-04 16:19:28 +0200 (Tue, 04 Sep 2007) | 3 lines
Patch #1031213: Decode source line in SyntaxErrors back to its original
source encoding. Will backport to 2.5.
........
r57972 | matthias.klose | 2007-09-04 20:17:36 +0200 (Tue, 04 Sep 2007) | 3 lines
- Makefile.pre.in(buildbottest): Run an optional script pybuildbot.identify
to include some information about the build environment.
........
r57973 | matthias.klose | 2007-09-04 21:05:38 +0200 (Tue, 04 Sep 2007) | 2 lines
- Makefile.pre.in(buildbottest): Remove whitespace at eol.
........
r57975 | matthias.klose | 2007-09-04 22:46:02 +0200 (Tue, 04 Sep 2007) | 2 lines
- Fix libffi configure for hppa*-*-linux* | parisc*-*-linux*.
........
r57980 | bill.janssen | 2007-09-05 02:46:27 +0200 (Wed, 05 Sep 2007) | 1 line
SSL certificate distinguished names should be represented by tuples
........
r57985 | martin.v.loewis | 2007-09-05 08:39:17 +0200 (Wed, 05 Sep 2007) | 3 lines
Patch #1105: Explain that one needs to build the solution
to get dependencies right.
........
r57987 | armin.rigo | 2007-09-05 09:51:21 +0200 (Wed, 05 Sep 2007) | 4 lines
PyDict_GetItem() returns a borrowed reference.
There are probably a number of places that are open to attacks
such as the following one, in bltinmodule.c:min_max().
........
r57991 | martin.v.loewis | 2007-09-05 13:47:34 +0200 (Wed, 05 Sep 2007) | 3 lines
Patch #786737: Allow building in a tree of symlinks pointing to
a readonly source.
........
r57993 | georg.brandl | 2007-09-05 15:36:44 +0200 (Wed, 05 Sep 2007) | 2 lines
Backport from Py3k: Bug #1684991: explain lookup semantics for __special__ methods (new-style classes only).
........
r58004 | armin.rigo | 2007-09-06 10:30:51 +0200 (Thu, 06 Sep 2007) | 4 lines
Patch #1733973 by peaker:
ptrace_enter_call() assumes no exception is currently set.
This assumption is broken when throwing into a generator.
........
r58006 | armin.rigo | 2007-09-06 11:30:38 +0200 (Thu, 06 Sep 2007) | 4 lines
PyDict_GetItem() returns a borrowed reference.
This attack is against ceval.c:IMPORT_NAME, which calls an
object (__builtin__.__import__) without holding a reference to it.
........
r58013 | georg.brandl | 2007-09-06 16:49:56 +0200 (Thu, 06 Sep 2007) | 2 lines
Backport from 3k: #1116: fix reference to old filename.
........
r58021 | thomas.heller | 2007-09-06 22:26:20 +0200 (Thu, 06 Sep 2007) | 1 line
Fix typo: c_float represents to C float type.
........
r58022 | skip.montanaro | 2007-09-07 00:29:06 +0200 (Fri, 07 Sep 2007) | 3 lines
If this is correct for py3k branch and it's already in the release25-maint
branch, seems like it ought to be on the trunk as well.
........
r58023 | gregory.p.smith | 2007-09-07 00:59:59 +0200 (Fri, 07 Sep 2007) | 4 lines
Apply the fix from Issue1112 to make this test more robust and keep
windows happy.
........
r58031 | brett.cannon | 2007-09-07 05:17:50 +0200 (Fri, 07 Sep 2007) | 4 lines
Make uuid1 and uuid4 tests conditional on whether ctypes can be imported;
implementation of either function depends on ctypes but uuid as a whole does
not.
........
r58032 | brett.cannon | 2007-09-07 06:18:30 +0200 (Fri, 07 Sep 2007) | 6 lines
Fix a crasher where Python code managed to infinitely recurse in C code without
ever going back out to Python code in PyObject_Call(). Required introducing a
static RuntimeError instance so that normalizing an exception there is no
reliance on a recursive call that would put the exception system over the
recursion check itself.
........
r58034 | thomas.heller | 2007-09-07 08:32:17 +0200 (Fri, 07 Sep 2007) | 1 line
Add a 'c_longdouble' type to the ctypes module.
........
r58035 | thomas.heller | 2007-09-07 11:30:40 +0200 (Fri, 07 Sep 2007) | 1 line
Remove unneeded #include.
........
r58036 | thomas.heller | 2007-09-07 11:33:24 +0200 (Fri, 07 Sep 2007) | 6 lines
Backport from py3k branch:
Add a workaround for a strange bug on win64, when _ctypes is compiled
with the SDK compiler. This should fix the failing
Lib\ctypes\test\test_as_parameter.py test.
........
r58037 | georg.brandl | 2007-09-07 16:14:40 +0200 (Fri, 07 Sep 2007) | 2 lines
Fix a wrong indentation for sublists.
........
r58043 | georg.brandl | 2007-09-07 22:10:49 +0200 (Fri, 07 Sep 2007) | 2 lines
#1095: ln -f doesn't work portably, fix in Makefile.
........
r58049 | skip.montanaro | 2007-09-08 02:34:17 +0200 (Sat, 08 Sep 2007) | 1 line
be explicit about the actual location of the missing file
........