Commit Graph

104 Commits

Author SHA1 Message Date
Lysandros Nikolaou 9a4b38f66b
bpo-40267: Fix message when last input character produces a SyntaxError (GH-19521)
When there is a SyntaxError after reading the last input character from
the tokenizer and if no newline follows it, the error message used to be
`unexpected EOF while parsing`, which is wrong.
2020-04-15 11:22:10 -07:00
Alexander Riccio 51e3e450fb
bpo-40020: Fix realloc leak on failure in growable_comment_array_add (GH-19083)
Fix a leak and subsequent crash in parsetok.c caused by realloc misuse on a rare codepath. 

Realloc returns a null pointer on failure, and then growable_comment_array_deallocate crashes later when it dereferences it.
2020-03-30 23:15:59 +02:00
Andy Lester 384f3c536d
closes bpo-39721: Fix constness of members of tok_state struct. (GH-18600)
The function PyTokenizer_FromUTF8 from Parser/tokenizer.c had a comment:

    /* XXX: constify members. */

This patch addresses that.

In the tok_state struct:
    * end and start were non-const but could be made const
    * str and input were const but should have been non-const

Changes to support this include:
    * decode_str() now returns a char * since it is allocated.
    * PyTokenizer_FromString() and PyTokenizer_FromUTF8() each creates a
        new char * for an allocate string instead of reusing the input
        const char *.
    * PyTokenizer_Get() and tok_get() now take const char ** arguments.
    * Various local vars are const or non-const accordingly.

I was able to remove five casts that cast away constness.
2020-02-27 18:44:52 -08:00
Emmanuel Arias d23f78267a Remove unused functions in Parser/parsetok.c (GH-17365) 2020-01-13 11:58:52 +00:00
Alex Henrie 7ba6f18de2 bpo-39307: Fix memory leak on error path in parsetok (GH-17953) 2020-01-13 10:35:47 +00:00
Steve Dower b82e17e626
bpo-36842: Implement PEP 578 (GH-12613)
Adds sys.audit, sys.addaudithook, io.open_code, and associated C APIs.
2019-05-23 08:45:22 -07:00
Michael J. Sullivan 933e1509ec bpo-36878: Track extra text added to 'type: ignore' in the AST (GH-13479)
GH-13238 made extra text after a # type: ignore accepted by the parser.
This finishes the job and actually plumbs the extra text through the
parser and makes it available in the AST.
2019-05-22 15:54:20 +01:00
Pablo Galindo f2cf1e3e28
bpo-36623: Clean parser headers and include files (GH-12253)
After the removal of pgen, multiple header and function prototypes that lack implementation or are unused are still lying around.
2019-04-13 17:05:14 +01:00
Guido van Rossum 495da29225 bpo-35975: Support parsing earlier minor versions of Python 3 (GH-12086)
This adds a `feature_version` flag to `ast.parse()` (documented) and `compile()` (hidden) that allow tweaking the parser to support older versions of the grammar. In particular if `feature_version` is 5 or 6, the hacks for the `async` and `await` keyword from PEP 492 are reinstated. (For 7 or higher, these are unconditionally treated as keywords, but they are still special tokens rather than `NAME` tokens that the parser driver recognizes.)



https://bugs.python.org/issue35975
2019-03-07 12:38:08 -08:00
Pablo Galindo 1f24a719e7
bpo-35808: Retire pgen and use pgen2 to generate the parser (GH-11814)
Pgen is the oldest piece of technology in the CPython repository, building it requires various #if[n]def PGEN hacks in other parts of the code and it also depends more and more on CPython internals. This commit removes the old pgen C code and replaces it for a new version implemented in pure Python. This is a modified and adapted version of lib2to3/pgen2 that can generate grammar files compatibles with the current parser.

This commit also eliminates all the #ifdef and code branches related to pgen, simplifying the code and making it more maintainable. The regen-grammar step now uses $(PYTHON_FOR_REGEN) that can be any version of the interpreter, so the new pgen code maintains compatibility with older versions of the interpreter (this also allows regenerating the grammar with the current CI solution that uses Python3.5). The new pgen Python module also makes use of the Grammar/Tokens file that holds the token specification, so is always kept in sync and avoids having to maintain duplicate token definitions.
2019-03-01 15:34:44 -08:00
Pablo Galindo b9d2e97601
Fix potential memory leak in parsetok.c (GH-11832) 2019-02-13 00:45:53 +00:00
Guido van Rossum d2b4c19d53
bpo-35879: Fix type comment leaks (GH-11728)
* Fix leak for # type: ignore
* Fix the type comment leak
2019-02-01 15:28:13 -08:00
Guido van Rossum dcfcd146f8 bpo-35766: Merge typed_ast back into CPython (GH-11645) 2019-01-31 12:40:27 +01:00
Ivan Levkivskyi 9932a22897
bpo-33416: Add end positions to Python AST (GH-11605)
The majority of this PR is tediously passing `end_lineno` and `end_col_offset` everywhere. Here are non-trivial points:
* It is not possible to reconstruct end positions in AST "on the fly", some information is lost after an AST node is constructed, so we need two more attributes for every AST node `end_lineno` and `end_col_offset`.
* I add end position information to both CST and AST.  Although it may be technically possible to avoid adding end positions to CST, the code becomes more cumbersome and less efficient.
* Since the end position is not known for non-leaf CST nodes while the next token is added, this requires a bit of extra care (see `_PyNode_FinalizeEndPos`). Unless I made some mistake, the algorithm should be linear.
* For statements, I "trim" the end position of suites to not include the terminal newlines and dedent (this seems to be what people would expect), for example in
  ```python
  class C:
      pass

  pass
  ```
  the end line and end column for the class definition is (2, 8).
* For `end_col_offset` I use the common Python convention for indexing, for example for `pass` the `end_col_offset` is 4 (not 3), so that `[0:4]` gives one the source code that corresponds to the node.
* I added a helper function `ast.get_source_segment()`, to get source text segment corresponding to a given AST node. It is also useful for testing.

An (inevitable) downside of this PR is that AST now takes almost 25% more memory. I think however it is probably justified by the benefits.
2019-01-22 11:18:22 +00:00
Anthony Sottile 995d9b9297 bpo-16806: Fix `lineno` and `col_offset` for multi-line string tokens (GH-10021) 2019-01-13 13:05:13 +09:00
Ammar Askar 025eb98dc0 bpo-34683: Make SyntaxError column offsets consistently 1-indexed (gh-9338)
Also point to start of tokens in parsing errors.

Fixes bpo-34683
2018-09-24 14:12:49 -07:00
Zackery Spytz 3e26e42c90 bpo-34400: Fix more undefined behavior in parsetok.c (GH-8833) 2018-08-20 20:11:40 -07:00
Zackery Spytz 7c4ab2afb1 closes bpo-34400: Fix undefined behavior in parsetok(). (GH-4439)
Avoid undefined pointer arithmetic with NULL.
2018-08-14 23:27:26 -07:00
Serhiy Storchaka aba24ff360
bpo-34084: Fix setting an error message for the "Barry as BDFL" easter egg. (GH-8262) 2018-07-23 23:41:11 +03:00
Benjamin Peterson ca47063998 replace Py_(u)intptr_t with the c99 standard types 2016-09-06 13:47:26 -07:00
Serhiy Storchaka 2d06e84455 Issue #25923: Added the const qualifier to static constant arrays. 2015-12-25 19:53:18 +02:00
Victor Stinner cad876d542 Fix a compiler warning on Windows 64-bit in parsetok.c
Python parser doesn't support lines longer than INT_MAX bytes yet
2013-11-18 01:09:51 +01:00
Serhiy Storchaka c679227e31 Issue #1772673: The type of `char*` arguments now changed to `const char*`. 2013-10-19 21:03:34 +03:00
Victor Stinner 14e461d5b9 Close #11619: The parser and the import machinery do not encode Unicode
filenames anymore on Windows.
2013-08-26 22:28:21 +02:00
Victor Stinner 526daabf34 Issue #18408: parsetok() must not write into stderr on memory allocation error
The caller gets an error code and can raise a classic Python exception.
2013-07-11 23:17:33 +02:00
Victor Stinner 3bf5f530d9 Issue #18408: parsetok() must not write into stderr on memory allocation error
The caller gets an error code and can raise a classic Python exception.
2013-07-11 22:52:19 +02:00
Benjamin Peterson cff9237d57 check after comments, too (#13832) 2012-01-19 17:46:13 -05:00
Benjamin Peterson 188bee5873 don't leak node 2012-01-19 08:48:18 -05:00
Benjamin Peterson 79c1f96438 only check this when parsing python 2012-01-19 08:48:11 -05:00
Meador Inge fa21bf015d Issue #12705: Raise SyntaxError when compiling multiple statements as single interactive statement 2012-01-19 01:08:41 -06:00
Antoine Pitrou f364e7b598 Fix memory leak with FLUFL-related syntax errors (!) 2011-11-13 01:02:02 +01:00
Antoine Pitrou 9ec2593bda Fix memory leak with FLUFL-related syntax errors (!) 2011-11-13 01:01:23 +01:00
Benjamin Peterson 758888d437 don't restrict unexpected EOF errors to the first line (closes #12216) 2011-05-30 11:12:38 -05:00
Victor Stinner 7f2fee3640 Issue #10785: Store the filename as Unicode in the Python parser. 2011-04-05 00:39:01 +02:00
Brett Cannon b94767ff44 Issue #8914: fix various warnings from the Clang static analyzer v254. 2011-02-22 20:15:44 +00:00
Antoine Pitrou f95a1b3c53 Recorded merge of revisions 81029 via svnmerge from
svn+ssh://pythondev@svn.python.org/python/trunk

........
  r81029 | antoine.pitrou | 2010-05-09 16:46:46 +0200 (dim., 09 mai 2010) | 3 lines

  Untabify C files. Will watch buildbots.
........
2010-05-09 15:52:27 +00:00
Benjamin Peterson aeaa592516 Merged revisions 76230 via svnmerge from
svn+ssh://pythondev@svn.python.org/python/trunk

........
  r76230 | benjamin.peterson | 2009-11-12 17:39:44 -0600 (Thu, 12 Nov 2009) | 2 lines

  fix several compile() issues by translating newlines in the tokenizer
........
2009-11-13 00:17:59 +00:00
Kristján Valur Jónsson ae4cfb1bb3 http://bugs.python.org/issue6836
Merging revisions 75103,75104 from trunk to py3k
2009-09-28 13:45:02 +00:00
Brett Cannon e3944a5e1e The BDFL has retired! Long live the FLUFL (Friendly Language Uncle For Life)! 2009-04-01 05:08:41 +00:00
Benjamin Peterson f5b52246ed ignore the coding cookie in compile(), exec(), and eval() if the source is a string #4626 2009-03-02 23:31:26 +00:00
Georg Brandl e1b5ac6408 Remove meaning of -ttt, but still accept -t option on cmdline for compatibility. 2008-06-04 13:06:58 +00:00
Georg Brandl a26f8ca668 Revert r63934 -- it was mixing two patches. 2008-06-04 13:01:30 +00:00
Georg Brandl f954c4b9fb Remove meaning of -ttt, but still accept -t option on cmdline for compatibility. 2008-06-04 11:41:32 +00:00
Christian Heimes b1b3efc504 Merged revisions 61954,61956-61957 via svnmerge from
svn+ssh://pythondev@svn.python.org/python/trunk

........
  r61954 | christian.heimes | 2008-03-26 23:20:26 +0100 (Wed, 26 Mar 2008) | 1 line

  Surround p_flags access with #ifdef PY_PARSER_REQUIRES_FUTURE_KEYWORD
........
  r61956 | christian.heimes | 2008-03-26 23:51:58 +0100 (Wed, 26 Mar 2008) | 1 line

  Initialize PyCompilerFlags cf_flags with 0
........
  r61957 | christian.heimes | 2008-03-26 23:55:31 +0100 (Wed, 26 Mar 2008) | 1 line

  I forgot to svn add the future test
........
2008-03-26 23:24:27 +00:00
Christian Heimes 4d6ec85a02 Merged revisions 61952-61953 via svnmerge from
svn+ssh://pythondev@svn.python.org/python/trunk

........
  r61952 | mark.dickinson | 2008-03-26 22:41:36 +0100 (Wed, 26 Mar 2008) | 2 lines

  Typo: "objects reference count" -> "object's reference count"
........
  r61953 | christian.heimes | 2008-03-26 23:01:37 +0100 (Wed, 26 Mar 2008) | 4 lines

  Patch #2477: Added from __future__ import unicode_literals

  The new PyParser_*Ex() functions are based on Neal's suggestion and initial patch. The new __future__ feature makes all '' and r'' unicode strings. b'' and br'' stay (byte) strings.
........
2008-03-26 22:34:47 +00:00
Martin v. Löwis 2593146227 Bug #2301: Don't try decoding the source code into the original
encoding for syntax errors.
2008-03-17 20:43:42 +00:00
Georg Brandl 52d168a995 Remove traces of Py_USING_UNICODE and Unicode-specific conditionals in configure.
Rename --enable-unicode to --with-wide-unicode; the default is still not wide.
2008-01-07 18:10:24 +00:00
Thomas Wouters 89d996e5c2 Merged revisions 57778-58052 via svnmerge from
svn+ssh://pythondev@svn.python.org/python/trunk

........
  r57820 | georg.brandl | 2007-08-31 08:59:27 +0200 (Fri, 31 Aug 2007) | 2 lines

  Document new shorthand notation for index entries.
........
  r57827 | georg.brandl | 2007-08-31 10:47:51 +0200 (Fri, 31 Aug 2007) | 2 lines

  Fix subitem markup.
........
  r57833 | martin.v.loewis | 2007-08-31 12:01:07 +0200 (Fri, 31 Aug 2007) | 1 line

  Mark registry components as 64-bit on Win64.
........
  r57854 | bill.janssen | 2007-08-31 21:02:23 +0200 (Fri, 31 Aug 2007) | 1 line

  deprecate use of FakeSocket
........
  r57855 | bill.janssen | 2007-08-31 21:02:46 +0200 (Fri, 31 Aug 2007) | 1 line

  remove mentions of socket.ssl in comments
........
  r57856 | bill.janssen | 2007-08-31 21:03:31 +0200 (Fri, 31 Aug 2007) | 1 line

  remove use of non-existent SSLFakeSocket in apparently untested code
........
  r57859 | martin.v.loewis | 2007-09-01 08:36:03 +0200 (Sat, 01 Sep 2007) | 3 lines

  Bug #1737210: Change Manufacturer of Windows installer to PSF.
  Will backport to 2.5.
........
  r57865 | georg.brandl | 2007-09-01 09:51:24 +0200 (Sat, 01 Sep 2007) | 2 lines

  Fix RST link (backport from Py3k).
........
  r57876 | georg.brandl | 2007-09-01 17:49:49 +0200 (Sat, 01 Sep 2007) | 2 lines

  Document sets' ">" and "<" operations (backport from py3k).
........
  r57878 | skip.montanaro | 2007-09-01 19:40:03 +0200 (Sat, 01 Sep 2007) | 4 lines

  Added a note and examples to explain that re.split does not split on an
  empty pattern match. (issue 852532).
........
  r57879 | walter.doerwald | 2007-09-01 20:18:09 +0200 (Sat, 01 Sep 2007) | 2 lines

  Fix wrong function names.
........
  r57880 | walter.doerwald | 2007-09-01 20:34:05 +0200 (Sat, 01 Sep 2007) | 2 lines

  Fix typo.
........
  r57889 | andrew.kuchling | 2007-09-01 22:31:59 +0200 (Sat, 01 Sep 2007) | 1 line

  Markup fix
........
  r57892 | andrew.kuchling | 2007-09-01 22:43:36 +0200 (Sat, 01 Sep 2007) | 1 line

  Add various items
........
  r57895 | andrew.kuchling | 2007-09-01 23:17:58 +0200 (Sat, 01 Sep 2007) | 1 line

  Wording change
........
  r57896 | andrew.kuchling | 2007-09-01 23:18:31 +0200 (Sat, 01 Sep 2007) | 1 line

  Add more items
........
  r57904 | ronald.oussoren | 2007-09-02 11:46:07 +0200 (Sun, 02 Sep 2007) | 3 lines

  Macosx: this patch ensures that the value of MACOSX_DEPLOYMENT_TARGET used
  by the Makefile is also used at configure-time.
........
  r57925 | georg.brandl | 2007-09-03 09:16:46 +0200 (Mon, 03 Sep 2007) | 2 lines

  Fix #883466: don't allow Unicode as arguments to quopri and uu codecs.
........
  r57936 | matthias.klose | 2007-09-04 01:33:04 +0200 (Tue, 04 Sep 2007) | 2 lines

  - Added support for linking the bsddb module against BerkeleyDB 4.6.x.
........
  r57954 | mark.summerfield | 2007-09-04 10:16:15 +0200 (Tue, 04 Sep 2007) | 3 lines

  Added cross-references plus a note about dict & list shallow copying.
........
  r57958 | martin.v.loewis | 2007-09-04 11:51:57 +0200 (Tue, 04 Sep 2007) | 3 lines

  Document that we rely on the OS to release the crypto
  context. Fixes #1626801.
........
  r57960 | martin.v.loewis | 2007-09-04 15:13:14 +0200 (Tue, 04 Sep 2007) | 3 lines

  Patch #1388440: Add set_completion_display_matches_hook and
  get_completion_type to readline.
........
  r57961 | martin.v.loewis | 2007-09-04 16:19:28 +0200 (Tue, 04 Sep 2007) | 3 lines

  Patch #1031213: Decode source line in SyntaxErrors back to its original
  source encoding. Will backport to 2.5.
........
  r57972 | matthias.klose | 2007-09-04 20:17:36 +0200 (Tue, 04 Sep 2007) | 3 lines

  - Makefile.pre.in(buildbottest): Run an optional script pybuildbot.identify
    to include some information about the build environment.
........
  r57973 | matthias.klose | 2007-09-04 21:05:38 +0200 (Tue, 04 Sep 2007) | 2 lines

  - Makefile.pre.in(buildbottest): Remove whitespace at eol.
........
  r57975 | matthias.klose | 2007-09-04 22:46:02 +0200 (Tue, 04 Sep 2007) | 2 lines

  - Fix libffi configure for hppa*-*-linux* | parisc*-*-linux*.
........
  r57980 | bill.janssen | 2007-09-05 02:46:27 +0200 (Wed, 05 Sep 2007) | 1 line

  SSL certificate distinguished names should be represented by tuples
........
  r57985 | martin.v.loewis | 2007-09-05 08:39:17 +0200 (Wed, 05 Sep 2007) | 3 lines

  Patch #1105: Explain that one needs to build the solution
  to get dependencies right.
........
  r57987 | armin.rigo | 2007-09-05 09:51:21 +0200 (Wed, 05 Sep 2007) | 4 lines

  PyDict_GetItem() returns a borrowed reference.
  There are probably a number of places that are open to attacks
  such as the following one, in bltinmodule.c:min_max().
........
  r57991 | martin.v.loewis | 2007-09-05 13:47:34 +0200 (Wed, 05 Sep 2007) | 3 lines

  Patch #786737: Allow building in a tree of symlinks pointing to
  a readonly source.
........
  r57993 | georg.brandl | 2007-09-05 15:36:44 +0200 (Wed, 05 Sep 2007) | 2 lines

  Backport from Py3k: Bug #1684991: explain lookup semantics for __special__ methods (new-style classes only).
........
  r58004 | armin.rigo | 2007-09-06 10:30:51 +0200 (Thu, 06 Sep 2007) | 4 lines

  Patch #1733973 by peaker:
  ptrace_enter_call() assumes no exception is currently set.
  This assumption is broken when throwing into a generator.
........
  r58006 | armin.rigo | 2007-09-06 11:30:38 +0200 (Thu, 06 Sep 2007) | 4 lines

  PyDict_GetItem() returns a borrowed reference.
  This attack is against ceval.c:IMPORT_NAME, which calls an
  object (__builtin__.__import__) without holding a reference to it.
........
  r58013 | georg.brandl | 2007-09-06 16:49:56 +0200 (Thu, 06 Sep 2007) | 2 lines

  Backport from 3k: #1116: fix reference to old filename.
........
  r58021 | thomas.heller | 2007-09-06 22:26:20 +0200 (Thu, 06 Sep 2007) | 1 line

  Fix typo:  c_float represents to C float type.
........
  r58022 | skip.montanaro | 2007-09-07 00:29:06 +0200 (Fri, 07 Sep 2007) | 3 lines

  If this is correct for py3k branch and it's already in the release25-maint
  branch, seems like it ought to be on the trunk as well.
........
  r58023 | gregory.p.smith | 2007-09-07 00:59:59 +0200 (Fri, 07 Sep 2007) | 4 lines

  Apply the fix from Issue1112 to make this test more robust and keep
  windows happy.
........
  r58031 | brett.cannon | 2007-09-07 05:17:50 +0200 (Fri, 07 Sep 2007) | 4 lines

  Make uuid1 and uuid4 tests conditional on whether ctypes can be imported;
  implementation of either function depends on ctypes but uuid as a whole does
  not.
........
  r58032 | brett.cannon | 2007-09-07 06:18:30 +0200 (Fri, 07 Sep 2007) | 6 lines

  Fix a crasher where Python code managed to infinitely recurse in C code without
  ever going back out to Python code in PyObject_Call().  Required introducing a
  static RuntimeError instance so that normalizing an exception there is no
  reliance on a recursive call that would put the exception system over the
  recursion check itself.
........
  r58034 | thomas.heller | 2007-09-07 08:32:17 +0200 (Fri, 07 Sep 2007) | 1 line

  Add a 'c_longdouble' type to the ctypes module.
........
  r58035 | thomas.heller | 2007-09-07 11:30:40 +0200 (Fri, 07 Sep 2007) | 1 line

  Remove unneeded #include.
........
  r58036 | thomas.heller | 2007-09-07 11:33:24 +0200 (Fri, 07 Sep 2007) | 6 lines

  Backport from py3k branch:

  Add a workaround for a strange bug on win64, when _ctypes is compiled
  with the SDK compiler.  This should fix the failing
  Lib\ctypes\test\test_as_parameter.py test.
........
  r58037 | georg.brandl | 2007-09-07 16:14:40 +0200 (Fri, 07 Sep 2007) | 2 lines

  Fix a wrong indentation for sublists.
........
  r58043 | georg.brandl | 2007-09-07 22:10:49 +0200 (Fri, 07 Sep 2007) | 2 lines

  #1095: ln -f doesn't work portably, fix in Makefile.
........
  r58049 | skip.montanaro | 2007-09-08 02:34:17 +0200 (Sat, 08 Sep 2007) | 1 line

  be explicit about the actual location of the missing file
........
2007-09-08 17:39:28 +00:00
Brett Cannon fdc1a567ec Cast away const qualifier to silence a compiler warning about it. 2007-09-05 20:35:46 +00:00
Martin v. Löwis 85bcc66bb4 Convert code from sys.stdin.encoding to UTF-8 in
interactive mode. Fixes #1100.
2007-09-04 09:18:06 +00:00