cpython

Commit Graph

Author	SHA1	Message	Date
Michael J. Sullivan	d8a82e2897	bpo-36878: Only allow text after `# type: ignore` if first character ASCII (GH-13504) This disallows things like `# type: ignoreé`, which seems wrong. Also switch to using Py_ISALNUM for the alnum check, for consistency with other code (and maybe correctness re: locale issues?). https://bugs.python.org/issue36878	2019-05-22 13:43:36 -07:00
Michael J. Sullivan	933e1509ec	bpo-36878: Track extra text added to 'type: ignore' in the AST (GH-13479) GH-13238 made extra text after a # type: ignore accepted by the parser. This finishes the job and actually plumbs the extra text through the parser and makes it available in the AST.	2019-05-22 15:54:20 +01:00
Anthony Sottile	abea73bf4a	bpo-2180: Treat line continuation at EOF as a `SyntaxError` (GH-13401) This makes the parser consistent with the tokenize module (already the case in `pypy`). sample ------ ```python x = 5\ ``` before ------ ```console $ python3 t.py $ python3 -mtokenize t.py t.py:2:0: error: EOF in multi-line statement ``` after ----- ```console $ ./python t.py File "t.py", line 3 x = 5\ ^ SyntaxError: unexpected EOF while parsing $ ./python -m tokenize t.py t.py:2:0: error: EOF in multi-line statement ``` https://bugs.python.org/issue2180	2019-05-18 11:27:16 -07:00
Michael J. Sullivan	d8320ecb86	bpo-36878: Allow extra text after `# type: ignore` comments (GH-13238) In the parser, when using the type_comments=True option, recognize a TYPE_IGNORE as anything containing `# type: ignore` followed by a non-alphanumeric character. This is to allow ignores such as `# type: ignore[E1000]`.	2019-05-11 19:17:24 +01:00
Pablo Galindo	f2cf1e3e28	bpo-36623: Clean parser headers and include files (GH-12253) After the removal of pgen, multiple header and function prototypes that lack implementation or are unused are still lying around.	2019-04-13 17:05:14 +01:00
Zackery Spytz	cda139d1de	bpo-36459: Fix a possible double PyMem_FREE() due to tokenizer.c's tok_nextc() (12601) Remove the PyMem_FREE() call added in `cb90c89`. The buffer will be freed when PyTokenizer_Free() is called on the tokenizer state.	2019-03-28 15:53:00 +02:00
Pablo Galindo	cb90c89de1	bpo-36367: Free buffer if realloc fails in tokenize.c (GH-12442)	2019-03-19 17:17:58 +00:00
Guido van Rossum	495da29225	bpo-35975: Support parsing earlier minor versions of Python 3 (GH-12086) This adds a `feature_version` flag to `ast.parse()` (documented) and `compile()` (hidden) that allow tweaking the parser to support older versions of the grammar. In particular if `feature_version` is 5 or 6, the hacks for the `async` and `await` keyword from PEP 492 are reinstated. (For 7 or higher, these are unconditionally treated as keywords, but they are still special tokens rather than `NAME` tokens that the parser driver recognizes.) https://bugs.python.org/issue35975	2019-03-07 12:38:08 -08:00
Pablo Galindo	1f24a719e7	bpo-35808: Retire pgen and use pgen2 to generate the parser (GH-11814) Pgen is the oldest piece of technology in the CPython repository, building it requires various #if[n]def PGEN hacks in other parts of the code and it also depends more and more on CPython internals. This commit removes the old pgen C code and replaces it for a new version implemented in pure Python. This is a modified and adapted version of lib2to3/pgen2 that can generate grammar files compatibles with the current parser. This commit also eliminates all the #ifdef and code branches related to pgen, simplifying the code and making it more maintainable. The regen-grammar step now uses $(PYTHON_FOR_REGEN) that can be any version of the interpreter, so the new pgen code maintains compatibility with older versions of the interpreter (this also allows regenerating the grammar with the current CI solution that uses Python3.5). The new pgen Python module also makes use of the Grammar/Tokens file that holds the token specification, so is always kept in sync and avoids having to maintain duplicate token definitions.	2019-03-01 15:34:44 -08:00
Guido van Rossum	dcfcd146f8	bpo-35766: Merge typed_ast back into CPython (GH-11645)	2019-01-31 12:40:27 +01:00
Anthony Sottile	995d9b9297	bpo-16806: Fix `lineno` and `col_offset` for multi-line string tokens (GH-10021)	2019-01-13 13:05:13 +09:00
Serhiy Storchaka	8ac658114d	bpo-30455: Generate all token related code and docs from Grammar/Tokens. (GH-10370) "Include/token.h", "Lib/token.py" (containing now some data moved from "Lib/tokenize.py") and new files "Parser/token.c" (containing the code moved from "Parser/tokenizer.c") and "Doc/library/token-list.inc" (included in "Doc/library/token.rst") are now generated from "Grammar/Tokens" by "Tools/scripts/generate_token.py". The script overwrites files only if needed and can be used on the read-only sources tree. "Lib/symbol.py" is now generated by "Tools/scripts/generate_symbol_py.py" instead of been executable itself. Added new make targets "regen-token" and "regen-symbol" which are now dependencies of "regen-all". The documentation contains now strings for operators and punctuation tokens.	2018-12-22 11:18:40 +02:00
Serhiy Storchaka	94cf308ee2	bpo-33306: Improve SyntaxError messages for unbalanced parentheses. (GH-6516)	2018-12-17 17:34:14 +02:00
Zackery Spytz	4c49da0cb7	bpo-35436: Add missing PyErr_NoMemory() calls and other minor bug fixes. (GH-11015) Set MemoryError when appropriate, add missing failure checks, and fix some potential leaks.	2018-12-07 12:11:30 +02:00
Zackery Spytz	5061a74a4c	Remove unneeded PyUnicode_READY() in tokenizer.c (GH-9114)	2018-09-10 09:27:31 +03:00
Victor Stinner	c884616390	Fix Windows compiler warning in tokenize.c (GH-8359) Fix the following warning on Windows: parser\tokenizer.c(1297): warning C4244: 'function': conversion from '__int64' to 'int', possible loss of data.	2018-07-21 03:36:06 +02:00
Serhiy Storchaka	cf7303ed2a	bpo-33305: Improve SyntaxError for invalid numerical literals. (GH-6517)	2018-07-09 15:09:35 +03:00
Victor Stinner	f2ddc6ac93	tokenizer: Remove unused tabs options (#4422 ) Remove the following fields from tok_state structure which are now used unused: * altwarning: "Issue warning if alternate tabs don't match" * alterror: "Issue error if alternate tabs don't match" * alttabsize: "Alternate tab spacing" Replace alttabsize variable with ALTTABSIZE define.	2017-11-17 01:25:47 -08:00
Jelle Zijlstra	ac317700ce	bpo-30406: Make async and await proper keywords (#1669 ) Per PEP 492, 'async' and 'await' should become proper keywords in 3.7.	2017-10-05 23:24:46 -04:00
Albert-Jan Nijburg	c9ccacea3f	bpo-25324: add missing comma in Parser/tokenizer.c (GH-1910)	2017-06-01 13:51:27 -07:00
Albert-Jan Nijburg	fc354f0785	bpo-25324: copy tok_name before changing it (#1608 ) * add test to check if were modifying token * copy list so import tokenize doesnt have side effects on token * shorten line * add tokenize tokens to token.h to get them to show up in token * move ERRORTOKEN back to its previous location, and fix nitpick * copy comments from token.h automatically * fix whitespace and make more pythonic * change to fix comments from @haypo * update token.rst and Misc/NEWS * change wording * some more wording changes	2017-05-31 16:00:21 +02:00
Berker Peksag	d2f4404bbb	Issue #28489 : Merge from 3.6	2017-02-05 04:33:11 +03:00
Berker Peksag	6f80562862	Issue #28489 : Fix comment in tokenizer.c Patch by Ryan Gonzalez.	2017-02-05 04:32:39 +03:00
Victor Stinner	a5ed5f000a	Use _PyObject_CallNoArg() Replace: PyObject_CallObject(callable, NULL) with: _PyObject_CallNoArg(callable)	2016-12-06 18:45:50 +01:00
Serhiy Storchaka	06515833fe	Replaced outdated macros _PyUnicode_AsString and _PyUnicode_AsStringAndSize with PyUnicode_AsUTF8 and PyUnicode_AsUTF8AndSize.	2016-11-20 09:13:07 +02:00
Benjamin Peterson	f5e8e8fc2b	merge 3.5 (#24022 )	2016-09-18 23:44:02 -07:00
Benjamin Peterson	57bda335e1	merge 3.4	2016-09-18 23:43:18 -07:00
Benjamin Peterson	26d998cfdd	properly handle the single null-byte file (closes #24022 )	2016-09-18 23:41:11 -07:00
Benjamin Peterson	5a715cfc57	merge 3.5 (#27981 )	2016-09-12 22:07:14 -07:00
Benjamin Peterson	35ee948fa5	restructure fp_setreadl so as to avoid refleaks (closes #27981 )	2016-09-12 22:06:58 -07:00
Brett Cannon	a721abac29	Issue #26331 : Implement the parsing part of PEP 515. Thanks to Georg Brandl for the patch.	2016-09-09 14:57:09 -07:00
Christian Heimes	c6cc23d0b9	Skip unused value in tokenizer code In the case of an escape character, c is never read. tok_next() is used to advance the pointer. CID 1225097	2016-09-09 00:09:45 +02:00
Serhiy Storchaka	ec39756960	Issue #22570 : Renamed Py_SETREF to Py_XSETREF.	2016-04-06 09:50:03 +03:00
Serhiy Storchaka	48842714b9	Issue #22570 : Renamed Py_SETREF to Py_XSETREF.	2016-04-06 09:45:48 +03:00
Benjamin Peterson	7285d520e0	remove duplicated check for fractions and complex numbers (closes #26076 ) Patch by Oren Milman.	2016-03-24 22:43:23 -07:00
Serhiy Storchaka	a051bf3afb	Issue #26581 : Use the first coding cookie on a line, not the last one.	2016-03-20 23:47:48 +02:00
Serhiy Storchaka	e431d3c9aa	Issue #26581 : Use the first coding cookie on a line, not the last one.	2016-03-20 23:36:29 +02:00
Serhiy Storchaka	ef1585eb9a	Issue #25923 : Added more const qualifiers to signatures of static and private functions.	2015-12-25 20:01:53 +02:00
Serhiy Storchaka	f006940351	Issue #20440 : Massive replacing unsafe attribute setting code with special macro Py_SETREF.	2015-12-24 10:39:57 +02:00
Serhiy Storchaka	5a57ade58e	Issue #20440 : Massive replacing unsafe attribute setting code with special macro Py_SETREF.	2015-12-24 10:35:59 +02:00
Serhiy Storchaka	0304729ec4	Issue #25388 : Fixed tokenizer crash when processing undecodable source code with a null byte.	2015-11-14 15:12:04 +02:00
Serhiy Storchaka	7e2b870b85	Issue #25388 : Fixed tokenizer crash when processing undecodable source code with a null byte.	2015-11-14 15:11:17 +02:00
Serhiy Storchaka	0d441119f5	Issue #25388 : Fixed tokenizer crash when processing undecodable source code with a null byte.	2015-11-14 15:10:35 +02:00
Eric V. Smith	235a6f0984	Issue #24965 : Implement PEP 498 "Literal String Interpolation". Documentation is still needed, I'll open an issue for that.	2015-09-19 14:51:32 -04:00
Eric V. Smith	6408dc82fa	Fixed indentation.	2015-09-12 18:53:36 -04:00
Yury Selivanov	96ec934e75	Issue #24619 : Simplify async/await tokenization. This commit simplifies async/await tokenization in tokenizer.c, tokenize.py & lib2to3/tokenize.py. Previous solution was to keep a stack of async-def & def blocks, whereas the new approach is just to remember position of the outermost async-def block. This change won't bring any parsing performance improvements, but it makes the code much easier to read and validate.	2015-07-23 15:01:58 +03:00
Yury Selivanov	8fb307cd65	Issue #24619 : New approach for tokenizing async/await. This commit fixes how one-line async-defs and defs are tracked by tokenizer. It allows to correctly parse invalid code such as: >>> async def f(): ... def g(): pass ... async = 10 and valid code such as: >>> async def f(): ... async def g(): pass ... await z As a consequence, is is now possible to have one-line 'async def foo(): await ..' functions: >>> async def foo(): return await bar()	2015-07-22 13:33:45 +03:00
Yury Selivanov	8085b80c18	Issue 24226: Fix parsing of many sequential one-line 'def' statements.	2015-05-18 12:50:52 -04:00
Yury Selivanov	7544508f02	PEP 0492 -- Coroutines with async and await syntax. Issue #24017 .	2015-05-11 22:57:16 -04:00
Benjamin Peterson	273a720f87	merge 3.4 (#24022 )	2015-04-21 12:07:06 -04:00

1 2 3 4 5

250 Commits