Commit Graph

316 Commits

Author SHA1 Message Date
Victor Stinner e822e37946
bpo-36020: Remove snprintf macro in pyerrors.h (GH-20889)
On Windows, #include "pyerrors.h" no longer defines "snprintf" and
"vsnprintf" macros.

PyOS_snprintf() and PyOS_vsnprintf() should be used to get portable
behavior.

Replace snprintf() calls with PyOS_snprintf() and replace vsnprintf()
calls with PyOS_vsnprintf().
2020-06-15 21:59:47 +02:00
Lysandros Nikolaou 896f4cf63f
bpo-40847: Consider a line with only a LINECONT a blank line (GH-20769)
A line with only a line continuation character should be considered
a blank line at tokenizer level so that only a single NEWLINE token
gets emitted. The old parser was working around the issue, but the
new parser threw a `SyntaxError` for valid input. For example,
an empty line following a line continuation character was interpreted
as a `SyntaxError`.

Co-authored-by: Pablo Galindo <Pablogsal@gmail.com>
2020-06-11 00:56:08 +01:00
Ammar Askar a2bbedc8b1
Fix peg_generator compiler warnings under MSVC (GH-20405) 2020-05-26 05:33:35 +01:00
Serhiy Storchaka 74ea6b5a75
bpo-40593: Improve syntax errors for invalid characters in source code. (GH-20033) 2020-05-12 12:42:04 +03:00
Lysandros Nikolaou 846d8b28ab
bpo-40246: Revert reporting of invalid string prefixes (GH-19888)
Due to backwards compatibility concerns regarding keywords immediately followed by a string without whitespace between them (like in `bg="#d00" if clear else"#fca"`) will fail to parse,
commit 41d5b94af4 has to be reverted.
2020-05-04 12:32:18 +01:00
Pablo Galindo 11a7f158ef
bpo-40335: Correctly handle multi-line strings in tokenize error scenarios (GH-19619)
Co-authored-by: Guido van Rossum <gvanrossum@gmail.com>
2020-04-21 01:53:04 +01:00
Lysandros Nikolaou 41d5b94af4
bpo-40246: Report a better error message for invalid string prefixes (GH-19476) 2020-04-12 19:21:00 +01:00
Victor Stinner 87d3b9db4a
bpo-39882: Add _Py_FatalErrorFormat() function (GH-19157) 2020-03-25 19:27:36 +01:00
Victor Stinner 9e5d30cc99
bpo-39882: Py_FatalError() logs the function name (GH-18819)
The Py_FatalError() function is replaced with a macro which logs
automatically the name of the current function, unless the
Py_LIMITED_API macro is defined.

Changes:

* Add _Py_FatalErrorFunc() function.
* Remove the function name from the message of Py_FatalError() calls
  which included the function name.
* Update tests.
2020-03-07 00:54:20 +01:00
Andy Lester 384f3c536d
closes bpo-39721: Fix constness of members of tok_state struct. (GH-18600)
The function PyTokenizer_FromUTF8 from Parser/tokenizer.c had a comment:

    /* XXX: constify members. */

This patch addresses that.

In the tok_state struct:
    * end and start were non-const but could be made const
    * str and input were const but should have been non-const

Changes to support this include:
    * decode_str() now returns a char * since it is allocated.
    * PyTokenizer_FromString() and PyTokenizer_FromUTF8() each creates a
        new char * for an allocate string instead of reusing the input
        const char *.
    * PyTokenizer_Get() and tok_get() now take const char ** arguments.
    * Various local vars are const or non-const accordingly.

I was able to remove five casts that cast away constness.
2020-02-27 18:44:52 -08:00
Serhiy Storchaka 0cc6b5e559
bpo-39219: Fix SyntaxError attributes in the tokenizer. (GH-17828)
* Always set the text attribute.
* Correct the offset attribute for non-ascii sources.
2020-02-12 12:17:00 +02:00
Victor Stinner f3e7ea5b8c
bpo-39500: Document PyUnicode_IsIdentifier() function (GH-18397)
PyUnicode_IsIdentifier() does not call Py_FatalError() anymore if the
string is not ready.
2020-02-11 14:29:33 +01:00
Pablo Galindo 5ec91f78d5
bpo-39209: Manage correctly multi-line tokens in interactive mode (GH-17860) 2020-01-06 15:59:09 +00:00
Batuhan Taşkaya 109fc2792a bpo-38673: dont switch to ps2 if the line starts with comment or whitespace (GH-17421)
https://bugs.python.org/issue38673
2019-12-08 20:36:27 -08:00
Hansraj Das 69f37bcb28 Indent code inside if block. (GH-15284)
Without indendation, seems like strcpy line is parallel to `if` condition.
2019-08-15 09:19:07 -07:00
Anthony Sottile 5b94f3578c Fix `SyntaxError` indicator printing too many spaces for multi-line strings (GH-14433) 2019-07-29 14:59:13 +01:00
Michael J. Sullivan d8a82e2897 bpo-36878: Only allow text after `# type: ignore` if first character ASCII (GH-13504)
This disallows things like `# type: ignoreé`, which seems wrong.

Also switch to using Py_ISALNUM for the alnum check, for consistency
with other code (and maybe correctness re: locale issues?).


https://bugs.python.org/issue36878
2019-05-22 13:43:36 -07:00
Michael J. Sullivan 933e1509ec bpo-36878: Track extra text added to 'type: ignore' in the AST (GH-13479)
GH-13238 made extra text after a # type: ignore accepted by the parser.
This finishes the job and actually plumbs the extra text through the
parser and makes it available in the AST.
2019-05-22 15:54:20 +01:00
Anthony Sottile abea73bf4a bpo-2180: Treat line continuation at EOF as a `SyntaxError` (GH-13401)
This makes the parser consistent with the tokenize module (already the case
in `pypy`).

sample
------

```python
x = 5\
```

before
------

```console
$ python3 t.py
$ python3 -mtokenize t.py
t.py:2:0: error: EOF in multi-line statement
```

after
-----

```console
$ ./python t.py
  File "t.py", line 3
    x = 5\

         ^
SyntaxError: unexpected EOF while parsing
$ ./python -m tokenize t.py
t.py:2:0: error: EOF in multi-line statement
```



https://bugs.python.org/issue2180
2019-05-18 11:27:16 -07:00
Michael J. Sullivan d8320ecb86 bpo-36878: Allow extra text after `# type: ignore` comments (GH-13238)
In the parser, when using the type_comments=True option, recognize
a TYPE_IGNORE as anything containing `# type: ignore` followed by
a non-alphanumeric character. This is to allow ignores such as
`# type: ignore[E1000]`.
2019-05-11 19:17:24 +01:00
Pablo Galindo f2cf1e3e28
bpo-36623: Clean parser headers and include files (GH-12253)
After the removal of pgen, multiple header and function prototypes that lack implementation or are unused are still lying around.
2019-04-13 17:05:14 +01:00
Zackery Spytz cda139d1de bpo-36459: Fix a possible double PyMem_FREE() due to tokenizer.c's tok_nextc() (12601)
Remove the PyMem_FREE() call added in cb90c89.  The buffer will be
freed when PyTokenizer_Free() is called on the tokenizer state.
2019-03-28 15:53:00 +02:00
Pablo Galindo cb90c89de1
bpo-36367: Free buffer if realloc fails in tokenize.c (GH-12442) 2019-03-19 17:17:58 +00:00
Guido van Rossum 495da29225 bpo-35975: Support parsing earlier minor versions of Python 3 (GH-12086)
This adds a `feature_version` flag to `ast.parse()` (documented) and `compile()` (hidden) that allow tweaking the parser to support older versions of the grammar. In particular if `feature_version` is 5 or 6, the hacks for the `async` and `await` keyword from PEP 492 are reinstated. (For 7 or higher, these are unconditionally treated as keywords, but they are still special tokens rather than `NAME` tokens that the parser driver recognizes.)



https://bugs.python.org/issue35975
2019-03-07 12:38:08 -08:00
Pablo Galindo 1f24a719e7
bpo-35808: Retire pgen and use pgen2 to generate the parser (GH-11814)
Pgen is the oldest piece of technology in the CPython repository, building it requires various #if[n]def PGEN hacks in other parts of the code and it also depends more and more on CPython internals. This commit removes the old pgen C code and replaces it for a new version implemented in pure Python. This is a modified and adapted version of lib2to3/pgen2 that can generate grammar files compatibles with the current parser.

This commit also eliminates all the #ifdef and code branches related to pgen, simplifying the code and making it more maintainable. The regen-grammar step now uses $(PYTHON_FOR_REGEN) that can be any version of the interpreter, so the new pgen code maintains compatibility with older versions of the interpreter (this also allows regenerating the grammar with the current CI solution that uses Python3.5). The new pgen Python module also makes use of the Grammar/Tokens file that holds the token specification, so is always kept in sync and avoids having to maintain duplicate token definitions.
2019-03-01 15:34:44 -08:00
Guido van Rossum dcfcd146f8 bpo-35766: Merge typed_ast back into CPython (GH-11645) 2019-01-31 12:40:27 +01:00
Anthony Sottile 995d9b9297 bpo-16806: Fix `lineno` and `col_offset` for multi-line string tokens (GH-10021) 2019-01-13 13:05:13 +09:00
Serhiy Storchaka 8ac658114d
bpo-30455: Generate all token related code and docs from Grammar/Tokens. (GH-10370)
"Include/token.h", "Lib/token.py" (containing now some data moved from
"Lib/tokenize.py") and new files "Parser/token.c" (containing the code
moved from "Parser/tokenizer.c") and "Doc/library/token-list.inc" (included
in "Doc/library/token.rst") are now generated from "Grammar/Tokens" by
"Tools/scripts/generate_token.py". The script overwrites files only if
needed and can be used on the read-only sources tree.

"Lib/symbol.py" is now generated by "Tools/scripts/generate_symbol_py.py"
instead of been executable itself.

Added new make targets "regen-token" and "regen-symbol" which are now
dependencies of "regen-all".

The documentation contains now strings for operators and punctuation tokens.
2018-12-22 11:18:40 +02:00
Serhiy Storchaka 94cf308ee2
bpo-33306: Improve SyntaxError messages for unbalanced parentheses. (GH-6516) 2018-12-17 17:34:14 +02:00
Zackery Spytz 4c49da0cb7 bpo-35436: Add missing PyErr_NoMemory() calls and other minor bug fixes. (GH-11015)
Set MemoryError when appropriate, add missing failure checks,
and fix some potential leaks.
2018-12-07 12:11:30 +02:00
Zackery Spytz 5061a74a4c Remove unneeded PyUnicode_READY() in tokenizer.c (GH-9114) 2018-09-10 09:27:31 +03:00
Victor Stinner c884616390
Fix Windows compiler warning in tokenize.c (GH-8359)
Fix the following warning on Windows:

parser\tokenizer.c(1297): warning C4244: 'function': conversion from
'__int64' to 'int', possible loss of data.
2018-07-21 03:36:06 +02:00
Serhiy Storchaka cf7303ed2a
bpo-33305: Improve SyntaxError for invalid numerical literals. (GH-6517) 2018-07-09 15:09:35 +03:00
Victor Stinner f2ddc6ac93
tokenizer: Remove unused tabs options (#4422)
Remove the following fields from tok_state structure which are now
used unused:

* altwarning: "Issue warning if alternate tabs don't match"
* alterror: "Issue error if alternate tabs don't match"
* alttabsize: "Alternate tab spacing"

Replace alttabsize variable with ALTTABSIZE define.
2017-11-17 01:25:47 -08:00
Jelle Zijlstra ac317700ce bpo-30406: Make async and await proper keywords (#1669)
Per PEP 492, 'async' and 'await' should become proper keywords in 3.7.
2017-10-05 23:24:46 -04:00
Albert-Jan Nijburg c9ccacea3f bpo-25324: add missing comma in Parser/tokenizer.c (GH-1910) 2017-06-01 13:51:27 -07:00
Albert-Jan Nijburg fc354f0785 bpo-25324: copy tok_name before changing it (#1608)
* add test to check if were modifying token

* copy list so import tokenize doesnt have side effects on token

* shorten line

* add tokenize tokens to token.h to get them to show up in token

* move ERRORTOKEN back to its previous location, and fix nitpick

* copy comments from token.h automatically

* fix whitespace and make more pythonic

* change to fix comments from @haypo

* update token.rst and Misc/NEWS

* change wording

* some more wording changes
2017-05-31 16:00:21 +02:00
Berker Peksag d2f4404bbb Issue #28489: Merge from 3.6 2017-02-05 04:33:11 +03:00
Berker Peksag 6f80562862 Issue #28489: Fix comment in tokenizer.c
Patch by Ryan Gonzalez.
2017-02-05 04:32:39 +03:00
Victor Stinner a5ed5f000a Use _PyObject_CallNoArg()
Replace:
    PyObject_CallObject(callable, NULL)
with:
    _PyObject_CallNoArg(callable)
2016-12-06 18:45:50 +01:00
Serhiy Storchaka 06515833fe Replaced outdated macros _PyUnicode_AsString and _PyUnicode_AsStringAndSize
with PyUnicode_AsUTF8 and PyUnicode_AsUTF8AndSize.
2016-11-20 09:13:07 +02:00
Benjamin Peterson f5e8e8fc2b merge 3.5 (#24022) 2016-09-18 23:44:02 -07:00
Benjamin Peterson 57bda335e1 merge 3.4 2016-09-18 23:43:18 -07:00
Benjamin Peterson 26d998cfdd properly handle the single null-byte file (closes #24022) 2016-09-18 23:41:11 -07:00
Benjamin Peterson 5a715cfc57 merge 3.5 (#27981) 2016-09-12 22:07:14 -07:00
Benjamin Peterson 35ee948fa5 restructure fp_setreadl so as to avoid refleaks (closes #27981) 2016-09-12 22:06:58 -07:00
Brett Cannon a721abac29 Issue #26331: Implement the parsing part of PEP 515.
Thanks to Georg Brandl for the patch.
2016-09-09 14:57:09 -07:00
Christian Heimes c6cc23d0b9 Skip unused value in tokenizer code
In the case of an escape character, c is never read. tok_next() is
used to advance the pointer.

CID 1225097
2016-09-09 00:09:45 +02:00
Serhiy Storchaka ec39756960 Issue #22570: Renamed Py_SETREF to Py_XSETREF. 2016-04-06 09:50:03 +03:00
Serhiy Storchaka 48842714b9 Issue #22570: Renamed Py_SETREF to Py_XSETREF. 2016-04-06 09:45:48 +03:00
Benjamin Peterson 7285d520e0 remove duplicated check for fractions and complex numbers (closes #26076)
Patch by Oren Milman.
2016-03-24 22:43:23 -07:00
Serhiy Storchaka a051bf3afb Issue #26581: Use the first coding cookie on a line, not the last one. 2016-03-20 23:47:48 +02:00
Serhiy Storchaka e431d3c9aa Issue #26581: Use the first coding cookie on a line, not the last one. 2016-03-20 23:36:29 +02:00
Serhiy Storchaka ef1585eb9a Issue #25923: Added more const qualifiers to signatures of static and private functions. 2015-12-25 20:01:53 +02:00
Serhiy Storchaka f006940351 Issue #20440: Massive replacing unsafe attribute setting code with special
macro Py_SETREF.
2015-12-24 10:39:57 +02:00
Serhiy Storchaka 5a57ade58e Issue #20440: Massive replacing unsafe attribute setting code with special
macro Py_SETREF.
2015-12-24 10:35:59 +02:00
Serhiy Storchaka 0304729ec4 Issue #25388: Fixed tokenizer crash when processing undecodable source code
with a null byte.
2015-11-14 15:12:04 +02:00
Serhiy Storchaka 7e2b870b85 Issue #25388: Fixed tokenizer crash when processing undecodable source code
with a null byte.
2015-11-14 15:11:17 +02:00
Serhiy Storchaka 0d441119f5 Issue #25388: Fixed tokenizer crash when processing undecodable source code
with a null byte.
2015-11-14 15:10:35 +02:00
Eric V. Smith 235a6f0984 Issue #24965: Implement PEP 498 "Literal String Interpolation". Documentation is still needed, I'll open an issue for that. 2015-09-19 14:51:32 -04:00
Eric V. Smith 6408dc82fa Fixed indentation. 2015-09-12 18:53:36 -04:00
Yury Selivanov 96ec934e75 Issue #24619: Simplify async/await tokenization.
This commit simplifies async/await tokenization in tokenizer.c,
tokenize.py & lib2to3/tokenize.py.  Previous solution was to keep
a stack of async-def & def blocks, whereas the new approach is just
to remember position of the outermost async-def block.

This change won't bring any parsing performance improvements, but
it makes the code much easier to read and validate.
2015-07-23 15:01:58 +03:00
Yury Selivanov 8fb307cd65 Issue #24619: New approach for tokenizing async/await.
This commit fixes how one-line async-defs and defs are tracked
by tokenizer.  It allows to correctly parse invalid code such
as:

>>> async def f():
...     def g(): pass
...     async = 10

and valid code such as:

>>> async def f():
...     async def g(): pass
...     await z

As a consequence, is is now possible to have one-line
'async def foo(): await ..' functions:

>>> async def foo(): return await bar()
2015-07-22 13:33:45 +03:00
Yury Selivanov 8085b80c18 Issue 24226: Fix parsing of many sequential one-line 'def' statements. 2015-05-18 12:50:52 -04:00
Yury Selivanov 7544508f02 PEP 0492 -- Coroutines with async and await syntax. Issue #24017. 2015-05-11 22:57:16 -04:00
Benjamin Peterson 273a720f87 merge 3.4 (#24022) 2015-04-21 12:07:06 -04:00
Benjamin Peterson d73aca769f do not call into python api if an exception is set (#24022) 2015-04-21 12:05:19 -04:00
Benjamin Peterson 3e439797ba merge 3.4 (#21642) 2014-06-07 12:39:51 -07:00
Benjamin Peterson c416162302 allow the keyword else immediately after (no space) an integer (closes #21642) 2014-06-07 12:36:39 -07:00
Benjamin Peterson d51374ed78 PEP 465: a dedicated infix operator for matrix multiplication (closes #21176) 2014-04-09 23:55:56 -04:00
Martin v. Löwis 78f1e4c865 Merge with 3.3 2014-02-28 15:43:36 +01:00
Martin v. Löwis 815b41b1cd Issue #20731: Properly position in source code files even if they
are opened in text mode. Patch by Serhiy Storchaka.
2014-02-28 15:27:29 +01:00
Serhiy Storchaka 5940b92909 Do not reset the line number because we already set file position to correct
value.

(fixes error in patch for issue #18960)
2014-01-09 20:13:52 +02:00
Serhiy Storchaka 1064a13bb0 Do not reset the line number because we already set file position to correct
value.

(fixes error in patch for issue #18960)
2014-01-09 20:12:49 +02:00
Serhiy Storchaka 7282ff6d5b Issue #18960: Fix bugs with Python source code encoding in the second line.
* The first line of Python script could be executed twice when the source
encoding (not equal to 'utf-8') was specified on the second line.

* Now the source encoding declaration on the second line isn't effective if
the first line contains anything except a comment.

* As a consequence, 'python -x' works now again with files with the source
encoding declarations specified on the second file, and can be used again
to make Python batch files on Windows.

* The tokenize module now ignore the source encoding declaration on the second
line if the first line contains anything except a comment.

* IDLE now ignores the source encoding declaration on the second line if the
first line contains anything except a comment.

* 2to3 and the findnocoding.py script now ignore the source encoding
declaration on the second line if the first line contains anything except
a comment.
2014-01-09 18:41:59 +02:00
Serhiy Storchaka 768c16ce02 Issue #18960: Fix bugs with Python source code encoding in the second line.
* The first line of Python script could be executed twice when the source
encoding (not equal to 'utf-8') was specified on the second line.

* Now the source encoding declaration on the second line isn't effective if
the first line contains anything except a comment.

* As a consequence, 'python -x' works now again with files with the source
encoding declarations specified on the second file, and can be used again
to make Python batch files on Windows.

* The tokenize module now ignore the source encoding declaration on the second
line if the first line contains anything except a comment.

* IDLE now ignores the source encoding declaration on the second line if the
first line contains anything except a comment.

* 2to3 and the findnocoding.py script now ignore the source encoding
declaration on the second line if the first line contains anything except
a comment.
2014-01-09 18:36:09 +02:00
Serhiy Storchaka c679227e31 Issue #1772673: The type of `char*` arguments now changed to `const char*`. 2013-10-19 21:03:34 +03:00
Victor Stinner daf455554b Issue #18571: Implementation of the PEP 446: file descriptors and file handles
are now created non-inheritable; add functions os.get/set_inheritable(),
os.get/set_handle_inheritable() and socket.socket.get/set_inheritable().
2013-08-28 00:53:59 +02:00
Antoine Pitrou 9ed5f27266 Issue #18722: Remove uses of the "register" keyword in C code. 2013-08-13 20:18:52 +02:00
Benjamin Peterson cb2226cb69 merge 3.3 2013-07-15 20:50:25 -07:00
Benjamin Peterson 265fba40c8 move declaration to top of block 2013-07-15 20:50:22 -07:00
Benjamin Peterson fd9c0203de merge 3.3 (closes #18470) 2013-07-15 20:47:47 -07:00
Benjamin Peterson 2dbfd88245 check the return value of new_string() (closes #18470) 2013-07-15 19:15:34 -07:00
Serhiy Storchaka 9670543a00 Issue #18038: SyntaxError raised during compilation sources with illegal
encoding now always contains an encoding name.
2013-06-09 16:53:55 +03:00
Serhiy Storchaka 3af14aaba5 Issue #18038: SyntaxError raised during compilation sources with illegal
encoding now always contains an encoding name.
2013-06-09 16:51:52 +03:00
Victor Stinner 796977360f Issue #9566: Fix compiler warning on Windows 64-bit 2013-06-05 00:44:00 +02:00
Benjamin Peterson d0845588b8 make _PyParser_TokenNames const 2012-10-24 08:21:52 -07:00
Christian Heimes 0b3847de6d Issue #15096: Drop support for the ur string prefix 2012-06-20 11:17:58 +02:00
Armin Ronacher 6ecf77b3f8 Basic support for PEP 414 without docs or tests. 2012-03-04 12:04:06 +00:00
Antoine Pitrou 3a5d4cb940 Issue #13748: Raw bytes literals can now be written with the `rb` prefix as well as `br`. 2012-01-12 22:46:19 +01:00
Martin v. Löwis bd928fef42 Rename _Py_identifier to _Py_IDENTIFIER. 2011-10-14 10:20:37 +02:00
Martin v. Löwis 1ee1b6fe0d Use identifier API for PyObject_GetAttrString. 2011-10-10 18:11:30 +02:00
Martin v. Löwis afe55bba33 Add API for static strings, primarily good for identifiers.
Thanks to Konrad Schöbel and Jasper Schulz for helping with the mass-editing.
2011-10-09 10:38:36 +02:00
Martin v. Löwis d63a3b8beb Implement PEP 393. 2011-09-28 07:41:54 +02:00
Jesus Cea c1935d2abf Revert bb62908896fe, but keep the test 2011-04-25 04:03:58 +02:00
Jesus Cea 88f7841be7 Correctly merging #9319 into 3.3? 2011-04-25 03:46:43 +02:00
Victor Stinner c68b6aaec8 Issue #9319: Fix a crash on parsing a Python source code without encoding
cookie and not valid in UTF-8: use "<file>" as the filename instead of
reading from NULL.
2011-04-23 00:41:19 +02:00
Victor Stinner fe7c5b5bdf Issue #9319: Include the filename in "Non-UTF8 code ..." syntax error. 2011-04-05 01:48:03 +02:00
Victor Stinner 7f2fee3640 Issue #10785: Store the filename as Unicode in the Python parser. 2011-04-05 00:39:01 +02:00
Victor Stinner 034c7537d8 Issue #10841: don't translate newlines for pgen 2011-01-07 18:56:19 +00:00