cpython

Commit Graph

Author	SHA1	Message	Date
Pablo Galindo Salgado	4f006a789a	Ensure the str member of the tokenizer is always initialised (GH-29681)	2021-11-21 02:06:39 +00:00
Pablo Galindo Salgado	81f4e116ef	bpo-45811: Improve error message when source code contains invisible control characters (GH-29654)	2021-11-20 18:28:28 +00:00
Pablo Galindo Salgado	25835c518a	bpo-45738: Fix computation of error location for invalid continuation (GH-29550) characters in the parser	2021-11-14 01:06:41 +00:00
Pablo Galindo Salgado	cdc7a58277	bpo-45562: Ensure all tokenizer debug messages are printed to stderr (GH-29270)	2021-10-28 18:06:15 +01:00
Pablo Galindo Salgado	10bbd41ba8	bpo-45562: Print tokenizer debug messages to stderr (GH-29250)	2021-10-27 14:27:34 -07:00
Nikita Sobolev	4bc5473a42	bpo-45574: fix warning about `print_escape` being unused (GH-29172) It used to be like this: <img width="1232" alt="Снимок экрана 2021-10-22 в 23 07 40" src="https://user-images.githubusercontent.com/4660275/138516608-fef6ec01-a96a-40f4-81ef-52265b0f536b.png"> Quick `grep` tells that it is just used in one place under `Py_DEBUG`: `f6e8b80d20/Parser/tokenizer.c (L1047-L1051)` <img width="752" alt="Снимок экрана 2021-10-22 в 23 08 09" src="https://user-images.githubusercontent.com/4660275/138516684-ea503136-1e92-48a5-95bb-419e190d5866.png"> I am not sure, but it also looks like a private thing, it should not affect other users. Automerge-Triggered-By: GH:pablogsal	2021-10-22 14:57:24 -07:00
Pablo Galindo Salgado	86dfb55d2e	bpo-45562: Only show debug output from the parser in debug builds (GH-29140)	2021-10-22 01:52:24 -07:00
Victor Stinner	713bb19356	bpo-45434: Mark the PyTokenizer C API as private (GH-28924) Rename PyTokenize functions to mark them as private: * PyTokenizer_FindEncodingFilename() => _PyTokenizer_FindEncodingFilename() * PyTokenizer_FromString() => _PyTokenizer_FromString() * PyTokenizer_FromFile() => _PyTokenizer_FromFile() * PyTokenizer_FromUTF8() => _PyTokenizer_FromUTF8() * PyTokenizer_Free() => _PyTokenizer_Free() * PyTokenizer_Get() => _PyTokenizer_Get() Remove the unused PyTokenizer_FindEncoding() function. import.c: remove unused #include "errcode.h".	2021-10-13 17:22:14 +02:00
Victor Stinner	d943d19172	bpo-45439: Move _PyObject_CallNoArgs() to pycore_call.h (GH-28895) * Move _PyObject_CallNoArgs() to pycore_call.h (internal C API). * _ssl, _sqlite and _testcapi extensions now call the public PyObject_CallNoArgs() function, rather than _PyObject_CallNoArgs(). * _lsprof extension is now built with Py_BUILD_CORE_MODULE macro defined to get access to internal _PyObject_CallNoArgs().	2021-10-12 08:38:19 +02:00
Victor Stinner	ce3489cfdb	bpo-45439: Rename _PyObject_CallNoArg() to _PyObject_CallNoArgs() (GH-28891) Fix typo in the private _PyObject_CallNoArg() function name: rename it to _PyObject_CallNoArgs() to be consistent with the public function PyObject_CallNoArgs().	2021-10-12 00:42:23 +02:00
Noah Kantrowitz	be42c06bb0	Update URLs in comments and metadata to use HTTPS (GH-27458)	2021-07-30 15:54:46 +02:00
Pablo Galindo Salgado	f24777c2b3	bpo-44317: Improve tokenizer errors with more informative locations (GH-26555)	2021-07-10 01:29:29 +01:00
Binbin	17b16e13bb	Fix typos in multiple files (GH-26689) Co-authored-by: Terry Jan Reedy <tjreedy@udel.edu>	2021-06-12 22:47:44 -04:00
Pablo Galindo	a342cc5891	bpo-44396: Update multi-line-start location when reallocating tokenizer buffers (GH-26676) Automerge-Triggered-By: GH:pablogsal	2021-06-12 10:53:49 -07:00
Serhiy Storchaka	2ea6d89028	bpo-43833: Emit warnings for numeric literals followed by keyword (GH-25466) Emit a deprecation warning if the numeric literal is immediately followed by one of keywords: and, else, for, if, in, is, or. Raise a syntax error with more informative message if it is immediately followed by other keyword or identifier. Automerge-Triggered-By: GH:pablogsal	2021-06-08 16:31:10 -07:00
Pablo Galindo	bd7476dae3	bpo-44201: Avoid side effects of "invalid_*" rules in the REPL (GH-26298) When the parser does a second pass to check for errors, these rules can have some small side-effects as they may advance the parser more than the point reached in the first pass. This can cause the tokenizer to ask for extra tokens in interactive mode causing the tokenizer to show the prompt instead of failing instantly. To avoid this, add a new mode to the tokenizer that is activated in the second pass and deactivates asking for new tokens when the interactive line is finished. As the parsing should have reached the last line in the first pass, the second pass should not need to ask for more tokens.	2021-05-22 23:05:00 +01:00
Pablo Galindo	92a02c1f7e	Fix tokenizer error when raw decoding null bytes (GH-25080)	2021-03-30 00:24:49 +01:00
Pablo Galindo	261a452a13	bpo-25643: Refactor the C tokenizer into smaller, logical units (GH-25050)	2021-03-28 23:48:05 +01:00
Pablo Galindo	cd8dcbc851	bpo-43410: Fix crash in the parser when producing syntax errors when reading from stdin (GH-24763)	2021-03-14 04:38:40 +01:00
Batuhan Taskaya	a698d52c39	bpo-40176: Improve error messages for unclosed string literals (GH-19346) Automerge-Triggered-By: GH:isidentical	2021-01-20 13:38:47 -08:00
Pablo Galindo	ae7d3cd980	bpo-42864: Fix compiler warning in the tokenizer with the new paren stack for column numbers (GH-24266)	2021-01-20 12:53:52 +00:00
Pablo Galindo	d6d6371447	bpo-42864: Improve error messages regarding unclosed parentheses (GH-24161)	2021-01-19 23:59:33 +00:00
Lysandros Nikolaou	e5fe509054	bpo-42827: Fix crash on SyntaxError in multiline expressions (GH-24140) When trying to extract the error line for the error message there are two distinct cases: 1. The input comes from a file, which means that we can extract the error line by using `PyErr_ProgramTextObject` and which we already do. 2. The input does not come from a file, at which point we need to get the source code from the tokenizer: * If the tokenizer's current line number is the same with the line of the error, we get the line from `tok->buf` and we're ready. * Else, we can extract the error line from the source code in the following two ways: * If the input comes from a string we have all the input in `tok->str` and we can extract the error line from it. * If the input comes from stdin, i.e. the interactive prompt, we do not have access to the previous line. That's why a new field `tok->stdin_content` is added which holds the whole input for the current (multiline) statement or expression. We can then extract the error line from `tok->stdin_content` like we do in the string case above. Co-authored-by: Pablo Galindo <Pablogsal@gmail.com>	2021-01-14 21:36:30 +00:00
Victor Stinner	00d7abd7ef	bpo-42519: Replace PyMem_MALLOC() with PyMem_Malloc() (GH-23586) No longer use deprecated aliases to functions: * Replace PyMem_MALLOC() with PyMem_Malloc() * Replace PyMem_REALLOC() with PyMem_Realloc() * Replace PyMem_FREE() with PyMem_Free() * Replace PyMem_Del() with PyMem_Free() * Replace PyMem_DEL() with PyMem_Free() Modify also the PyMem_DEL() macro to use directly PyMem_Free().	2020-12-01 09:56:42 +01:00
Victor Stinner	e822e37946	bpo-36020: Remove snprintf macro in pyerrors.h (GH-20889) On Windows, #include "pyerrors.h" no longer defines "snprintf" and "vsnprintf" macros. PyOS_snprintf() and PyOS_vsnprintf() should be used to get portable behavior. Replace snprintf() calls with PyOS_snprintf() and replace vsnprintf() calls with PyOS_vsnprintf().	2020-06-15 21:59:47 +02:00
Lysandros Nikolaou	896f4cf63f	bpo-40847: Consider a line with only a LINECONT a blank line (GH-20769) A line with only a line continuation character should be considered a blank line at tokenizer level so that only a single NEWLINE token gets emitted. The old parser was working around the issue, but the new parser threw a `SyntaxError` for valid input. For example, an empty line following a line continuation character was interpreted as a `SyntaxError`. Co-authored-by: Pablo Galindo <Pablogsal@gmail.com>	2020-06-11 00:56:08 +01:00
Ammar Askar	a2bbedc8b1	Fix peg_generator compiler warnings under MSVC (GH-20405)	2020-05-26 05:33:35 +01:00
Serhiy Storchaka	74ea6b5a75	bpo-40593: Improve syntax errors for invalid characters in source code. (GH-20033)	2020-05-12 12:42:04 +03:00
Lysandros Nikolaou	846d8b28ab	bpo-40246: Revert reporting of invalid string prefixes (GH-19888) Due to backwards compatibility concerns regarding keywords immediately followed by a string without whitespace between them (like in `bg="#d00" if clear else"#fca"`) will fail to parse, commit `41d5b94af4` has to be reverted.	2020-05-04 12:32:18 +01:00
Pablo Galindo	11a7f158ef	bpo-40335: Correctly handle multi-line strings in tokenize error scenarios (GH-19619) Co-authored-by: Guido van Rossum <gvanrossum@gmail.com>	2020-04-21 01:53:04 +01:00
Lysandros Nikolaou	41d5b94af4	bpo-40246: Report a better error message for invalid string prefixes (GH-19476)	2020-04-12 19:21:00 +01:00
Victor Stinner	87d3b9db4a	bpo-39882: Add _Py_FatalErrorFormat() function (GH-19157)	2020-03-25 19:27:36 +01:00
Victor Stinner	9e5d30cc99	bpo-39882: Py_FatalError() logs the function name (GH-18819) The Py_FatalError() function is replaced with a macro which logs automatically the name of the current function, unless the Py_LIMITED_API macro is defined. Changes: * Add _Py_FatalErrorFunc() function. * Remove the function name from the message of Py_FatalError() calls which included the function name. * Update tests.	2020-03-07 00:54:20 +01:00
Andy Lester	384f3c536d	closes bpo-39721: Fix constness of members of tok_state struct. (GH-18600) The function PyTokenizer_FromUTF8 from Parser/tokenizer.c had a comment: /* XXX: constify members. / This patch addresses that. In the tok_state struct: end and start were non-const but could be made const * str and input were const but should have been non-const Changes to support this include: * decode_str() now returns a char * since it is allocated. * PyTokenizer_FromString() and PyTokenizer_FromUTF8() each creates a new char * for an allocate string instead of reusing the input const char . PyTokenizer_Get() and tok_get() now take const char ** arguments. * Various local vars are const or non-const accordingly. I was able to remove five casts that cast away constness.	2020-02-27 18:44:52 -08:00
Serhiy Storchaka	0cc6b5e559	bpo-39219: Fix SyntaxError attributes in the tokenizer. (GH-17828) * Always set the text attribute. * Correct the offset attribute for non-ascii sources.	2020-02-12 12:17:00 +02:00
Victor Stinner	f3e7ea5b8c	bpo-39500: Document PyUnicode_IsIdentifier() function (GH-18397) PyUnicode_IsIdentifier() does not call Py_FatalError() anymore if the string is not ready.	2020-02-11 14:29:33 +01:00
Pablo Galindo	5ec91f78d5	bpo-39209: Manage correctly multi-line tokens in interactive mode (GH-17860)	2020-01-06 15:59:09 +00:00
Batuhan Taşkaya	109fc2792a	bpo-38673: dont switch to ps2 if the line starts with comment or whitespace (GH-17421) https://bugs.python.org/issue38673	2019-12-08 20:36:27 -08:00
Hansraj Das	69f37bcb28	Indent code inside if block. (GH-15284) Without indendation, seems like strcpy line is parallel to `if` condition.	2019-08-15 09:19:07 -07:00
Anthony Sottile	5b94f3578c	Fix `SyntaxError` indicator printing too many spaces for multi-line strings (GH-14433)	2019-07-29 14:59:13 +01:00
Michael J. Sullivan	d8a82e2897	bpo-36878: Only allow text after `# type: ignore` if first character ASCII (GH-13504) This disallows things like `# type: ignoreé`, which seems wrong. Also switch to using Py_ISALNUM for the alnum check, for consistency with other code (and maybe correctness re: locale issues?). https://bugs.python.org/issue36878	2019-05-22 13:43:36 -07:00
Michael J. Sullivan	933e1509ec	bpo-36878: Track extra text added to 'type: ignore' in the AST (GH-13479) GH-13238 made extra text after a # type: ignore accepted by the parser. This finishes the job and actually plumbs the extra text through the parser and makes it available in the AST.	2019-05-22 15:54:20 +01:00
Anthony Sottile	abea73bf4a	bpo-2180: Treat line continuation at EOF as a `SyntaxError` (GH-13401) This makes the parser consistent with the tokenize module (already the case in `pypy`). sample ------ ```python x = 5\ ``` before ------ ```console $ python3 t.py $ python3 -mtokenize t.py t.py:2:0: error: EOF in multi-line statement ``` after ----- ```console $ ./python t.py File "t.py", line 3 x = 5\ ^ SyntaxError: unexpected EOF while parsing $ ./python -m tokenize t.py t.py:2:0: error: EOF in multi-line statement ``` https://bugs.python.org/issue2180	2019-05-18 11:27:16 -07:00
Michael J. Sullivan	d8320ecb86	bpo-36878: Allow extra text after `# type: ignore` comments (GH-13238) In the parser, when using the type_comments=True option, recognize a TYPE_IGNORE as anything containing `# type: ignore` followed by a non-alphanumeric character. This is to allow ignores such as `# type: ignore[E1000]`.	2019-05-11 19:17:24 +01:00
Pablo Galindo	f2cf1e3e28	bpo-36623: Clean parser headers and include files (GH-12253) After the removal of pgen, multiple header and function prototypes that lack implementation or are unused are still lying around.	2019-04-13 17:05:14 +01:00
Zackery Spytz	cda139d1de	bpo-36459: Fix a possible double PyMem_FREE() due to tokenizer.c's tok_nextc() (12601) Remove the PyMem_FREE() call added in `cb90c89`. The buffer will be freed when PyTokenizer_Free() is called on the tokenizer state.	2019-03-28 15:53:00 +02:00
Pablo Galindo	cb90c89de1	bpo-36367: Free buffer if realloc fails in tokenize.c (GH-12442)	2019-03-19 17:17:58 +00:00
Guido van Rossum	495da29225	bpo-35975: Support parsing earlier minor versions of Python 3 (GH-12086) This adds a `feature_version` flag to `ast.parse()` (documented) and `compile()` (hidden) that allow tweaking the parser to support older versions of the grammar. In particular if `feature_version` is 5 or 6, the hacks for the `async` and `await` keyword from PEP 492 are reinstated. (For 7 or higher, these are unconditionally treated as keywords, but they are still special tokens rather than `NAME` tokens that the parser driver recognizes.) https://bugs.python.org/issue35975	2019-03-07 12:38:08 -08:00
Pablo Galindo	1f24a719e7	bpo-35808: Retire pgen and use pgen2 to generate the parser (GH-11814) Pgen is the oldest piece of technology in the CPython repository, building it requires various #if[n]def PGEN hacks in other parts of the code and it also depends more and more on CPython internals. This commit removes the old pgen C code and replaces it for a new version implemented in pure Python. This is a modified and adapted version of lib2to3/pgen2 that can generate grammar files compatibles with the current parser. This commit also eliminates all the #ifdef and code branches related to pgen, simplifying the code and making it more maintainable. The regen-grammar step now uses $(PYTHON_FOR_REGEN) that can be any version of the interpreter, so the new pgen code maintains compatibility with older versions of the interpreter (this also allows regenerating the grammar with the current CI solution that uses Python3.5). The new pgen Python module also makes use of the Grammar/Tokens file that holds the token specification, so is always kept in sync and avoids having to maintain duplicate token definitions.	2019-03-01 15:34:44 -08:00
Guido van Rossum	dcfcd146f8	bpo-35766: Merge typed_ast back into CPython (GH-11645)	2019-01-31 12:40:27 +01:00

1 2 3 4 5 ...

290 Commits