cpython

Commit Graph

Author	SHA1	Message	Date
Eric Snow	91a8e002c2	gh-81057: Move More Globals to _PyRuntimeState (gh-100092) https://github.com/python/cpython/issues/81057	2022-12-07 15:56:31 -07:00
Victor Stinner	4ce2a202c7	gh-99300: Use Py_NewRef() in Parser/ directory (#99330 ) Replace Py_INCREF() with Py_NewRef() in C files of the Parser/ directory and in the PEG generator.	2022-11-10 15:30:05 +01:00
Lysandros Nikolaou	cbf0afd8a1	gh-97973: Return all necessary information from the tokenizer (GH-97984) Right now, the tokenizer only returns type and two pointers to the start and end of the token. This PR modifies the tokenizer to return the type and set all of the necessary information, so that the parser does not have to this.	2022-10-06 16:07:17 -07:00
Gregory P. Smith	511ca94520	gh-95778: CVE-2020-10735: Prevent DoS by very large int() (#96499 ) Integer to and from text conversions via CPython's bignum `int` type is not safe against denial of service attacks due to malicious input. Very large input strings with hundred thousands of digits can consume several CPU seconds. This PR comes fresh from a pile of work done in our private PSRT security response team repo. Signed-off-by: Christian Heimes [Red Hat] <christian@python.org> Tons-of-polishing-up-by: Gregory P. Smith [Google] <greg@krypto.org> Reviews via the private PSRT repo via many others (see the NEWS entry in the PR). <!-- gh-issue-number: gh-95778 --> * Issue: gh-95778 <!-- /gh-issue-number --> I wrote up [a one pager for the release managers](https://docs.google.com/document/d/1KjuF_aXlzPUxTK4BMgezGJ2Pn7uevfX7g0_mvgHlL7Y/edit#). Much of that text wound up in the Issue. Backports PRs already exist. See the issue for links.	2022-09-02 09:35:08 -07:00
Honglin Zhu	b946f529ef	gh-95355: Check tokens[0] after allocating memory (GH-95356) #95355 Automerge-Triggered-By: GH:pablogsal	2022-07-28 03:00:34 -07:00
Serhiy Storchaka	6fd4c8ec77	gh-93741: Add private C API _PyImport_GetModuleAttrString() (GH-93742) It combines PyImport_ImportModule() and PyObject_GetAttrString() and saves 4-6 lines of code on every use. Add also _PyImport_GetModuleAttr() which takes Python strings as arguments.	2022-06-14 07:15:26 +03:00
Victor Stinner	5115a16831	gh-93103: Parser uses PyConfig.parser_debug instead of Py_DebugFlag (#93106 ) * Replace deprecated Py_DebugFlag with PyConfig.parser_debug in the parser. * Add Parser.debug member. * Add tok_state.debug member. * Py_FrozenMain(): Replace Py_VerboseFlag with PyConfig.verbose.	2022-05-24 22:35:08 +02:00
Oleg Iarygin	a52f82baf2	bpo-46920: Remove disabled debug code added decades ago and likely unnecessary (GH-31812)	2022-03-14 17:03:21 +01:00
Pablo Galindo Salgado	e19059ecd8	Don't print rejected tokens when using the debug flags in the parser (GH-31258)	2022-02-10 14:38:27 +00:00
Pablo Galindo Salgado	390459de6d	Allow the parser to avoid nested processing of invalid rules (GH-31252)	2022-02-10 13:12:14 +00:00
Pablo Galindo Salgado	69e10976b2	bpo-46521: Fix codeop to use a new partial-input mode of the parser (GH-31010)	2022-02-08 11:54:37 +00:00
Pablo Galindo Salgado	6fa8b2ceee	bpo-46237: Fix the line number of tokenizer errors inside f-strings (GH-30463)	2022-01-08 00:23:40 +00:00
Pablo Galindo Salgado	dd6c35761a	bpo-46110: Restore commit `e9898bf153` This restores commit `e9898bf153` .	2022-01-03 19:54:06 +00:00
Pablo Galindo Salgado	9d35dedc5e	Revert "bpo-46110: Add a recursion check to avoid stack overflow in the PEG parser (GH-30177)" (GH-30363) This reverts commit `e9898bf153` temporarily as we want to confirm if this commit is the cause of a slowdown at startup time.	2022-01-03 18:29:18 +00:00
Pablo Galindo Salgado	e9898bf153	bpo-46110: Add a recursion check to avoid stack overflow in the PEG parser (GH-30177) Co-authored-by: Batuhan Taskaya <isidentical@gmail.com>	2021-12-20 15:43:26 +00:00
Kumar Aditya	41026c3155	bpo-45855: Replaced deprecated `PyImport_ImportModuleNoBlock` with PyImport_ImportModule (GH-30046)	2021-12-12 10:45:20 +02:00
Weipeng Hong	28179aac79	bpo-42918: Improve build-in function compile() in mode 'single' (GH-29934) Co-authored-by: Alex Waygood <Alex.Waygood@Gmail.com>	2021-12-11 00:44:26 +01:00
Pablo Galindo Salgado	24c10d2943	bpo-45727: Only trigger the 'did you forgot a comma' error suggestion if inside parentheses (GH-29757)	2021-11-24 22:21:23 +00:00
Pablo Galindo Salgado	c9c4444d9f	Refactor parser compilation units into specific components (GH-29676)	2021-11-21 01:08:50 +00:00
Pablo Galindo Salgado	79ff0d1687	bpo-45494: Fix error location in EOF tokenizer errors (GH-29108)	2021-11-20 17:40:59 +00:00
Pablo Galindo Salgado	fdcc46d955	bpo-45848: Allow the parser to get error lines from encoded files (GH-29646)	2021-11-20 15:36:07 +01:00
Pablo Galindo Salgado	546cefcda7	bpo-45727: Make the syntax error for missing comma more consistent (GH-29427)	2021-11-19 23:11:57 +00:00
Pablo Galindo Salgado	da20d7401d	bpo-45822: Respect PEP 263's coding cookies in the parser even if flags are not provided (GH-29582)	2021-11-16 12:30:47 -08:00
Pablo Galindo Salgado	df4ae55e66	bpo-45820: Fix a segfault when the parser fails without reading any input (GH-29580)	2021-11-16 19:51:52 +00:00
Pablo Galindo Salgado	25835c518a	bpo-45738: Fix computation of error location for invalid continuation (GH-29550) characters in the parser	2021-11-14 01:06:41 +00:00
Pablo Galindo Salgado	a106343f63	bpo-45494: Fix parser crash when reporting errors involving invalid continuation characters (GH-28993) There are two errors that this commit fixes: * The parser was not correctly computing the offset and the string source for E_LINECONT errors due to the incorrect usage of strtok(). * The parser was not correctly unwinding the call stack when a tokenizer exception happened in rules involving optionals ('?', [...]) as we always make them return valid results by using the comma operator. We need to check first if we don't have an error before continuing.	2021-10-19 21:24:12 +02:00
Victor Stinner	713bb19356	bpo-45434: Mark the PyTokenizer C API as private (GH-28924) Rename PyTokenize functions to mark them as private: * PyTokenizer_FindEncodingFilename() => _PyTokenizer_FindEncodingFilename() * PyTokenizer_FromString() => _PyTokenizer_FromString() * PyTokenizer_FromFile() => _PyTokenizer_FromFile() * PyTokenizer_FromUTF8() => _PyTokenizer_FromUTF8() * PyTokenizer_Free() => _PyTokenizer_Free() * PyTokenizer_Get() => _PyTokenizer_Get() Remove the unused PyTokenizer_FindEncoding() function. import.c: remove unused #include "errcode.h".	2021-10-13 17:22:14 +02:00
Pablo Galindo Salgado	0219017df7	bpo-45408: Don't override previous tokenizer errors in the second parser pass (GH-28812)	2021-10-07 22:33:05 +01:00
Pablo Galindo Salgado	e5f13ce5b4	bpo-43914: Correctly highlight SyntaxError exceptions for invalid generator expression in function calls (GH-28576)	2021-09-27 14:37:43 +01:00
Pablo Galindo Salgado	953d27261e	Update pegen to use the latest upstream developments (GH-27586)	2021-08-12 17:37:30 +01:00
Pablo Galindo Salgado	6948964ecf	bpo-34013: Generalize the invalid legacy statement error message (GH-27389)	2021-07-27 17:19:22 +01:00
Batuhan Taskaya	fbc349ff79	bpo-43950: Distinguish errors happening on character offset decoding (GH-27217)	2021-07-20 16:42:12 +01:00
Ammar Askar	5644c7b3ff	bpo-43950: Print columns in tracebacks (PEP 657) (GH-26958) The traceback.c and traceback.py mechanisms now utilize the newly added code.co_positions and PyCode_Addr2Location to print carets on the specific expressions involved in a traceback. Co-authored-by: Pablo Galindo <Pablogsal@gmail.com> Co-authored-by: Ammar Askar <ammar@ammaraskar.com> Co-authored-by: Batuhan Taskaya <batuhanosmantaskaya@gmail.com>	2021-07-05 00:14:33 +01:00
Pablo Galindo	0acc258fe6	bpo-44456: Improve the syntax error when mixing keyword and positional patterns (GH-26793)	2021-06-24 16:09:57 +01:00
Pablo Galindo	507ed6fa1d	bpo-44409: Fix error location in tokenizer errors that happen during initialization (GH-26712)	2021-06-14 17:46:11 +01:00
Serhiy Storchaka	be8b631b7a	Add more const modifiers. (GH-26691)	2021-06-12 16:11:59 +03:00
Pablo Galindo	457ce60fc7	bpo-44368: Ensure we don't raise incorrect custom syntax errors with soft keywords (GH-26630)	2021-06-09 22:20:01 +01:00
Pablo Galindo	9fd21f649d	bpo-44349: Fix edge case when displaying text from files with encoding in syntax errors (GH-26611)	2021-06-09 00:54:29 +01:00
Pablo Galindo	bafe0aade5	bpo-44335: Ensure the tokenizer doesn't go into Python with the error set (GH-26608)	2021-06-08 20:02:03 +01:00
Pablo Galindo	d334c73b56	bpo-44335: Fix a regression when identifying invalid characters in syntax errors (GH-26589)	2021-06-08 12:25:22 +01:00
Serhiy Storchaka	39dd141a4b	bpo-44273: Improve syntax error message for assigning to "..." (GH-26477) Use "ellipsis" instead of "Ellipsis" in syntax error messages to eliminate confusion with built-in variable Ellipsis.	2021-06-01 12:07:05 +01:00
Pablo Galindo	bd7476dae3	bpo-44201: Avoid side effects of "invalid_*" rules in the REPL (GH-26298) When the parser does a second pass to check for errors, these rules can have some small side-effects as they may advance the parser more than the point reached in the first pass. This can cause the tokenizer to ask for extra tokens in interactive mode causing the tokenizer to show the prompt instead of failing instantly. To avoid this, add a new mode to the tokenizer that is activated in the second pass and deactivates asking for new tokens when the interactive line is finished. As the parsing should have reached the last line in the first pass, the second pass should not need to ask for more tokens.	2021-05-22 23:05:00 +01:00
Pablo Galindo	c878a97968	bpo-44180: Fix edge cases in invalid assigment rules in the parser (GH-26283) The invalid assignment rules are very delicate since the parser can easily raise an invalid assignment when a keyword argument is provided. As they are very deep into the grammar tree, is very difficult to specify in which contexts these rules can be used and in which don't. For that, we need to use a different version of the rule that doesn't do error checking in those situations where we don't want the rule to raise (keyword arguments and generator expressions). We also need to check if we are in left-recursive rule, as those can try to eagerly advance the parser even if the parse will fail at the end of the expression. Failing to do this allows the parser to start parsing a call as a tuple and incorrectly identify a keyword argument as an invalid assignment, before it realizes that it was not a tuple after all.	2021-05-21 18:34:54 +01:00
Pablo Galindo	b51081c1a8	bpo-44180: Report generic syntax errors in the furthest position reached in the first parser pass (GH-26253)	2021-05-21 16:09:51 +01:00
Pablo Galindo	80b089179f	bpo-44143: Fix crash in the parser when raising tokenizer errors with an exception set (GH-26144)	2021-05-15 17:58:02 +01:00
Pablo Galindo	9142088e74	bpo-43822: Prioritize tokenizer errors over custom syntax errors when raising parser exceptions (GH-25866)	2021-05-04 01:32:46 +01:00
Brandt Bucher	dbe60ee09d	bpo-43892: Validate the first term of complex literal value patterns (GH-25735)	2021-04-29 17:19:28 -07:00
Nick Coghlan	1e7b858575	bpo-43892: Make match patterns explicit in the AST (GH-25585) Co-authored-by: Brandt Bucher <brandtbucher@gmail.com>	2021-04-28 22:58:44 -07:00
Pablo Galindo	a77aac4fca	bpo-43914: Highlight invalid ranges in SyntaxErrors (#25525 ) To improve the user experience understanding what part of the error messages associated with SyntaxErrors is wrong, we can highlight the whole error range and not only place the caret at the first character. In this way: >>> foo(x, z for z in range(10), t, w) File "<stdin>", line 1 foo(x, z for z in range(10), t, w) ^ SyntaxError: Generator expression must be parenthesized becomes >>> foo(x, z for z in range(10), t, w) File "<stdin>", line 1 foo(x, z for z in range(10), t, w) ^^^^^^^^^^^^^^^^^^^^ SyntaxError: Generator expression must be parenthesized	2021-04-23 14:27:05 +01:00
Pablo Galindo	b280248be8	bpo-43822: Improve syntax errors for missing commas (GH-25377)	2021-04-15 21:38:45 +01:00

1 2

91 Commits