cpython

Commit Graph

Author	SHA1	Message	Date
Lysandros Nikolaou	8549559f38	gh-120317: Lock around global state in the tokenize module (#120318 ) Co-authored-by: Pablo Galindo <pablogsal@gmail.com>	2024-07-16 11:35:57 +02:00
Lysandros Nikolaou	4b5d3e0e72	gh-120343: Fix column offsets of multiline tokens in tokenize (#120391 )	2024-06-12 20:52:55 +02:00
Lysandros Nikolaou	1b62bcee94	gh-120343: Do not reset byte_col_offset_diff after multiline tokens (#120352 ) Co-authored-by: blurb-it[bot] <43283697+blurb-it[bot]@users.noreply.github.com>	2024-06-11 17:00:53 +00:00
Kirill Podoprigora	c0faade891	gh-119704: Fix reference leak in the ``Python/Python-tokenize.c`` (#119705 )	2024-05-29 07:56:44 +01:00
Lysandros Nikolaou	d87b015106	gh-119118: Fix performance regression in tokenize module (#119615 ) * gh-119118: Fix performance regression in tokenize module - Cache line object to avoid creating a Unicode object for all of the tokens in the same line. - Speed up byte offset to column offset conversion by using the smallest buffer possible to measure the difference. Co-authored-by: Pablo Galindo <pablogsal@gmail.com>	2024-05-28 19:17:49 +00:00
Brett Simmers	c2627d6eea	gh-116322: Add Py_mod_gil module slot (#116882 ) This PR adds the ability to enable the GIL if it was disabled at interpreter startup, and modifies the multi-phase module initialization path to enable the GIL when loading a module, unless that module's spec includes a slot indicating it can run safely without the GIL. PEP 703 called the constant for the slot `Py_mod_gil_not_used`; I went with `Py_MOD_GIL_NOT_USED` for consistency with gh-104148. A warning will be issued up to once per interpreter for the first GIL-using module that is loaded. If `-v` is given, a shorter message will be printed to stderr every time a GIL-using module is loaded (including the first one that issues a warning).	2024-05-03 11:30:55 -04:00
Pablo Galindo Salgado	a135a6d2c6	gh-112943: Correctly compute end offsets for multiline tokens in the tokenize module (#112949 )	2023-12-11 11:44:22 +00:00
Serhiy Storchaka	a12f624a9d	Remove unnecessary includes (GH-111633)	2023-11-02 10:42:58 +02:00
Lysandros Nikolaou	01481f2dc1	gh-104169: Refactor tokenizer into lexer and wrappers (#110684 ) * The lexer, which include the actual lexeme producing logic, goes into the `lexer` directory. * The wrappers, one wrapper per input mode (file, string, utf-8, and readline), go into the `tokenizer` directory and include logic for creating a lexer instance and managing the buffer for different modes. --------- Co-authored-by: Pablo Galindo <pablogsal@gmail.com> Co-authored-by: blurb-it[bot] <43283697+blurb-it[bot]@users.noreply.github.com>	2023-10-11 15:14:44 +00:00
Pablo Galindo Salgado	da8f87b7ea	gh-107015: Remove async_hacks from the tokenizer (#107018 )	2023-07-26 16:34:15 +01:00
Pablo Galindo Salgado	d7f46bcd98	gh-105564: Don't include artificial newlines in the line attribute of tokens (#105565 )	2023-06-09 17:01:26 +01:00
Kirill Podoprigora	264a0110ff	gh-105390: Add explicit type cast (#105466 )	2023-06-07 20:20:43 +00:00
Pablo Galindo Salgado	7279fb6408	gh-105435: Fix spurious NEWLINE token if file ends with comment without a newline (#105442 )	2023-06-07 13:31:48 +01:00
Pablo Galindo Salgado	ffd2654550	gh-105390: Correctly raise TokenError instead of SyntaxError for tokenize errors (#105399 )	2023-06-07 12:04:40 +01:00
Pablo Galindo Salgado	c0a6ed3934	gh-105259: Ensure we don't show newline characters for trailing NEWLINE tokens (#105364 )	2023-06-06 12:52:16 +01:00
Lysandros Nikolaou	70f315c2d6	gh-105042: Disable unmatched parens syntax error in python tokenize (#105061 )	2023-05-30 22:52:52 +01:00
Pablo Galindo Salgado	9216e69a87	gh-105069: Add a readline-like callable to the tokenizer to consume input iteratively (#105070 )	2023-05-30 22:43:34 +01:00
Marta Gómez Macías	96fff35325	gh-105017: Include CRLF lines in strings and column numbers (#105030 ) Co-authored-by: Pablo Galindo <pablogsal@gmail.com>	2023-05-28 15:15:53 +01:00
Pablo Galindo Salgado	46b52e6e2b	gh-104976: Ensure trailing dedent tokens are emitted as the previous tokenizer (#104980 ) Signed-off-by: Pablo Galindo <pablogsal@gmail.com>	2023-05-26 22:02:26 +01:00
Pablo Galindo Salgado	3fdb55c482	gh-104972: Ensure that line attributes in tokens in the tokenize module are correct (#104975 )	2023-05-26 15:46:22 +01:00
Pablo Galindo Salgado	c8cf9b42eb	gh-104825: Remove implicit newline in the line attribute in tokens emitted in the tokenize module (#104846 )	2023-05-24 09:59:18 +00:00
Marta Gómez Macías	729b252241	gh-104741: Add line number attribute to indentation error exception (#104743 )	2023-05-22 11:30:18 +00:00
Marta Gómez Macías	8817886ae5	gh-102856: Tokenize performance improvement (#104731 )	2023-05-22 00:29:04 +00:00
Marta Gómez Macías	6715f91edc	gh-102856: Python tokenizer implementation for PEP 701 (#104323 ) This commit replaces the Python implementation of the tokenize module with an implementation that reuses the real C tokenizer via a private extension module. The tokenize module now implements a compatibility layer that transforms tokens from the C tokenizer into Python tokenize tokens for backward compatibility. As the C tokenizer does not emit some tokens that the Python tokenizer provides (such as comments and non-semantic newlines), a new special mode has been added to the C tokenizer mode that currently is only used via the extension module that exposes it to the Python layer. This new mode forces the C tokenizer to emit these new extra tokens and add the appropriate metadata that is needed to match the old Python implementation. Co-authored-by: Pablo Galindo <pablogsal@gmail.com>	2023-05-21 01:03:02 +01:00
Eric Snow	a9c6e0618f	gh-99113: Add Py_MOD_PER_INTERPRETER_GIL_SUPPORTED (gh-104205) Here we are doing no more than adding the value for Py_mod_multiple_interpreters and using it for stdlib modules. We will start checking for it in gh-104206 (once PyInterpreterState.ceval.own_gil is added in gh-104204).	2023-05-05 21:11:27 +00:00
Pablo Galindo Salgado	1ef61cf71a	gh-102856: Initial implementation of PEP 701 (#102855 ) Co-authored-by: Lysandros Nikolaou <lisandrosnik@gmail.com> Co-authored-by: Batuhan Taskaya <isidentical@gmail.com> Co-authored-by: Marta Gómez Macías <mgmacias@google.com> Co-authored-by: sunmy2019 <59365878+sunmy2019@users.noreply.github.com>	2023-04-19 11:18:16 -05:00
Lysandros Nikolaou	cbf0afd8a1	gh-97973: Return all necessary information from the tokenizer (GH-97984) Right now, the tokenizer only returns type and two pointers to the start and end of the token. This PR modifies the tokenizer to return the type and set all of the necessary information, so that the parser does not have to this.	2022-10-06 16:07:17 -07:00
Eric Snow	6f6a4e6cc5	gh-90928: Statically Initialize the Keywords Tuple in Clinic-Generated Code (gh-95860) We only statically initialize for core code and builtin modules. Extension modules still create the tuple at runtime. We'll solve that part of interpreter isolation separately. This change includes generated code. The non-generated changes are in: * Tools/clinic/clinic.py * Python/getargs.c * Include/cpython/modsupport.h * Makefile.pre.in (re-generate global strings after running clinic) * very minor tweaks to Modules/_codecsmodule.c and Python/Python-tokenize.c All other changes are generated code (clinic, global strings).	2022-08-11 15:25:49 -06:00
Petr Viktorin	204946986f	bpo-46613: Add PyType_GetModuleByDef to the public API (GH-31081) * Make PyType_GetModuleByDef public (remove underscore) Co-authored-by: Victor Stinner <vstinner@python.org>	2022-02-11 17:22:11 +01:00
Victor Stinner	713bb19356	bpo-45434: Mark the PyTokenizer C API as private (GH-28924) Rename PyTokenize functions to mark them as private: * PyTokenizer_FindEncodingFilename() => _PyTokenizer_FindEncodingFilename() * PyTokenizer_FromString() => _PyTokenizer_FromString() * PyTokenizer_FromFile() => _PyTokenizer_FromFile() * PyTokenizer_FromUTF8() => _PyTokenizer_FromUTF8() * PyTokenizer_Free() => _PyTokenizer_Free() * PyTokenizer_Get() => _PyTokenizer_Get() Remove the unused PyTokenizer_FindEncoding() function. import.c: remove unused #include "errcode.h".	2021-10-13 17:22:14 +02:00
Serhiy Storchaka	a5a56154f1	Remove trailing spaces. (GH-28706)	2021-10-03 16:58:14 +03:00
Pablo Galindo Salgado	214c2e5d91	Format the Python-tokenize module and fix exit path (GH-27935)	2021-08-25 14:41:14 +02:00
Pablo Galindo Salgado	a24676bedc	Add tests for the C tokenizer and expose it as a private module (GH-27924)	2021-08-24 17:50:05 +01:00

33 Commits