In very rare circumstances the JUMP opcode could be confused with the
argument of the opcode in the "then" part which doesn't end with the
JUMP opcode. This led to incorrect detection of the final JUMP opcode
and incorrect calculation of the size of the subexpression.
NOTE: Changed return value of functions _validate_inner() and
_validate_charset() in Modules/_sre/sre.c. Now they return 0 on success,
-1 on failure, and 1 if the last op is JUMP (which usually is a failure).
Previously they returned 1 on success and 0 on failure.
Functions re.sub() and re.subn() and corresponding re.Pattern methods
are now 2-3 times faster for replacement strings containing group references.
Closes#91524
Primarily authored by serhiy-storchaka Serhiy Storchaka
Minor-cleanups-by: Gregory P. Smith [Google] <greg@krypto.org>
We only statically initialize for core code and builtin modules. Extension modules still create
the tuple at runtime. We'll solve that part of interpreter isolation separately.
This change includes generated code. The non-generated changes are in:
* Tools/clinic/clinic.py
* Python/getargs.c
* Include/cpython/modsupport.h
* Makefile.pre.in (re-generate global strings after running clinic)
* very minor tweaks to Modules/_codecsmodule.c and Python/Python-tokenize.c
All other changes are generated code (clinic, global strings).
Revert "bpo-23689: re module, fix memory leak when a match is terminated by a signal or memory allocation failure (GH-32283)"
This reverts commit 6e3eee5c11.
Manual fixups to increase the MAGIC number and to handle conflicts with
a couple of changes that landed after that.
Thanks for reviews by Ma Lin and Serhiy Storchaka.
It combines PyImport_ImportModule() and PyObject_GetAttrString()
and saves 4-6 lines of code on every use.
Add also _PyImport_GetModuleAttr() which takes Python strings as arguments.
It was initially added to support atomic groups, but that
support was never fully implemented, and CALL was only left
in the compiler, but not interpreter and parser.
ATOMIC_GROUP is now used to support atomic groups.
Limit the maximum capturing group to 2**30-1 on 64-bit platforms
(it was 2**31-1). No change on 32-bit platforms (2**28-1).
It allows to reduce the size of SRE(match_context):
- On 32 bit platform: 36 bytes, no change. (msvc2022)
- On 64 bit platform: 72 bytes -> 56 bytes. (msvc2022/gcc9.4)
which leads to increasing the depth of backtracking.
* Move the code for generating Modules/_sre/sre_constants.h from
Lib/re/_constants.py into a separate script
Tools/scripts/generate_sre_constants.py.
* Add target `regen-sre` in the makefile.
* Make target `regen-all` depending on `regen-sre`.