Commit Graph

31 Commits

Author SHA1 Message Date
Serhiy Storchaka 8bc76ae45f
gh-111259: Optimize complementary character sets in RE (GH-120742)
Patterns like "[\s\S]" or "\s|\S" which match any character are now compiled
to the same effective code as a dot with the DOTALL modifier ("(?s:.)").
2024-06-20 07:19:32 +00:00
Victor Stinner 6ae254aaa0
gh-120417: Add #noqa to used imports in the stdlib (#120421)
Tools such as ruff can ignore "imported but unused" warnings if a
line ends with "# noqa: F401". It avoids the temptation to remove
an import which is used effectively.
2024-06-13 16:14:50 +02:00
achhina a01022af23
GH-83162: Rename re.error for better clarity. (#101677)
Renamed re.error for clarity, and kept re.error for backward compatibility.
Updated idlelib files at TJR's request.
---------

Co-authored-by: Matthias Bussonnier <mbussonnier@ucmerced.edu>
Co-authored-by: Hugo van Kemenade <hugovk@users.noreply.github.com>
Co-authored-by: Terry Jan Reedy <tjreedy@udel.edu>
2023-12-11 15:45:08 -05:00
Serhiy Storchaka e2b3d831fd
gh-109747: Improve errors for unsupported look-behind patterns (GH-109859)
Now re.error is raised instead of OverflowError or RuntimeError for
too large width of look-behind pattern.

The limit is increased to 2**32-1 (was 2**31-1).
2023-10-14 09:13:02 +03:00
Serhiy Storchaka 882cb79afa
gh-56166: Deprecate passing confusing positional arguments in re functions (#107778)
Deprecate passing optional arguments maxsplit, count and flags in
module-level functions re.split(), re.sub() and re.subn() as positional.
They should only be passed by keyword.
2023-08-16 13:35:35 -07:00
SKO abd9cc52d9
gh-100061: Proper fix of the bug in the matching of possessive quantifiers (GH-102612)
Restore the global Input Stream pointer after trying to match a sub-pattern.

Co-authored-by: Ma Lin <animalize@users.noreply.github.com>
2023-08-16 10:43:45 +03:00
Serhiy Storchaka 7b6e34e5ba
gh-106052: Fix bug in the matching of possessive quantifiers (gh-106515)
It did not work in the case of a subpattern containing backtracking.

Temporary implement possessive quantifiers as equivalent greedy qualifiers
in atomic groups.
2023-08-09 08:47:57 +03:00
Serhiy Storchaka ed64204716
gh-106566: Optimize (?!) in regular expressions (GH-106567) 2023-08-07 18:09:56 +03:00
Serhiy Storchaka 74ec02e949
gh-106510: Fix DEBUG output for atomic group (GH-106511) 2023-07-08 14:31:25 +03:00
Nikita Sobolev 67f69dba0a
gh-105687: Remove deprecated objects from `re` module (#105688) 2023-06-14 12:26:20 +02:00
Serhiy Storchaka 75a6fadf36
gh-91524: Speed up the regular expression substitution (#91525)
Functions re.sub() and re.subn() and corresponding re.Pattern methods
are now 2-3 times faster for replacement strings containing group references.

Closes #91524

Primarily authored by serhiy-storchaka Serhiy Storchaka
Minor-cleanups-by: Gregory P. Smith [Google] <greg@krypto.org>
2022-10-23 15:57:30 -07:00
Serhiy Storchaka c11b667a1d
gh-96346: Use double caching for re._compile() (#96347) 2022-10-07 12:21:42 -07:00
Gregory P. Smith 4beee0c7b0
gh-91404: Revert "bpo-23689: re module, fix memory leak when a match is terminated by a signal or allocation failure (GH-32283) (#93882)
Revert "bpo-23689: re module, fix memory leak when a match is terminated by a signal or memory allocation failure (GH-32283)"

This reverts commit 6e3eee5c11.

Manual fixups to increase the MAGIC number and to handle conflicts with
a couple of changes that landed after that.

Thanks for reviews by Ma Lin and Serhiy Storchaka.
2022-06-17 01:19:44 -07:00
Miro Hrončok 16a7e4a0b7
gh-92728: Restore re.template, but deprecate it (GH-93161)
Revert "bpo-47211: Remove function re.template() and flag re.TEMPLATE (GH-32300)"

This reverts commit b09184bf05.
2022-05-25 09:05:35 +03:00
Serhiy Storchaka a84a56d80f
gh-91760: More strict rules for numerical group references and group names in RE (GH-91792)
Only sequence of ASCII digits is now accepted as a numerical reference.
The group name in bytes patterns and replacement strings can now only
contain ASCII letters and digits and underscore.
2022-05-08 19:19:29 +03:00
Serhiy Storchaka 19dca04121
gh-91760: Deprecate group names and numbers which will be invalid in future (GH-91794)
Only sequence of ASCII digits will be accepted as a numerical reference.
The group name in bytes patterns and replacement strings could only
contain ASCII letters and digits and underscore.
2022-04-30 13:13:46 +03:00
Serhiy Storchaka 6d0d547033
gh-92049: Forbid pickling constants re._constants.SUCCESS etc (GH-92070)
Previously, pickling did not fail, but the result could not be unpickled.
2022-04-30 13:03:23 +03:00
Serhiy Storchaka f703c96cf0
gh-91870: Remove unsupported SRE opcode CALL (GH-91872)
It was initially added to support atomic groups, but that
support was never fully implemented, and CALL was only left
in the compiler, but not interpreter and parser.

ATOMIC_GROUP is now used to support atomic groups.
2022-04-26 21:07:25 +03:00
Serhiy Storchaka 28890427c5
RE: Pre-split the list of opcode names (GH-91859)
1. It makes them interned.
2. It allows to add comments to individual opcodes.
2022-04-23 18:49:23 +03:00
Serhiy Storchaka 130a8c386b
gh-91308: Simplify parsing inline flag "x" (verbose) (GH-91855) 2022-04-23 12:50:42 +03:00
Serhiy Storchaka f912cc0e41
gh-91575: Add a script for generating data for case-insensitive matching in re (GH-91660)
Also test that all extra cases are in BMP.
2022-04-22 21:37:46 +03:00
Serhiy Storchaka 48ec61a89a
gh-91700: Validate the group number in conditional expression in RE (GH-91702)
In expression (?(group)...) an appropriate re.error is now
raised if the group number refers to not defined group.

Previously it raised RuntimeError: invalid SRE code.
2022-04-22 19:53:10 +03:00
Serhiy Storchaka 6ccfa31421
gh-90568: Fix exception type for \N with a named sequence in RE (GH-91665)
re.error is now raised instead of TypeError.
2022-04-22 18:35:28 +03:00
Serhiy Storchaka 1c2fcebf3c
gh-91575: Update case-insensitive matching in re to the latest Unicode version (GH-91580) 2022-04-18 12:26:30 +03:00
Serhiy Storchaka 474fdbe9e4
bpo-47152: Automatically regenerate sre_constants.h (GH-91439)
* Move the code for generating Modules/_sre/sre_constants.h from
  Lib/re/_constants.py into a separate script
  Tools/scripts/generate_sre_constants.py.
* Add target `regen-sre` in the makefile.
* Make target `regen-all` depending on `regen-sre`.
2022-04-12 18:34:06 +03:00
Serhiy Storchaka 50872dbadc
bpo-47227: Suppress expression chaining for more RE parsing errors (GH-32333) 2022-04-06 19:54:44 +03:00
Serhiy Storchaka b09184bf05
bpo-47211: Remove function re.template() and flag re.TEMPLATE (GH-32300)
They were undocumented and never working.
2022-04-06 19:53:50 +03:00
Serhiy Storchaka ff2cf1d7d5
bpo-47152: Remove unused import in re (GH-32298) 2022-04-04 12:00:53 +03:00
Serhiy Storchaka 1578f06c1c
bpo-47152: Move sources of the _sre module into a subdirectory (GH-32290) 2022-04-04 10:53:26 +03:00
Ma Lin 6e3eee5c11
bpo-23689: re module, fix memory leak when a match is terminated by a signal or memory allocation failure (GH-32283) 2022-04-03 19:16:20 +03:00
Serhiy Storchaka 1be3260a90
bpo-47152: Convert the re module into a package (GH-32177)
The sre_* modules are now deprecated.
2022-04-02 11:35:13 +03:00