Commit Graph

266 Commits

Author SHA1 Message Date
Miro Hrončok fe23c0061d
gh-94675: Add a regression test for rjsmin re slowdown (GH-94685)
Adds a regression test for an re slowdown observed by rjsmin.
Uses multiprocessing to kill the test after SHORT_TIMEOUT.

Co-authored-by: Oleg Iarygin <dralife@yandex.ru>
Co-authored-by: Christian Heimes <christian@python.org>
2022-08-03 16:19:36 -07:00
Gregory P. Smith 4beee0c7b0
gh-91404: Revert "bpo-23689: re module, fix memory leak when a match is terminated by a signal or allocation failure (GH-32283) (#93882)
Revert "bpo-23689: re module, fix memory leak when a match is terminated by a signal or memory allocation failure (GH-32283)"

This reverts commit 6e3eee5c11.

Manual fixups to increase the MAGIC number and to handle conflicts with
a couple of changes that landed after that.

Thanks for reviews by Ma Lin and Serhiy Storchaka.
2022-06-17 01:19:44 -07:00
Miro Hrončok 16a7e4a0b7
gh-92728: Restore re.template, but deprecate it (GH-93161)
Revert "bpo-47211: Remove function re.template() and flag re.TEMPLATE (GH-32300)"

This reverts commit b09184bf05.
2022-05-25 09:05:35 +03:00
Christian Heimes 9b50585e02
gh-90473: Skip tests that don't apply to Emscripten and WASI (GH-92846) 2022-05-16 16:02:37 +02:00
Serhiy Storchaka a84a56d80f
gh-91760: More strict rules for numerical group references and group names in RE (GH-91792)
Only sequence of ASCII digits is now accepted as a numerical reference.
The group name in bytes patterns and replacement strings can now only
contain ASCII letters and digits and underscore.
2022-05-08 19:19:29 +03:00
Serhiy Storchaka 19dca04121
gh-91760: Deprecate group names and numbers which will be invalid in future (GH-91794)
Only sequence of ASCII digits will be accepted as a numerical reference.
The group name in bytes patterns and replacement strings could only
contain ASCII letters and digits and underscore.
2022-04-30 13:13:46 +03:00
Serhiy Storchaka 090721721b
Simplify testing the warning filename (GH-91868)
The context manager result has the "filename" attribute.
2022-04-24 10:23:59 +03:00
Serhiy Storchaka 6b45076bd6
RE: Add more tests for inline flag "x" and re.VERBOSE (GH-91854) 2022-04-23 12:49:06 +03:00
Serhiy Storchaka 48ec61a89a
gh-91700: Validate the group number in conditional expression in RE (GH-91702)
In expression (?(group)...) an appropriate re.error is now
raised if the group number refers to not defined group.

Previously it raised RuntimeError: invalid SRE code.
2022-04-22 19:53:10 +03:00
Serhiy Storchaka 6ccfa31421
gh-90568: Fix exception type for \N with a named sequence in RE (GH-91665)
re.error is now raised instead of TypeError.
2022-04-22 18:35:28 +03:00
Ma Lin e4e8895ae3
gh-91616: re module, fix .fullmatch() mismatch when using Atomic Grouping or Possessive Quantifiers (GH-91681)
These jumps should use DO_JUMP0() instead of DO_JUMP():
- JUMP_POSS_REPEAT_1
- JUMP_POSS_REPEAT_2
- JUMP_ATOMIC_GROUP
2022-04-19 17:49:36 +03:00
Serhiy Storchaka 74070085da
Add more tests for group names and refs in RE (GH-91695) 2022-04-19 16:56:51 +03:00
Serhiy Storchaka 1c2fcebf3c
gh-91575: Update case-insensitive matching in re to the latest Unicode version (GH-91580) 2022-04-18 12:26:30 +03:00
Serhiy Storchaka b09184bf05
bpo-47211: Remove function re.template() and flag re.TEMPLATE (GH-32300)
They were undocumented and never working.
2022-04-06 19:53:50 +03:00
Ma Lin 6e3eee5c11
bpo-23689: re module, fix memory leak when a match is terminated by a signal or memory allocation failure (GH-32283) 2022-04-03 19:16:20 +03:00
Serhiy Storchaka 1be3260a90
bpo-47152: Convert the re module into a package (GH-32177)
The sre_* modules are now deprecated.
2022-04-02 11:35:13 +03:00
Ma Lin 356997cccc
bpo-35859: Fix a few long-standing bugs in re engine (GH-12427)
In rare cases, capturing group could get wrong result.

Regular expression engines in Perl and Java have similar bugs.
The new behavior now matches the behavior of more modern
RE engines: in the regex module and in PHP, Ruby and Node.js.
2022-03-29 17:31:01 +03:00
Serhiy Storchaka 492d4109f4
bpo-42885: Optimize search for regular expressions starting with "\A" or "^" (GH-32021)
Affected functions are re.search(), re.split(), re.findall(), re.finditer()
and re.sub().
2022-03-22 17:27:55 +02:00
Serhiy Storchaka c6cd3cc93c
bpo-47081: Replace "qualifiers" with "quantifiers" in the re module documentation (GH-32028)
It is a more commonly used term.
2022-03-22 11:44:47 +02:00
Serhiy Storchaka 345b390ed6
bpo-433030: Add support of atomic grouping in regular expressions (GH-31982)
* Atomic grouping: (?>...).
* Possessive quantifiers: x++, x*+, x?+, x{m,n}+.
  Equivalent to (?>x+), (?>x*), (?>x?), (?>x{m,n}).

Co-authored-by: Jeffrey C. Jacobs <timehorse@users.sourceforge.net>
2022-03-21 18:28:22 +02:00
Serhiy Storchaka 92a6abf72e
bpo-47066: Convert a warning about flags not at the start of the regular expression into error (GH-31994) 2022-03-19 16:10:44 +02:00
Serhiy Storchaka 4142961b9f
bpo-39394: Improve warning message in the re module (GH-31988)
A warning about inline flags not at the start of the regular
expression now contains the position of the flag.
2022-03-19 14:13:31 +02:00
Christian Heimes ef1327e3b6
bpo-40280: Skip more tests on Emscripten (GH-31947)
- lchmod, lchown are not fully implemented
- skip umask tests
- cannot fstat unlinked or renamed files yet
- ignore musl libc issues that affect Emscripten
2022-03-17 12:09:57 +01:00
Erlend Egeberg Aasland fbff5387c3
bpo-43988: Use check disallow instantiation helper (GH-26392) 2021-05-27 08:43:52 +02:00
Zackery Spytz 6cc8ac9499
bpo-40736: Improve the error message for re.search() TypeError (GH-23312)
Include the invalid type in the error message.
2021-05-21 22:02:42 +01:00
Erlend Egeberg Aasland 9746cda705
bpo-43916: Apply Py_TPFLAGS_DISALLOW_INSTANTIATION to selected types (GH-25748)
Apply Py_TPFLAGS_DISALLOW_INSTANTIATION to the following types:

* _dbm.dbm
* _gdbm.gdbm
* _multibytecodec.MultibyteCodec
* _sre..SRE_Scanner
* _thread._localdummy
* _thread.lock
* _winapi.Overlapped
* array.arrayiterator
* functools.KeyWrapper
* functools._lru_list_elem
* pyexpat.xmlparser
* re.Match
* re.Pattern
* unicodedata.UCD
* zlib.Compress
* zlib.Decompress
2021-04-30 16:04:57 +02:00
Erlend Egeberg Aasland 5daf70b22e
bpo-43908: Make re types immutable (GH-25697)
Co-authored-by: Victor Stinner <vstinner@python.org>
2021-04-29 08:47:11 +02:00
Ethan Furman 7aaeb2a3d6
bpo-38250: [Enum] single-bit flags are canonical (GH-24215)
Flag members are now divided by one-bit verses multi-bit, with multi-bit being treated as aliases. Iterating over a flag only returns the contained single-bit flags.

Iterating, repr(), and str() show members in definition order.

When constructing combined-member flags, any extra integer values are either discarded (CONFORM), turned into ints (EJECT) or treated as errors (STRICT). Flag classes can specify which of those three behaviors is desired:

>>> class Test(Flag, boundary=CONFORM):
...     ONE = 1
...     TWO = 2
...
>>> Test(5)
<Test.ONE: 1>

Besides the three above behaviors, there is also KEEP, which should not be used unless necessary -- for example, _convert_ specifies KEEP as there are flag sets in the stdlib that are incomplete and/or inconsistent (e.g. ssl.Options). KEEP will, as the name suggests, keep all bits; however, iterating over a flag with extra bits will only return the canonical flags contained, not the extra bits.

Iteration is now in member definition order.  If member definition order
matches increasing value order, then a more efficient method of flag
decomposition is used; otherwise, sort() is called on the results of
that method to get definition order.


``re`` module:

repr() has been modified to support as closely as possible its previous
output; the big difference is that inverted flags cannot be output as
before because the inversion operation now always returns the comparable
positive result; i.e.

   re.A|re.I|re.M|re.S is ~(re.L|re.U|re.S|re.T|re.DEBUG)

in both of the above terms, the ``value`` is 282.

re's tests have been updated to reflect the modifications to repr().
2021-01-25 14:26:19 -08:00
Erlend Egeberg Aasland a6109ef68d
bpo-1635741: Convert _sre types to heap types and establish module state (PEP 384) (GH-23393) 2020-11-20 21:36:23 +09:00
Victor Stinner 57572b103e
bpo-40443: Remove unused imports in tests (GH-19805) 2020-04-30 01:48:37 +02:00
Serhiy Storchaka 14a0e16c88
bpo-36548: Improve the repr of re flags. (GH-12715) 2019-05-31 10:39:47 +03:00
Max Bernstein ccb7ca728e bpo-36929: Modify io/re tests to allow for missing mod name (#13392)
* bpo-36929: Modify io/re tests to allow for missing mod name

For a vanishingly small number of internal types, CPython sets the
tp_name slot to mod_name.type_name, either in the PyTypeObject or the
PyType_Spec. There are a few minor places where this surfaces:

* Custom repr functions for those types (some of which ignore the
  tp_name in favor of using a string literal, such as _io.TextIOWrapper)
* Pickling error messages

The test suite only tests the former. This commit modifies the test
suite to allow Python implementations to omit the module prefix.

https://bugs.python.org/issue36929
2019-05-21 10:09:21 -07:00
Victor Stinner ab71f8b793
bpo-29571: Fix test_re.test_locale_flag() (GH-12099)
Use locale.getpreferredencoding() rather than locale.getlocale() to
get the locale encoding. With some locales, locale.getlocale()
returns the wrong encoding.

For example, on Fedora 29, locale.getlocale() returns ISO-8859-1
encoding for the "en_IN" locale, whereas
locale.getpreferredencoding() reports the correct encoding: UTF-8.
2019-03-01 00:08:03 +01:00
animalize 4a7f44a2ed bpo-34294: re module, fix wrong capturing groups in rare cases. (GH-11546)
Need to reset capturing groups between two SRE(match) callings in loops, this fixes wrong capturing groups in rare cases.

Also add a missing index in re.rst.
2019-02-18 15:26:37 +02:00
Serhiy Storchaka a445feb729
bpo-30688: Support \N{name} escapes in re patterns. (GH-5588)
Co-authored-by: Jonathan Eunice <jonathan.eunice@gmail.com>
2018-02-10 00:08:17 +02:00
Serhiy Storchaka fbb490fd2f
bpo-32308: Replace empty matches adjacent to a previous non-empty match in re.sub(). (#4846) 2018-01-04 11:06:13 +02:00
Serhiy Storchaka b748e3b258
Fix improper use of re.escape() in tests. (#4814) 2017-12-12 19:21:50 +02:00
Serhiy Storchaka 70d56fb525
bpo-25054, bpo-1647489: Added support of splitting on zerowidth patterns. (#4471)
Also fixed searching patterns that could match an empty string.
2017-12-04 14:29:05 +02:00
Serhiy Storchaka 05cb728d68
bpo-30349: Raise FutureWarning for nested sets and set operations (#1553)
in regular expressions.
2017-11-16 12:38:26 +02:00
Serhiy Storchaka 3557b05c5a bpo-31690: Allow the inline flags "a", "L", and "u" to be used as group flags for RE. (#3885) 2017-10-24 23:31:42 +03:00
Serhiy Storchaka 0b5e61ddca bpo-30397: Add re.Pattern and re.Match. (#1646) 2017-10-04 20:09:49 +03:00
Serhiy Storchaka 5075416b8f bpo-30978: str.format_map() now passes key lookup exceptions through. (#2790)
Previously any exception was replaced with a KeyError exception.
2017-08-03 11:45:23 +03:00
Roy Williams 171b9a354e bpo-30605: Fix compiling binary regexs with BytesWarnings enabled. (#2016)
Running our unit tests with `-bb` enabled triggered this failure.
2017-06-10 08:01:16 +03:00
Serhiy Storchaka c7ac7280c3 bpo-30375: Correct the stacklevel of regex compiling warnings. (#1595)
Warnings emitted when compile a regular expression now always point
to the line in the user code.  Previously they could point into inners
of the re module if emitted from inside of groups or conditionals.
2017-05-16 15:16:15 +03:00
Serhiy Storchaka 4ab6abfca4 bpo-30299: Display a bytecode when compile a regex in debug mode. (#1491)
`re.compile(..., re.DEBUG)` now displays the compiled bytecode in
human readable form.
2017-05-14 09:05:13 +03:00
Serhiy Storchaka 821a9d146b bpo-30340: Enhanced regular expressions optimization. (#1542)
This increased the performance of matching some patterns up to 25 times.
2017-05-14 08:32:33 +03:00
Serhiy Storchaka 305ccbe27e bpo-30298: Weaken the condition of deprecation warnings for inline modifiers. (#1490)
Now allowed several subsequential inline modifiers at the start of the
pattern (e.g. '(?i)(?s)...').  In verbose mode whitespaces and comments
now are allowed before and between inline modifiers (e.g.
'(?x) (?i) (?s)...').
2017-05-10 06:05:20 +03:00
Serhiy Storchaka 6d336a0279 bpo-30285: Optimize case-insensitive matching and searching (#1482)
of regular expressions.
2017-05-09 23:37:14 +03:00
Serhiy Storchaka 7186cc29be bpo-30277: Replace _sre.getlower() with _sre.ascii_tolower() and _sre.unicode_tolower(). (#1468) 2017-05-05 10:42:46 +03:00
Serhiy Storchaka 898ff03e1e bpo-30215: Make re.compile() locale agnostic. (#1361)
Compiled regular expression objects with the re.LOCALE flag no longer
depend on the locale at compile time.  Only the locale at matching
time affects the result of matching.
2017-05-05 08:53:40 +03:00