Commit Graph

280 Commits

Author SHA1 Message Date
Serhiy Storchaka b748e3b258
Fix improper use of re.escape() in tests. (#4814) 2017-12-12 19:21:50 +02:00
Serhiy Storchaka 70d56fb525
bpo-25054, bpo-1647489: Added support of splitting on zerowidth patterns. (#4471)
Also fixed searching patterns that could match an empty string.
2017-12-04 14:29:05 +02:00
Serhiy Storchaka 05cb728d68
bpo-30349: Raise FutureWarning for nested sets and set operations (#1553)
in regular expressions.
2017-11-16 12:38:26 +02:00
Serhiy Storchaka 3557b05c5a bpo-31690: Allow the inline flags "a", "L", and "u" to be used as group flags for RE. (#3885) 2017-10-24 23:31:42 +03:00
Serhiy Storchaka 0b5e61ddca bpo-30397: Add re.Pattern and re.Match. (#1646) 2017-10-04 20:09:49 +03:00
Serhiy Storchaka 5075416b8f bpo-30978: str.format_map() now passes key lookup exceptions through. (#2790)
Previously any exception was replaced with a KeyError exception.
2017-08-03 11:45:23 +03:00
Roy Williams 171b9a354e bpo-30605: Fix compiling binary regexs with BytesWarnings enabled. (#2016)
Running our unit tests with `-bb` enabled triggered this failure.
2017-06-10 08:01:16 +03:00
Serhiy Storchaka c7ac7280c3 bpo-30375: Correct the stacklevel of regex compiling warnings. (#1595)
Warnings emitted when compile a regular expression now always point
to the line in the user code.  Previously they could point into inners
of the re module if emitted from inside of groups or conditionals.
2017-05-16 15:16:15 +03:00
Serhiy Storchaka 4ab6abfca4 bpo-30299: Display a bytecode when compile a regex in debug mode. (#1491)
`re.compile(..., re.DEBUG)` now displays the compiled bytecode in
human readable form.
2017-05-14 09:05:13 +03:00
Serhiy Storchaka 821a9d146b bpo-30340: Enhanced regular expressions optimization. (#1542)
This increased the performance of matching some patterns up to 25 times.
2017-05-14 08:32:33 +03:00
Serhiy Storchaka 305ccbe27e bpo-30298: Weaken the condition of deprecation warnings for inline modifiers. (#1490)
Now allowed several subsequential inline modifiers at the start of the
pattern (e.g. '(?i)(?s)...').  In verbose mode whitespaces and comments
now are allowed before and between inline modifiers (e.g.
'(?x) (?i) (?s)...').
2017-05-10 06:05:20 +03:00
Serhiy Storchaka 6d336a0279 bpo-30285: Optimize case-insensitive matching and searching (#1482)
of regular expressions.
2017-05-09 23:37:14 +03:00
Serhiy Storchaka 7186cc29be bpo-30277: Replace _sre.getlower() with _sre.ascii_tolower() and _sre.unicode_tolower(). (#1468) 2017-05-05 10:42:46 +03:00
Serhiy Storchaka 898ff03e1e bpo-30215: Make re.compile() locale agnostic. (#1361)
Compiled regular expression objects with the re.LOCALE flag no longer
depend on the locale at compile time.  Only the locale at matching
time affects the result of matching.
2017-05-05 08:53:40 +03:00
Serhiy Storchaka fdbd01151d bpo-10076: Compiled regular expression and match objects now are copyable. (#1000) 2017-04-16 10:16:03 +03:00
Serhiy Storchaka 5908300e4b bpo-29995: re.escape() now escapes only special characters. (#1007) 2017-04-13 21:06:43 +03:00
Victor Stinner d6debb24e0 bpo-29919: Remove unused imports found by pyflakes (#137)
Make also minor PEP8 coding style fixes on modified imports.
2017-03-27 16:05:26 +02:00
Benjamin Peterson 21a74312f2 Revert "bpo-29571: Use correct locale encoding in test_re (#149)" (#554)
This reverts commit ace5c0fdd9.
2017-03-07 22:48:09 -08:00
Benjamin Peterson 1e68716fd5 Revert "make the locale_flag fallback code work again (#375)" (#387)
This reverts commit 43f5df5bfa.
2017-03-01 21:53:00 -08:00
Benjamin Peterson 43f5df5bfa make the locale_flag fallback code work again (#375) 2017-02-28 23:59:12 -08:00
Nick Coghlan ace5c0fdd9 bpo-29571: Use correct locale encoding in test_re (#149)
``local.getlocale(locale.LC_CTYPE)`` and
``locale.getpreferredencoding(False)`` may give different answers
in some cases (such as the ``en_IN`` locale).

``re.LOCALE`` uses the latter, so update the test case to match.
2017-02-18 15:01:22 +05:30
Serhiy Storchaka ef5176769d Issue #29444: Fixed out-of-bounds buffer access in the group() method of
the match object.  Based on patch by WGH.
2017-02-04 22:57:44 +02:00
Serhiy Storchaka 86e42376c2 Issue #29444: Fixed out-of-bounds buffer access in the group() method of
the match object.  Based on patch by WGH.
2017-02-04 22:55:40 +02:00
Serhiy Storchaka 7e10dbbd45 Issue #29444: Fixed out-of-bounds buffer access in the group() method of
the match object.  Based on patch by WGH.
2017-02-04 22:53:57 +02:00
Serhiy Storchaka 70d28a184c Remove unused imports. 2016-12-16 20:00:15 +02:00
Victor Stinner 726a57d45f Issue #28765: _sre.compile() now checks the type of groupindex and indexgroup
groupindex must a dictionary and indexgroup must be a tuple.

Previously, indexgroup was a list. Use a tuple to reduce the memory usage.
2016-11-22 23:04:39 +01:00
Serhiy Storchaka 53c53ea4c5 Issue #27030: Unknown escapes in re.sub() replacement template are allowed
again.  But they still are deprecated and will be disabled in 3.7.
2016-12-06 19:15:29 +02:00
Victor Stinner bcf4dccfa7 Issue #28727: Optimize pattern_richcompare() for a==a
A pattern is equal to itself.
2016-11-22 15:30:38 +01:00
Victor Stinner b44fb128ae Implement rich comparison for _sre.SRE_Pattern
Issue #28727: Regular expression patterns, _sre.SRE_Pattern objects created by
re.compile(), become comparable (only x==y and x!=y operators). This change
should fix the issue #18383: don't duplicate warning filters when the warnings
module is reloaded (thing usually only done in unit tests).
2016-11-21 16:35:08 +01:00
Victor Stinner 8bf43e6d0b Issue #28082: Add basic unit tests on re enums 2016-11-14 12:38:43 +01:00
Serhiy Storchaka 662cef66d7 Issue #25953: re.sub() now raises an error for invalid numerical group
reference in replacement template even if the pattern is not found in
the string.  Error message for invalid group reference now includes the
group index and the position of the reference.
Based on patch by SilentGhost.
2016-10-23 12:11:19 +03:00
Serhiy Storchaka 0eb60a7cb9 Issue #11957: Restored re tests for passing count and maxsplit as positional
arguments.
2016-09-25 20:39:04 +03:00
Serhiy Storchaka b02f8fc3af Issue #11957: Restored re tests for passing count and maxsplit as positional
arguments.
2016-09-25 20:36:23 +03:00
Serhiy Storchaka abf275af58 Issue #22493: Warning message emitted by using inline flags in the middle of
regular expression now contains a (truncated) regex pattern.
Patch by Tim Graham.
2016-09-17 01:29:58 +03:00
Eric V. Smith 605bdae078 Issue 24454: Improve the usability of the re match object named group API 2016-09-11 08:55:43 -04:00
Serhiy Storchaka bd48d27944 Issue #22493: Inline flags now should be used only at the start of the
regular expression.  Deprecation warning is emitted if uses them in the
middle of the regular expression.
2016-09-11 12:50:02 +03:00
Serhiy Storchaka cc66a6528d Backported tests for issue #28070. 2016-09-11 01:39:51 +03:00
Serhiy Storchaka d65cd091e9 Issue #28070: Fixed parsing inline verbose flag in regular expressions. 2016-09-11 01:39:01 +03:00
Serhiy Storchaka be9a4e5c85 Issue #433028: Added support of modifier spans in regular expressions. 2016-09-10 00:57:55 +03:00
R David Murray 44b548dda8 #27364: fix "incorrect" uses of escape character in the stdlib.
And most of the tools.

Patch by Emanual Barry, reviewed by me, Serhiy Storchaka, and
Martin Panter.
2016-09-08 13:59:53 -04:00
Serhiy Storchaka 977b3ac1c1 Issue #27177: Match objects in the re module now support index-like objects
as group indices.  Based on patches by Jeroen Demeyer and Xiang Zhang.
2016-06-18 16:48:07 +03:00
Serhiy Storchaka 9bd85b83f6 Issue #27030: Unknown escapes consisting of ``'\'`` and ASCII letter in
regular expressions now are errors.
2016-06-11 19:15:00 +03:00
Serhiy Storchaka 485407ce1e Issue #24580: Symbolic group references to open group in re patterns now are
explicitly forbidden as well as numeric group references.
2015-07-18 23:27:00 +03:00
Serhiy Storchaka 07360df481 Issue #14260: The groupindex attribute of regular expression pattern object
now is non-modifiable mapping.
2015-03-30 01:01:48 +03:00
Serhiy Storchaka 632a77e6a3 Issue #22364: Improved some re error messages using regex for hints. 2015-03-25 21:03:47 +02:00
Serhiy Storchaka a54aae0683 Issue #23622: Unknown escapes in regular expressions that consist of ``'\'``
and ASCII letter now raise a deprecation warning and will be forbidden in
Python 3.6.
2015-03-24 22:58:14 +02:00
Serhiy Storchaka 4eea62fd2e Issues #814253, #9179: Group references and conditional group references now
work in lookbehind assertions in regular expressions.
2015-02-21 10:07:35 +02:00
Serhiy Storchaka 83e802796c Issue #22818: Splitting on a pattern that could match an empty string now
raises a warning.  Patterns that can only match empty strings are now
rejected.
2015-02-03 11:04:19 +02:00
Serhiy Storchaka 22a309a434 Issue #21032: Deprecated the use of re.LOCALE flag with str patterns or
re.ASCII. It was newer worked.
2014-12-01 11:50:07 +02:00
Serhiy Storchaka fb028336f9 Issue #22838: All test_re tests now work with unittest test discovery. 2014-12-01 11:08:27 +02:00
Serhiy Storchaka 9cba989502 Issue #22838: All test_re tests now work with unittest test discovery. 2014-12-01 11:06:45 +02:00
Benjamin Peterson 16e802f4ae merge 3.4 (#9179) 2014-11-30 11:51:16 -05:00
Benjamin Peterson 66323415c7 backout 9fcf4008b626 (#9179) for further consideration 2014-11-30 11:49:00 -05:00
Serhiy Storchaka ab14088141 Minor code clean up and improvements in the re module. 2014-11-11 21:13:28 +02:00
Serhiy Storchaka b99c132bd9 Fixed AttributeError when the regular expression starts from illegal escape. 2014-11-10 14:38:16 +02:00
Serhiy Storchaka ad446d57a9 Issue #22578: Added attributes to the re.error class. 2014-11-10 13:49:00 +02:00
Serhiy Storchaka 5619ab926b Issue #12728: Different Unicode characters having the same uppercase but
different lowercase are now matched in case-insensitive regular expressions.
2014-11-10 12:43:14 +02:00
Serhiy Storchaka 0c938f6d24 Issue #12728: Different Unicode characters having the same uppercase but
different lowercase are now matched in case-insensitive regular expressions.
2014-11-10 12:37:16 +02:00
Serhiy Storchaka c7f7d3897e Issue #22434: Constants in sre_constants are now named constants (enum-like). 2014-11-09 20:48:36 +02:00
Serhiy Storchaka 6276b32799 Issues #814253, #9179: Group references and conditional group references now
work in lookbehind assertions in regular expressions.
2014-11-07 21:45:17 +02:00
Serhiy Storchaka 84df7fe6a2 Issues #814253, #9179: Group references and conditional group references now
work in lookbehind assertions in regular expressions.
2014-11-07 21:43:57 +02:00
Serhiy Storchaka 4b8f8949b4 Issue #17381: Fixed handling of case-insensitive ranges in regular expressions.
Added new opcode RANGE_IGNORE.
2014-10-31 12:36:56 +02:00
Serhiy Storchaka 7cc0a1f7cb Issue #22410: Module level functions in the re module now cache compiled
locale-dependent regular expressions taking into account the locale.
2014-10-31 00:56:45 +02:00
Serhiy Storchaka 4659cc0756 Issue #22410: Module level functions in the re module now cache compiled
locale-dependent regular expressions taking into account the locale.
2014-10-31 00:53:49 +02:00
Victor Stinner 55e614a2a8 Issue #11957: Explicit parameter name when calling re.split() and re.sub() 2014-10-29 16:58:59 +01:00
Serhiy Storchaka 7438e4b56f Issue 1519638: Now unmatched groups are replaced with empty strings in re.sub()
and re.subn().
2014-10-10 11:06:31 +03:00
Serhiy Storchaka 9baa5b2de2 Issue #22437: Number of capturing groups in regular expression is no longer
limited by 100.
2014-09-29 22:49:23 +03:00
Serhiy Storchaka c563caf3a2 Issue #22362: Forbidden ambiguous octal escapes out of range 0-0o377 in
regular expressions.
2014-09-23 23:22:41 +03:00
Serhiy Storchaka cd9032d45b Fixed bytes literals in tests. 2014-09-23 23:04:21 +03:00
Serhiy Storchaka 44dae8bde3 Issue #22423: Fixed debugging output of the GROUPREF_EXISTS opcode in the re
module.
2014-09-21 22:47:55 +03:00
Serhiy Storchaka b1847e7541 Issue #17381: Fixed handling of case-insensitive ranges in regular expressions. 2014-10-31 12:37:50 +02:00
Serhiy Storchaka b85a97600a Restored re pickling test. 2014-09-15 11:33:19 +03:00
Serhiy Storchaka d9cf65f00e Use more appropriate asserts in re tests. 2014-09-14 16:20:20 +03:00
Serhiy Storchaka a25875cfd0 Fixed re tests incorrectly ported from 2.x to 3.x. 2014-09-14 15:56:27 +03:00
Serhiy Storchaka 429b59ec69 Issue #20998: Fixed re.fullmatch() of repeated single character pattern
with ignore case.  Original patch by Matthew Barnett.
2014-05-14 21:48:17 +03:00
Serhiy Storchaka a537eb45fd Issue #20283: RE pattern methods now accept the string keyword parameters
as documented.  The pattern and source keyword parameters are left as
deprecated aliases.
2014-03-06 11:36:15 +02:00
Serhiy Storchaka ccdf352370 Issue #20283: RE pattern methods now accept the string keyword parameters
as documented.  The pattern and source keyword parameters are left as
deprecated aliases.
2014-03-06 11:28:32 +02:00
Antoine Pitrou c49672f25e Issue #20426: When passing the re.DEBUG flag, re.compile() displays the debug output every time it is called, regardless of the compilation cache. 2014-02-03 21:01:35 +01:00
Antoine Pitrou d2cc743ca4 Issue #20426: When passing the re.DEBUG flag, re.compile() displays the debug output every time it is called, regardless of the compilation cache. 2014-02-03 20:59:59 +01:00
Serhiy Storchaka 32eddc1bbc Issue #16203: Add re.fullmatch() function and regex.fullmatch() method,
which anchor the pattern at both ends of the string to match.

Original patch by Matthew Barnett.
2013-11-23 23:20:30 +02:00
Serhiy Storchaka 5c24d0e504 Issue #13592: Improved the repr for regular expression pattern objects.
Based on patch by Hugo Lopes Tavares.
2013-11-23 22:42:43 +02:00
Serhiy Storchaka 9eabac68a3 Issue #18685: Restore re performance to pre-PEP 393 levels. 2013-10-26 10:45:48 +03:00
Antoine Pitrou 79aa68dfc1 Issue #19387: explain and test the sre overlap table 2013-10-25 21:36:10 +02:00
Serhiy Storchaka 8b150ecfc9 Issue #19327: Fixed the working of regular expressions with too big charset. 2013-10-24 22:04:37 +03:00
Serhiy Storchaka be80fc9a84 Issue #19327: Fixed the working of regular expressions with too big charset. 2013-10-24 22:02:58 +03:00
Serhiy Storchaka 36af10c1f7 Issue #17087: Improved the repr for regular expression match objects. 2013-10-20 13:13:31 +03:00
Serhiy Storchaka 25324971fb Issue #18468: The re.split, re.findall, and re.sub functions and the group()
and groups() methods of match object now always return a string or a bytes
object.
2013-10-16 12:46:28 +03:00
Georg Brandl daa1fa991c Back out accidentally pushed changeset b51218966201. 2013-10-13 09:32:59 +02:00
Georg Brandl 4300019e1a Add re.fullmatch() function and regex.fullmatch() method, which anchor the
pattern at both ends of the string to match.

Patch by Matthew Barnett.
Closes #16203.
2013-10-13 09:18:45 +02:00
Serhiy Storchaka 98985a1980 Issue #2537: Remove breaked check which prevented valid regular expressions.
Patch by Meador Inge.

See also issue #18647.
2013-08-19 23:18:23 +03:00
Serhiy Storchaka 1f35ae0a3c Issue #17998: Fix an internal error in regular expression engine. 2013-08-03 19:18:38 +03:00
R David Murray 26dfaac9ac #17341: Include name in re error message about invalid group name.
Patch by Jason Michalski.
2013-04-14 13:00:54 -04:00
Georg Brandl 1d472b74cb Closes #14462: allow any valid Python identifier in sre group names, as documented. 2013-04-14 11:40:00 +02:00
Ezio Melotti eadece2865 #12749: add a test for non-BMP ranges in character classes. 2013-02-23 08:40:07 +02:00
Serhiy Storchaka b0c75a7dec Issue #9669: Protect re against infinite loops on zero-width matching in
non-greedy repeat.  Patch by Matthew Barnett.
2013-02-16 21:25:05 +02:00
Serhiy Storchaka fa46816915 Issue #9669: Protect re against infinite loops on zero-width matching in
non-greedy repeat.  Patch by Matthew Barnett.
2013-02-16 21:23:53 +02:00
Serhiy Storchaka a0eb809995 Issue #13169: The maximal repetition number in a regular expression has been
increased from 65534 to 2147483647 (on 32-bit platform) or 4294967294 (on
64-bit).
2013-02-16 16:54:33 +02:00
Serhiy Storchaka 70ca0210e8 Issue #13169: The maximal repetition number in a regular expression has been
increased from 65534 to 2147483647 (on 32-bit platform) or 4294967294 (on
64-bit).
2013-02-16 16:47:47 +02:00
Ezio Melotti adfbb8e8ec #13899: merge with 3.2. 2013-01-11 08:43:53 +02:00
Ezio Melotti fe8e6e7414 #13899: \A, \Z, and \B now correctly match the A, Z, and B literals when used inside character classes (e.g. [A]). Patch by Matthew Barnett. 2013-01-11 08:32:01 +02:00