Commit Graph

111 Commits

Author SHA1 Message Date
Serhiy Storchaka 1be3260a90
bpo-47152: Convert the re module into a package (GH-32177)
The sre_* modules are now deprecated.
2022-04-02 11:35:13 +03:00
Serhiy Storchaka 345b390ed6
bpo-433030: Add support of atomic grouping in regular expressions (GH-31982)
* Atomic grouping: (?>...).
* Possessive quantifiers: x++, x*+, x?+, x{m,n}+.
  Equivalent to (?>x+), (?>x*), (?>x?), (?>x{m,n}).

Co-authored-by: Jeffrey C. Jacobs <timehorse@users.sourceforge.net>
2022-03-21 18:28:22 +02:00
Serhiy Storchaka 491974735c
Simplify flags checks in sre_compile.py. (GH-9718)
Flags SRE_FLAG_UNICODE and SRE_FLAG_ASCII are mutually exclusive.
2018-10-05 20:53:45 +03:00
Serhiy Storchaka e0c19ddc66
bpo-34681: Rename class Pattern in sre_parse to State. (GH-9310)
Also rename corresponding attributes, parameters and variables.
2018-09-18 09:16:26 +03:00
Serhiy Storchaka 3557b05c5a bpo-31690: Allow the inline flags "a", "L", and "u" to be used as group flags for RE. (#3885) 2017-10-24 23:31:42 +03:00
Serhiy Storchaka 4ab6abfca4 bpo-30299: Display a bytecode when compile a regex in debug mode. (#1491)
`re.compile(..., re.DEBUG)` now displays the compiled bytecode in
human readable form.
2017-05-14 09:05:13 +03:00
Serhiy Storchaka 821a9d146b bpo-30340: Enhanced regular expressions optimization. (#1542)
This increased the performance of matching some patterns up to 25 times.
2017-05-14 08:32:33 +03:00
Serhiy Storchaka 6d336a0279 bpo-30285: Optimize case-insensitive matching and searching (#1482)
of regular expressions.
2017-05-09 23:37:14 +03:00
Serhiy Storchaka 7186cc29be bpo-30277: Replace _sre.getlower() with _sre.ascii_tolower() and _sre.unicode_tolower(). (#1468) 2017-05-05 10:42:46 +03:00
Serhiy Storchaka 898ff03e1e bpo-30215: Make re.compile() locale agnostic. (#1361)
Compiled regular expression objects with the re.LOCALE flag no longer
depend on the locale at compile time.  Only the locale at matching
time affects the result of matching.
2017-05-05 08:53:40 +03:00
Victor Stinner 726a57d45f Issue #28765: _sre.compile() now checks the type of groupindex and indexgroup
groupindex must a dictionary and indexgroup must be a tuple.

Previously, indexgroup was a list. Use a tuple to reduce the memory usage.
2016-11-22 23:04:39 +01:00
Serhiy Storchaka be9a4e5c85 Issue #433028: Added support of modifier spans in regular expressions. 2016-09-10 00:57:55 +03:00
Serhiy Storchaka 66dc4648fc Issue #24426: Fast searching optimization in regular expressions now works
for patterns that starts with capturing groups.  Fast searching optimization
now can't be disabled at compile time.
2015-06-21 14:06:55 +03:00
Serhiy Storchaka 632a77e6a3 Issue #22364: Improved some re error messages using regex for hints. 2015-03-25 21:03:47 +02:00
Serhiy Storchaka 83e802796c Issue #22818: Splitting on a pattern that could match an empty string now
raises a warning.  Patterns that can only match empty strings are now
rejected.
2015-02-03 11:04:19 +02:00
Serhiy Storchaka ab14088141 Minor code clean up and improvements in the re module. 2014-11-11 21:13:28 +02:00
Serhiy Storchaka eb99e51574 Got rid of the array module dependency in the re module.
The re module could be used during building before array is built.
2014-11-10 13:25:14 +02:00
Serhiy Storchaka 19e9158497 Got rid of the array module dependency in the re module.
The re module could be used during building before array is built.
2014-11-10 13:24:47 +02:00
Serhiy Storchaka 5619ab926b Issue #12728: Different Unicode characters having the same uppercase but
different lowercase are now matched in case-insensitive regular expressions.
2014-11-10 12:43:14 +02:00
Serhiy Storchaka 0c938f6d24 Issue #12728: Different Unicode characters having the same uppercase but
different lowercase are now matched in case-insensitive regular expressions.
2014-11-10 12:37:16 +02:00
Serhiy Storchaka 5f33677219 Merge heads 2014-11-10 10:21:03 +02:00
Raymond Hettinger df1b699447 Issue #22823: Use set literals instead of creating a set from a list 2014-11-09 15:56:33 -08:00
Serhiy Storchaka c7f7d3897e Issue #22434: Constants in sre_constants are now named constants (enum-like). 2014-11-09 20:48:36 +02:00
Serhiy Storchaka 4b8f8949b4 Issue #17381: Fixed handling of case-insensitive ranges in regular expressions.
Added new opcode RANGE_IGNORE.
2014-10-31 12:36:56 +02:00
Serhiy Storchaka 9baa5b2de2 Issue #22437: Number of capturing groups in regular expression is no longer
limited by 100.
2014-09-29 22:49:23 +03:00
Serhiy Storchaka b1847e7541 Issue #17381: Fixed handling of case-insensitive ranges in regular expressions. 2014-10-31 12:37:50 +02:00
Victor Stinner 7fa767e517 Issue #20976: pyflakes: Remove unused imports 2014-03-20 09:16:38 +01:00
Serhiy Storchaka 68457be619 Issue #19329: Optimized compiling charsets in regular expressions. 2013-10-27 08:20:29 +02:00
Serhiy Storchaka 1985f7b133 Issue #19405: Fixed outdated comments in the _sre module. 2013-10-27 08:07:46 +02:00
Serhiy Storchaka efa5a39fa5 Issue #19405: Fixed outdated comments in the _sre module. 2013-10-27 08:04:58 +02:00
Antoine Pitrou 79aa68dfc1 Issue #19387: explain and test the sre overlap table 2013-10-25 21:36:10 +02:00
Serhiy Storchaka 8b150ecfc9 Issue #19327: Fixed the working of regular expressions with too big charset. 2013-10-24 22:04:37 +03:00
Serhiy Storchaka be80fc9a84 Issue #19327: Fixed the working of regular expressions with too big charset. 2013-10-24 22:02:58 +03:00
Serhiy Storchaka c8bf95cfc5 Issue #18050: Fixed an incompatibility of the re module with Python 3.3.0
binaries.
2013-09-20 21:24:39 +03:00
Serhiy Storchaka 228c194596 Issue #2537: Remove breaked check which prevented valid regular expressions.
Patch by Meador Inge.

See also issue #18647.
2013-08-19 23:19:49 +03:00
Serhiy Storchaka 98985a1980 Issue #2537: Remove breaked check which prevented valid regular expressions.
Patch by Meador Inge.

See also issue #18647.
2013-08-19 23:18:23 +03:00
Serhiy Storchaka 3efad82b05 Issue #18647: Temporary disable the "nothing to repeat" check to make buildbots happy. 2013-08-03 23:47:48 +03:00
Serhiy Storchaka 5e376a7809 Issue #18647: Temporary disable the "nothing to repeat" check to make buildbots happy. 2013-08-03 23:46:19 +03:00
Brett Cannon cd171c8e92 Issue #18200: Back out usage of ModuleNotFoundError (8d28d44f3a9a) 2013-07-04 17:43:24 -04:00
Brett Cannon 0a140668fa Issue #18200: Update the stdlib (except tests) to use
ModuleNotFoundError.
2013-06-13 20:57:26 -04:00
Victor Stinner 678ad51b38 Issue #17516: remove dead code 2013-03-26 01:14:35 +01:00
Serhiy Storchaka a0eb809995 Issue #13169: The maximal repetition number in a regular expression has been
increased from 65534 to 2147483647 (on 32-bit platform) or 4294967294 (on
64-bit).
2013-02-16 16:54:33 +02:00
Serhiy Storchaka 70ca0210e8 Issue #13169: The maximal repetition number in a regular expression has been
increased from 65534 to 2147483647 (on 32-bit platform) or 4294967294 (on
64-bit).
2013-02-16 16:47:47 +02:00
Ezio Melotti a9860aeb08 #13054: fix usage of sys.maxunicode after PEP-393. 2011-10-04 19:06:00 +03:00
Antoine Pitrou 1ce3eb5c5b Issue #8990: array.fromstring() and array.tostring() get renamed to
frombytes() and tobytes(), respectively, to avoid confusion.  Furthermore,
array.frombytes(), array.extend() as well as the array.array()
constructor now accept bytearray objects.  Patch by Thomas Jollans.
2010-09-01 20:29:34 +00:00
Benjamin Peterson 6c940d6159 Merged revisions 66894 via svnmerge from
svn+ssh://pythondev@svn.python.org/python/trunk

........
  r66894 | benjamin.peterson | 2008-10-14 17:37:18 -0500 (Tue, 14 Oct 2008) | 1 line

  remove set compat cruft
........
2008-10-14 23:07:40 +00:00
Christian Heimes 5e69685999 Merged revisions 62194,62197-62198,62204-62205,62214,62219-62221,62227,62229-62231,62233-62235,62237-62239 via svnmerge from
svn+ssh://pythondev@svn.python.org/python/trunk

........
  r62194 | jeffrey.yasskin | 2008-04-07 01:04:28 +0200 (Mon, 07 Apr 2008) | 7 lines

  Add enough debugging information to diagnose failures where the
  HandlerBException is ignored, and fix one such problem, where it was thrown
  during the __del__ method of the previous Popen object.

  We may want to find a better way of printing verbose information so it's not
  spammy when the test passes.
........
  r62197 | mark.hammond | 2008-04-07 03:53:39 +0200 (Mon, 07 Apr 2008) | 2 lines

  Issue #2513: enable 64bit cross compilation on windows.
........
  r62198 | mark.hammond | 2008-04-07 03:59:40 +0200 (Mon, 07 Apr 2008) | 2 lines

  correct heading underline for new "Cross-compiling on Windows" section
........
  r62204 | gregory.p.smith | 2008-04-07 08:33:21 +0200 (Mon, 07 Apr 2008) | 4 lines

  Use the new PyFile_IncUseCount & PyFile_DecUseCount calls appropriatly
  within the standard library.  These modules use PyFile_AsFile and later
  release the GIL while operating on the previously returned FILE*.
........
  r62205 | mark.summerfield | 2008-04-07 09:39:23 +0200 (Mon, 07 Apr 2008) | 4 lines

  changed "2500 components" to "several thousand" since the number keeps
  growning:-)
........
  r62214 | georg.brandl | 2008-04-07 20:51:59 +0200 (Mon, 07 Apr 2008) | 2 lines

  #2525: update timezone info examples in the docs.
........
  r62219 | andrew.kuchling | 2008-04-08 01:57:07 +0200 (Tue, 08 Apr 2008) | 1 line

  Write PEP 3127 section; add items
........
  r62220 | andrew.kuchling | 2008-04-08 01:57:21 +0200 (Tue, 08 Apr 2008) | 1 line

  Typo fix
........
  r62221 | andrew.kuchling | 2008-04-08 03:33:10 +0200 (Tue, 08 Apr 2008) | 1 line

  Typographical fix: 32bit -> 32-bit, 64bit -> 64-bit
........
  r62227 | andrew.kuchling | 2008-04-08 23:22:53 +0200 (Tue, 08 Apr 2008) | 1 line

  Add items
........
  r62229 | amaury.forgeotdarc | 2008-04-08 23:27:42 +0200 (Tue, 08 Apr 2008) | 7 lines

  Issue2564: Prevent a hang in "import test.autotest", which runs the entire test
  suite as a side-effect of importing the module.

  - in test_capi, a thread tried to import other modules
  - re.compile() imported sre_parse again on every call.
........
  r62230 | amaury.forgeotdarc | 2008-04-08 23:51:57 +0200 (Tue, 08 Apr 2008) | 2 lines

  Prevent an error when inspect.isabstract() is called with something else than a new-style class.
........
  r62231 | amaury.forgeotdarc | 2008-04-09 00:07:05 +0200 (Wed, 09 Apr 2008) | 8 lines

  Issue 2408: remove the _types module
  It was only used as a helper in types.py to access types (GetSetDescriptorType and MemberDescriptorType),
  when they can easily be obtained with python code.
  These expressions even work with Jython.

  I don't know what the future of the types module is; (cf. discussion in http://bugs.python.org/issue1605 )
  at least this change makes it simpler.
........
  r62233 | amaury.forgeotdarc | 2008-04-09 01:10:07 +0200 (Wed, 09 Apr 2008) | 2 lines

  Add a NEWS entry for previous checkin
........
  r62234 | trent.nelson | 2008-04-09 01:47:30 +0200 (Wed, 09 Apr 2008) | 37 lines

  - Issue #2550: The approach used by client/server code for obtaining ports
    to listen on in network-oriented tests has been refined in an effort to
    facilitate running multiple instances of the entire regression test suite
    in parallel without issue.  test_support.bind_port() has been fixed such
    that it will always return a unique port -- which wasn't always the case
    with the previous implementation, especially if socket options had been
    set that affected address reuse (i.e. SO_REUSEADDR, SO_REUSEPORT).  The
    new implementation of bind_port() will actually raise an exception if it
    is passed an AF_INET/SOCK_STREAM socket with either the SO_REUSEADDR or
    SO_REUSEPORT socket option set.  Furthermore, if available, bind_port()
    will set the SO_EXCLUSIVEADDRUSE option on the socket it's been passed.
    This currently only applies to Windows.  This option prevents any other
    sockets from binding to the host/port we've bound to, thus removing the
    possibility of the 'non-deterministic' behaviour, as Microsoft puts it,
    that occurs when a second SOCK_STREAM socket binds and accepts to a
    host/port that's already been bound by another socket.  The optional
    preferred port parameter to bind_port() has been removed.  Under no
    circumstances should tests be hard coding ports!

    test_support.find_unused_port() has also been introduced, which will pass
    a temporary socket object to bind_port() in order to obtain an unused port.
    The temporary socket object is then closed and deleted, and the port is
    returned.  This method should only be used for obtaining an unused port
    in order to pass to an external program (i.e. the -accept [port] argument
    to openssl's s_server mode) or as a parameter to a server-oriented class
    that doesn't give you direct access to the underlying socket used.

    Finally, test_support.HOST has been introduced, which should be used for
    the host argument of any relevant socket calls (i.e. bind and connect).

    The following tests were updated to following the new conventions:
      test_socket, test_smtplib, test_asyncore, test_ssl, test_httplib,
      test_poplib, test_ftplib, test_telnetlib, test_socketserver,
      test_asynchat and test_socket_ssl.

    It is now possible for multiple instances of the regression test suite to
    run in parallel without issue.
........
  r62235 | gregory.p.smith | 2008-04-09 02:25:17 +0200 (Wed, 09 Apr 2008) | 3 lines

  Fix zlib crash from zlib.decompressobj().flush(val) when val was not positive.
  It tried to allocate negative or zero memory.  That fails.
........
  r62237 | trent.nelson | 2008-04-09 02:34:53 +0200 (Wed, 09 Apr 2008) | 1 line

  Fix typo with regards to self.PORT shadowing class variables with the same name.
........
  r62238 | andrew.kuchling | 2008-04-09 03:08:32 +0200 (Wed, 09 Apr 2008) | 1 line

  Add items
........
  r62239 | jerry.seutter | 2008-04-09 07:07:58 +0200 (Wed, 09 Apr 2008) | 1 line

  Changed test so it no longer runs as a side effect of importing.
........
2008-04-09 08:37:03 +00:00
Thomas Wouters 40a088dc27 Fix 're' to work on bytes. It could do with a few more tests, though. 2008-03-18 20:19:54 +00:00
Christian Heimes 072c0f1b7e Merged revisions 59666-59679 via svnmerge from
svn+ssh://pythondev@svn.python.org/python/trunk

........
  r59666 | christian.heimes | 2008-01-02 19:28:32 +0100 (Wed, 02 Jan 2008) | 1 line

  Made vs9to8 Unix compatible
........
  r59669 | guido.van.rossum | 2008-01-02 20:00:46 +0100 (Wed, 02 Jan 2008) | 2 lines

  Patch #1696.  Don't attempt to close None in dry-run mode.
........
  r59671 | jeffrey.yasskin | 2008-01-03 03:21:52 +0100 (Thu, 03 Jan 2008) | 6 lines

  Backport PEP 3141 from the py3k branch to the trunk. This includes r50877 (just
  the complex_pow part), r56649, r56652, r56715, r57296, r57302, r57359, r57361,
  r57372, r57738, r57739, r58017, r58039, r58040, and r59390, and new
  documentation. The only significant difference is that round(x) returns a float
  to preserve backward-compatibility. See http://bugs.python.org/issue1689.
........
  r59672 | christian.heimes | 2008-01-03 16:41:30 +0100 (Thu, 03 Jan 2008) | 1 line

  Issue #1726: Remove Python/atof.c from PCBuild/pythoncore.vcproj
........
  r59675 | guido.van.rossum | 2008-01-03 20:12:44 +0100 (Thu, 03 Jan 2008) | 4 lines

  Issue #1700, reported by Nguyen Quan Son, fix by Fredruk Lundh:
  Regular Expression inline flags not handled correctly for some unicode
  characters.  (Forward port from 2.5.2.)
........
  r59676 | christian.heimes | 2008-01-03 21:23:15 +0100 (Thu, 03 Jan 2008) | 1 line

  Added math.isinf() and math.isnan()
........
  r59677 | christian.heimes | 2008-01-03 22:14:48 +0100 (Thu, 03 Jan 2008) | 1 line

  Some build bots don't compile mathmodule. There is an issue with the long definition of pi and euler
........
  r59678 | christian.heimes | 2008-01-03 23:16:32 +0100 (Thu, 03 Jan 2008) | 2 lines

  Modified PyImport_Import and PyImport_ImportModule to always use absolute imports by calling __import__ with an explicit level of 0
  Added a new API function PyImport_ImportModuleNoBlock. It solves the problem with dead locks when mixing threads and imports
........
  r59679 | christian.heimes | 2008-01-03 23:32:26 +0100 (Thu, 03 Jan 2008) | 1 line

  Added copysign(x, y) function to the math module
........
2008-01-03 23:01:04 +00:00
Guido van Rossum 3172c5d263 Patch# 1258 by Christian Heimes: kill basestring.
I like this because it makes the code shorter! :-)
2007-10-16 18:12:55 +00:00