Commit Graph

68 Commits

Author SHA1 Message Date
Serhiy Storchaka e927757df6 Issue #12728: Different Unicode characters having the same uppercase but
different lowercase are now matched in case-insensitive regular expressions.
2014-11-10 12:37:02 +02:00
Serhiy Storchaka e9e54ae222 Issue #17381: Fixed ranges handling in case-insensitive regular expressions. 2014-10-31 13:53:21 +02:00
Serhiy Storchaka c04fcd40bd Backported the optimization of compiling charsets in regular expressions
(issue #19329).  This is needed to apply the patch from issue #17381.
2014-10-31 13:34:06 +02:00
Serhiy Storchaka fdb73ed486 Issue #19405: Fixed outdated comments in the _sre module. 2013-10-27 08:00:57 +02:00
Serhiy Storchaka 22fb0dec30 Issue #19327: Fixed the working of regular expressions with too big charset. 2013-10-24 22:02:42 +03:00
Serhiy Storchaka 60bf0e4daa Issue #18050: Fixed an incompatibility of the re module with Python 2.7.3
and older binaries.
2013-09-20 21:25:53 +03:00
Serhiy Storchaka 83737c632c Issue #2537: Remove breaked check which prevented valid regular expressions.
Patch by Meador Inge.

See also issue #18647.
2013-08-19 23:20:07 +03:00
Serhiy Storchaka 06fbac5ea0 Issue #18647: Temporary disable the "nothing to repeat" check to make buildbots happy. 2013-08-04 13:22:30 +03:00
Serhiy Storchaka e18e05cce9 Issue #13169: The maximal repetition number in a regular expression has been
increased from 65534 to 2147483647 (on 32-bit platform) or 4294967294 (on
64-bit).
2013-02-16 16:47:15 +02:00
Benjamin Peterson 32e7aa6ab9 remove set compat cruft 2008-10-14 22:37:18 +00:00
Amaury Forgeot d'Arc 4b798bdf8a Issue2564: Prevent a hang in "import test.autotest", which runs the entire test
suite as a side-effect of importing the module.

- in test_capi, a thread tried to import other modules
- re.compile() imported sre_parse again on every call.
2008-04-08 21:27:42 +00:00
Guido van Rossum ae04c3356e Issue #1700, reported by Nguyen Quan Son, fix by Fredruk Lundh:
Regular Expression inline flags not handled correctly for some unicode
characters.  (Forward port from 2.5.2.)
2008-01-03 19:12:44 +00:00
Neal Norwitz e3b185f966 Fix typo in comment 2007-07-06 04:13:39 +00:00
Andrew M. Kuchling c30faa812c [Bug #1177831] Fix generation of code for GROUPREF_EXISTS. Thanks to Andre Malo for the fix. 2005-06-02 13:35:52 +00:00
Raymond Hettinger 049ade2997 Complete the previous effort to factor out constant expressions
and improve the speed of the if/elif/else blocks.
2005-02-28 19:27:52 +00:00
Fredrik Lundh 5e7d51b62c make sure to check for this limit even if we're running with -O 2004-10-15 06:15:08 +00:00
Martin v. Löwis 7d9c6c7e8c Fix _sre.CODESIZE on 64-bit machines in UCS-4 mode. Fixes #931848.
Backported to 2.3.
2004-05-07 07:18:13 +00:00
Raymond Hettinger d732c95eb0 Revert 1.51 booleans so that sre will still run on old pythons. 2004-03-27 09:24:36 +00:00
Raymond Hettinger 29e383754e Remove unnecessary test. (Thanks Skip) 2004-03-26 20:16:39 +00:00
Raymond Hettinger 01c9f8c35f Simple optimizations:
* pre-build a single identity function for the fixup function
* pre-build membership tests in dictionaries instead of in-line tuples
* assign len() to a local variable
* assign append() methods to a local variable
* use xrange() instead of range()
* replace "x<<1" with "x+x"
2004-03-26 11:16:55 +00:00
Martin v. Löwis bc503d1e90 Use True/False instead of 0/1 for character classes. 2004-03-25 13:50:59 +00:00
Gustavo Niemeyer ad3fc44ccb Implemented non-recursive SRE matching. 2003-10-17 22:13:16 +00:00
Just van Rossum 74902508dc Addendum to #764548: restore 2.1 compatibility. 2003-07-02 21:37:16 +00:00
Just van Rossum 12723bacea Fix and test for bug #764548:
Use isinstance() instead of comparing types directly, to enable
subclasses of str and unicode to be used as patterns.
Blessed by /F.
2003-07-02 20:03:04 +00:00
Martin v. Löwis 78e2f06cc6 Fully support 32-bit codes. Enable BIGCHARSET in UCS-4 builds. 2003-04-19 12:56:08 +00:00
Guido van Rossum 41c99e7f96 SF patch #720991 by Gary Herron:
A small fix for bug #545855 and Greg Chapman's
addition of op code SRE_OP_MIN_REPEAT_ONE for
eliminating recursion on simple uses of pattern '*?' on a
long string.
2003-04-14 17:59:34 +00:00
Guido van Rossum 577fb5a1db Fix from SF patch #633359 by Greg Chapman for SF bug #610299:
The problem is in sre_compile.py: the call to
    _compile_charset near the end of _compile_info forgets to
    pass in the flags, so that the info charset is not compiled
    with re.U. (The info charset is used when searching to find
    the first character at which a match could start; it is not
    generated for patterns beginning with a repeat like '\w{1}'.)
2003-02-24 01:18:35 +00:00
Martin v. Löwis 67c4cb1f13 Disable big charsets in UCS-4 builds. Works around #599377.
Will backport to 2.2
2002-09-26 16:39:20 +00:00
Fredrik Lundh 4fb7027ec0 made the code match the comments (1.5.2 compatibility) 2002-06-27 20:08:25 +00:00
Raymond Hettinger f13eb55d59 Replace boolean test with is None. 2002-06-02 00:40:05 +00:00
Fred Drake b8f2274985 Added docstrings by Neal Norwitz. This closes SF bug #450980. 2001-09-04 19:10:20 +00:00
Tim Peters 87cc0c329e Whitespace normalization, plus:
+ test_quopri.py relied on significant trailing spaces.  Fixed.
+ test_dircache.py (still) doesn't work on Windows (directory mtime on
  Windows doesn't work like it does on Unix).
2001-07-21 01:41:30 +00:00
Martin v. Löwis 3550dd30bb Patch #442512: put block indices in the right byte order on bigendian systems. 2001-07-19 14:26:10 +00:00
Fredrik Lundh 19af43d78a added martin's BIGCHARSET patch to SRE 2.1.1. martin reports 2x
speedups for certain unicode character ranges.
2001-07-02 16:58:38 +00:00
Fredrik Lundh b25e1ad253 sre 2.1b2 update:
- take locale into account for word boundary anchors (#410271)
- restored 2.0's *? behaviour (#233283, #408936 and others)
- speed up re.sub/re.subn
2001-03-22 15:50:10 +00:00
Fredrik Lundh f2989b22ff - restored 1.5.2 compatibility (sorry, eric)
- removed __all__ cruft from internal modules (sorry, skip)
- don't assume ASCII for string escapes (sorry, per)
2001-02-18 12:05:16 +00:00
Skip Montanaro 0de65807e6 bunch more __all__ lists
also modified check_all function to suppress all warnings since they aren't
relevant to what this test is doing (allows quiet checking of regsub, for
instance)
2001-02-15 22:15:14 +00:00
Fredrik Lundh 2e24044f9d from the really-stupid-bug department: uppercase literals should match
uppercase strings also when the IGNORECASE flag is set (bug #128899)

(also added test cases for recently fixed bugs to the regression suite
-- or in other words, check in re_tests.py too...)
2001-01-15 18:28:14 +00:00
Fredrik Lundh b35ffc0417 added "magic" number to the _sre module, to avoid weird errors caused
by compiler/engine mismatches
2001-01-15 12:46:09 +00:00
Fredrik Lundh 770617b23e SRE fixes for 2.1 alpha:
-- added some more docstrings
-- fixed typo in scanner class (#125531)
-- the multiline flag (?m) should't affect the \Z operator (#127259)
-- fixed non-greedy backtracking bug (#123769, #127259)
-- added sre.DEBUG flag (currently dumps the parsed pattern structure)
-- fixed a couple of glitches in groupdict (the #126587 memory leak
   had already been fixed by AMK)
2001-01-14 15:06:11 +00:00
Fredrik Lundh 13ac9926ac Fixed too ambitious "nothing to repeat" check. Closes bug #114033. 2000-10-07 17:38:23 +00:00
Fredrik Lundh 7898c3e685 -- reset marks if repeat_one tail doesn't match
(this should fix Sjoerd's xmllib problem)
-- added skip field to INFO header
-- changed compiler to generate charset INFO header
-- changed trace messages to support post-mortem analysis
2000-08-07 20:59:04 +00:00
Fredrik Lundh e186983842 final 0.9.8 updates:
-- added REPEAT_ONE operator
-- added ANY_ALL operator (used to represent "(?s).")
2000-08-01 22:47:49 +00:00
Fredrik Lundh 2f2c67d7e5 -- fixed width calculations for alternations
-- fixed literal check in branch operator
   (this broke test_tokenize, as reported by Mark Favas)
-- added REPEAT_ONE operator (still not enabled, though)
-- added some debugging stuff (maxlevel)
2000-08-01 21:05:41 +00:00
Fredrik Lundh 29c4ba9ada SRE 0.9.8: passes the entire test suite
-- reverted REPEAT operator to use "repeat context" strategy
   (from 0.8.X), but done right this time.
-- got rid of backtracking stack; use nested SRE_MATCH calls
   instead (should probably put it back again in 0.9.9 ;-)
-- properly reset state in scanner mode
-- don't use aggressive inlining by default
2000-08-01 18:20:07 +00:00
Fredrik Lundh 8a3ebf8ca8 -- SRE 0.9.6 sync. this includes:
+ added "regs" attribute
 + fixed "pos" and "endpos" attributes
 + reset "lastindex" and "lastgroup" in scanner methods
 + removed (?P#id) syntax; the "lastindex" and "lastgroup"
   attributes are now always set
 + removed string module dependencies in sre_parse
 + better debugging support in sre_parse
 + various tweaks to build under 1.5.2
2000-07-23 21:46:17 +00:00
Fredrik Lundh 2855290b84 maintenance release:
- reorganized some code to get rid of -Wall and -W4
  warnings

- fixed default argument handling for sub/subn/split
  methods (reported by Peter Schneider-Kamp).
2000-07-05 21:14:16 +00:00
Fredrik Lundh 72b82ba16d - fixed grouping error bug
- changed "group" operator to "groupref"
2000-07-03 21:31:48 +00:00
Fredrik Lundh 6f01398236 - added lookbehind support (?<=pattern), (?<!pattern).
the pattern must have a fixed width.

- got rid of array-module dependencies; the match pro-
  gram is now stored inside the pattern object, rather
  than in an extra string buffer.

- cleaned up a various of potential leaks, api abuses,
  and other minors in the engine module.

- use mal's new isalnum macro, rather than my own work-
  around.

- untabified test_sre.py.  seems like I removed a couple
  of trailing spaces in the process...
2000-07-03 18:44:21 +00:00
Fredrik Lundh c2301730b8 - experimental: added two new attributes to the match object:
"lastgroup" is the name of the last matched capturing group,
  "lastindex" is the index of the same group.  if no group was
  matched, both attributes are set to None.

  the (?P#) feature will be removed in the next relase.
2000-07-02 22:25:39 +00:00