Commit Graph

33 Commits

Author SHA1 Message Date
Antoine Pitrou fd036451bf #2834: Change re module semantics, so that str and bytes mixing is forbidden,
and str (unicode) patterns get full unicode matching by default. The re.ASCII
flag is also introduced to ask for ASCII matching instead.
2008-08-19 17:56:33 +00:00
Antoine Pitrou 22628c4d6a #3231: re.compile fails with some bytes patterns 2008-07-22 17:53:22 +00:00
Gustavo Niemeyer be733ee7fb More work on bug #672491 and patch #712900.
I've applied a modified version of Greg Chapman's patch. I've included
the fixes without introducing the reorganization mentioned, for the sake
of stability. Also, the second fix mentioned in the patch don't fix the
mentioned problem anymore, because of the change introduced by patch
#720991 (by Greg as well). The new fix wasn't complicated though, and is
included as well.

As a note. It seems that there are other places that require the
"protection" of LASTMARK_SAVE()/LASTMARK_RESTORE(), and are just waiting
for someone to find how to break them. Particularly, I belive that every
recursion of SRE_MATCH() should be protected by these macros. I won't
do that right now since I'm not completely sure about this, and we don't
have much time for testing until the next release.
2003-04-20 07:35:44 +00:00
Guido van Rossum 577fb5a1db Fix from SF patch #633359 by Greg Chapman for SF bug #610299:
The problem is in sre_compile.py: the call to
    _compile_charset near the end of _compile_info forgets to
    pass in the flags, so that the info charset is not compiled
    with re.U. (The info charset is used when searching to find
    the first character at which a match could start; it is not
    generated for patterns beginning with a repeat like '\w{1}'.)
2003-02-24 01:18:35 +00:00
Tim Peters f2715e0764 Whitespace normalization. 2003-02-19 02:35:07 +00:00
Neal Norwitz bb1844148a SF patch #682432, add lookbehind tests 2003-02-13 03:01:18 +00:00
Gustavo Niemeyer 4e7be06a65 Fixed bug #470582, using a modified version of patch #527371,
from Greg Chapman.

* Modules/_sre.c
  (lastmark_restore): New function, implementing algorithm to restore
  a state to a given lastmark. In addition to the similar algorithm used
  in a few places of SRE_MATCH, restore lastindex when restoring lastmark.
  (SRE_MATCH): Replace lastmark inline restoring by lastmark_restore(),
  function. Also include it where missing. In SRE_OP_MARK, set lastindex
  only if i > lastmark.

* Lib/test/re_tests.py
* Lib/test/test_sre.py
  Included regression tests for the fixed bugs.

* Misc/NEWS
  Mention fixes.
2002-11-06 14:06:53 +00:00
Fredrik Lundh 82b230732f bug #133283, #477728, #483789, #490573
backed out of broken minimal repeat patch from July

also fixed a couple of minor potential resource leaks in pattern_subx
(Guido had already fixed the big one)
2001-12-09 16:13:15 +00:00
Fredrik Lundh df781e6a3f reapplied darryl gallion's minimizing repeat fix. I'm still not 100%
sure about this one, but test #133283 now works even with the fix in
place, and so does the test suite.  we'll see what comes up...
2001-07-02 19:54:28 +00:00
Fredrik Lundh b25e1ad253 sre 2.1b2 update:
- take locale into account for word boundary anchors (#410271)
- restored 2.0's *? behaviour (#233283, #408936 and others)
- speed up re.sub/re.subn
2001-03-22 15:50:10 +00:00
Fredrik Lundh c0c7ee3a65 detect attempts to repeat anchors (fixes bug #130748) 2001-02-18 21:04:48 +00:00
Fredrik Lundh 2e24044f9d from the really-stupid-bug department: uppercase literals should match
uppercase strings also when the IGNORECASE flag is set (bug #128899)

(also added test cases for recently fixed bugs to the regression suite
-- or in other words, check in re_tests.py too...)
2001-01-15 18:28:14 +00:00
Fredrik Lundh 13ac9926ac Fixed too ambitious "nothing to repeat" check. Closes bug #114033. 2000-10-07 17:38:23 +00:00
Fredrik Lundh 025468d246 SRE didn't handle character category followed by hyphen inside a
character class.  Fix provided by Andrew Kuchling.  Closes bug
#116251.
2000-10-07 10:16:19 +00:00
Fredrik Lundh d11b5e54f0 Recompile pattern if (?x) flag was found inside the pattern during the
first scan.  Closes bug #115040.
2000-10-03 19:22:26 +00:00
Fredrik Lundh 65d4bc616a Fixed negative lookahead/lookbehind. Closes bug #115618. 2000-10-03 16:29:23 +00:00
Fredrik Lundh 19f977ba40 - don't hang if group id is followed by whitespace (closes bug #114660) 2000-09-24 14:46:23 +00:00
Fredrik Lundh 0c4fdbaee8 closes bug #112468 (and all the other bugs that surfaced when
I fixed the a bug in the regression test harness...)
2000-08-31 22:57:55 +00:00
Fredrik Lundh 8e6d571a7c -- enabled some temporarily disabled RE tests
-- added basic unicode tests to test_re
-- added test case for Sjoerd's xmllib problem to re_tests
2000-08-08 17:06:53 +00:00
Fredrik Lundh 2643b55a77 -- whitespace cleanup (real changes coming in next checkin) 2000-08-08 16:52:51 +00:00
Guido van Rossum 8430c583da AMK's latest 1998-04-03 21:47:12 +00:00
Guido van Rossum dfa6790bd6 New re version from AMK 1997-12-08 17:12:06 +00:00
Guido van Rossum cf00505325 Added tests for \b, \B (AMK). 1997-08-15 15:44:58 +00:00
Guido van Rossum 95e8053a9f 1.5a3 prerelease 1 from AMK 1997-08-13 22:34:14 +00:00
Guido van Rossum 06c0ec94e4 Several additions from Jeffrey. 1997-07-17 22:36:39 +00:00
Guido van Rossum a0e4c1bffc Jeffrey's latest -- seems to solve most problems! 1997-07-17 14:52:48 +00:00
Guido van Rossum 9ddd9dad80 Fixed a syntax error caused by a bad line in the Perl source. 1997-07-15 19:01:04 +00:00
Guido van Rossum 16bd0ff16a Merged my changes in, and added all converted Perl tests. 1997-07-15 18:45:20 +00:00
Guido van Rossum 337c6d41d4 Jeffrey's version 1997-07-15 18:42:58 +00:00
Guido van Rossum 23b8d4c15e Tweak re_tests and test_re to differentiate between
groups that have no value and groups that are out of bounds.
1997-07-15 15:49:52 +00:00
Guido van Rossum 847ed4afb5 More tweaks; re.py is nearly there... 1997-07-15 15:40:57 +00:00
Guido van Rossum 04a1d74229 Jeffrey's newest 1997-07-15 14:38:13 +00:00
Guido van Rossum 8e0ce30ce4 test suite for re.py 1997-07-11 19:34:44 +00:00