I've applied a modified version of Greg Chapman's patch. I've included
the fixes without introducing the reorganization mentioned, for the sake
of stability. Also, the second fix mentioned in the patch don't fix the
mentioned problem anymore, because of the change introduced by patch
#720991 (by Greg as well). The new fix wasn't complicated though, and is
included as well.
As a note. It seems that there are other places that require the
"protection" of LASTMARK_SAVE()/LASTMARK_RESTORE(), and are just waiting
for someone to find how to break them. Particularly, I belive that every
recursion of SRE_MATCH() should be protected by these macros. I won't
do that right now since I'm not completely sure about this, and we don't
have much time for testing until the next release.
The problem is in sre_compile.py: the call to
_compile_charset near the end of _compile_info forgets to
pass in the flags, so that the info charset is not compiled
with re.U. (The info charset is used when searching to find
the first character at which a match could start; it is not
generated for patterns beginning with a repeat like '\w{1}'.)
from Greg Chapman.
* Modules/_sre.c
(lastmark_restore): New function, implementing algorithm to restore
a state to a given lastmark. In addition to the similar algorithm used
in a few places of SRE_MATCH, restore lastindex when restoring lastmark.
(SRE_MATCH): Replace lastmark inline restoring by lastmark_restore(),
function. Also include it where missing. In SRE_OP_MARK, set lastindex
only if i > lastmark.
* Lib/test/re_tests.py
* Lib/test/test_sre.py
Included regression tests for the fixed bugs.
* Misc/NEWS
Mention fixes.
backed out of broken minimal repeat patch from July
also fixed a couple of minor potential resource leaks in pattern_subx
(Guido had already fixed the big one)
uppercase strings also when the IGNORECASE flag is set (bug #128899)
(also added test cases for recently fixed bugs to the regression suite
-- or in other words, check in re_tests.py too...)