The problem is in sre_compile.py: the call to
_compile_charset near the end of _compile_info forgets to
pass in the flags, so that the info charset is not compiled
with re.U. (The info charset is used when searching to find
the first character at which a match could start; it is not
generated for patterns beginning with a repeat like '\w{1}'.)
from Greg Chapman.
* Modules/_sre.c
(lastmark_restore): New function, implementing algorithm to restore
a state to a given lastmark. In addition to the similar algorithm used
in a few places of SRE_MATCH, restore lastindex when restoring lastmark.
(SRE_MATCH): Replace lastmark inline restoring by lastmark_restore(),
function. Also include it where missing. In SRE_OP_MARK, set lastindex
only if i > lastmark.
* Lib/test/re_tests.py
* Lib/test/test_sre.py
Included regression tests for the fixed bugs.
* Misc/NEWS
Mention fixes.
backed out of broken minimal repeat patch from July
also fixed a couple of minor potential resource leaks in pattern_subx
(Guido had already fixed the big one)
uppercase strings also when the IGNORECASE flag is set (bug #128899)
(also added test cases for recently fixed bugs to the regression suite
-- or in other words, check in re_tests.py too...)