bpo-31672: Fix string.Template accidentally matched non-ASCII identifiers (GH-3872)

Pattern `[a-z]` with `IGNORECASE` flag can match to some non-ASCII characters.

Straightforward solution for this is using `IGNORECASE | ASCII` flag.
But users may subclass `Template` and override only `idpattern`. So we want to
avoid changing `Template.flags`.

So this commit uses local flag `-i` for `idpattern` and change `[a-z]` to `[a-zA-Z]`.
This commit is contained in:
INADA Naoki 2017-10-13 16:02:23 +09:00 committed by GitHub
parent 9255104499
commit b22273ec5d
4 changed files with 24 additions and 3 deletions

View File

@ -755,8 +755,17 @@ attributes:
* *idpattern* -- This is the regular expression describing the pattern for * *idpattern* -- This is the regular expression describing the pattern for
non-braced placeholders. The default value is the regular expression non-braced placeholders. The default value is the regular expression
``[_a-z][_a-z0-9]*``. If this is given and *braceidpattern* is ``None`` ``(?-i:[_a-zA-Z][_a-zA-Z0-9]*)``. If this is given and *braceidpattern* is
this pattern will also apply to braced placeholders. ``None`` this pattern will also apply to braced placeholders.
.. note::
Since default *flags* is ``re.IGNORECASE``, pattern ``[a-z]`` can match
with some non-ASCII characters. That's why we use local ``-i`` flag here.
While *flags* is kept to ``re.IGNORECASE`` for backward compatibility,
you can override it to ``0`` or ``re.IGNORECASE | re.ASCII`` when
subclassing. It's simple way to avoid unexpected match like above example.
.. versionchanged:: 3.7 .. versionchanged:: 3.7
*braceidpattern* can be used to define separate patterns used inside and *braceidpattern* can be used to define separate patterns used inside and

View File

@ -79,7 +79,11 @@ class Template(metaclass=_TemplateMetaclass):
"""A string class for supporting $-substitutions.""" """A string class for supporting $-substitutions."""
delimiter = '$' delimiter = '$'
idpattern = r'[_a-z][_a-z0-9]*' # r'[a-z]' matches to non-ASCII letters when used with IGNORECASE,
# but without ASCII flag. We can't add re.ASCII to flags because of
# backward compatibility. So we use local -i flag and [a-zA-Z] pattern.
# See https://bugs.python.org/issue31672
idpattern = r'(?-i:[_a-zA-Z][_a-zA-Z0-9]*)'
braceidpattern = None braceidpattern = None
flags = _re.IGNORECASE flags = _re.IGNORECASE

View File

@ -270,6 +270,12 @@ class TestTemplate(unittest.TestCase):
raises(ValueError, s.substitute, dict(who='tim')) raises(ValueError, s.substitute, dict(who='tim'))
s = Template('$who likes $100') s = Template('$who likes $100')
raises(ValueError, s.substitute, dict(who='tim')) raises(ValueError, s.substitute, dict(who='tim'))
# Template.idpattern should match to only ASCII characters.
# https://bugs.python.org/issue31672
s = Template("$who likes $\u0131") # (DOTLESS I)
raises(ValueError, s.substitute, dict(who='tim'))
s = Template("$who likes $\u0130") # (LATIN CAPITAL LETTER I WITH DOT ABOVE)
raises(ValueError, s.substitute, dict(who='tim'))
def test_idpattern_override(self): def test_idpattern_override(self):
class PathPattern(Template): class PathPattern(Template):

View File

@ -0,0 +1,2 @@
``idpattern`` in ``string.Template`` matched some non-ASCII characters. Now
it uses ``-i`` regular expression local flag to avoid non-ASCII characters.