diff --git a/Doc/howto/regex.rst b/Doc/howto/regex.rst index 87a6b1aba59..bdf687ee455 100644 --- a/Doc/howto/regex.rst +++ b/Doc/howto/regex.rst @@ -289,6 +289,8 @@ Putting REs in strings keeps the Python language simpler, but has one disadvantage which is the topic of the next section. +.. _the-backslash-plague: + The Backslash Plague -------------------- @@ -327,6 +329,13 @@ backslashes are not handled in any special way in a string literal prefixed with while ``"\n"`` is a one-character string containing a newline. Regular expressions will often be written in Python code using this raw string notation. +In addition, special escape sequences that are valid in regular expressions, +but not valid as Python string literals, now result in a +:exc:`DeprecationWarning` and will eventually become a :exc:`SyntaxError`, +which means the sequences will be invalid if raw string notation or escaping +the backslashes isn't used. + + +-------------------+------------------+ | Regular String | Raw string | +===================+==================+ @@ -457,10 +466,16 @@ In actual programs, the most common style is to store the Two pattern methods return all of the matches for a pattern. :meth:`~re.Pattern.findall` returns a list of matching strings:: - >>> p = re.compile('\d+') + >>> p = re.compile(r'\d+') >>> p.findall('12 drummers drumming, 11 pipers piping, 10 lords a-leaping') ['12', '11', '10'] +The ``r`` prefix, making the literal a raw string literal, is needed in this +example because escape sequences in a normal "cooked" string literal that are +not recognized by Python, as opposed to regular expressions, now result in a +:exc:`DeprecationWarning` and will eventually become a :exc:`SyntaxError`. See +:ref:`the-backslash-plague`. + :meth:`~re.Pattern.findall` has to create the entire list before it can be returned as the result. The :meth:`~re.Pattern.finditer` method returns a sequence of :ref:`match object ` instances as an :term:`iterator`:: @@ -1096,11 +1111,11 @@ following calls:: The module-level function :func:`re.split` adds the RE to be used as the first argument, but is otherwise the same. :: - >>> re.split('[\W]+', 'Words, words, words.') + >>> re.split(r'[\W]+', 'Words, words, words.') ['Words', 'words', 'words', ''] - >>> re.split('([\W]+)', 'Words, words, words.') + >>> re.split(r'([\W]+)', 'Words, words, words.') ['Words', ', ', 'words', ', ', 'words', '.', ''] - >>> re.split('[\W]+', 'Words, words, words.', 1) + >>> re.split(r'[\W]+', 'Words, words, words.', 1) ['Words', 'words, words.'] diff --git a/Doc/howto/unicode.rst b/Doc/howto/unicode.rst index d4b8f8d2204..093f4454af1 100644 --- a/Doc/howto/unicode.rst +++ b/Doc/howto/unicode.rst @@ -463,7 +463,7 @@ The string in this example has the number 57 written in both Thai and Arabic numerals:: import re - p = re.compile('\d+') + p = re.compile(r'\d+') s = "Over \u0e55\u0e57 57 flavours" m = p.search(s) diff --git a/Doc/library/re.rst b/Doc/library/re.rst index 9b175f4e967..83ebe7db01a 100644 --- a/Doc/library/re.rst +++ b/Doc/library/re.rst @@ -345,7 +345,7 @@ The special characters are: This example looks for a word following a hyphen: - >>> m = re.search('(?<=-)\w+', 'spam-egg') + >>> m = re.search(r'(?<=-)\w+', 'spam-egg') >>> m.group(0) 'egg' diff --git a/Misc/NEWS.d/next/Documentation/2018-02-02-07-41-57.bpo-32614.LSqzGw.rst b/Misc/NEWS.d/next/Documentation/2018-02-02-07-41-57.bpo-32614.LSqzGw.rst new file mode 100644 index 00000000000..9e9f3e3a74d --- /dev/null +++ b/Misc/NEWS.d/next/Documentation/2018-02-02-07-41-57.bpo-32614.LSqzGw.rst @@ -0,0 +1,3 @@ +Modify RE examples in documentation to use raw strings to prevent +:exc:`DeprecationWarning` and add text to REGEX HOWTO to highlight the +deprecation.