From a2f837f751bd38c85636c6431a11b216228da92e Mon Sep 17 00:00:00 2001 From: Benjamin Peterson Date: Mon, 28 Apr 2008 21:05:10 +0000 Subject: [PATCH] Document the fact that '\U' and '\u' escapes are not treated specially in 3.0 (see issue 2541) --- Doc/reference/lexical_analysis.rst | 14 +++----------- Doc/whatsnew/3.0.rst | 5 +++++ Misc/NEWS | 3 +++ 3 files changed, 11 insertions(+), 11 deletions(-) diff --git a/Doc/reference/lexical_analysis.rst b/Doc/reference/lexical_analysis.rst index 566e90b905e..2a9fd7931c7 100644 --- a/Doc/reference/lexical_analysis.rst +++ b/Doc/reference/lexical_analysis.rst @@ -423,8 +423,9 @@ characters that otherwise have a special meaning, such as newline, backslash itself, or the quote character. String literals may optionally be prefixed with a letter ``'r'`` or ``'R'``; -such strings are called :dfn:`raw strings` and use different rules for -interpreting backslash escape sequences. +such strings are called :dfn:`raw strings` and treat backslashes as literal +characters. As a result, ``'\U'`` and ``'\u'`` escapes in raw strings are not +treated specially. Bytes literals are always prefixed with ``'b'`` or ``'B'``; they produce an instance of the :class:`bytes` type instead of the :class:`str` type. They @@ -520,15 +521,6 @@ is more easily recognized as broken.) It is also important to note that the escape sequences only recognized in string literals fall into the category of unrecognized escapes for bytes literals. -When an ``'r'`` or ``'R'`` prefix is used in a string literal, then the -``\uXXXX`` and ``\UXXXXXXXX`` escape sequences are processed while *all other -backslashes are left in the string*. For example, the string literal -``r"\u0062\n"`` consists of three Unicode characters: 'LATIN SMALL LETTER B', -'REVERSE SOLIDUS', and 'LATIN SMALL LETTER N'. Backslashes can be escaped with a -preceding backslash; however, both remain in the string. As a result, -``\uXXXX`` escape sequences are only recognized when there is an odd number of -backslashes. - Even in a raw string, string quotes can be escaped with a backslash, but the backslash remains in the string; for example, ``r"\""`` is a valid string literal consisting of two characters: a backslash and a double quote; ``r"\"`` diff --git a/Doc/whatsnew/3.0.rst b/Doc/whatsnew/3.0.rst index 7f8ba47b36f..11b56ccd054 100644 --- a/Doc/whatsnew/3.0.rst +++ b/Doc/whatsnew/3.0.rst @@ -167,6 +167,9 @@ Strings and Bytes explicitly convert between them, using the :meth:`str.encode` (str -> bytes) or :meth:`bytes.decode` (bytes -> str) methods. +* All backslashes in raw strings are interpreted literally. This means that + Unicode escapes are not treated specially. + .. XXX add bytearray * PEP 3112: Bytes literals, e.g. ``b"abc"``, create :class:`bytes` instances. @@ -183,6 +186,8 @@ Strings and Bytes * The :mod:`StringIO` and :mod:`cStringIO` modules are gone. Instead, import :class:`io.StringIO` or :class:`io.BytesIO`. +* ``'\U'`` and ``'\u'`` escapes in raw strings are not treated specially. + PEP 3101: A New Approach to String Formatting ============================================= diff --git a/Misc/NEWS b/Misc/NEWS index 5c3b875a69a..10640a11139 100644 --- a/Misc/NEWS +++ b/Misc/NEWS @@ -26,6 +26,9 @@ Core and Builtins through as unmodified as possible; as a consequence, the C API related to command line arguments was changed to use wchar_t. +- All backslashes in raw strings are interpreted literally. This means that + '\u' and '\U' escapes are not treated specially. + Extension Modules -----------------