Update C API docs for PEP 393.

2011-10-07 11:19:11 +02:00 · 2011-10-07 11:19:11 +02:00 · db6c7f5c33
parent 59de0ee9e0
commit db6c7f5c33
4 changed files with 521 additions and 107 deletions
--- a/Doc/c-api/long.rst
+++ b/Doc/c-api/long.rst
@ -100,6 +100,20 @@ All integers are implemented as "long" integer objects of arbitrary size.
   string is first encoded to a byte string using :c:func:`PyUnicode_EncodeDecimal`
   and then converted using :c:func:`PyLong_FromString`.
   .. deprecated-removed:: 3.3 4.0
      Part of the old-style :c:type:`Py_UNICODE` API; please migrate to using
      :c:func:`PyLong_FromUnicodeObject`.
 .. c:function:: PyObject* PyLong_FromUnicodeObject(PyObject *u, int base)
   Convert a sequence of Unicode digits in the string *u* to a Python integer
   value.  The Unicode string is first encoded to a byte string using
   :c:func:`PyUnicode_EncodeDecimal` and then converted using
   :c:func:`PyLong_FromString`.
   .. versionadded:: 3.3
 .. c:function:: PyObject* PyLong_FromVoidPtr(void *p)
--- a/Doc/c-api/module.rst
+++ b/Doc/c-api/module.rst
@ -87,7 +87,7 @@ There are only a few functions special to module objects.
   Return the name of the file from which *module* was loaded using *module*'s
   :attr:`__file__` attribute.  If this is not defined, or if it is not a
   unicode string, raise :exc:`SystemError` and return *NULL*; otherwise return
-   a reference to a :c:type:`PyUnicodeObject`.
+   a reference to a Unicode object.
   .. versionadded:: 3.2
--- a/Doc/c-api/unicode.rst
+++ b/Doc/c-api/unicode.rst
@ -6,38 +6,58 @@ Unicode Objects and Codecs
 --------------------------
 .. sectionauthor:: Marc-Andre Lemburg <mal@lemburg.com>
 .. sectionauthor:: Georg Brandl <georg@python.org>
 Unicode Objects
 ^^^^^^^^^^^^^^^
 Since the implementation of :pep:`393` in Python 3.3, Unicode objects internally
 use a variety of representations, in order to allow handling the complete range
 of Unicode characters while staying memory efficient.  There are special cases
 for strings where all code points are below 128, 256, or 65536; otherwise, code
 points must be below 1114112 (which is the full Unicode range).
 :c:type:`Py_UNICODE*` and UTF-8 representations are created on demand and cached
 in the Unicode object.
 Unicode Type
 """"""""""""
 These are the basic Unicode object types used for the Unicode implementation in
 Python:
 .. c:type:: Py_UCS4
            Py_UCS2
            Py_UCS1
   These types are typedefs for unsigned integer types wide enough to contain
   characters of 32 bits, 16 bits and 8 bits, respectively.  When dealing with
   single Unicode characters, use :c:type:`Py_UCS4`.
   .. versionadded:: 3.3
 .. c:type:: Py_UNICODE
-   This type represents the storage type which is used by Python internally as
+   This is a typedef of :c:type:`wchar_t`, which is a 16-bit type or 32-bit type
-   basis for holding Unicode ordinals.  Python's default builds use a 16-bit type
+   depending on the platform.
   for :c:type:`Py_UNICODE` and store Unicode values internally as UCS2. It is also
   possible to build a UCS4 version of Python (most recent Linux distributions come
   with UCS4 builds of Python). These builds then use a 32-bit type for
   :c:type:`Py_UNICODE` and store Unicode data internally as UCS4. On platforms
   where :c:type:`wchar_t` is available and compatible with the chosen Python
   Unicode build variant, :c:type:`Py_UNICODE` is a typedef alias for
   :c:type:`wchar_t` to enhance native platform compatibility. On all other
   platforms, :c:type:`Py_UNICODE` is a typedef alias for either :c:type:`unsigned
   short` (UCS2) or :c:type:`unsigned long` (UCS4).
-Note that UCS2 and UCS4 Python builds are not binary compatible. Please keep
+   .. versionchanged:: 3.3
-this in mind when writing extensions or interfaces.
+      In previous versions, this was a 16-bit type or a 32-bit type depending on
      whether you selected a "narrow" or "wide" Unicode version of Python at
      build time.
-.. c:type:: PyUnicodeObject
+.. c:type:: PyASCIIObject
            PyCompactUnicodeObject
            PyUnicodeObject
-   This subtype of :c:type:`PyObject` represents a Python Unicode object.
+   These subtypes of :c:type:`PyObject` represent a Python Unicode object.  In
   almost all cases, they shouldn't be used directly, since all API functions
   that deal with Unicode objects take and return :c:type:`PyObject` pointers.
   .. versionadded:: 3.3
 .. c:var:: PyTypeObject PyUnicode_Type
@ -45,10 +65,10 @@ this in mind when writing extensions or interfaces.
   This instance of :c:type:`PyTypeObject` represents the Python Unicode type.  It
   is exposed to Python code as ``str``.
 The following APIs are really C macros and can be used to do fast checks and to
 access internal read-only data of Unicode objects:
 .. c:function:: int PyUnicode_Check(PyObject *o)
   Return true if the object *o* is a Unicode object or an instance of a Unicode
@ -63,26 +83,161 @@ access internal read-only data of Unicode objects:
 .. c:function:: Py_ssize_t PyUnicode_GET_SIZE(PyObject *o)
-   Return the size of the object.  *o* has to be a :c:type:`PyUnicodeObject` (not
+   Return the size of the deprecated :c:type:`Py_UNICODE` representation, in
-   checked).
+   code units (this includes surrogate pairs as 2 units).  *o* has to be a
   Unicode object (not checked).
   .. deprecated-removed:: 3.3 4.0
      Part of the old-style Unicode API, please migrate to using
      :c:func:`PyUnicode_GET_LENGTH`.
 .. c:function:: Py_ssize_t PyUnicode_GET_DATA_SIZE(PyObject *o)
-   Return the size of the object's internal buffer in bytes.  *o* has to be a
+   Return the size of the deprecated :c:type:`Py_UNICODE` representation in
-   :c:type:`PyUnicodeObject` (not checked).
+   bytes.  *o* has to be a Unicode object (not checked).
   .. deprecated-removed:: 3.3 4.0
      Part of the old-style Unicode API, please migrate to using
      :c:func:`PyUnicode_GET_LENGTH` or :c:func:`PyUnicode_KIND_SIZE`.
 .. c:function:: Py_UNICODE* PyUnicode_AS_UNICODE(PyObject *o)
                const char* PyUnicode_AS_DATA(PyObject *o)
-   Return a pointer to the internal :c:type:`Py_UNICODE` buffer of the object.  *o*
+   Return a pointer to a :c:type:`Py_UNICODE` representation of the object.  The
-   has to be a :c:type:`PyUnicodeObject` (not checked).
+   ``AS_DATA`` form casts the pointer to :c:type:`const char *`.  *o* has to be
   a Unicode object (not checked).
   .. versionchanged:: 3.3
      This macro is now inefficient -- because in many cases the
      :c:type:`Py_UNICODE` representation does not exist and needs to be created
      -- and can fail (return *NULL* with an exception set).  Try to port the
      code to use the new :c:func:`PyUnicode_nBYTE_DATA` macros or use
      :c:func:`PyUnicode_WRITE` or :c:func:`PyUnicode_READ`.
   .. deprecated-removed:: 3.3 4.0
      Part of the old-style Unicode API, please migrate to using the
      :c:func:`PyUnicode_nBYTE_DATA` family of macros.
-.. c:function:: const char* PyUnicode_AS_DATA(PyObject *o)
+.. c:function:: int PyUnicode_READY(PyObject *o)
-   Return a pointer to the internal buffer of the object. *o* has to be a
+   Ensure the string object *o* is in the "canonical" representation.  This is
-   :c:type:`PyUnicodeObject` (not checked).
+   required before using any of the access macros described below.
   .. XXX expand on when it is not required
   Returns 0 on success and -1 with an exception set on failure, which in
   particular happens if memory allocation fails.
   .. versionadded:: 3.3
 .. c:function:: Py_ssize_t PyUnicode_GET_LENGTH(PyObject *o)
   Return the length of the Unicode string, in code points.  *o* has to be a
   Unicode object in the "canonical" representation (not checked).
   .. versionadded:: 3.3
 .. c:function:: Py_UCS1* PyUnicode_1BYTE_DATA(PyObject *o)
                Py_UCS2* PyUnicode_2BYTE_DATA(PyObject *o)
                Py_UCS4* PyUnicode_4BYTE_DATA(PyObject *o)
   Return a pointer to the canonical representation cast to UCS1, UCS2 or UCS4
   integer types for direct character access.  No checks are performed if the
   canonical representation has the correct character size; use
   :c:func:`PyUnicode_CHARACTER_SIZE` or :c:func:`PyUnicode_KIND` to select the
   right macro.  Make sure :c:func:`PyUnicode_READY` has been called before
   accessing this.
   .. versionadded:: 3.3
 .. c:macro:: PyUnicode_1BYTE_KIND
             PyUnicode_2BYTE_KIND
             PyUnicode_4BYTE_KIND
   Return values of the :c:func:`PyUnicode_KIND` macro.
   .. versionadded:: 3.3
 .. c:function:: int PyUnicode_KIND(PyObject *o)
   Return one of the PyUnicode kind constants (see above) that indicate how many
   bytes per character this Unicode object uses to store its data.  *o* has to
   be a Unicode object in the "canonical" representation (not checked).
   .. XXX document "0" return value?
   .. versionadded:: 3.3
 .. c:function:: int PyUnicode_CHARACTER_SIZE(PyObject *o)
   Return the number of bytes the string uses to represent single characters;
   this can be 1, 2 or 4.  *o* has to be a Unicode object in the "canonical"
   representation (not checked).
   .. versionadded:: 3.3
 .. c:function:: void* PyUnicode_DATA(PyObject *o)
   Return a void pointer to the raw unicode buffer.  *o* has to be a Unicode
   object in the "canonical" representation (not checked).
   .. versionadded:: 3.3
 .. c:function:: int PyUnicode_KIND_SIZE(int kind, Py_ssize_t index)
   Compute ``index * char_size`` where ``char_size`` is ``2**(kind - 1)``.  The
   index is a character index, the result is a size in bytes.
   .. versionadded:: 3.3
 .. c:function:: void PyUnicode_WRITE(int kind, void *data, Py_ssize_t index, \
                                     Py_UCS4 value)
   Write into a canonical representation *data* (as obtained with
   :c:func:`PyUnicode_DATA`).  This macro does not do any sanity checks and is
   intended for usage in loops.  The caller should cache the *kind* value and
   *data* pointer as obtained from other macro calls.  *index* is the index in
   the string (starts at 0) and *value* is the new code point value which should
   be written to that location.
   .. versionadded:: 3.3
 .. c:function:: Py_UCS4 PyUnicode_READ(int kind, void *data, Py_ssize_t index)
   Read a code point from a canonical representation *data* (as obtained with
   :c:func:`PyUnicode_DATA`).  No checks or ready calls are performed.
   .. versionadded:: 3.3
 .. c:function:: Py_UCS4 PyUnicode_READ_CHAR(PyObject *o, Py_ssize_t index)
   Read a character from a Unicode object *o*, which must be in the "canonical"
   representation.  This is less efficient than :c:func:`PyUnicode_READ` if you
   do multiple consecutive reads.
   .. versionadded:: 3.3
 .. c:function:: PyUnicode_MAX_CHAR_VALUE(PyObject *o)
   Return the maximum code point that is suitable for creating another string
   based on *o*, which must be in the "canonical" representation.  This is
   always an approximation but more efficient than iterating over the string.
   .. versionadded:: 3.3
 .. c:function:: int PyUnicode_ClearFreeList()
@ -216,31 +371,45 @@ These APIs can be used to work with surrogates:
   surrogate pair.
-Plain Py_UNICODE
+Creating and accessing Unicode strings
-""""""""""""""""
+""""""""""""""""""""""""""""""""""""""
 To create Unicode objects and access their basic sequence properties, use these
 APIs:
 .. c:function:: PyObject* PyUnicode_New(Py_ssize_t size, Py_UCS4 maxchar)
-.. c:function:: PyObject* PyUnicode_FromUnicode(const Py_UNICODE *u, Py_ssize_t size)
+   Create a new Unicode object.  *maxchar* should be the true maximum code point
   to be placed in the string.  As an approximation, it can be rounded up to the
   nearest value in the sequence 127, 255, 65535, 1114111.
-   Create a Unicode object from the Py_UNICODE buffer *u* of the given size. *u*
+   This is the recommended way to allocate a new Unicode object.  Objects
-   may be *NULL* which causes the contents to be undefined. It is the user's
+   created using this function are not resizable.
-   responsibility to fill in the needed data.  The buffer is copied into the new
+
-   object. If the buffer is not *NULL*, the return value might be a shared object.
+   .. versionadded:: 3.3
-   Therefore, modification of the resulting Unicode object is only allowed when *u*
+
-   is *NULL*.
+
 .. c:function:: PyObject* PyUnicode_FromKindAndData(int kind, const void *buffer, \
                                                    Py_ssize_t size)
   Create a new Unicode object with the given *kind* (possible values are
   :c:macro:`PyUnicode_1BYTE_KIND` etc., as returned by
   :c:func:`PyUnicode_KIND`).  The *buffer* must point to an array of *size*
   units of 1, 2 or 4 bytes per character, as given by the kind.
   .. versionadded:: 3.3
 .. c:function:: PyObject* PyUnicode_FromStringAndSize(const char *u, Py_ssize_t size)
-   Create a Unicode object from the char buffer *u*.  The bytes will be interpreted
+   Create a Unicode object from the char buffer *u*.  The bytes will be
-   as being UTF-8 encoded.  *u* may also be *NULL* which
+   interpreted as being UTF-8 encoded.  The buffer is copied into the new
-   causes the contents to be undefined. It is the user's responsibility to fill in
+   object. If the buffer is not *NULL*, the return value might be a shared
-   the needed data.  The buffer is copied into the new object. If the buffer is not
+   object, i.e. modification of the data is not allowed.
-   *NULL*, the return value might be a shared object. Therefore, modification of
+
-   the resulting Unicode object is only allowed when *u* is *NULL*.
+   If *u* is *NULL*, this function behaves like :c:func:`PyUnicode_FromUnicode`
   with the buffer set to *NULL*.  This usage is deprecated in favor of
   :c:func:`PyUnicode_New`.
 .. c:function:: PyObject *PyUnicode_FromString(const char *u)
@ -361,36 +530,9 @@ APIs:
   Identical to :c:func:`PyUnicode_FromFormat` except that it takes exactly two
   arguments.
 .. c:function:: PyObject* PyUnicode_TransformDecimalToASCII(Py_UNICODE *s, Py_ssize_t size)
-   Create a Unicode object by replacing all decimal digits in
+.. c:function:: PyObject* PyUnicode_FromEncodedObject(PyObject *obj, \
-   :c:type:`Py_UNICODE` buffer of the given *size* by ASCII digits 0--9
+                               const char *encoding, const char *errors)
   according to their decimal value.  Return *NULL* if an exception
   occurs.
 .. c:function:: Py_UNICODE* PyUnicode_AsUnicode(PyObject *unicode)
   Return a read-only pointer to the Unicode object's internal :c:type:`Py_UNICODE`
   buffer, *NULL* if *unicode* is not a Unicode object.
 .. c:function:: Py_UNICODE* PyUnicode_AsUnicodeCopy(PyObject *unicode)
   Create a copy of a Unicode string ending with a nul character. Return *NULL*
   and raise a :exc:`MemoryError` exception on memory allocation failure,
   otherwise return a new allocated buffer (use :c:func:`PyMem_Free` to free the
   buffer).
   .. versionadded:: 3.2
 .. c:function:: Py_ssize_t PyUnicode_GetSize(PyObject *unicode)
   Return the length of the Unicode object.
 .. c:function:: PyObject* PyUnicode_FromEncodedObject(PyObject *obj, const char *encoding, const char *errors)
   Coerce an encoded object *obj* to an Unicode object and return a reference with
   incremented refcount.
@ -407,16 +549,158 @@ APIs:
   decref'ing the returned objects.
 .. c:function:: Py_ssize_t PyUnicode_GetLength(PyObject *unicode)
   Return the length of the Unicode object, in code points.
   .. versionadded:: 3.3
 .. c:function:: int PyUnicode_CopyCharacters(PyObject *to, Py_ssize_t to_start, \
                        PyObject *to, Py_ssize_t from_start, Py_ssize_t how_many)
   Copy characters from one Unicode object into another.  This function performs
   character conversion when necessary and falls back to :c:func:`memcpy` if
   possible.  Returns ``-1`` and sets an exception on error, otherwise returns
   ``0``.
   .. versionadded:: 3.3
 .. c:function:: int PyUnicode_WriteChar(PyObject *unicode, Py_ssize_t index, \
                                        Py_UCS4 character)
   Write a character to a string.  The string must have been created through
   :c:func:`PyUnicode_New`.  Since Unicode strings are supposed to be immutable,
   the string must not be shared, or have been hashed yet.
   This function checks that *unicode* is a Unicode object, that the index is
   not out of bounds, and that the object can be modified safely (i.e. that it
   its reference count is one), in contrast to the macro version
   :c:func:`PyUnicode_WRITE_CHAR`.
   .. versionadded:: 3.3
 .. c:function:: Py_UCS4 PyUnicode_ReadChar(PyObject *unicode, Py_ssize_t index)
   Read a character from a string.  This function checks that *unicode* is a
   Unicode object and the index is not out of bounds, in contrast to the macro
   version :c:func:`PyUnicode_READ_CHAR`.
   .. versionadded:: 3.3
 .. c:function:: PyObject* PyUnicode_Substring(PyObject *str, Py_ssize_t start, \
                                              Py_ssize_t end)
   Return a substring of *str*, from character index *start* (included) to
   character index *end* (excluded).  Negative indices are not supported.
   .. versionadded:: 3.3
 .. c:function:: Py_UCS4* PyUnicode_AsUCS4(PyObject *u, Py_UCS4 *buffer, \
                                          Py_ssize_t buflen, int copy_null)
   Copy the string *u* into a UCS4 buffer, including a null character, if
   *copy_null* is set.  Returns *NULL* and sets an exception on error (in
   particular, a :exc:`ValueError` if *buflen* is smaller than the length of
   *u*).  *buffer* is returned on success.
   .. versionadded:: 3.3
 .. c:function:: Py_UCS4* PyUnicode_AsUCS4Copy(PyObject *u)
   Copy the string *u* into a new UCS4 buffer that is allocated using
   :c:func:`PyMem_Malloc`.  If this fails, *NULL* is returned with a
   :exc:`MemoryError` set.
   .. versionadded:: 3.3
 Deprecated Py_UNICODE APIs
 """"""""""""""""""""""""""
 .. deprecated-removed:: 3.3 4.0
 These API functions are deprecated with the implementation of :pep:`393`.
 Extension modules can continue using them, as they will not be removed in Python
 3.x, but need to be aware that their use can now cause performance and memory hits.
 .. c:function:: PyObject* PyUnicode_FromUnicode(const Py_UNICODE *u, Py_ssize_t size)
   Create a Unicode object from the Py_UNICODE buffer *u* of the given size. *u*
   may be *NULL* which causes the contents to be undefined. It is the user's
   responsibility to fill in the needed data.  The buffer is copied into the new
   object.
   If the buffer is not *NULL*, the return value might be a shared object.
   Therefore, modification of the resulting Unicode object is only allowed when
   *u* is *NULL*.
   If the buffer is *NULL*, :c:func:`PyUnicode_READY` must be called once the
   string content has been filled before using any of the access macros such as
   :c:func:`PyUnicode_KIND`.
   Please migrate to using :c:func:`PyUnicode_FromKindAndData` or
   :c:func:`PyUnicode_New`.
 .. c:function:: Py_UNICODE* PyUnicode_AsUnicode(PyObject *unicode)
   Return a read-only pointer to the Unicode object's internal
   :c:type:`Py_UNICODE` buffer, *NULL* if *unicode* is not a Unicode object.
   This will create the :c:type:`Py_UNICODE` representation of the object if it
   is not yet available.
   Please migrate to using :c:func:`PyUnicode_AsUCS4`,
   :c:func:`PyUnicode_Substring`, :c:func:`PyUnicode_ReadChar` or similar new
   APIs.
 .. c:function:: PyObject* PyUnicode_TransformDecimalToASCII(Py_UNICODE *s, Py_ssize_t size)
   Create a Unicode object by replacing all decimal digits in
   :c:type:`Py_UNICODE` buffer of the given *size* by ASCII digits 0--9
   according to their decimal value.  Return *NULL* if an exception occurs.
 .. c:function:: Py_UNICODE* PyUnicode_AsUnicodeAndSize(PyObject *unicode, Py_ssize_t *size)
   Like :c:func:`PyUnicode_AsUnicode`, but also saves the :c:func:`Py_UNICODE`
   array length in *size*.
   .. versionadded:: 3.3
 .. c:function:: Py_UNICODE* PyUnicode_AsUnicodeCopy(PyObject *unicode)
   Create a copy of a Unicode string ending with a nul character. Return *NULL*
   and raise a :exc:`MemoryError` exception on memory allocation failure,
   otherwise return a new allocated buffer (use :c:func:`PyMem_Free` to free the
   buffer).
   .. versionadded:: 3.2
   Please migrate to using :c:func:`PyUnicode_AsUCS4Copy` or similar new APIs.
 .. c:function:: Py_ssize_t PyUnicode_GetSize(PyObject *unicode)
   Return the size of the deprecated :c:type:`Py_UNICODE` representation, in
   code units (this includes surrogate pairs as 2 units).
   Please migrate to using :c:func:`PyUnicode_GetLength`.
 .. c:function:: PyObject* PyUnicode_FromObject(PyObject *obj)
   Shortcut for ``PyUnicode_FromEncodedObject(obj, NULL, "strict")`` which is used
   throughout the interpreter whenever coercion to Unicode is needed.
 If the platform supports :c:type:`wchar_t` and provides a header file wchar.h,
 Python can interface directly to this type using the following functions.
 Support is optimized if Python's own :c:type:`Py_UNICODE` type is identical to
 the system's :c:type:`wchar_t`.
 File System Encoding
 """"""""""""""""""""
@ -526,6 +810,26 @@ wchar_t Support
   .. versionadded:: 3.2
 UCS4 Support
 """"""""""""
 .. versionadded:: 3.3
 .. XXX are these meant to be public?
 .. c:function:: size_t Py_UCS4_strlen(const Py_UCS4 *u)
                Py_UCS4* Py_UCS4_strcpy(Py_UCS4 *s1, const Py_UCS4 *s2)
                Py_UCS4* Py_UCS4_strncpy(Py_UCS4 *s1, const Py_UCS4 *s2, size_t n)
                Py_UCS4* Py_UCS4_strcat(Py_UCS4 *s1, const Py_UCS4 *s2)
                int Py_UCS4_strcmp(const Py_UCS4 *s1, const Py_UCS4 *s2)
                int Py_UCS4_strncmp(const Py_UCS4 *s1, const Py_UCS4 *s2, size_t n)
                Py_UCS4* strchr(const Py_UCS4 *s, Py_UCS4 c)
                Py_UCS4* strrchr(const Py_UCS4 *s, Py_UCS4 c)
   These utility functions work on strings of :c:type:`Py_UCS4` characters and
   otherwise behave like the C standard library functions with the same name.
 .. _builtincodecs:
 Built-in Codecs
@ -560,7 +864,8 @@ Generic Codecs
 These are the generic codec APIs:
-.. c:function:: PyObject* PyUnicode_Decode(const char *s, Py_ssize_t size, const char *encoding, const char *errors)
+.. c:function:: PyObject* PyUnicode_Decode(const char *s, Py_ssize_t size, \
                              const char *encoding, const char *errors)
   Create a Unicode object by decoding *size* bytes of the encoded string *s*.
   *encoding* and *errors* have the same meaning as the parameters of the same name
@ -569,7 +874,8 @@ These are the generic codec APIs:
   the codec.
-.. c:function:: PyObject* PyUnicode_Encode(const Py_UNICODE *s, Py_ssize_t size, const char *encoding, const char *errors)
+.. c:function:: PyObject* PyUnicode_Encode(const Py_UNICODE *s, Py_ssize_t size, \
                              const char *encoding, const char *errors)
   Encode the :c:type:`Py_UNICODE` buffer *s* of the given *size* and return a Python
   bytes object.  *encoding* and *errors* have the same meaning as the
@ -577,8 +883,13 @@ These are the generic codec APIs:
   to be used is looked up using the Python codec registry.  Return *NULL* if an
   exception was raised by the codec.
   .. deprecated-removed:: 3.3 4.0
      Part of the old-style :c:type:`Py_UNICODE` API; please migrate to using
      :c:func:`PyUnicode_AsEncodedString`.
-.. c:function:: PyObject* PyUnicode_AsEncodedString(PyObject *unicode, const char *encoding, const char *errors)
+
 .. c:function:: PyObject* PyUnicode_AsEncodedString(PyObject *unicode, \
                              const char *encoding, const char *errors)
   Encode a Unicode object and return the result as Python bytes object.
   *encoding* and *errors* have the same meaning as the parameters of the same
@ -599,7 +910,8 @@ These are the UTF-8 codec APIs:
   *s*. Return *NULL* if an exception was raised by the codec.
-.. c:function:: PyObject* PyUnicode_DecodeUTF8Stateful(const char *s, Py_ssize_t size, const char *errors, Py_ssize_t *consumed)
+.. c:function:: PyObject* PyUnicode_DecodeUTF8Stateful(const char *s, Py_ssize_t size, \
                              const char *errors, Py_ssize_t *consumed)
   If *consumed* is *NULL*, behave like :c:func:`PyUnicode_DecodeUTF8`. If
   *consumed* is not *NULL*, trailing incomplete UTF-8 byte sequences will not be
@ -613,6 +925,10 @@ These are the UTF-8 codec APIs:
   return a Python bytes object.  Return *NULL* if an exception was raised by
   the codec.
   .. deprecated-removed:: 3.3 4.0
      Part of the old-style :c:type:`Py_UNICODE` API; please migrate to using
      :c:func:`PyUnicode_AsUTF8String` or :c:func:`PyUnicode_AsUTF8AndSize`.
 .. c:function:: PyObject* PyUnicode_AsUTF8String(PyObject *unicode)
@ -621,13 +937,37 @@ These are the UTF-8 codec APIs:
   raised by the codec.
 .. c:function:: char* PyUnicode_AsUTF8AndSize(PyObject *unicode, Py_ssize_t *size)
   Return a pointer to the default encoding (UTF-8) of the Unicode object, and
   store the size of the encoded representation (in bytes) in *size*.  *size*
   can be *NULL*, in this case no size will be stored.
   In the case of an error, *NULL* is returned with an exception set and no
   *size* is stored.
   This caches the UTF-8 representation of the string in the Unicode object, and
   subsequent calls will return a pointer to the same buffer.  The caller is not
   responsible for deallocating the buffer.
   .. versionadded:: 3.3
 .. c:function:: char* PyUnicode_AsUTF8(PyObject *unicode)
   As :c:func:`PyUnicode_AsUTF8AndSize`, but does not store the size.
   .. versionadded:: 3.3
 UTF-32 Codecs
 """""""""""""
 These are the UTF-32 codec APIs:
-.. c:function:: PyObject* PyUnicode_DecodeUTF32(const char *s, Py_ssize_t size, const char *errors, int *byteorder)
+.. c:function:: PyObject* PyUnicode_DecodeUTF32(const char *s, Py_ssize_t size, \
                              const char *errors, int *byteorder)
   Decode *size* bytes from a UTF-32 encoded buffer string and return the
   corresponding Unicode object.  *errors* (if non-*NULL*) defines the error
@ -655,7 +995,8 @@ These are the UTF-32 codec APIs:
   Return *NULL* if an exception was raised by the codec.
-.. c:function:: PyObject* PyUnicode_DecodeUTF32Stateful(const char *s, Py_ssize_t size, const char *errors, int *byteorder, Py_ssize_t *consumed)
+.. c:function:: PyObject* PyUnicode_DecodeUTF32Stateful(const char *s, Py_ssize_t size, \
                              const char *errors, int *byteorder, Py_ssize_t *consumed)
   If *consumed* is *NULL*, behave like :c:func:`PyUnicode_DecodeUTF32`. If
   *consumed* is not *NULL*, :c:func:`PyUnicode_DecodeUTF32Stateful` will not treat
@ -664,7 +1005,8 @@ These are the UTF-32 codec APIs:
   that have been decoded will be stored in *consumed*.
-.. c:function:: PyObject* PyUnicode_EncodeUTF32(const Py_UNICODE *s, Py_ssize_t size, const char *errors, int byteorder)
+.. c:function:: PyObject* PyUnicode_EncodeUTF32(const Py_UNICODE *s, Py_ssize_t size, \
                              const char *errors, int byteorder)
   Return a Python bytes object holding the UTF-32 encoded value of the Unicode
   data in *s*.  Output is written according to the following byte order::
@ -681,6 +1023,10 @@ These are the UTF-32 codec APIs:
   Return *NULL* if an exception was raised by the codec.
   .. deprecated-removed:: 3.3 4.0
      Part of the old-style :c:type:`Py_UNICODE` API; please migrate to using
      :c:func:`PyUnicode_AsUTF32String`.
 .. c:function:: PyObject* PyUnicode_AsUTF32String(PyObject *unicode)
@ -695,7 +1041,8 @@ UTF-16 Codecs
 These are the UTF-16 codec APIs:
-.. c:function:: PyObject* PyUnicode_DecodeUTF16(const char *s, Py_ssize_t size, const char *errors, int *byteorder)
+.. c:function:: PyObject* PyUnicode_DecodeUTF16(const char *s, Py_ssize_t size, \
                              const char *errors, int *byteorder)
   Decode *size* bytes from a UTF-16 encoded buffer string and return the
   corresponding Unicode object.  *errors* (if non-*NULL*) defines the error
@ -722,7 +1069,8 @@ These are the UTF-16 codec APIs:
   Return *NULL* if an exception was raised by the codec.
-.. c:function:: PyObject* PyUnicode_DecodeUTF16Stateful(const char *s, Py_ssize_t size, const char *errors, int *byteorder, Py_ssize_t *consumed)
+.. c:function:: PyObject* PyUnicode_DecodeUTF16Stateful(const char *s, Py_ssize_t size, \
                              const char *errors, int *byteorder, Py_ssize_t *consumed)
   If *consumed* is *NULL*, behave like :c:func:`PyUnicode_DecodeUTF16`. If
   *consumed* is not *NULL*, :c:func:`PyUnicode_DecodeUTF16Stateful` will not treat
@ -731,7 +1079,8 @@ These are the UTF-16 codec APIs:
   number of bytes that have been decoded will be stored in *consumed*.
-.. c:function:: PyObject* PyUnicode_EncodeUTF16(const Py_UNICODE *s, Py_ssize_t size, const char *errors, int byteorder)
+.. c:function:: PyObject* PyUnicode_EncodeUTF16(const Py_UNICODE *s, Py_ssize_t size, \
                              const char *errors, int byteorder)
   Return a Python bytes object holding the UTF-16 encoded value of the Unicode
   data in *s*.  Output is written according to the following byte order::
@ -749,6 +1098,10 @@ These are the UTF-16 codec APIs:
   Return *NULL* if an exception was raised by the codec.
   .. deprecated-removed:: 3.3 4.0
      Part of the old-style :c:type:`Py_UNICODE` API; please migrate to using
      :c:func:`PyUnicode_AsUTF16String`.
 .. c:function:: PyObject* PyUnicode_AsUTF16String(PyObject *unicode)
@ -769,7 +1122,8 @@ These are the UTF-7 codec APIs:
   *s*.  Return *NULL* if an exception was raised by the codec.
-.. c:function:: PyObject* PyUnicode_DecodeUTF7Stateful(const char *s, Py_ssize_t size, const char *errors, Py_ssize_t *consumed)
+.. c:function:: PyObject* PyUnicode_DecodeUTF7Stateful(const char *s, Py_ssize_t size, \
                              const char *errors, Py_ssize_t *consumed)
   If *consumed* is *NULL*, behave like :c:func:`PyUnicode_DecodeUTF7`.  If
   *consumed* is not *NULL*, trailing incomplete UTF-7 base-64 sections will not
@ -777,7 +1131,8 @@ These are the UTF-7 codec APIs:
   bytes that have been decoded will be stored in *consumed*.
-.. c:function:: PyObject* PyUnicode_EncodeUTF7(const Py_UNICODE *s, Py_ssize_t size, int base64SetO, int base64WhiteSpace, const char *errors)
+.. c:function:: PyObject* PyUnicode_EncodeUTF7(const Py_UNICODE *s, Py_ssize_t size, \
                              int base64SetO, int base64WhiteSpace, const char *errors)
   Encode the :c:type:`Py_UNICODE` buffer of the given size using UTF-7 and
   return a Python bytes object.  Return *NULL* if an exception was raised by
@ -788,6 +1143,11 @@ These are the UTF-7 codec APIs:
   nonzero, whitespace will be encoded in base-64.  Both are set to zero for the
   Python "utf-7" codec.
   .. deprecated-removed:: 3.3 4.0
      Part of the old-style :c:type:`Py_UNICODE` API.
   .. XXX replace with what?
 Unicode-Escape Codecs
 """""""""""""""""""""
@ -795,7 +1155,8 @@ Unicode-Escape Codecs
 These are the "Unicode Escape" codec APIs:
-.. c:function:: PyObject* PyUnicode_DecodeUnicodeEscape(const char *s, Py_ssize_t size, const char *errors)
+.. c:function:: PyObject* PyUnicode_DecodeUnicodeEscape(const char *s, \
                              Py_ssize_t size, const char *errors)
   Create a Unicode object by decoding *size* bytes of the Unicode-Escape encoded
   string *s*.  Return *NULL* if an exception was raised by the codec.
@ -807,6 +1168,10 @@ These are the "Unicode Escape" codec APIs:
   return a Python string object.  Return *NULL* if an exception was raised by the
   codec.
   .. deprecated-removed:: 3.3 4.0
      Part of the old-style :c:type:`Py_UNICODE` API; please migrate to using
      :c:func:`PyUnicode_AsUnicodeEscapeString`.
 .. c:function:: PyObject* PyUnicode_AsUnicodeEscapeString(PyObject *unicode)
@ -821,18 +1186,24 @@ Raw-Unicode-Escape Codecs
 These are the "Raw Unicode Escape" codec APIs:
-.. c:function:: PyObject* PyUnicode_DecodeRawUnicodeEscape(const char *s, Py_ssize_t size, const char *errors)
+.. c:function:: PyObject* PyUnicode_DecodeRawUnicodeEscape(const char *s, \
                              Py_ssize_t size, const char *errors)
   Create a Unicode object by decoding *size* bytes of the Raw-Unicode-Escape
   encoded string *s*.  Return *NULL* if an exception was raised by the codec.
-.. c:function:: PyObject* PyUnicode_EncodeRawUnicodeEscape(const Py_UNICODE *s, Py_ssize_t size, const char *errors)
+.. c:function:: PyObject* PyUnicode_EncodeRawUnicodeEscape(const Py_UNICODE *s, \
                              Py_ssize_t size, const char *errors)
   Encode the :c:type:`Py_UNICODE` buffer of the given *size* using Raw-Unicode-Escape
   and return a Python string object.  Return *NULL* if an exception was raised by
   the codec.
   .. deprecated-removed:: 3.3 4.0
      Part of the old-style :c:type:`Py_UNICODE` API; please migrate to using
      :c:func:`PyUnicode_AsRawUnicodeEscapeString`.
 .. c:function:: PyObject* PyUnicode_AsRawUnicodeEscapeString(PyObject *unicode)
@ -860,6 +1231,10 @@ ordinals and only these are accepted by the codecs during encoding.
   return a Python bytes object.  Return *NULL* if an exception was raised by
   the codec.
   .. deprecated-removed:: 3.3 4.0
      Part of the old-style :c:type:`Py_UNICODE` API; please migrate to using
      :c:func:`PyUnicode_AsLatin1String`.
 .. c:function:: PyObject* PyUnicode_AsLatin1String(PyObject *unicode)
@ -887,6 +1262,10 @@ codes generate errors.
   return a Python bytes object.  Return *NULL* if an exception was raised by
   the codec.
   .. deprecated-removed:: 3.3 4.0
      Part of the old-style :c:type:`Py_UNICODE` API; please migrate to using
      :c:func:`PyUnicode_AsASCIIString`.
 .. c:function:: PyObject* PyUnicode_AsASCIIString(PyObject *unicode)
@ -921,7 +1300,8 @@ characters to different code points.
 These are the mapping codec APIs:
-.. c:function:: PyObject* PyUnicode_DecodeCharmap(const char *s, Py_ssize_t size, PyObject *mapping, const char *errors)
+.. c:function:: PyObject* PyUnicode_DecodeCharmap(const char *s, Py_ssize_t size, \
                              PyObject *mapping, const char *errors)
   Create a Unicode object by decoding *size* bytes of the encoded string *s* using
   the given *mapping* object.  Return *NULL* if an exception was raised by the
@ -931,12 +1311,17 @@ These are the mapping codec APIs:
   treated as "undefined mapping".
-.. c:function:: PyObject* PyUnicode_EncodeCharmap(const Py_UNICODE *s, Py_ssize_t size, PyObject *mapping, const char *errors)
+.. c:function:: PyObject* PyUnicode_EncodeCharmap(const Py_UNICODE *s, Py_ssize_t size, \
                              PyObject *mapping, const char *errors)
   Encode the :c:type:`Py_UNICODE` buffer of the given *size* using the given
   *mapping* object and return a Python string object. Return *NULL* if an
   exception was raised by the codec.
   .. deprecated-removed:: 3.3 4.0
      Part of the old-style :c:type:`Py_UNICODE` API; please migrate to using
      :c:func:`PyUnicode_AsCharmapString`.
 .. c:function:: PyObject* PyUnicode_AsCharmapString(PyObject *unicode, PyObject *mapping)
@ -947,7 +1332,8 @@ These are the mapping codec APIs:
 The following codec API is special in that maps Unicode to Unicode.
-.. c:function:: PyObject* PyUnicode_TranslateCharmap(const Py_UNICODE *s, Py_ssize_t size, PyObject *table, const char *errors)
+.. c:function:: PyObject* PyUnicode_TranslateCharmap(const Py_UNICODE *s, Py_ssize_t size, \
                              PyObject *table, const char *errors)
   Translate a :c:type:`Py_UNICODE` buffer of the given *size* by applying a
   character mapping *table* to it and return the resulting Unicode object.  Return
@ -960,6 +1346,10 @@ The following codec API is special in that maps Unicode to Unicode.
   and sequences work well.  Unmapped character ordinals (ones which cause a
   :exc:`LookupError`) are left untouched and are copied as-is.
   .. deprecated-removed:: 3.3 4.0
      Part of the old-style :c:type:`Py_UNICODE` API.
   .. XXX replace with what?
 MBCS codecs for Windows
@ -976,7 +1366,8 @@ the user settings on the machine running the codec.
   Return *NULL* if an exception was raised by the codec.
-.. c:function:: PyObject* PyUnicode_DecodeMBCSStateful(const char *s, int size, const char *errors, int *consumed)
+.. c:function:: PyObject* PyUnicode_DecodeMBCSStateful(const char *s, int size, \
                              const char *errors, int *consumed)
   If *consumed* is *NULL*, behave like :c:func:`PyUnicode_DecodeMBCS`. If
   *consumed* is not *NULL*, :c:func:`PyUnicode_DecodeMBCSStateful` will not decode
@ -990,6 +1381,10 @@ the user settings on the machine running the codec.
   a Python bytes object.  Return *NULL* if an exception was raised by the
   codec.
   .. deprecated-removed:: 3.3 4.0
      Part of the old-style :c:type:`Py_UNICODE` API; please migrate to using
      :c:func:`PyUnicode_AsMBCSString`.
 .. c:function:: PyObject* PyUnicode_AsMBCSString(PyObject *unicode)
@ -1034,7 +1429,8 @@ They all return *NULL* or ``-1`` if an exception occurs.
   characters are not included in the resulting strings.
-.. c:function:: PyObject* PyUnicode_Translate(PyObject *str, PyObject *table, const char *errors)
+.. c:function:: PyObject* PyUnicode_Translate(PyObject *str, PyObject *table, \
                              const char *errors)
   Translate a string by applying a character mapping table to it and return the
   resulting Unicode object.
@ -1056,14 +1452,16 @@ They all return *NULL* or ``-1`` if an exception occurs.
   Unicode string.
-.. c:function:: int PyUnicode_Tailmatch(PyObject *str, PyObject *substr, Py_ssize_t start, Py_ssize_t end, int direction)
+.. c:function:: int PyUnicode_Tailmatch(PyObject *str, PyObject *substr, \
                        Py_ssize_t start, Py_ssize_t end, int direction)
   Return 1 if *substr* matches ``str[start:end]`` at the given tail end
   (*direction* == -1 means to do a prefix match, *direction* == 1 a suffix match),
   0 otherwise. Return ``-1`` if an error occurred.
-.. c:function:: Py_ssize_t PyUnicode_Find(PyObject *str, PyObject *substr, Py_ssize_t start, Py_ssize_t end, int direction)
+.. c:function:: Py_ssize_t PyUnicode_Find(PyObject *str, PyObject *substr, \
                               Py_ssize_t start, Py_ssize_t end, int direction)
   Return the first position of *substr* in ``str[start:end]`` using the given
   *direction* (*direction* == 1 means to do a forward search, *direction* == -1 a
@ -1072,7 +1470,8 @@ They all return *NULL* or ``-1`` if an exception occurs.
   occurred and an exception has been set.
-.. c:function:: Py_ssize_t PyUnicode_FindChar(PyObject *str, Py_UCS4 ch, Py_ssize_t start, Py_ssize_t end, int direction)
+.. c:function:: Py_ssize_t PyUnicode_FindChar(PyObject *str, Py_UCS4 ch, \
                               Py_ssize_t start, Py_ssize_t end, int direction)
   Return the first position of the character *ch* in ``str[start:end]`` using
   the given *direction* (*direction* == 1 means to do a forward search,
@ -1083,13 +1482,15 @@ They all return *NULL* or ``-1`` if an exception occurs.
   .. versionadded:: 3.3
-.. c:function:: Py_ssize_t PyUnicode_Count(PyObject *str, PyObject *substr, Py_ssize_t start, Py_ssize_t end)
+.. c:function:: Py_ssize_t PyUnicode_Count(PyObject *str, PyObject *substr, \
                               Py_ssize_t start, Py_ssize_t end)
   Return the number of non-overlapping occurrences of *substr* in
   ``str[start:end]``.  Return ``-1`` if an error occurred.
-.. c:function:: PyObject* PyUnicode_Replace(PyObject *str, PyObject *substr, PyObject *replstr, Py_ssize_t maxcount)
+.. c:function:: PyObject* PyUnicode_Replace(PyObject *str, PyObject *substr, \
                              PyObject *replstr, Py_ssize_t maxcount)
   Replace at most *maxcount* occurrences of *substr* in *str* with *replstr* and
   return the resulting Unicode object. *maxcount* == -1 means replace all
@ -1137,8 +1538,8 @@ They all return *NULL* or ``-1`` if an exception occurs.
   Check whether *element* is contained in *container* and return true or false
   accordingly.
-   *element* has to coerce to a one element Unicode string. ``-1`` is returned if
+   *element* has to coerce to a one element Unicode string. ``-1`` is returned
-   there was an error.
+   if there was an error.
 .. c:function:: void PyUnicode_InternInPlace(PyObject **string)
@ -1157,7 +1558,6 @@ They all return *NULL* or ``-1`` if an exception occurs.
 .. c:function:: PyObject* PyUnicode_InternFromString(const char *v)
   A combination of :c:func:`PyUnicode_FromString` and
-   :c:func:`PyUnicode_InternInPlace`, returning either a new unicode string object
+   :c:func:`PyUnicode_InternInPlace`, returning either a new unicode string
-   that has been interned, or a new ("owned") reference to an earlier interned
+   object that has been interned, or a new ("owned") reference to an earlier
-   string object with the same value.
+   interned string object with the same value.
--- a/Include/unicodeobject.h
+++ b/Include/unicodeobject.h
@ -686,7 +686,7 @@ PyAPI_FUNC(PyObject*) PyUnicode_Substring(
    Py_ssize_t start,
    Py_ssize_t end);
-/* Copy the string into a UCS4 buffer including the null character is copy_null
+/* Copy the string into a UCS4 buffer including the null character if copy_null
   is set. Return NULL and raise an exception on error. Raise a ValueError if
   the buffer is smaller than the string. Return buffer on success.