#11840: Improve c-api/unicode documentation. Patch by Sandro Tosi.
This commit is contained in:
parent
832c8bbe51
commit
95cd91c17f
|
@ -329,8 +329,8 @@ APIs:
|
|||
incremented refcount.
|
||||
|
||||
:class:`bytes`, :class:`bytearray` and other char buffer compatible objects
|
||||
are decoded according to the given encoding and using the error handling
|
||||
defined by errors. Both can be *NULL* to have the interface use the default
|
||||
are decoded according to the given *encoding* and using the error handling
|
||||
defined by *errors*. Both can be *NULL* to have the interface use the default
|
||||
values (see the next section for details).
|
||||
|
||||
All other objects, including Unicode objects, cause a :exc:`TypeError` to be
|
||||
|
@ -390,12 +390,12 @@ used, passing :cfunc:`PyUnicode_FSConverter` as the conversion function:
|
|||
wchar_t Support
|
||||
"""""""""""""""
|
||||
|
||||
wchar_t support for platforms which support it:
|
||||
:ctype:`wchar_t` support for platforms which support it:
|
||||
|
||||
.. cfunction:: PyObject* PyUnicode_FromWideChar(const wchar_t *w, Py_ssize_t size)
|
||||
|
||||
Create a Unicode object from the :ctype:`wchar_t` buffer *w* of the given size.
|
||||
Passing -1 as the size indicates that the function must itself compute the length,
|
||||
Create a Unicode object from the :ctype:`wchar_t` buffer *w* of the given *size*.
|
||||
Passing -1 as the *size* indicates that the function must itself compute the length,
|
||||
using wcslen.
|
||||
Return *NULL* on failure.
|
||||
|
||||
|
@ -419,15 +419,15 @@ Built-in Codecs
|
|||
Python provides a set of built-in codecs which are written in C for speed. All of
|
||||
these codecs are directly usable via the following functions.
|
||||
|
||||
Many of the following APIs take two arguments encoding and errors. These
|
||||
parameters encoding and errors have the same semantics as the ones of the
|
||||
built-in :func:`str` string object constructor.
|
||||
Many of the following APIs take two arguments encoding and errors, and they
|
||||
have the same semantics as the ones of the built-in :func:`str` string object
|
||||
constructor.
|
||||
|
||||
Setting encoding to *NULL* causes the default encoding to be used
|
||||
which is ASCII. The file system calls should use
|
||||
:cfunc:`PyUnicode_FSConverter` for encoding file names. This uses the
|
||||
variable :cdata:`Py_FileSystemDefaultEncoding` internally. This
|
||||
variable should be treated as read-only: On some systems, it will be a
|
||||
variable should be treated as read-only: on some systems, it will be a
|
||||
pointer to a static string, on others, it will change at run-time
|
||||
(such as when the application invokes setlocale).
|
||||
|
||||
|
@ -456,7 +456,7 @@ These are the generic codec APIs:
|
|||
|
||||
.. cfunction:: PyObject* PyUnicode_Encode(const Py_UNICODE *s, Py_ssize_t size, const char *encoding, const char *errors)
|
||||
|
||||
Encode the :ctype:`Py_UNICODE` buffer of the given size and return a Python
|
||||
Encode the :ctype:`Py_UNICODE` buffer *s* of the given *size* and return a Python
|
||||
bytes object. *encoding* and *errors* have the same meaning as the
|
||||
parameters of the same name in the Unicode :meth:`encode` method. The codec
|
||||
to be used is looked up using the Python codec registry. Return *NULL* if an
|
||||
|
@ -494,7 +494,7 @@ These are the UTF-8 codec APIs:
|
|||
|
||||
.. cfunction:: PyObject* PyUnicode_EncodeUTF8(const Py_UNICODE *s, Py_ssize_t size, const char *errors)
|
||||
|
||||
Encode the :ctype:`Py_UNICODE` buffer of the given size using UTF-8 and
|
||||
Encode the :ctype:`Py_UNICODE` buffer *s* of the given *size* using UTF-8 and
|
||||
return a Python bytes object. Return *NULL* if an exception was raised by
|
||||
the codec.
|
||||
|
||||
|
@ -514,7 +514,7 @@ These are the UTF-32 codec APIs:
|
|||
|
||||
.. cfunction:: PyObject* PyUnicode_DecodeUTF32(const char *s, Py_ssize_t size, const char *errors, int *byteorder)
|
||||
|
||||
Decode *length* bytes from a UTF-32 encoded buffer string and return the
|
||||
Decode *size* bytes from a UTF-32 encoded buffer string and return the
|
||||
corresponding Unicode object. *errors* (if non-*NULL*) defines the error
|
||||
handling. It defaults to "strict".
|
||||
|
||||
|
@ -582,7 +582,7 @@ These are the UTF-16 codec APIs:
|
|||
|
||||
.. cfunction:: PyObject* PyUnicode_DecodeUTF16(const char *s, Py_ssize_t size, const char *errors, int *byteorder)
|
||||
|
||||
Decode *length* bytes from a UTF-16 encoded buffer string and return the
|
||||
Decode *size* bytes from a UTF-16 encoded buffer string and return the
|
||||
corresponding Unicode object. *errors* (if non-*NULL*) defines the error
|
||||
handling. It defaults to "strict".
|
||||
|
||||
|
@ -714,7 +714,7 @@ These are the "Raw Unicode Escape" codec APIs:
|
|||
|
||||
.. cfunction:: PyObject* PyUnicode_EncodeRawUnicodeEscape(const Py_UNICODE *s, Py_ssize_t size, const char *errors)
|
||||
|
||||
Encode the :ctype:`Py_UNICODE` buffer of the given size using Raw-Unicode-Escape
|
||||
Encode the :ctype:`Py_UNICODE` buffer of the given *size* using Raw-Unicode-Escape
|
||||
and return a Python string object. Return *NULL* if an exception was raised by
|
||||
the codec.
|
||||
|
||||
|
@ -741,7 +741,7 @@ ordinals and only these are accepted by the codecs during encoding.
|
|||
|
||||
.. cfunction:: PyObject* PyUnicode_EncodeLatin1(const Py_UNICODE *s, Py_ssize_t size, const char *errors)
|
||||
|
||||
Encode the :ctype:`Py_UNICODE` buffer of the given size using Latin-1 and
|
||||
Encode the :ctype:`Py_UNICODE` buffer of the given *size* using Latin-1 and
|
||||
return a Python bytes object. Return *NULL* if an exception was raised by
|
||||
the codec.
|
||||
|
||||
|
@ -768,7 +768,7 @@ codes generate errors.
|
|||
|
||||
.. cfunction:: PyObject* PyUnicode_EncodeASCII(const Py_UNICODE *s, Py_ssize_t size, const char *errors)
|
||||
|
||||
Encode the :ctype:`Py_UNICODE` buffer of the given size using ASCII and
|
||||
Encode the :ctype:`Py_UNICODE` buffer of the given *size* using ASCII and
|
||||
return a Python bytes object. Return *NULL* if an exception was raised by
|
||||
the codec.
|
||||
|
||||
|
@ -783,8 +783,6 @@ codes generate errors.
|
|||
Character Map Codecs
|
||||
""""""""""""""""""""
|
||||
|
||||
These are the mapping codec APIs:
|
||||
|
||||
This codec is special in that it can be used to implement many different codecs
|
||||
(and this is in fact what was done to obtain most of the standard codecs
|
||||
included in the :mod:`encodings` package). The codec uses mapping to encode and
|
||||
|
@ -806,6 +804,7 @@ meaning that its ordinal value will be interpreted as Unicode or Latin-1 ordinal
|
|||
resp. Because of this, mappings only need to contain those mappings which map
|
||||
characters to different code points.
|
||||
|
||||
These are the mapping codec APIs:
|
||||
|
||||
.. cfunction:: PyObject* PyUnicode_DecodeCharmap(const char *s, Py_ssize_t size, PyObject *mapping, const char *errors)
|
||||
|
||||
|
@ -819,7 +818,7 @@ characters to different code points.
|
|||
|
||||
.. cfunction:: PyObject* PyUnicode_EncodeCharmap(const Py_UNICODE *s, Py_ssize_t size, PyObject *mapping, const char *errors)
|
||||
|
||||
Encode the :ctype:`Py_UNICODE` buffer of the given size using the given
|
||||
Encode the :ctype:`Py_UNICODE` buffer of the given *size* using the given
|
||||
*mapping* object and return a Python string object. Return *NULL* if an
|
||||
exception was raised by the codec.
|
||||
|
||||
|
@ -835,7 +834,7 @@ The following codec API is special in that maps Unicode to Unicode.
|
|||
|
||||
.. cfunction:: PyObject* PyUnicode_TranslateCharmap(const Py_UNICODE *s, Py_ssize_t size, PyObject *table, const char *errors)
|
||||
|
||||
Translate a :ctype:`Py_UNICODE` buffer of the given length by applying a
|
||||
Translate a :ctype:`Py_UNICODE` buffer of the given *size* by applying a
|
||||
character mapping *table* to it and return the resulting Unicode object. Return
|
||||
*NULL* when an exception was raised by the codec.
|
||||
|
||||
|
@ -847,16 +846,15 @@ The following codec API is special in that maps Unicode to Unicode.
|
|||
:exc:`LookupError`) are left untouched and are copied as-is.
|
||||
|
||||
|
||||
MBCS codecs for Windows
|
||||
"""""""""""""""""""""""
|
||||
|
||||
These are the MBCS codec APIs. They are currently only available on Windows and
|
||||
use the Win32 MBCS converters to implement the conversions. Note that MBCS (or
|
||||
DBCS) is a class of encodings, not just one. The target encoding is defined by
|
||||
the user settings on the machine running the codec.
|
||||
|
||||
|
||||
MBCS codecs for Windows
|
||||
"""""""""""""""""""""""
|
||||
|
||||
|
||||
.. cfunction:: PyObject* PyUnicode_DecodeMBCS(const char *s, Py_ssize_t size, const char *errors)
|
||||
|
||||
Create a Unicode object by decoding *size* bytes of the MBCS encoded string *s*.
|
||||
|
@ -873,7 +871,7 @@ MBCS codecs for Windows
|
|||
|
||||
.. cfunction:: PyObject* PyUnicode_EncodeMBCS(const Py_UNICODE *s, Py_ssize_t size, const char *errors)
|
||||
|
||||
Encode the :ctype:`Py_UNICODE` buffer of the given size using MBCS and return
|
||||
Encode the :ctype:`Py_UNICODE` buffer of the given *size* using MBCS and return
|
||||
a Python bytes object. Return *NULL* if an exception was raised by the
|
||||
codec.
|
||||
|
||||
|
@ -908,7 +906,7 @@ They all return *NULL* or ``-1`` if an exception occurs.
|
|||
|
||||
.. cfunction:: PyObject* PyUnicode_Split(PyObject *s, PyObject *sep, Py_ssize_t maxsplit)
|
||||
|
||||
Split a string giving a list of Unicode strings. If sep is *NULL*, splitting
|
||||
Split a string giving a list of Unicode strings. If *sep* is *NULL*, splitting
|
||||
will be done at all whitespace substrings. Otherwise, splits occur at the given
|
||||
separator. At most *maxsplit* splits will be done. If negative, no limit is
|
||||
set. Separators are not included in the resulting list.
|
||||
|
@ -939,20 +937,20 @@ They all return *NULL* or ``-1`` if an exception occurs.
|
|||
|
||||
.. cfunction:: PyObject* PyUnicode_Join(PyObject *separator, PyObject *seq)
|
||||
|
||||
Join a sequence of strings using the given separator and return the resulting
|
||||
Join a sequence of strings using the given *separator* and return the resulting
|
||||
Unicode string.
|
||||
|
||||
|
||||
.. cfunction:: int PyUnicode_Tailmatch(PyObject *str, PyObject *substr, Py_ssize_t start, Py_ssize_t end, int direction)
|
||||
|
||||
Return 1 if *substr* matches *str*[*start*:*end*] at the given tail end
|
||||
Return 1 if *substr* matches ``str[start:end]`` at the given tail end
|
||||
(*direction* == -1 means to do a prefix match, *direction* == 1 a suffix match),
|
||||
0 otherwise. Return ``-1`` if an error occurred.
|
||||
|
||||
|
||||
.. cfunction:: Py_ssize_t PyUnicode_Find(PyObject *str, PyObject *substr, Py_ssize_t start, Py_ssize_t end, int direction)
|
||||
|
||||
Return the first position of *substr* in *str*[*start*:*end*] using the given
|
||||
Return the first position of *substr* in ``str[start:end]`` using the given
|
||||
*direction* (*direction* == 1 means to do a forward search, *direction* == -1 a
|
||||
backward search). The return value is the index of the first match; a value of
|
||||
``-1`` indicates that no match was found, and ``-2`` indicates that an error
|
||||
|
|
Loading…
Reference in New Issue