Update C API docs for PEP 393.
This commit is contained in:
parent
59de0ee9e0
commit
db6c7f5c33
|
@ -100,6 +100,20 @@ All integers are implemented as "long" integer objects of arbitrary size.
|
||||||
string is first encoded to a byte string using :c:func:`PyUnicode_EncodeDecimal`
|
string is first encoded to a byte string using :c:func:`PyUnicode_EncodeDecimal`
|
||||||
and then converted using :c:func:`PyLong_FromString`.
|
and then converted using :c:func:`PyLong_FromString`.
|
||||||
|
|
||||||
|
.. deprecated-removed:: 3.3 4.0
|
||||||
|
Part of the old-style :c:type:`Py_UNICODE` API; please migrate to using
|
||||||
|
:c:func:`PyLong_FromUnicodeObject`.
|
||||||
|
|
||||||
|
|
||||||
|
.. c:function:: PyObject* PyLong_FromUnicodeObject(PyObject *u, int base)
|
||||||
|
|
||||||
|
Convert a sequence of Unicode digits in the string *u* to a Python integer
|
||||||
|
value. The Unicode string is first encoded to a byte string using
|
||||||
|
:c:func:`PyUnicode_EncodeDecimal` and then converted using
|
||||||
|
:c:func:`PyLong_FromString`.
|
||||||
|
|
||||||
|
.. versionadded:: 3.3
|
||||||
|
|
||||||
|
|
||||||
.. c:function:: PyObject* PyLong_FromVoidPtr(void *p)
|
.. c:function:: PyObject* PyLong_FromVoidPtr(void *p)
|
||||||
|
|
||||||
|
|
|
@ -87,7 +87,7 @@ There are only a few functions special to module objects.
|
||||||
Return the name of the file from which *module* was loaded using *module*'s
|
Return the name of the file from which *module* was loaded using *module*'s
|
||||||
:attr:`__file__` attribute. If this is not defined, or if it is not a
|
:attr:`__file__` attribute. If this is not defined, or if it is not a
|
||||||
unicode string, raise :exc:`SystemError` and return *NULL*; otherwise return
|
unicode string, raise :exc:`SystemError` and return *NULL*; otherwise return
|
||||||
a reference to a :c:type:`PyUnicodeObject`.
|
a reference to a Unicode object.
|
||||||
|
|
||||||
.. versionadded:: 3.2
|
.. versionadded:: 3.2
|
||||||
|
|
||||||
|
|
|
@ -6,38 +6,58 @@ Unicode Objects and Codecs
|
||||||
--------------------------
|
--------------------------
|
||||||
|
|
||||||
.. sectionauthor:: Marc-Andre Lemburg <mal@lemburg.com>
|
.. sectionauthor:: Marc-Andre Lemburg <mal@lemburg.com>
|
||||||
|
.. sectionauthor:: Georg Brandl <georg@python.org>
|
||||||
|
|
||||||
Unicode Objects
|
Unicode Objects
|
||||||
^^^^^^^^^^^^^^^
|
^^^^^^^^^^^^^^^
|
||||||
|
|
||||||
|
Since the implementation of :pep:`393` in Python 3.3, Unicode objects internally
|
||||||
|
use a variety of representations, in order to allow handling the complete range
|
||||||
|
of Unicode characters while staying memory efficient. There are special cases
|
||||||
|
for strings where all code points are below 128, 256, or 65536; otherwise, code
|
||||||
|
points must be below 1114112 (which is the full Unicode range).
|
||||||
|
|
||||||
|
:c:type:`Py_UNICODE*` and UTF-8 representations are created on demand and cached
|
||||||
|
in the Unicode object.
|
||||||
|
|
||||||
|
|
||||||
Unicode Type
|
Unicode Type
|
||||||
""""""""""""
|
""""""""""""
|
||||||
|
|
||||||
These are the basic Unicode object types used for the Unicode implementation in
|
These are the basic Unicode object types used for the Unicode implementation in
|
||||||
Python:
|
Python:
|
||||||
|
|
||||||
|
.. c:type:: Py_UCS4
|
||||||
|
Py_UCS2
|
||||||
|
Py_UCS1
|
||||||
|
|
||||||
|
These types are typedefs for unsigned integer types wide enough to contain
|
||||||
|
characters of 32 bits, 16 bits and 8 bits, respectively. When dealing with
|
||||||
|
single Unicode characters, use :c:type:`Py_UCS4`.
|
||||||
|
|
||||||
|
.. versionadded:: 3.3
|
||||||
|
|
||||||
|
|
||||||
.. c:type:: Py_UNICODE
|
.. c:type:: Py_UNICODE
|
||||||
|
|
||||||
This type represents the storage type which is used by Python internally as
|
This is a typedef of :c:type:`wchar_t`, which is a 16-bit type or 32-bit type
|
||||||
basis for holding Unicode ordinals. Python's default builds use a 16-bit type
|
depending on the platform.
|
||||||
for :c:type:`Py_UNICODE` and store Unicode values internally as UCS2. It is also
|
|
||||||
possible to build a UCS4 version of Python (most recent Linux distributions come
|
|
||||||
with UCS4 builds of Python). These builds then use a 32-bit type for
|
|
||||||
:c:type:`Py_UNICODE` and store Unicode data internally as UCS4. On platforms
|
|
||||||
where :c:type:`wchar_t` is available and compatible with the chosen Python
|
|
||||||
Unicode build variant, :c:type:`Py_UNICODE` is a typedef alias for
|
|
||||||
:c:type:`wchar_t` to enhance native platform compatibility. On all other
|
|
||||||
platforms, :c:type:`Py_UNICODE` is a typedef alias for either :c:type:`unsigned
|
|
||||||
short` (UCS2) or :c:type:`unsigned long` (UCS4).
|
|
||||||
|
|
||||||
Note that UCS2 and UCS4 Python builds are not binary compatible. Please keep
|
.. versionchanged:: 3.3
|
||||||
this in mind when writing extensions or interfaces.
|
In previous versions, this was a 16-bit type or a 32-bit type depending on
|
||||||
|
whether you selected a "narrow" or "wide" Unicode version of Python at
|
||||||
|
build time.
|
||||||
|
|
||||||
|
|
||||||
.. c:type:: PyUnicodeObject
|
.. c:type:: PyASCIIObject
|
||||||
|
PyCompactUnicodeObject
|
||||||
|
PyUnicodeObject
|
||||||
|
|
||||||
This subtype of :c:type:`PyObject` represents a Python Unicode object.
|
These subtypes of :c:type:`PyObject` represent a Python Unicode object. In
|
||||||
|
almost all cases, they shouldn't be used directly, since all API functions
|
||||||
|
that deal with Unicode objects take and return :c:type:`PyObject` pointers.
|
||||||
|
|
||||||
|
.. versionadded:: 3.3
|
||||||
|
|
||||||
|
|
||||||
.. c:var:: PyTypeObject PyUnicode_Type
|
.. c:var:: PyTypeObject PyUnicode_Type
|
||||||
|
@ -45,10 +65,10 @@ this in mind when writing extensions or interfaces.
|
||||||
This instance of :c:type:`PyTypeObject` represents the Python Unicode type. It
|
This instance of :c:type:`PyTypeObject` represents the Python Unicode type. It
|
||||||
is exposed to Python code as ``str``.
|
is exposed to Python code as ``str``.
|
||||||
|
|
||||||
|
|
||||||
The following APIs are really C macros and can be used to do fast checks and to
|
The following APIs are really C macros and can be used to do fast checks and to
|
||||||
access internal read-only data of Unicode objects:
|
access internal read-only data of Unicode objects:
|
||||||
|
|
||||||
|
|
||||||
.. c:function:: int PyUnicode_Check(PyObject *o)
|
.. c:function:: int PyUnicode_Check(PyObject *o)
|
||||||
|
|
||||||
Return true if the object *o* is a Unicode object or an instance of a Unicode
|
Return true if the object *o* is a Unicode object or an instance of a Unicode
|
||||||
|
@ -63,26 +83,161 @@ access internal read-only data of Unicode objects:
|
||||||
|
|
||||||
.. c:function:: Py_ssize_t PyUnicode_GET_SIZE(PyObject *o)
|
.. c:function:: Py_ssize_t PyUnicode_GET_SIZE(PyObject *o)
|
||||||
|
|
||||||
Return the size of the object. *o* has to be a :c:type:`PyUnicodeObject` (not
|
Return the size of the deprecated :c:type:`Py_UNICODE` representation, in
|
||||||
checked).
|
code units (this includes surrogate pairs as 2 units). *o* has to be a
|
||||||
|
Unicode object (not checked).
|
||||||
|
|
||||||
|
.. deprecated-removed:: 3.3 4.0
|
||||||
|
Part of the old-style Unicode API, please migrate to using
|
||||||
|
:c:func:`PyUnicode_GET_LENGTH`.
|
||||||
|
|
||||||
|
|
||||||
.. c:function:: Py_ssize_t PyUnicode_GET_DATA_SIZE(PyObject *o)
|
.. c:function:: Py_ssize_t PyUnicode_GET_DATA_SIZE(PyObject *o)
|
||||||
|
|
||||||
Return the size of the object's internal buffer in bytes. *o* has to be a
|
Return the size of the deprecated :c:type:`Py_UNICODE` representation in
|
||||||
:c:type:`PyUnicodeObject` (not checked).
|
bytes. *o* has to be a Unicode object (not checked).
|
||||||
|
|
||||||
|
.. deprecated-removed:: 3.3 4.0
|
||||||
|
Part of the old-style Unicode API, please migrate to using
|
||||||
|
:c:func:`PyUnicode_GET_LENGTH` or :c:func:`PyUnicode_KIND_SIZE`.
|
||||||
|
|
||||||
|
|
||||||
.. c:function:: Py_UNICODE* PyUnicode_AS_UNICODE(PyObject *o)
|
.. c:function:: Py_UNICODE* PyUnicode_AS_UNICODE(PyObject *o)
|
||||||
|
const char* PyUnicode_AS_DATA(PyObject *o)
|
||||||
|
|
||||||
Return a pointer to the internal :c:type:`Py_UNICODE` buffer of the object. *o*
|
Return a pointer to a :c:type:`Py_UNICODE` representation of the object. The
|
||||||
has to be a :c:type:`PyUnicodeObject` (not checked).
|
``AS_DATA`` form casts the pointer to :c:type:`const char *`. *o* has to be
|
||||||
|
a Unicode object (not checked).
|
||||||
|
|
||||||
|
.. versionchanged:: 3.3
|
||||||
|
This macro is now inefficient -- because in many cases the
|
||||||
|
:c:type:`Py_UNICODE` representation does not exist and needs to be created
|
||||||
|
-- and can fail (return *NULL* with an exception set). Try to port the
|
||||||
|
code to use the new :c:func:`PyUnicode_nBYTE_DATA` macros or use
|
||||||
|
:c:func:`PyUnicode_WRITE` or :c:func:`PyUnicode_READ`.
|
||||||
|
|
||||||
|
.. deprecated-removed:: 3.3 4.0
|
||||||
|
Part of the old-style Unicode API, please migrate to using the
|
||||||
|
:c:func:`PyUnicode_nBYTE_DATA` family of macros.
|
||||||
|
|
||||||
|
|
||||||
.. c:function:: const char* PyUnicode_AS_DATA(PyObject *o)
|
.. c:function:: int PyUnicode_READY(PyObject *o)
|
||||||
|
|
||||||
Return a pointer to the internal buffer of the object. *o* has to be a
|
Ensure the string object *o* is in the "canonical" representation. This is
|
||||||
:c:type:`PyUnicodeObject` (not checked).
|
required before using any of the access macros described below.
|
||||||
|
|
||||||
|
.. XXX expand on when it is not required
|
||||||
|
|
||||||
|
Returns 0 on success and -1 with an exception set on failure, which in
|
||||||
|
particular happens if memory allocation fails.
|
||||||
|
|
||||||
|
.. versionadded:: 3.3
|
||||||
|
|
||||||
|
|
||||||
|
.. c:function:: Py_ssize_t PyUnicode_GET_LENGTH(PyObject *o)
|
||||||
|
|
||||||
|
Return the length of the Unicode string, in code points. *o* has to be a
|
||||||
|
Unicode object in the "canonical" representation (not checked).
|
||||||
|
|
||||||
|
.. versionadded:: 3.3
|
||||||
|
|
||||||
|
|
||||||
|
.. c:function:: Py_UCS1* PyUnicode_1BYTE_DATA(PyObject *o)
|
||||||
|
Py_UCS2* PyUnicode_2BYTE_DATA(PyObject *o)
|
||||||
|
Py_UCS4* PyUnicode_4BYTE_DATA(PyObject *o)
|
||||||
|
|
||||||
|
Return a pointer to the canonical representation cast to UCS1, UCS2 or UCS4
|
||||||
|
integer types for direct character access. No checks are performed if the
|
||||||
|
canonical representation has the correct character size; use
|
||||||
|
:c:func:`PyUnicode_CHARACTER_SIZE` or :c:func:`PyUnicode_KIND` to select the
|
||||||
|
right macro. Make sure :c:func:`PyUnicode_READY` has been called before
|
||||||
|
accessing this.
|
||||||
|
|
||||||
|
.. versionadded:: 3.3
|
||||||
|
|
||||||
|
|
||||||
|
.. c:macro:: PyUnicode_1BYTE_KIND
|
||||||
|
PyUnicode_2BYTE_KIND
|
||||||
|
PyUnicode_4BYTE_KIND
|
||||||
|
|
||||||
|
Return values of the :c:func:`PyUnicode_KIND` macro.
|
||||||
|
|
||||||
|
.. versionadded:: 3.3
|
||||||
|
|
||||||
|
|
||||||
|
.. c:function:: int PyUnicode_KIND(PyObject *o)
|
||||||
|
|
||||||
|
Return one of the PyUnicode kind constants (see above) that indicate how many
|
||||||
|
bytes per character this Unicode object uses to store its data. *o* has to
|
||||||
|
be a Unicode object in the "canonical" representation (not checked).
|
||||||
|
|
||||||
|
.. XXX document "0" return value?
|
||||||
|
|
||||||
|
.. versionadded:: 3.3
|
||||||
|
|
||||||
|
|
||||||
|
.. c:function:: int PyUnicode_CHARACTER_SIZE(PyObject *o)
|
||||||
|
|
||||||
|
Return the number of bytes the string uses to represent single characters;
|
||||||
|
this can be 1, 2 or 4. *o* has to be a Unicode object in the "canonical"
|
||||||
|
representation (not checked).
|
||||||
|
|
||||||
|
.. versionadded:: 3.3
|
||||||
|
|
||||||
|
|
||||||
|
.. c:function:: void* PyUnicode_DATA(PyObject *o)
|
||||||
|
|
||||||
|
Return a void pointer to the raw unicode buffer. *o* has to be a Unicode
|
||||||
|
object in the "canonical" representation (not checked).
|
||||||
|
|
||||||
|
.. versionadded:: 3.3
|
||||||
|
|
||||||
|
|
||||||
|
.. c:function:: int PyUnicode_KIND_SIZE(int kind, Py_ssize_t index)
|
||||||
|
|
||||||
|
Compute ``index * char_size`` where ``char_size`` is ``2**(kind - 1)``. The
|
||||||
|
index is a character index, the result is a size in bytes.
|
||||||
|
|
||||||
|
.. versionadded:: 3.3
|
||||||
|
|
||||||
|
|
||||||
|
.. c:function:: void PyUnicode_WRITE(int kind, void *data, Py_ssize_t index, \
|
||||||
|
Py_UCS4 value)
|
||||||
|
|
||||||
|
Write into a canonical representation *data* (as obtained with
|
||||||
|
:c:func:`PyUnicode_DATA`). This macro does not do any sanity checks and is
|
||||||
|
intended for usage in loops. The caller should cache the *kind* value and
|
||||||
|
*data* pointer as obtained from other macro calls. *index* is the index in
|
||||||
|
the string (starts at 0) and *value* is the new code point value which should
|
||||||
|
be written to that location.
|
||||||
|
|
||||||
|
.. versionadded:: 3.3
|
||||||
|
|
||||||
|
|
||||||
|
.. c:function:: Py_UCS4 PyUnicode_READ(int kind, void *data, Py_ssize_t index)
|
||||||
|
|
||||||
|
Read a code point from a canonical representation *data* (as obtained with
|
||||||
|
:c:func:`PyUnicode_DATA`). No checks or ready calls are performed.
|
||||||
|
|
||||||
|
.. versionadded:: 3.3
|
||||||
|
|
||||||
|
|
||||||
|
.. c:function:: Py_UCS4 PyUnicode_READ_CHAR(PyObject *o, Py_ssize_t index)
|
||||||
|
|
||||||
|
Read a character from a Unicode object *o*, which must be in the "canonical"
|
||||||
|
representation. This is less efficient than :c:func:`PyUnicode_READ` if you
|
||||||
|
do multiple consecutive reads.
|
||||||
|
|
||||||
|
.. versionadded:: 3.3
|
||||||
|
|
||||||
|
|
||||||
|
.. c:function:: PyUnicode_MAX_CHAR_VALUE(PyObject *o)
|
||||||
|
|
||||||
|
Return the maximum code point that is suitable for creating another string
|
||||||
|
based on *o*, which must be in the "canonical" representation. This is
|
||||||
|
always an approximation but more efficient than iterating over the string.
|
||||||
|
|
||||||
|
.. versionadded:: 3.3
|
||||||
|
|
||||||
|
|
||||||
.. c:function:: int PyUnicode_ClearFreeList()
|
.. c:function:: int PyUnicode_ClearFreeList()
|
||||||
|
@ -216,31 +371,45 @@ These APIs can be used to work with surrogates:
|
||||||
surrogate pair.
|
surrogate pair.
|
||||||
|
|
||||||
|
|
||||||
Plain Py_UNICODE
|
Creating and accessing Unicode strings
|
||||||
""""""""""""""""
|
""""""""""""""""""""""""""""""""""""""
|
||||||
|
|
||||||
To create Unicode objects and access their basic sequence properties, use these
|
To create Unicode objects and access their basic sequence properties, use these
|
||||||
APIs:
|
APIs:
|
||||||
|
|
||||||
|
.. c:function:: PyObject* PyUnicode_New(Py_ssize_t size, Py_UCS4 maxchar)
|
||||||
|
|
||||||
.. c:function:: PyObject* PyUnicode_FromUnicode(const Py_UNICODE *u, Py_ssize_t size)
|
Create a new Unicode object. *maxchar* should be the true maximum code point
|
||||||
|
to be placed in the string. As an approximation, it can be rounded up to the
|
||||||
|
nearest value in the sequence 127, 255, 65535, 1114111.
|
||||||
|
|
||||||
Create a Unicode object from the Py_UNICODE buffer *u* of the given size. *u*
|
This is the recommended way to allocate a new Unicode object. Objects
|
||||||
may be *NULL* which causes the contents to be undefined. It is the user's
|
created using this function are not resizable.
|
||||||
responsibility to fill in the needed data. The buffer is copied into the new
|
|
||||||
object. If the buffer is not *NULL*, the return value might be a shared object.
|
.. versionadded:: 3.3
|
||||||
Therefore, modification of the resulting Unicode object is only allowed when *u*
|
|
||||||
is *NULL*.
|
|
||||||
|
.. c:function:: PyObject* PyUnicode_FromKindAndData(int kind, const void *buffer, \
|
||||||
|
Py_ssize_t size)
|
||||||
|
|
||||||
|
Create a new Unicode object with the given *kind* (possible values are
|
||||||
|
:c:macro:`PyUnicode_1BYTE_KIND` etc., as returned by
|
||||||
|
:c:func:`PyUnicode_KIND`). The *buffer* must point to an array of *size*
|
||||||
|
units of 1, 2 or 4 bytes per character, as given by the kind.
|
||||||
|
|
||||||
|
.. versionadded:: 3.3
|
||||||
|
|
||||||
|
|
||||||
.. c:function:: PyObject* PyUnicode_FromStringAndSize(const char *u, Py_ssize_t size)
|
.. c:function:: PyObject* PyUnicode_FromStringAndSize(const char *u, Py_ssize_t size)
|
||||||
|
|
||||||
Create a Unicode object from the char buffer *u*. The bytes will be interpreted
|
Create a Unicode object from the char buffer *u*. The bytes will be
|
||||||
as being UTF-8 encoded. *u* may also be *NULL* which
|
interpreted as being UTF-8 encoded. The buffer is copied into the new
|
||||||
causes the contents to be undefined. It is the user's responsibility to fill in
|
object. If the buffer is not *NULL*, the return value might be a shared
|
||||||
the needed data. The buffer is copied into the new object. If the buffer is not
|
object, i.e. modification of the data is not allowed.
|
||||||
*NULL*, the return value might be a shared object. Therefore, modification of
|
|
||||||
the resulting Unicode object is only allowed when *u* is *NULL*.
|
If *u* is *NULL*, this function behaves like :c:func:`PyUnicode_FromUnicode`
|
||||||
|
with the buffer set to *NULL*. This usage is deprecated in favor of
|
||||||
|
:c:func:`PyUnicode_New`.
|
||||||
|
|
||||||
|
|
||||||
.. c:function:: PyObject *PyUnicode_FromString(const char *u)
|
.. c:function:: PyObject *PyUnicode_FromString(const char *u)
|
||||||
|
@ -361,36 +530,9 @@ APIs:
|
||||||
Identical to :c:func:`PyUnicode_FromFormat` except that it takes exactly two
|
Identical to :c:func:`PyUnicode_FromFormat` except that it takes exactly two
|
||||||
arguments.
|
arguments.
|
||||||
|
|
||||||
.. c:function:: PyObject* PyUnicode_TransformDecimalToASCII(Py_UNICODE *s, Py_ssize_t size)
|
|
||||||
|
|
||||||
Create a Unicode object by replacing all decimal digits in
|
.. c:function:: PyObject* PyUnicode_FromEncodedObject(PyObject *obj, \
|
||||||
:c:type:`Py_UNICODE` buffer of the given *size* by ASCII digits 0--9
|
const char *encoding, const char *errors)
|
||||||
according to their decimal value. Return *NULL* if an exception
|
|
||||||
occurs.
|
|
||||||
|
|
||||||
|
|
||||||
.. c:function:: Py_UNICODE* PyUnicode_AsUnicode(PyObject *unicode)
|
|
||||||
|
|
||||||
Return a read-only pointer to the Unicode object's internal :c:type:`Py_UNICODE`
|
|
||||||
buffer, *NULL* if *unicode* is not a Unicode object.
|
|
||||||
|
|
||||||
|
|
||||||
.. c:function:: Py_UNICODE* PyUnicode_AsUnicodeCopy(PyObject *unicode)
|
|
||||||
|
|
||||||
Create a copy of a Unicode string ending with a nul character. Return *NULL*
|
|
||||||
and raise a :exc:`MemoryError` exception on memory allocation failure,
|
|
||||||
otherwise return a new allocated buffer (use :c:func:`PyMem_Free` to free the
|
|
||||||
buffer).
|
|
||||||
|
|
||||||
.. versionadded:: 3.2
|
|
||||||
|
|
||||||
|
|
||||||
.. c:function:: Py_ssize_t PyUnicode_GetSize(PyObject *unicode)
|
|
||||||
|
|
||||||
Return the length of the Unicode object.
|
|
||||||
|
|
||||||
|
|
||||||
.. c:function:: PyObject* PyUnicode_FromEncodedObject(PyObject *obj, const char *encoding, const char *errors)
|
|
||||||
|
|
||||||
Coerce an encoded object *obj* to an Unicode object and return a reference with
|
Coerce an encoded object *obj* to an Unicode object and return a reference with
|
||||||
incremented refcount.
|
incremented refcount.
|
||||||
|
@ -407,16 +549,158 @@ APIs:
|
||||||
decref'ing the returned objects.
|
decref'ing the returned objects.
|
||||||
|
|
||||||
|
|
||||||
|
.. c:function:: Py_ssize_t PyUnicode_GetLength(PyObject *unicode)
|
||||||
|
|
||||||
|
Return the length of the Unicode object, in code points.
|
||||||
|
|
||||||
|
.. versionadded:: 3.3
|
||||||
|
|
||||||
|
|
||||||
|
.. c:function:: int PyUnicode_CopyCharacters(PyObject *to, Py_ssize_t to_start, \
|
||||||
|
PyObject *to, Py_ssize_t from_start, Py_ssize_t how_many)
|
||||||
|
|
||||||
|
Copy characters from one Unicode object into another. This function performs
|
||||||
|
character conversion when necessary and falls back to :c:func:`memcpy` if
|
||||||
|
possible. Returns ``-1`` and sets an exception on error, otherwise returns
|
||||||
|
``0``.
|
||||||
|
|
||||||
|
.. versionadded:: 3.3
|
||||||
|
|
||||||
|
|
||||||
|
.. c:function:: int PyUnicode_WriteChar(PyObject *unicode, Py_ssize_t index, \
|
||||||
|
Py_UCS4 character)
|
||||||
|
|
||||||
|
Write a character to a string. The string must have been created through
|
||||||
|
:c:func:`PyUnicode_New`. Since Unicode strings are supposed to be immutable,
|
||||||
|
the string must not be shared, or have been hashed yet.
|
||||||
|
|
||||||
|
This function checks that *unicode* is a Unicode object, that the index is
|
||||||
|
not out of bounds, and that the object can be modified safely (i.e. that it
|
||||||
|
its reference count is one), in contrast to the macro version
|
||||||
|
:c:func:`PyUnicode_WRITE_CHAR`.
|
||||||
|
|
||||||
|
.. versionadded:: 3.3
|
||||||
|
|
||||||
|
|
||||||
|
.. c:function:: Py_UCS4 PyUnicode_ReadChar(PyObject *unicode, Py_ssize_t index)
|
||||||
|
|
||||||
|
Read a character from a string. This function checks that *unicode* is a
|
||||||
|
Unicode object and the index is not out of bounds, in contrast to the macro
|
||||||
|
version :c:func:`PyUnicode_READ_CHAR`.
|
||||||
|
|
||||||
|
.. versionadded:: 3.3
|
||||||
|
|
||||||
|
|
||||||
|
.. c:function:: PyObject* PyUnicode_Substring(PyObject *str, Py_ssize_t start, \
|
||||||
|
Py_ssize_t end)
|
||||||
|
|
||||||
|
Return a substring of *str*, from character index *start* (included) to
|
||||||
|
character index *end* (excluded). Negative indices are not supported.
|
||||||
|
|
||||||
|
.. versionadded:: 3.3
|
||||||
|
|
||||||
|
|
||||||
|
.. c:function:: Py_UCS4* PyUnicode_AsUCS4(PyObject *u, Py_UCS4 *buffer, \
|
||||||
|
Py_ssize_t buflen, int copy_null)
|
||||||
|
|
||||||
|
Copy the string *u* into a UCS4 buffer, including a null character, if
|
||||||
|
*copy_null* is set. Returns *NULL* and sets an exception on error (in
|
||||||
|
particular, a :exc:`ValueError` if *buflen* is smaller than the length of
|
||||||
|
*u*). *buffer* is returned on success.
|
||||||
|
|
||||||
|
.. versionadded:: 3.3
|
||||||
|
|
||||||
|
|
||||||
|
.. c:function:: Py_UCS4* PyUnicode_AsUCS4Copy(PyObject *u)
|
||||||
|
|
||||||
|
Copy the string *u* into a new UCS4 buffer that is allocated using
|
||||||
|
:c:func:`PyMem_Malloc`. If this fails, *NULL* is returned with a
|
||||||
|
:exc:`MemoryError` set.
|
||||||
|
|
||||||
|
.. versionadded:: 3.3
|
||||||
|
|
||||||
|
|
||||||
|
Deprecated Py_UNICODE APIs
|
||||||
|
""""""""""""""""""""""""""
|
||||||
|
|
||||||
|
.. deprecated-removed:: 3.3 4.0
|
||||||
|
|
||||||
|
These API functions are deprecated with the implementation of :pep:`393`.
|
||||||
|
Extension modules can continue using them, as they will not be removed in Python
|
||||||
|
3.x, but need to be aware that their use can now cause performance and memory hits.
|
||||||
|
|
||||||
|
|
||||||
|
.. c:function:: PyObject* PyUnicode_FromUnicode(const Py_UNICODE *u, Py_ssize_t size)
|
||||||
|
|
||||||
|
Create a Unicode object from the Py_UNICODE buffer *u* of the given size. *u*
|
||||||
|
may be *NULL* which causes the contents to be undefined. It is the user's
|
||||||
|
responsibility to fill in the needed data. The buffer is copied into the new
|
||||||
|
object.
|
||||||
|
|
||||||
|
If the buffer is not *NULL*, the return value might be a shared object.
|
||||||
|
Therefore, modification of the resulting Unicode object is only allowed when
|
||||||
|
*u* is *NULL*.
|
||||||
|
|
||||||
|
If the buffer is *NULL*, :c:func:`PyUnicode_READY` must be called once the
|
||||||
|
string content has been filled before using any of the access macros such as
|
||||||
|
:c:func:`PyUnicode_KIND`.
|
||||||
|
|
||||||
|
Please migrate to using :c:func:`PyUnicode_FromKindAndData` or
|
||||||
|
:c:func:`PyUnicode_New`.
|
||||||
|
|
||||||
|
|
||||||
|
.. c:function:: Py_UNICODE* PyUnicode_AsUnicode(PyObject *unicode)
|
||||||
|
|
||||||
|
Return a read-only pointer to the Unicode object's internal
|
||||||
|
:c:type:`Py_UNICODE` buffer, *NULL* if *unicode* is not a Unicode object.
|
||||||
|
This will create the :c:type:`Py_UNICODE` representation of the object if it
|
||||||
|
is not yet available.
|
||||||
|
|
||||||
|
Please migrate to using :c:func:`PyUnicode_AsUCS4`,
|
||||||
|
:c:func:`PyUnicode_Substring`, :c:func:`PyUnicode_ReadChar` or similar new
|
||||||
|
APIs.
|
||||||
|
|
||||||
|
|
||||||
|
.. c:function:: PyObject* PyUnicode_TransformDecimalToASCII(Py_UNICODE *s, Py_ssize_t size)
|
||||||
|
|
||||||
|
Create a Unicode object by replacing all decimal digits in
|
||||||
|
:c:type:`Py_UNICODE` buffer of the given *size* by ASCII digits 0--9
|
||||||
|
according to their decimal value. Return *NULL* if an exception occurs.
|
||||||
|
|
||||||
|
|
||||||
|
.. c:function:: Py_UNICODE* PyUnicode_AsUnicodeAndSize(PyObject *unicode, Py_ssize_t *size)
|
||||||
|
|
||||||
|
Like :c:func:`PyUnicode_AsUnicode`, but also saves the :c:func:`Py_UNICODE`
|
||||||
|
array length in *size*.
|
||||||
|
|
||||||
|
.. versionadded:: 3.3
|
||||||
|
|
||||||
|
|
||||||
|
.. c:function:: Py_UNICODE* PyUnicode_AsUnicodeCopy(PyObject *unicode)
|
||||||
|
|
||||||
|
Create a copy of a Unicode string ending with a nul character. Return *NULL*
|
||||||
|
and raise a :exc:`MemoryError` exception on memory allocation failure,
|
||||||
|
otherwise return a new allocated buffer (use :c:func:`PyMem_Free` to free the
|
||||||
|
buffer).
|
||||||
|
|
||||||
|
.. versionadded:: 3.2
|
||||||
|
|
||||||
|
Please migrate to using :c:func:`PyUnicode_AsUCS4Copy` or similar new APIs.
|
||||||
|
|
||||||
|
|
||||||
|
.. c:function:: Py_ssize_t PyUnicode_GetSize(PyObject *unicode)
|
||||||
|
|
||||||
|
Return the size of the deprecated :c:type:`Py_UNICODE` representation, in
|
||||||
|
code units (this includes surrogate pairs as 2 units).
|
||||||
|
|
||||||
|
Please migrate to using :c:func:`PyUnicode_GetLength`.
|
||||||
|
|
||||||
|
|
||||||
.. c:function:: PyObject* PyUnicode_FromObject(PyObject *obj)
|
.. c:function:: PyObject* PyUnicode_FromObject(PyObject *obj)
|
||||||
|
|
||||||
Shortcut for ``PyUnicode_FromEncodedObject(obj, NULL, "strict")`` which is used
|
Shortcut for ``PyUnicode_FromEncodedObject(obj, NULL, "strict")`` which is used
|
||||||
throughout the interpreter whenever coercion to Unicode is needed.
|
throughout the interpreter whenever coercion to Unicode is needed.
|
||||||
|
|
||||||
If the platform supports :c:type:`wchar_t` and provides a header file wchar.h,
|
|
||||||
Python can interface directly to this type using the following functions.
|
|
||||||
Support is optimized if Python's own :c:type:`Py_UNICODE` type is identical to
|
|
||||||
the system's :c:type:`wchar_t`.
|
|
||||||
|
|
||||||
|
|
||||||
File System Encoding
|
File System Encoding
|
||||||
""""""""""""""""""""
|
""""""""""""""""""""
|
||||||
|
@ -526,6 +810,26 @@ wchar_t Support
|
||||||
.. versionadded:: 3.2
|
.. versionadded:: 3.2
|
||||||
|
|
||||||
|
|
||||||
|
UCS4 Support
|
||||||
|
""""""""""""
|
||||||
|
|
||||||
|
.. versionadded:: 3.3
|
||||||
|
|
||||||
|
.. XXX are these meant to be public?
|
||||||
|
|
||||||
|
.. c:function:: size_t Py_UCS4_strlen(const Py_UCS4 *u)
|
||||||
|
Py_UCS4* Py_UCS4_strcpy(Py_UCS4 *s1, const Py_UCS4 *s2)
|
||||||
|
Py_UCS4* Py_UCS4_strncpy(Py_UCS4 *s1, const Py_UCS4 *s2, size_t n)
|
||||||
|
Py_UCS4* Py_UCS4_strcat(Py_UCS4 *s1, const Py_UCS4 *s2)
|
||||||
|
int Py_UCS4_strcmp(const Py_UCS4 *s1, const Py_UCS4 *s2)
|
||||||
|
int Py_UCS4_strncmp(const Py_UCS4 *s1, const Py_UCS4 *s2, size_t n)
|
||||||
|
Py_UCS4* strchr(const Py_UCS4 *s, Py_UCS4 c)
|
||||||
|
Py_UCS4* strrchr(const Py_UCS4 *s, Py_UCS4 c)
|
||||||
|
|
||||||
|
These utility functions work on strings of :c:type:`Py_UCS4` characters and
|
||||||
|
otherwise behave like the C standard library functions with the same name.
|
||||||
|
|
||||||
|
|
||||||
.. _builtincodecs:
|
.. _builtincodecs:
|
||||||
|
|
||||||
Built-in Codecs
|
Built-in Codecs
|
||||||
|
@ -560,7 +864,8 @@ Generic Codecs
|
||||||
These are the generic codec APIs:
|
These are the generic codec APIs:
|
||||||
|
|
||||||
|
|
||||||
.. c:function:: PyObject* PyUnicode_Decode(const char *s, Py_ssize_t size, const char *encoding, const char *errors)
|
.. c:function:: PyObject* PyUnicode_Decode(const char *s, Py_ssize_t size, \
|
||||||
|
const char *encoding, const char *errors)
|
||||||
|
|
||||||
Create a Unicode object by decoding *size* bytes of the encoded string *s*.
|
Create a Unicode object by decoding *size* bytes of the encoded string *s*.
|
||||||
*encoding* and *errors* have the same meaning as the parameters of the same name
|
*encoding* and *errors* have the same meaning as the parameters of the same name
|
||||||
|
@ -569,7 +874,8 @@ These are the generic codec APIs:
|
||||||
the codec.
|
the codec.
|
||||||
|
|
||||||
|
|
||||||
.. c:function:: PyObject* PyUnicode_Encode(const Py_UNICODE *s, Py_ssize_t size, const char *encoding, const char *errors)
|
.. c:function:: PyObject* PyUnicode_Encode(const Py_UNICODE *s, Py_ssize_t size, \
|
||||||
|
const char *encoding, const char *errors)
|
||||||
|
|
||||||
Encode the :c:type:`Py_UNICODE` buffer *s* of the given *size* and return a Python
|
Encode the :c:type:`Py_UNICODE` buffer *s* of the given *size* and return a Python
|
||||||
bytes object. *encoding* and *errors* have the same meaning as the
|
bytes object. *encoding* and *errors* have the same meaning as the
|
||||||
|
@ -577,8 +883,13 @@ These are the generic codec APIs:
|
||||||
to be used is looked up using the Python codec registry. Return *NULL* if an
|
to be used is looked up using the Python codec registry. Return *NULL* if an
|
||||||
exception was raised by the codec.
|
exception was raised by the codec.
|
||||||
|
|
||||||
|
.. deprecated-removed:: 3.3 4.0
|
||||||
|
Part of the old-style :c:type:`Py_UNICODE` API; please migrate to using
|
||||||
|
:c:func:`PyUnicode_AsEncodedString`.
|
||||||
|
|
||||||
.. c:function:: PyObject* PyUnicode_AsEncodedString(PyObject *unicode, const char *encoding, const char *errors)
|
|
||||||
|
.. c:function:: PyObject* PyUnicode_AsEncodedString(PyObject *unicode, \
|
||||||
|
const char *encoding, const char *errors)
|
||||||
|
|
||||||
Encode a Unicode object and return the result as Python bytes object.
|
Encode a Unicode object and return the result as Python bytes object.
|
||||||
*encoding* and *errors* have the same meaning as the parameters of the same
|
*encoding* and *errors* have the same meaning as the parameters of the same
|
||||||
|
@ -599,7 +910,8 @@ These are the UTF-8 codec APIs:
|
||||||
*s*. Return *NULL* if an exception was raised by the codec.
|
*s*. Return *NULL* if an exception was raised by the codec.
|
||||||
|
|
||||||
|
|
||||||
.. c:function:: PyObject* PyUnicode_DecodeUTF8Stateful(const char *s, Py_ssize_t size, const char *errors, Py_ssize_t *consumed)
|
.. c:function:: PyObject* PyUnicode_DecodeUTF8Stateful(const char *s, Py_ssize_t size, \
|
||||||
|
const char *errors, Py_ssize_t *consumed)
|
||||||
|
|
||||||
If *consumed* is *NULL*, behave like :c:func:`PyUnicode_DecodeUTF8`. If
|
If *consumed* is *NULL*, behave like :c:func:`PyUnicode_DecodeUTF8`. If
|
||||||
*consumed* is not *NULL*, trailing incomplete UTF-8 byte sequences will not be
|
*consumed* is not *NULL*, trailing incomplete UTF-8 byte sequences will not be
|
||||||
|
@ -613,6 +925,10 @@ These are the UTF-8 codec APIs:
|
||||||
return a Python bytes object. Return *NULL* if an exception was raised by
|
return a Python bytes object. Return *NULL* if an exception was raised by
|
||||||
the codec.
|
the codec.
|
||||||
|
|
||||||
|
.. deprecated-removed:: 3.3 4.0
|
||||||
|
Part of the old-style :c:type:`Py_UNICODE` API; please migrate to using
|
||||||
|
:c:func:`PyUnicode_AsUTF8String` or :c:func:`PyUnicode_AsUTF8AndSize`.
|
||||||
|
|
||||||
|
|
||||||
.. c:function:: PyObject* PyUnicode_AsUTF8String(PyObject *unicode)
|
.. c:function:: PyObject* PyUnicode_AsUTF8String(PyObject *unicode)
|
||||||
|
|
||||||
|
@ -621,13 +937,37 @@ These are the UTF-8 codec APIs:
|
||||||
raised by the codec.
|
raised by the codec.
|
||||||
|
|
||||||
|
|
||||||
|
.. c:function:: char* PyUnicode_AsUTF8AndSize(PyObject *unicode, Py_ssize_t *size)
|
||||||
|
|
||||||
|
Return a pointer to the default encoding (UTF-8) of the Unicode object, and
|
||||||
|
store the size of the encoded representation (in bytes) in *size*. *size*
|
||||||
|
can be *NULL*, in this case no size will be stored.
|
||||||
|
|
||||||
|
In the case of an error, *NULL* is returned with an exception set and no
|
||||||
|
*size* is stored.
|
||||||
|
|
||||||
|
This caches the UTF-8 representation of the string in the Unicode object, and
|
||||||
|
subsequent calls will return a pointer to the same buffer. The caller is not
|
||||||
|
responsible for deallocating the buffer.
|
||||||
|
|
||||||
|
.. versionadded:: 3.3
|
||||||
|
|
||||||
|
|
||||||
|
.. c:function:: char* PyUnicode_AsUTF8(PyObject *unicode)
|
||||||
|
|
||||||
|
As :c:func:`PyUnicode_AsUTF8AndSize`, but does not store the size.
|
||||||
|
|
||||||
|
.. versionadded:: 3.3
|
||||||
|
|
||||||
|
|
||||||
UTF-32 Codecs
|
UTF-32 Codecs
|
||||||
"""""""""""""
|
"""""""""""""
|
||||||
|
|
||||||
These are the UTF-32 codec APIs:
|
These are the UTF-32 codec APIs:
|
||||||
|
|
||||||
|
|
||||||
.. c:function:: PyObject* PyUnicode_DecodeUTF32(const char *s, Py_ssize_t size, const char *errors, int *byteorder)
|
.. c:function:: PyObject* PyUnicode_DecodeUTF32(const char *s, Py_ssize_t size, \
|
||||||
|
const char *errors, int *byteorder)
|
||||||
|
|
||||||
Decode *size* bytes from a UTF-32 encoded buffer string and return the
|
Decode *size* bytes from a UTF-32 encoded buffer string and return the
|
||||||
corresponding Unicode object. *errors* (if non-*NULL*) defines the error
|
corresponding Unicode object. *errors* (if non-*NULL*) defines the error
|
||||||
|
@ -655,7 +995,8 @@ These are the UTF-32 codec APIs:
|
||||||
Return *NULL* if an exception was raised by the codec.
|
Return *NULL* if an exception was raised by the codec.
|
||||||
|
|
||||||
|
|
||||||
.. c:function:: PyObject* PyUnicode_DecodeUTF32Stateful(const char *s, Py_ssize_t size, const char *errors, int *byteorder, Py_ssize_t *consumed)
|
.. c:function:: PyObject* PyUnicode_DecodeUTF32Stateful(const char *s, Py_ssize_t size, \
|
||||||
|
const char *errors, int *byteorder, Py_ssize_t *consumed)
|
||||||
|
|
||||||
If *consumed* is *NULL*, behave like :c:func:`PyUnicode_DecodeUTF32`. If
|
If *consumed* is *NULL*, behave like :c:func:`PyUnicode_DecodeUTF32`. If
|
||||||
*consumed* is not *NULL*, :c:func:`PyUnicode_DecodeUTF32Stateful` will not treat
|
*consumed* is not *NULL*, :c:func:`PyUnicode_DecodeUTF32Stateful` will not treat
|
||||||
|
@ -664,7 +1005,8 @@ These are the UTF-32 codec APIs:
|
||||||
that have been decoded will be stored in *consumed*.
|
that have been decoded will be stored in *consumed*.
|
||||||
|
|
||||||
|
|
||||||
.. c:function:: PyObject* PyUnicode_EncodeUTF32(const Py_UNICODE *s, Py_ssize_t size, const char *errors, int byteorder)
|
.. c:function:: PyObject* PyUnicode_EncodeUTF32(const Py_UNICODE *s, Py_ssize_t size, \
|
||||||
|
const char *errors, int byteorder)
|
||||||
|
|
||||||
Return a Python bytes object holding the UTF-32 encoded value of the Unicode
|
Return a Python bytes object holding the UTF-32 encoded value of the Unicode
|
||||||
data in *s*. Output is written according to the following byte order::
|
data in *s*. Output is written according to the following byte order::
|
||||||
|
@ -681,6 +1023,10 @@ These are the UTF-32 codec APIs:
|
||||||
|
|
||||||
Return *NULL* if an exception was raised by the codec.
|
Return *NULL* if an exception was raised by the codec.
|
||||||
|
|
||||||
|
.. deprecated-removed:: 3.3 4.0
|
||||||
|
Part of the old-style :c:type:`Py_UNICODE` API; please migrate to using
|
||||||
|
:c:func:`PyUnicode_AsUTF32String`.
|
||||||
|
|
||||||
|
|
||||||
.. c:function:: PyObject* PyUnicode_AsUTF32String(PyObject *unicode)
|
.. c:function:: PyObject* PyUnicode_AsUTF32String(PyObject *unicode)
|
||||||
|
|
||||||
|
@ -695,7 +1041,8 @@ UTF-16 Codecs
|
||||||
These are the UTF-16 codec APIs:
|
These are the UTF-16 codec APIs:
|
||||||
|
|
||||||
|
|
||||||
.. c:function:: PyObject* PyUnicode_DecodeUTF16(const char *s, Py_ssize_t size, const char *errors, int *byteorder)
|
.. c:function:: PyObject* PyUnicode_DecodeUTF16(const char *s, Py_ssize_t size, \
|
||||||
|
const char *errors, int *byteorder)
|
||||||
|
|
||||||
Decode *size* bytes from a UTF-16 encoded buffer string and return the
|
Decode *size* bytes from a UTF-16 encoded buffer string and return the
|
||||||
corresponding Unicode object. *errors* (if non-*NULL*) defines the error
|
corresponding Unicode object. *errors* (if non-*NULL*) defines the error
|
||||||
|
@ -722,7 +1069,8 @@ These are the UTF-16 codec APIs:
|
||||||
Return *NULL* if an exception was raised by the codec.
|
Return *NULL* if an exception was raised by the codec.
|
||||||
|
|
||||||
|
|
||||||
.. c:function:: PyObject* PyUnicode_DecodeUTF16Stateful(const char *s, Py_ssize_t size, const char *errors, int *byteorder, Py_ssize_t *consumed)
|
.. c:function:: PyObject* PyUnicode_DecodeUTF16Stateful(const char *s, Py_ssize_t size, \
|
||||||
|
const char *errors, int *byteorder, Py_ssize_t *consumed)
|
||||||
|
|
||||||
If *consumed* is *NULL*, behave like :c:func:`PyUnicode_DecodeUTF16`. If
|
If *consumed* is *NULL*, behave like :c:func:`PyUnicode_DecodeUTF16`. If
|
||||||
*consumed* is not *NULL*, :c:func:`PyUnicode_DecodeUTF16Stateful` will not treat
|
*consumed* is not *NULL*, :c:func:`PyUnicode_DecodeUTF16Stateful` will not treat
|
||||||
|
@ -731,7 +1079,8 @@ These are the UTF-16 codec APIs:
|
||||||
number of bytes that have been decoded will be stored in *consumed*.
|
number of bytes that have been decoded will be stored in *consumed*.
|
||||||
|
|
||||||
|
|
||||||
.. c:function:: PyObject* PyUnicode_EncodeUTF16(const Py_UNICODE *s, Py_ssize_t size, const char *errors, int byteorder)
|
.. c:function:: PyObject* PyUnicode_EncodeUTF16(const Py_UNICODE *s, Py_ssize_t size, \
|
||||||
|
const char *errors, int byteorder)
|
||||||
|
|
||||||
Return a Python bytes object holding the UTF-16 encoded value of the Unicode
|
Return a Python bytes object holding the UTF-16 encoded value of the Unicode
|
||||||
data in *s*. Output is written according to the following byte order::
|
data in *s*. Output is written according to the following byte order::
|
||||||
|
@ -749,6 +1098,10 @@ These are the UTF-16 codec APIs:
|
||||||
|
|
||||||
Return *NULL* if an exception was raised by the codec.
|
Return *NULL* if an exception was raised by the codec.
|
||||||
|
|
||||||
|
.. deprecated-removed:: 3.3 4.0
|
||||||
|
Part of the old-style :c:type:`Py_UNICODE` API; please migrate to using
|
||||||
|
:c:func:`PyUnicode_AsUTF16String`.
|
||||||
|
|
||||||
|
|
||||||
.. c:function:: PyObject* PyUnicode_AsUTF16String(PyObject *unicode)
|
.. c:function:: PyObject* PyUnicode_AsUTF16String(PyObject *unicode)
|
||||||
|
|
||||||
|
@ -769,7 +1122,8 @@ These are the UTF-7 codec APIs:
|
||||||
*s*. Return *NULL* if an exception was raised by the codec.
|
*s*. Return *NULL* if an exception was raised by the codec.
|
||||||
|
|
||||||
|
|
||||||
.. c:function:: PyObject* PyUnicode_DecodeUTF7Stateful(const char *s, Py_ssize_t size, const char *errors, Py_ssize_t *consumed)
|
.. c:function:: PyObject* PyUnicode_DecodeUTF7Stateful(const char *s, Py_ssize_t size, \
|
||||||
|
const char *errors, Py_ssize_t *consumed)
|
||||||
|
|
||||||
If *consumed* is *NULL*, behave like :c:func:`PyUnicode_DecodeUTF7`. If
|
If *consumed* is *NULL*, behave like :c:func:`PyUnicode_DecodeUTF7`. If
|
||||||
*consumed* is not *NULL*, trailing incomplete UTF-7 base-64 sections will not
|
*consumed* is not *NULL*, trailing incomplete UTF-7 base-64 sections will not
|
||||||
|
@ -777,7 +1131,8 @@ These are the UTF-7 codec APIs:
|
||||||
bytes that have been decoded will be stored in *consumed*.
|
bytes that have been decoded will be stored in *consumed*.
|
||||||
|
|
||||||
|
|
||||||
.. c:function:: PyObject* PyUnicode_EncodeUTF7(const Py_UNICODE *s, Py_ssize_t size, int base64SetO, int base64WhiteSpace, const char *errors)
|
.. c:function:: PyObject* PyUnicode_EncodeUTF7(const Py_UNICODE *s, Py_ssize_t size, \
|
||||||
|
int base64SetO, int base64WhiteSpace, const char *errors)
|
||||||
|
|
||||||
Encode the :c:type:`Py_UNICODE` buffer of the given size using UTF-7 and
|
Encode the :c:type:`Py_UNICODE` buffer of the given size using UTF-7 and
|
||||||
return a Python bytes object. Return *NULL* if an exception was raised by
|
return a Python bytes object. Return *NULL* if an exception was raised by
|
||||||
|
@ -788,6 +1143,11 @@ These are the UTF-7 codec APIs:
|
||||||
nonzero, whitespace will be encoded in base-64. Both are set to zero for the
|
nonzero, whitespace will be encoded in base-64. Both are set to zero for the
|
||||||
Python "utf-7" codec.
|
Python "utf-7" codec.
|
||||||
|
|
||||||
|
.. deprecated-removed:: 3.3 4.0
|
||||||
|
Part of the old-style :c:type:`Py_UNICODE` API.
|
||||||
|
|
||||||
|
.. XXX replace with what?
|
||||||
|
|
||||||
|
|
||||||
Unicode-Escape Codecs
|
Unicode-Escape Codecs
|
||||||
"""""""""""""""""""""
|
"""""""""""""""""""""
|
||||||
|
@ -795,7 +1155,8 @@ Unicode-Escape Codecs
|
||||||
These are the "Unicode Escape" codec APIs:
|
These are the "Unicode Escape" codec APIs:
|
||||||
|
|
||||||
|
|
||||||
.. c:function:: PyObject* PyUnicode_DecodeUnicodeEscape(const char *s, Py_ssize_t size, const char *errors)
|
.. c:function:: PyObject* PyUnicode_DecodeUnicodeEscape(const char *s, \
|
||||||
|
Py_ssize_t size, const char *errors)
|
||||||
|
|
||||||
Create a Unicode object by decoding *size* bytes of the Unicode-Escape encoded
|
Create a Unicode object by decoding *size* bytes of the Unicode-Escape encoded
|
||||||
string *s*. Return *NULL* if an exception was raised by the codec.
|
string *s*. Return *NULL* if an exception was raised by the codec.
|
||||||
|
@ -807,6 +1168,10 @@ These are the "Unicode Escape" codec APIs:
|
||||||
return a Python string object. Return *NULL* if an exception was raised by the
|
return a Python string object. Return *NULL* if an exception was raised by the
|
||||||
codec.
|
codec.
|
||||||
|
|
||||||
|
.. deprecated-removed:: 3.3 4.0
|
||||||
|
Part of the old-style :c:type:`Py_UNICODE` API; please migrate to using
|
||||||
|
:c:func:`PyUnicode_AsUnicodeEscapeString`.
|
||||||
|
|
||||||
|
|
||||||
.. c:function:: PyObject* PyUnicode_AsUnicodeEscapeString(PyObject *unicode)
|
.. c:function:: PyObject* PyUnicode_AsUnicodeEscapeString(PyObject *unicode)
|
||||||
|
|
||||||
|
@ -821,18 +1186,24 @@ Raw-Unicode-Escape Codecs
|
||||||
These are the "Raw Unicode Escape" codec APIs:
|
These are the "Raw Unicode Escape" codec APIs:
|
||||||
|
|
||||||
|
|
||||||
.. c:function:: PyObject* PyUnicode_DecodeRawUnicodeEscape(const char *s, Py_ssize_t size, const char *errors)
|
.. c:function:: PyObject* PyUnicode_DecodeRawUnicodeEscape(const char *s, \
|
||||||
|
Py_ssize_t size, const char *errors)
|
||||||
|
|
||||||
Create a Unicode object by decoding *size* bytes of the Raw-Unicode-Escape
|
Create a Unicode object by decoding *size* bytes of the Raw-Unicode-Escape
|
||||||
encoded string *s*. Return *NULL* if an exception was raised by the codec.
|
encoded string *s*. Return *NULL* if an exception was raised by the codec.
|
||||||
|
|
||||||
|
|
||||||
.. c:function:: PyObject* PyUnicode_EncodeRawUnicodeEscape(const Py_UNICODE *s, Py_ssize_t size, const char *errors)
|
.. c:function:: PyObject* PyUnicode_EncodeRawUnicodeEscape(const Py_UNICODE *s, \
|
||||||
|
Py_ssize_t size, const char *errors)
|
||||||
|
|
||||||
Encode the :c:type:`Py_UNICODE` buffer of the given *size* using Raw-Unicode-Escape
|
Encode the :c:type:`Py_UNICODE` buffer of the given *size* using Raw-Unicode-Escape
|
||||||
and return a Python string object. Return *NULL* if an exception was raised by
|
and return a Python string object. Return *NULL* if an exception was raised by
|
||||||
the codec.
|
the codec.
|
||||||
|
|
||||||
|
.. deprecated-removed:: 3.3 4.0
|
||||||
|
Part of the old-style :c:type:`Py_UNICODE` API; please migrate to using
|
||||||
|
:c:func:`PyUnicode_AsRawUnicodeEscapeString`.
|
||||||
|
|
||||||
|
|
||||||
.. c:function:: PyObject* PyUnicode_AsRawUnicodeEscapeString(PyObject *unicode)
|
.. c:function:: PyObject* PyUnicode_AsRawUnicodeEscapeString(PyObject *unicode)
|
||||||
|
|
||||||
|
@ -860,6 +1231,10 @@ ordinals and only these are accepted by the codecs during encoding.
|
||||||
return a Python bytes object. Return *NULL* if an exception was raised by
|
return a Python bytes object. Return *NULL* if an exception was raised by
|
||||||
the codec.
|
the codec.
|
||||||
|
|
||||||
|
.. deprecated-removed:: 3.3 4.0
|
||||||
|
Part of the old-style :c:type:`Py_UNICODE` API; please migrate to using
|
||||||
|
:c:func:`PyUnicode_AsLatin1String`.
|
||||||
|
|
||||||
|
|
||||||
.. c:function:: PyObject* PyUnicode_AsLatin1String(PyObject *unicode)
|
.. c:function:: PyObject* PyUnicode_AsLatin1String(PyObject *unicode)
|
||||||
|
|
||||||
|
@ -887,6 +1262,10 @@ codes generate errors.
|
||||||
return a Python bytes object. Return *NULL* if an exception was raised by
|
return a Python bytes object. Return *NULL* if an exception was raised by
|
||||||
the codec.
|
the codec.
|
||||||
|
|
||||||
|
.. deprecated-removed:: 3.3 4.0
|
||||||
|
Part of the old-style :c:type:`Py_UNICODE` API; please migrate to using
|
||||||
|
:c:func:`PyUnicode_AsASCIIString`.
|
||||||
|
|
||||||
|
|
||||||
.. c:function:: PyObject* PyUnicode_AsASCIIString(PyObject *unicode)
|
.. c:function:: PyObject* PyUnicode_AsASCIIString(PyObject *unicode)
|
||||||
|
|
||||||
|
@ -921,7 +1300,8 @@ characters to different code points.
|
||||||
|
|
||||||
These are the mapping codec APIs:
|
These are the mapping codec APIs:
|
||||||
|
|
||||||
.. c:function:: PyObject* PyUnicode_DecodeCharmap(const char *s, Py_ssize_t size, PyObject *mapping, const char *errors)
|
.. c:function:: PyObject* PyUnicode_DecodeCharmap(const char *s, Py_ssize_t size, \
|
||||||
|
PyObject *mapping, const char *errors)
|
||||||
|
|
||||||
Create a Unicode object by decoding *size* bytes of the encoded string *s* using
|
Create a Unicode object by decoding *size* bytes of the encoded string *s* using
|
||||||
the given *mapping* object. Return *NULL* if an exception was raised by the
|
the given *mapping* object. Return *NULL* if an exception was raised by the
|
||||||
|
@ -931,12 +1311,17 @@ These are the mapping codec APIs:
|
||||||
treated as "undefined mapping".
|
treated as "undefined mapping".
|
||||||
|
|
||||||
|
|
||||||
.. c:function:: PyObject* PyUnicode_EncodeCharmap(const Py_UNICODE *s, Py_ssize_t size, PyObject *mapping, const char *errors)
|
.. c:function:: PyObject* PyUnicode_EncodeCharmap(const Py_UNICODE *s, Py_ssize_t size, \
|
||||||
|
PyObject *mapping, const char *errors)
|
||||||
|
|
||||||
Encode the :c:type:`Py_UNICODE` buffer of the given *size* using the given
|
Encode the :c:type:`Py_UNICODE` buffer of the given *size* using the given
|
||||||
*mapping* object and return a Python string object. Return *NULL* if an
|
*mapping* object and return a Python string object. Return *NULL* if an
|
||||||
exception was raised by the codec.
|
exception was raised by the codec.
|
||||||
|
|
||||||
|
.. deprecated-removed:: 3.3 4.0
|
||||||
|
Part of the old-style :c:type:`Py_UNICODE` API; please migrate to using
|
||||||
|
:c:func:`PyUnicode_AsCharmapString`.
|
||||||
|
|
||||||
|
|
||||||
.. c:function:: PyObject* PyUnicode_AsCharmapString(PyObject *unicode, PyObject *mapping)
|
.. c:function:: PyObject* PyUnicode_AsCharmapString(PyObject *unicode, PyObject *mapping)
|
||||||
|
|
||||||
|
@ -947,7 +1332,8 @@ These are the mapping codec APIs:
|
||||||
The following codec API is special in that maps Unicode to Unicode.
|
The following codec API is special in that maps Unicode to Unicode.
|
||||||
|
|
||||||
|
|
||||||
.. c:function:: PyObject* PyUnicode_TranslateCharmap(const Py_UNICODE *s, Py_ssize_t size, PyObject *table, const char *errors)
|
.. c:function:: PyObject* PyUnicode_TranslateCharmap(const Py_UNICODE *s, Py_ssize_t size, \
|
||||||
|
PyObject *table, const char *errors)
|
||||||
|
|
||||||
Translate a :c:type:`Py_UNICODE` buffer of the given *size* by applying a
|
Translate a :c:type:`Py_UNICODE` buffer of the given *size* by applying a
|
||||||
character mapping *table* to it and return the resulting Unicode object. Return
|
character mapping *table* to it and return the resulting Unicode object. Return
|
||||||
|
@ -960,6 +1346,10 @@ The following codec API is special in that maps Unicode to Unicode.
|
||||||
and sequences work well. Unmapped character ordinals (ones which cause a
|
and sequences work well. Unmapped character ordinals (ones which cause a
|
||||||
:exc:`LookupError`) are left untouched and are copied as-is.
|
:exc:`LookupError`) are left untouched and are copied as-is.
|
||||||
|
|
||||||
|
.. deprecated-removed:: 3.3 4.0
|
||||||
|
Part of the old-style :c:type:`Py_UNICODE` API.
|
||||||
|
|
||||||
|
.. XXX replace with what?
|
||||||
|
|
||||||
|
|
||||||
MBCS codecs for Windows
|
MBCS codecs for Windows
|
||||||
|
@ -976,7 +1366,8 @@ the user settings on the machine running the codec.
|
||||||
Return *NULL* if an exception was raised by the codec.
|
Return *NULL* if an exception was raised by the codec.
|
||||||
|
|
||||||
|
|
||||||
.. c:function:: PyObject* PyUnicode_DecodeMBCSStateful(const char *s, int size, const char *errors, int *consumed)
|
.. c:function:: PyObject* PyUnicode_DecodeMBCSStateful(const char *s, int size, \
|
||||||
|
const char *errors, int *consumed)
|
||||||
|
|
||||||
If *consumed* is *NULL*, behave like :c:func:`PyUnicode_DecodeMBCS`. If
|
If *consumed* is *NULL*, behave like :c:func:`PyUnicode_DecodeMBCS`. If
|
||||||
*consumed* is not *NULL*, :c:func:`PyUnicode_DecodeMBCSStateful` will not decode
|
*consumed* is not *NULL*, :c:func:`PyUnicode_DecodeMBCSStateful` will not decode
|
||||||
|
@ -990,6 +1381,10 @@ the user settings on the machine running the codec.
|
||||||
a Python bytes object. Return *NULL* if an exception was raised by the
|
a Python bytes object. Return *NULL* if an exception was raised by the
|
||||||
codec.
|
codec.
|
||||||
|
|
||||||
|
.. deprecated-removed:: 3.3 4.0
|
||||||
|
Part of the old-style :c:type:`Py_UNICODE` API; please migrate to using
|
||||||
|
:c:func:`PyUnicode_AsMBCSString`.
|
||||||
|
|
||||||
|
|
||||||
.. c:function:: PyObject* PyUnicode_AsMBCSString(PyObject *unicode)
|
.. c:function:: PyObject* PyUnicode_AsMBCSString(PyObject *unicode)
|
||||||
|
|
||||||
|
@ -1034,7 +1429,8 @@ They all return *NULL* or ``-1`` if an exception occurs.
|
||||||
characters are not included in the resulting strings.
|
characters are not included in the resulting strings.
|
||||||
|
|
||||||
|
|
||||||
.. c:function:: PyObject* PyUnicode_Translate(PyObject *str, PyObject *table, const char *errors)
|
.. c:function:: PyObject* PyUnicode_Translate(PyObject *str, PyObject *table, \
|
||||||
|
const char *errors)
|
||||||
|
|
||||||
Translate a string by applying a character mapping table to it and return the
|
Translate a string by applying a character mapping table to it and return the
|
||||||
resulting Unicode object.
|
resulting Unicode object.
|
||||||
|
@ -1056,14 +1452,16 @@ They all return *NULL* or ``-1`` if an exception occurs.
|
||||||
Unicode string.
|
Unicode string.
|
||||||
|
|
||||||
|
|
||||||
.. c:function:: int PyUnicode_Tailmatch(PyObject *str, PyObject *substr, Py_ssize_t start, Py_ssize_t end, int direction)
|
.. c:function:: int PyUnicode_Tailmatch(PyObject *str, PyObject *substr, \
|
||||||
|
Py_ssize_t start, Py_ssize_t end, int direction)
|
||||||
|
|
||||||
Return 1 if *substr* matches ``str[start:end]`` at the given tail end
|
Return 1 if *substr* matches ``str[start:end]`` at the given tail end
|
||||||
(*direction* == -1 means to do a prefix match, *direction* == 1 a suffix match),
|
(*direction* == -1 means to do a prefix match, *direction* == 1 a suffix match),
|
||||||
0 otherwise. Return ``-1`` if an error occurred.
|
0 otherwise. Return ``-1`` if an error occurred.
|
||||||
|
|
||||||
|
|
||||||
.. c:function:: Py_ssize_t PyUnicode_Find(PyObject *str, PyObject *substr, Py_ssize_t start, Py_ssize_t end, int direction)
|
.. c:function:: Py_ssize_t PyUnicode_Find(PyObject *str, PyObject *substr, \
|
||||||
|
Py_ssize_t start, Py_ssize_t end, int direction)
|
||||||
|
|
||||||
Return the first position of *substr* in ``str[start:end]`` using the given
|
Return the first position of *substr* in ``str[start:end]`` using the given
|
||||||
*direction* (*direction* == 1 means to do a forward search, *direction* == -1 a
|
*direction* (*direction* == 1 means to do a forward search, *direction* == -1 a
|
||||||
|
@ -1072,7 +1470,8 @@ They all return *NULL* or ``-1`` if an exception occurs.
|
||||||
occurred and an exception has been set.
|
occurred and an exception has been set.
|
||||||
|
|
||||||
|
|
||||||
.. c:function:: Py_ssize_t PyUnicode_FindChar(PyObject *str, Py_UCS4 ch, Py_ssize_t start, Py_ssize_t end, int direction)
|
.. c:function:: Py_ssize_t PyUnicode_FindChar(PyObject *str, Py_UCS4 ch, \
|
||||||
|
Py_ssize_t start, Py_ssize_t end, int direction)
|
||||||
|
|
||||||
Return the first position of the character *ch* in ``str[start:end]`` using
|
Return the first position of the character *ch* in ``str[start:end]`` using
|
||||||
the given *direction* (*direction* == 1 means to do a forward search,
|
the given *direction* (*direction* == 1 means to do a forward search,
|
||||||
|
@ -1083,13 +1482,15 @@ They all return *NULL* or ``-1`` if an exception occurs.
|
||||||
.. versionadded:: 3.3
|
.. versionadded:: 3.3
|
||||||
|
|
||||||
|
|
||||||
.. c:function:: Py_ssize_t PyUnicode_Count(PyObject *str, PyObject *substr, Py_ssize_t start, Py_ssize_t end)
|
.. c:function:: Py_ssize_t PyUnicode_Count(PyObject *str, PyObject *substr, \
|
||||||
|
Py_ssize_t start, Py_ssize_t end)
|
||||||
|
|
||||||
Return the number of non-overlapping occurrences of *substr* in
|
Return the number of non-overlapping occurrences of *substr* in
|
||||||
``str[start:end]``. Return ``-1`` if an error occurred.
|
``str[start:end]``. Return ``-1`` if an error occurred.
|
||||||
|
|
||||||
|
|
||||||
.. c:function:: PyObject* PyUnicode_Replace(PyObject *str, PyObject *substr, PyObject *replstr, Py_ssize_t maxcount)
|
.. c:function:: PyObject* PyUnicode_Replace(PyObject *str, PyObject *substr, \
|
||||||
|
PyObject *replstr, Py_ssize_t maxcount)
|
||||||
|
|
||||||
Replace at most *maxcount* occurrences of *substr* in *str* with *replstr* and
|
Replace at most *maxcount* occurrences of *substr* in *str* with *replstr* and
|
||||||
return the resulting Unicode object. *maxcount* == -1 means replace all
|
return the resulting Unicode object. *maxcount* == -1 means replace all
|
||||||
|
@ -1137,8 +1538,8 @@ They all return *NULL* or ``-1`` if an exception occurs.
|
||||||
Check whether *element* is contained in *container* and return true or false
|
Check whether *element* is contained in *container* and return true or false
|
||||||
accordingly.
|
accordingly.
|
||||||
|
|
||||||
*element* has to coerce to a one element Unicode string. ``-1`` is returned if
|
*element* has to coerce to a one element Unicode string. ``-1`` is returned
|
||||||
there was an error.
|
if there was an error.
|
||||||
|
|
||||||
|
|
||||||
.. c:function:: void PyUnicode_InternInPlace(PyObject **string)
|
.. c:function:: void PyUnicode_InternInPlace(PyObject **string)
|
||||||
|
@ -1157,7 +1558,6 @@ They all return *NULL* or ``-1`` if an exception occurs.
|
||||||
.. c:function:: PyObject* PyUnicode_InternFromString(const char *v)
|
.. c:function:: PyObject* PyUnicode_InternFromString(const char *v)
|
||||||
|
|
||||||
A combination of :c:func:`PyUnicode_FromString` and
|
A combination of :c:func:`PyUnicode_FromString` and
|
||||||
:c:func:`PyUnicode_InternInPlace`, returning either a new unicode string object
|
:c:func:`PyUnicode_InternInPlace`, returning either a new unicode string
|
||||||
that has been interned, or a new ("owned") reference to an earlier interned
|
object that has been interned, or a new ("owned") reference to an earlier
|
||||||
string object with the same value.
|
interned string object with the same value.
|
||||||
|
|
||||||
|
|
|
@ -686,7 +686,7 @@ PyAPI_FUNC(PyObject*) PyUnicode_Substring(
|
||||||
Py_ssize_t start,
|
Py_ssize_t start,
|
||||||
Py_ssize_t end);
|
Py_ssize_t end);
|
||||||
|
|
||||||
/* Copy the string into a UCS4 buffer including the null character is copy_null
|
/* Copy the string into a UCS4 buffer including the null character if copy_null
|
||||||
is set. Return NULL and raise an exception on error. Raise a ValueError if
|
is set. Return NULL and raise an exception on error. Raise a ValueError if
|
||||||
the buffer is smaller than the string. Return buffer on success.
|
the buffer is smaller than the string. Return buffer on success.
|
||||||
|
|
||||||
|
|
Loading…
Reference in New Issue