Merge: #23088: Clarify null termination of bytes and strings in C API.
This commit is contained in:
commit
812bc1b86b
|
@ -64,7 +64,8 @@ Direct API functions
|
||||||
.. c:function:: char* PyByteArray_AsString(PyObject *bytearray)
|
.. c:function:: char* PyByteArray_AsString(PyObject *bytearray)
|
||||||
|
|
||||||
Return the contents of *bytearray* as a char array after checking for a
|
Return the contents of *bytearray* as a char array after checking for a
|
||||||
*NULL* pointer.
|
*NULL* pointer. The returned array always has an extra
|
||||||
|
null byte appended.
|
||||||
|
|
||||||
|
|
||||||
.. c:function:: int PyByteArray_Resize(PyObject *bytearray, Py_ssize_t len)
|
.. c:function:: int PyByteArray_Resize(PyObject *bytearray, Py_ssize_t len)
|
||||||
|
|
|
@ -69,8 +69,8 @@ called with a non-bytes parameter.
|
||||||
+===================+===============+================================+
|
+===================+===============+================================+
|
||||||
| :attr:`%%` | *n/a* | The literal % character. |
|
| :attr:`%%` | *n/a* | The literal % character. |
|
||||||
+-------------------+---------------+--------------------------------+
|
+-------------------+---------------+--------------------------------+
|
||||||
| :attr:`%c` | int | A single character, |
|
| :attr:`%c` | int | A single byte, |
|
||||||
| | | represented as an C int. |
|
| | | represented as a C int. |
|
||||||
+-------------------+---------------+--------------------------------+
|
+-------------------+---------------+--------------------------------+
|
||||||
| :attr:`%d` | int | Exactly equivalent to |
|
| :attr:`%d` | int | Exactly equivalent to |
|
||||||
| | | ``printf("%d")``. |
|
| | | ``printf("%d")``. |
|
||||||
|
@ -109,7 +109,7 @@ called with a non-bytes parameter.
|
||||||
+-------------------+---------------+--------------------------------+
|
+-------------------+---------------+--------------------------------+
|
||||||
|
|
||||||
An unrecognized format character causes all the rest of the format string to be
|
An unrecognized format character causes all the rest of the format string to be
|
||||||
copied as-is to the result string, and any extra arguments discarded.
|
copied as-is to the result object, and any extra arguments discarded.
|
||||||
|
|
||||||
|
|
||||||
.. c:function:: PyObject* PyBytes_FromFormatV(const char *format, va_list vargs)
|
.. c:function:: PyObject* PyBytes_FromFormatV(const char *format, va_list vargs)
|
||||||
|
@ -136,11 +136,13 @@ called with a non-bytes parameter.
|
||||||
|
|
||||||
.. c:function:: char* PyBytes_AsString(PyObject *o)
|
.. c:function:: char* PyBytes_AsString(PyObject *o)
|
||||||
|
|
||||||
Return a NUL-terminated representation of the contents of *o*. The pointer
|
Return a pointer to the contents of *o*. The pointer
|
||||||
refers to the internal buffer of *o*, not a copy. The data must not be
|
refers to the internal buffer of *o*, which consists of ``len(o) + 1``
|
||||||
modified in any way, unless the string was just created using
|
bytes. The last byte in the buffer is always null, regardless of
|
||||||
|
whether there are any other null bytes. The data must not be
|
||||||
|
modified in any way, unless the object was just created using
|
||||||
``PyBytes_FromStringAndSize(NULL, size)``. It must not be deallocated. If
|
``PyBytes_FromStringAndSize(NULL, size)``. It must not be deallocated. If
|
||||||
*o* is not a string object at all, :c:func:`PyBytes_AsString` returns *NULL*
|
*o* is not a bytes object at all, :c:func:`PyBytes_AsString` returns *NULL*
|
||||||
and raises :exc:`TypeError`.
|
and raises :exc:`TypeError`.
|
||||||
|
|
||||||
|
|
||||||
|
@ -151,16 +153,18 @@ called with a non-bytes parameter.
|
||||||
|
|
||||||
.. c:function:: int PyBytes_AsStringAndSize(PyObject *obj, char **buffer, Py_ssize_t *length)
|
.. c:function:: int PyBytes_AsStringAndSize(PyObject *obj, char **buffer, Py_ssize_t *length)
|
||||||
|
|
||||||
Return a NUL-terminated representation of the contents of the object *obj*
|
Return the null-terminated contents of the object *obj*
|
||||||
through the output variables *buffer* and *length*.
|
through the output variables *buffer* and *length*.
|
||||||
|
|
||||||
If *length* is *NULL*, the resulting buffer may not contain NUL characters;
|
If *length* is *NULL*, the bytes object
|
||||||
|
may not contain embedded null bytes;
|
||||||
if it does, the function returns ``-1`` and a :exc:`TypeError` is raised.
|
if it does, the function returns ``-1`` and a :exc:`TypeError` is raised.
|
||||||
|
|
||||||
The buffer refers to an internal string buffer of *obj*, not a copy. The data
|
The buffer refers to an internal buffer of *obj*, which includes an
|
||||||
must not be modified in any way, unless the string was just created using
|
additional null byte at the end (not counted in *length*). The data
|
||||||
|
must not be modified in any way, unless the object was just created using
|
||||||
``PyBytes_FromStringAndSize(NULL, size)``. It must not be deallocated. If
|
``PyBytes_FromStringAndSize(NULL, size)``. It must not be deallocated. If
|
||||||
*string* is not a string object at all, :c:func:`PyBytes_AsStringAndSize`
|
*obj* is not a bytes object at all, :c:func:`PyBytes_AsStringAndSize`
|
||||||
returns ``-1`` and raises :exc:`TypeError`.
|
returns ``-1`` and raises :exc:`TypeError`.
|
||||||
|
|
||||||
|
|
||||||
|
@ -168,14 +172,14 @@ called with a non-bytes parameter.
|
||||||
|
|
||||||
Create a new bytes object in *\*bytes* containing the contents of *newpart*
|
Create a new bytes object in *\*bytes* containing the contents of *newpart*
|
||||||
appended to *bytes*; the caller will own the new reference. The reference to
|
appended to *bytes*; the caller will own the new reference. The reference to
|
||||||
the old value of *bytes* will be stolen. If the new string cannot be
|
the old value of *bytes* will be stolen. If the new object cannot be
|
||||||
created, the old reference to *bytes* will still be discarded and the value
|
created, the old reference to *bytes* will still be discarded and the value
|
||||||
of *\*bytes* will be set to *NULL*; the appropriate exception will be set.
|
of *\*bytes* will be set to *NULL*; the appropriate exception will be set.
|
||||||
|
|
||||||
|
|
||||||
.. c:function:: void PyBytes_ConcatAndDel(PyObject **bytes, PyObject *newpart)
|
.. c:function:: void PyBytes_ConcatAndDel(PyObject **bytes, PyObject *newpart)
|
||||||
|
|
||||||
Create a new string object in *\*bytes* containing the contents of *newpart*
|
Create a new bytes object in *\*bytes* containing the contents of *newpart*
|
||||||
appended to *bytes*. This version decrements the reference count of
|
appended to *bytes*. This version decrements the reference count of
|
||||||
*newpart*.
|
*newpart*.
|
||||||
|
|
||||||
|
|
|
@ -227,7 +227,10 @@ access internal read-only data of Unicode objects:
|
||||||
const char* PyUnicode_AS_DATA(PyObject *o)
|
const char* PyUnicode_AS_DATA(PyObject *o)
|
||||||
|
|
||||||
Return a pointer to a :c:type:`Py_UNICODE` representation of the object. The
|
Return a pointer to a :c:type:`Py_UNICODE` representation of the object. The
|
||||||
``AS_DATA`` form casts the pointer to :c:type:`const char *`. *o* has to be
|
returned buffer is always terminated with an extra null code point. It
|
||||||
|
may also contain embedded null code points, which would cause the string
|
||||||
|
to be truncated when used in most C functions. The ``AS_DATA`` form
|
||||||
|
casts the pointer to :c:type:`const char *`. The *o* argument has to be
|
||||||
a Unicode object (not checked).
|
a Unicode object (not checked).
|
||||||
|
|
||||||
.. versionchanged:: 3.3
|
.. versionchanged:: 3.3
|
||||||
|
@ -650,7 +653,8 @@ APIs:
|
||||||
|
|
||||||
Copy the string *u* into a new UCS4 buffer that is allocated using
|
Copy the string *u* into a new UCS4 buffer that is allocated using
|
||||||
:c:func:`PyMem_Malloc`. If this fails, *NULL* is returned with a
|
:c:func:`PyMem_Malloc`. If this fails, *NULL* is returned with a
|
||||||
:exc:`MemoryError` set.
|
:exc:`MemoryError` set. The returned buffer always has an extra
|
||||||
|
null code point appended.
|
||||||
|
|
||||||
.. versionadded:: 3.3
|
.. versionadded:: 3.3
|
||||||
|
|
||||||
|
@ -689,8 +693,9 @@ Extension modules can continue using them, as they will not be removed in Python
|
||||||
Return a read-only pointer to the Unicode object's internal
|
Return a read-only pointer to the Unicode object's internal
|
||||||
:c:type:`Py_UNICODE` buffer, or *NULL* on error. This will create the
|
:c:type:`Py_UNICODE` buffer, or *NULL* on error. This will create the
|
||||||
:c:type:`Py_UNICODE*` representation of the object if it is not yet
|
:c:type:`Py_UNICODE*` representation of the object if it is not yet
|
||||||
available. Note that the resulting :c:type:`Py_UNICODE` string may contain
|
available. The buffer is always terminated with an extra null code point.
|
||||||
embedded null characters, which would cause the string to be truncated when
|
Note that the resulting :c:type:`Py_UNICODE` string may also contain
|
||||||
|
embedded null code points, which would cause the string to be truncated when
|
||||||
used in most C functions.
|
used in most C functions.
|
||||||
|
|
||||||
Please migrate to using :c:func:`PyUnicode_AsUCS4`,
|
Please migrate to using :c:func:`PyUnicode_AsUCS4`,
|
||||||
|
@ -708,8 +713,9 @@ Extension modules can continue using them, as they will not be removed in Python
|
||||||
.. c:function:: Py_UNICODE* PyUnicode_AsUnicodeAndSize(PyObject *unicode, Py_ssize_t *size)
|
.. c:function:: Py_UNICODE* PyUnicode_AsUnicodeAndSize(PyObject *unicode, Py_ssize_t *size)
|
||||||
|
|
||||||
Like :c:func:`PyUnicode_AsUnicode`, but also saves the :c:func:`Py_UNICODE`
|
Like :c:func:`PyUnicode_AsUnicode`, but also saves the :c:func:`Py_UNICODE`
|
||||||
array length in *size*. Note that the resulting :c:type:`Py_UNICODE*` string
|
array length (excluding the extra null terminator) in *size*.
|
||||||
may contain embedded null characters, which would cause the string to be
|
Note that the resulting :c:type:`Py_UNICODE*` string
|
||||||
|
may contain embedded null code points, which would cause the string to be
|
||||||
truncated when used in most C functions.
|
truncated when used in most C functions.
|
||||||
|
|
||||||
.. versionadded:: 3.3
|
.. versionadded:: 3.3
|
||||||
|
@ -717,11 +723,11 @@ Extension modules can continue using them, as they will not be removed in Python
|
||||||
|
|
||||||
.. c:function:: Py_UNICODE* PyUnicode_AsUnicodeCopy(PyObject *unicode)
|
.. c:function:: Py_UNICODE* PyUnicode_AsUnicodeCopy(PyObject *unicode)
|
||||||
|
|
||||||
Create a copy of a Unicode string ending with a nul character. Return *NULL*
|
Create a copy of a Unicode string ending with a null code point. Return *NULL*
|
||||||
and raise a :exc:`MemoryError` exception on memory allocation failure,
|
and raise a :exc:`MemoryError` exception on memory allocation failure,
|
||||||
otherwise return a new allocated buffer (use :c:func:`PyMem_Free` to free
|
otherwise return a new allocated buffer (use :c:func:`PyMem_Free` to free
|
||||||
the buffer). Note that the resulting :c:type:`Py_UNICODE*` string may
|
the buffer). Note that the resulting :c:type:`Py_UNICODE*` string may
|
||||||
contain embedded null characters, which would cause the string to be
|
contain embedded null code points, which would cause the string to be
|
||||||
truncated when used in most C functions.
|
truncated when used in most C functions.
|
||||||
|
|
||||||
.. versionadded:: 3.2
|
.. versionadded:: 3.2
|
||||||
|
@ -902,10 +908,10 @@ wchar_t Support
|
||||||
|
|
||||||
Copy the Unicode object contents into the :c:type:`wchar_t` buffer *w*. At most
|
Copy the Unicode object contents into the :c:type:`wchar_t` buffer *w*. At most
|
||||||
*size* :c:type:`wchar_t` characters are copied (excluding a possibly trailing
|
*size* :c:type:`wchar_t` characters are copied (excluding a possibly trailing
|
||||||
0-termination character). Return the number of :c:type:`wchar_t` characters
|
null termination character). Return the number of :c:type:`wchar_t` characters
|
||||||
copied or -1 in case of an error. Note that the resulting :c:type:`wchar_t*`
|
copied or -1 in case of an error. Note that the resulting :c:type:`wchar_t*`
|
||||||
string may or may not be 0-terminated. It is the responsibility of the caller
|
string may or may not be null-terminated. It is the responsibility of the caller
|
||||||
to make sure that the :c:type:`wchar_t*` string is 0-terminated in case this is
|
to make sure that the :c:type:`wchar_t*` string is null-terminated in case this is
|
||||||
required by the application. Also, note that the :c:type:`wchar_t*` string
|
required by the application. Also, note that the :c:type:`wchar_t*` string
|
||||||
might contain null characters, which would cause the string to be truncated
|
might contain null characters, which would cause the string to be truncated
|
||||||
when used with most C functions.
|
when used with most C functions.
|
||||||
|
@ -914,8 +920,8 @@ wchar_t Support
|
||||||
.. c:function:: wchar_t* PyUnicode_AsWideCharString(PyObject *unicode, Py_ssize_t *size)
|
.. c:function:: wchar_t* PyUnicode_AsWideCharString(PyObject *unicode, Py_ssize_t *size)
|
||||||
|
|
||||||
Convert the Unicode object to a wide character string. The output string
|
Convert the Unicode object to a wide character string. The output string
|
||||||
always ends with a nul character. If *size* is not *NULL*, write the number
|
always ends with a null character. If *size* is not *NULL*, write the number
|
||||||
of wide characters (excluding the trailing 0-termination character) into
|
of wide characters (excluding the trailing null termination character) into
|
||||||
*\*size*.
|
*\*size*.
|
||||||
|
|
||||||
Returns a buffer allocated by :c:func:`PyMem_Alloc` (use
|
Returns a buffer allocated by :c:func:`PyMem_Alloc` (use
|
||||||
|
@ -1045,9 +1051,11 @@ These are the UTF-8 codec APIs:
|
||||||
|
|
||||||
.. c:function:: char* PyUnicode_AsUTF8AndSize(PyObject *unicode, Py_ssize_t *size)
|
.. c:function:: char* PyUnicode_AsUTF8AndSize(PyObject *unicode, Py_ssize_t *size)
|
||||||
|
|
||||||
Return a pointer to the default encoding (UTF-8) of the Unicode object, and
|
Return a pointer to the UTF-8 encoding of the Unicode object, and
|
||||||
store the size of the encoded representation (in bytes) in *size*. *size*
|
store the size of the encoded representation (in bytes) in *size*. The
|
||||||
can be *NULL*, in this case no size will be stored.
|
*size* argument can be *NULL*; in this case no size will be stored. The
|
||||||
|
returned buffer always has an extra null byte appended (not included in
|
||||||
|
*size*), regardless of whether there are any other null code points.
|
||||||
|
|
||||||
In the case of an error, *NULL* is returned with an exception set and no
|
In the case of an error, *NULL* is returned with an exception set and no
|
||||||
*size* is stored.
|
*size* is stored.
|
||||||
|
|
Loading…
Reference in New Issue