From 49f6ff719c4e0beeafd6c42edd696601acf72764 Mon Sep 17 00:00:00 2001 From: Petr Viktorin Date: Fri, 23 Dec 2022 16:00:21 +0100 Subject: [PATCH] gh-98712: Clarify "readonly bytes-like object" semantics in C arg-parsing docs (#98710) --- Doc/c-api/arg.rst | 54 ++++++++++++++++++++++++++++++----------------- 1 file changed, 35 insertions(+), 19 deletions(-) diff --git a/Doc/c-api/arg.rst b/Doc/c-api/arg.rst index c5be453c153..9713431688d 100644 --- a/Doc/c-api/arg.rst +++ b/Doc/c-api/arg.rst @@ -34,24 +34,39 @@ These formats allow accessing an object as a contiguous chunk of memory. You don't have to provide raw storage for the returned unicode or bytes area. -In general, when a format sets a pointer to a buffer, the buffer is -managed by the corresponding Python object, and the buffer shares -the lifetime of this object. You won't have to release any memory yourself. -The only exceptions are ``es``, ``es#``, ``et`` and ``et#``. - -However, when a :c:type:`Py_buffer` structure gets filled, the underlying -buffer is locked so that the caller can subsequently use the buffer even -inside a :c:type:`Py_BEGIN_ALLOW_THREADS` block without the risk of mutable data -being resized or destroyed. As a result, **you have to call** -:c:func:`PyBuffer_Release` after you have finished processing the data (or -in any early abort case). - Unless otherwise stated, buffers are not NUL-terminated. -Some formats require a read-only :term:`bytes-like object`, and set a -pointer instead of a buffer structure. They work by checking that -the object's :c:member:`PyBufferProcs.bf_releasebuffer` field is ``NULL``, -which disallows mutable objects such as :class:`bytearray`. +There are three ways strings and buffers can be converted to C: + +* Formats such as ``y*`` and ``s*`` fill a :c:type:`Py_buffer` structure. + This locks the underlying buffer so that the caller can subsequently use + the buffer even inside a :c:type:`Py_BEGIN_ALLOW_THREADS` + block without the risk of mutable data being resized or destroyed. + As a result, **you have to call** :c:func:`PyBuffer_Release` after you have + finished processing the data (or in any early abort case). + +* The ``es``, ``es#``, ``et`` and ``et#`` formats allocate the result buffer. + **You have to call** :c:func:`PyMem_Free` after you have finished + processing the data (or in any early abort case). + +* .. _c-arg-borrowed-buffer: + + Other formats take a :class:`str` or a read-only :term:`bytes-like object`, + such as :class:`bytes`, and provide a ``const char *`` pointer to + its buffer. + In this case the buffer is "borrowed": it is managed by the corresponding + Python object, and shares the lifetime of this object. + You won't have to release any memory yourself. + + To ensure that the underlying buffer may be safely borrowed, the object's + :c:member:`PyBufferProcs.bf_releasebuffer` field must be ``NULL``. + This disallows common mutable objects such as :class:`bytearray`, + but also some read-only objects such as :class:`memoryview` of + :class:`bytes`. + + Besides this ``bf_releasebuffer`` requirement, there is no check to verify + whether the input object is immutable (e.g. whether it would honor a request + for a writable buffer, or whether another thread can mutate the data). .. note:: @@ -89,7 +104,7 @@ which disallows mutable objects such as :class:`bytearray`. Unicode objects are converted to C strings using ``'utf-8'`` encoding. ``s#`` (:class:`str`, read-only :term:`bytes-like object`) [const char \*, :c:type:`Py_ssize_t`] - Like ``s*``, except that it doesn't accept mutable objects. + Like ``s*``, except that it provides a :ref:`borrowed buffer `. The result is stored into two C variables, the first one a pointer to a C string, the second one its length. The string may contain embedded null bytes. Unicode objects are converted @@ -108,8 +123,9 @@ which disallows mutable objects such as :class:`bytearray`. pointer is set to ``NULL``. ``y`` (read-only :term:`bytes-like object`) [const char \*] - This format converts a bytes-like object to a C pointer to a character - string; it does not accept Unicode objects. The bytes buffer must not + This format converts a bytes-like object to a C pointer to a + :ref:`borrowed ` character string; + it does not accept Unicode objects. The bytes buffer must not contain embedded null bytes; if it does, a :exc:`ValueError` exception is raised.