mirror of https://github.com/python/cpython
gh-99146 struct module documentation should have more predictable examples/warnings (GH-99141)
* nail down a couple examples to have more predictable output * update a number of things, but this is really just a stash... * added an applications section to describe typical uses for native and machine-independent formats * make sure all format strings use a format prefix character * responding to comments from @gpshead. Not likely finished yet. * This got more involved than I expected... * respond to several PR comments * a lot of wordsmithing * try and be more consistent in use of ``x`` vs ``'x'`` * expand examples a bit * update the "see also" to be more up-to-date * original examples relied on import * so present all examples as if * reformat based on @gpshead comment (missed before) * responding to comments * missed this * one more suggested edit * wordsmithing
This commit is contained in:
parent
5d41833cc0
commit
22d91c16bb
|
@ -12,21 +12,25 @@
|
|||
|
||||
--------------
|
||||
|
||||
This module performs conversions between Python values and C structs represented
|
||||
as Python :class:`bytes` objects. This can be used in handling binary data
|
||||
stored in files or from network connections, among other sources. It uses
|
||||
:ref:`struct-format-strings` as compact descriptions of the layout of the C
|
||||
structs and the intended conversion to/from Python values.
|
||||
This module converts between Python values and C structs represented
|
||||
as Python :class:`bytes` objects. Compact :ref:`format strings <struct-format-strings>`
|
||||
describe the intended conversions to/from Python values.
|
||||
The module's functions and objects can be used for two largely
|
||||
distinct applications, data exchange with external sources (files or
|
||||
network connections), or data transfer between the Python application
|
||||
and the C layer.
|
||||
|
||||
.. note::
|
||||
|
||||
By default, the result of packing a given C struct includes pad bytes in
|
||||
order to maintain proper alignment for the C types involved; similarly,
|
||||
alignment is taken into account when unpacking. This behavior is chosen so
|
||||
that the bytes of a packed struct correspond exactly to the layout in memory
|
||||
of the corresponding C struct. To handle platform-independent data formats
|
||||
or omit implicit pad bytes, use ``standard`` size and alignment instead of
|
||||
``native`` size and alignment: see :ref:`struct-alignment` for details.
|
||||
When no prefix character is given, native mode is the default. It
|
||||
packs or unpacks data based on the platform and compiler on which
|
||||
the Python interpreter was built.
|
||||
The result of packing a given C struct includes pad bytes which
|
||||
maintain proper alignment for the C types involved; similarly,
|
||||
alignment is taken into account when unpacking. In contrast, when
|
||||
communicating data between external sources, the programmer is
|
||||
responsible for defining byte ordering and padding between elements.
|
||||
See :ref:`struct-alignment` for details.
|
||||
|
||||
Several :mod:`struct` functions (and methods of :class:`Struct`) take a *buffer*
|
||||
argument. This refers to objects that implement the :ref:`bufferobjects` and
|
||||
|
@ -102,10 +106,13 @@ The module defines the following exception and functions:
|
|||
Format Strings
|
||||
--------------
|
||||
|
||||
Format strings are the mechanism used to specify the expected layout when
|
||||
packing and unpacking data. They are built up from :ref:`format-characters`,
|
||||
which specify the type of data being packed/unpacked. In addition, there are
|
||||
special characters for controlling the :ref:`struct-alignment`.
|
||||
Format strings describe the data layout when
|
||||
packing and unpacking data. They are built up from :ref:`format characters<format-characters>`,
|
||||
which specify the type of data being packed/unpacked. In addition,
|
||||
special characters control the :ref:`byte order, size and alignment<struct-alignment>`.
|
||||
Each format string consists of an optional prefix character which
|
||||
describes the overall properties of the data and one or more format
|
||||
characters which describe the actual data values and padding.
|
||||
|
||||
|
||||
.. _struct-alignment:
|
||||
|
@ -116,6 +123,11 @@ Byte Order, Size, and Alignment
|
|||
By default, C types are represented in the machine's native format and byte
|
||||
order, and properly aligned by skipping pad bytes if necessary (according to the
|
||||
rules used by the C compiler).
|
||||
This behavior is chosen so
|
||||
that the bytes of a packed struct correspond exactly to the memory layout
|
||||
of the corresponding C struct.
|
||||
Whether to use native byte ordering
|
||||
and padding or standard formats depends on the application.
|
||||
|
||||
.. index::
|
||||
single: @ (at); in struct format strings
|
||||
|
@ -144,12 +156,10 @@ following table:
|
|||
|
||||
If the first character is not one of these, ``'@'`` is assumed.
|
||||
|
||||
Native byte order is big-endian or little-endian, depending on the host
|
||||
system. For example, Intel x86 and AMD64 (x86-64) are little-endian;
|
||||
IBM z and most legacy architectures are big-endian;
|
||||
and ARM, RISC-V and IBM Power feature switchable endianness
|
||||
(bi-endian, though the former two are nearly always little-endian in practice).
|
||||
Use ``sys.byteorder`` to check the endianness of your system.
|
||||
Native byte order is big-endian or little-endian, depending on the
|
||||
host system. For example, Intel x86, AMD64 (x86-64), and Apple M1 are
|
||||
little-endian; IBM z and many legacy architectures are big-endian.
|
||||
Use :data:`sys.byteorder` to check the endianness of your system.
|
||||
|
||||
Native size and alignment are determined using the C compiler's
|
||||
``sizeof`` expression. This is always combined with native byte order.
|
||||
|
@ -231,9 +241,9 @@ platform-dependent.
|
|||
+--------+--------------------------+--------------------+----------------+------------+
|
||||
| ``d`` | :c:expr:`double` | float | 8 | \(4) |
|
||||
+--------+--------------------------+--------------------+----------------+------------+
|
||||
| ``s`` | :c:expr:`char[]` | bytes | | |
|
||||
| ``s`` | :c:expr:`char[]` | bytes | | \(9) |
|
||||
+--------+--------------------------+--------------------+----------------+------------+
|
||||
| ``p`` | :c:expr:`char[]` | bytes | | |
|
||||
| ``p`` | :c:expr:`char[]` | bytes | | \(8) |
|
||||
+--------+--------------------------+--------------------+----------------+------------+
|
||||
| ``P`` | :c:expr:`void \*` | integer | | \(5) |
|
||||
+--------+--------------------------+--------------------+----------------+------------+
|
||||
|
@ -292,8 +302,33 @@ Notes:
|
|||
format <half precision format_>`_ for more information.
|
||||
|
||||
(7)
|
||||
For padding, ``x`` inserts null bytes.
|
||||
When packing, ``'x'`` inserts one NUL byte.
|
||||
|
||||
(8)
|
||||
The ``'p'`` format character encodes a "Pascal string", meaning a short
|
||||
variable-length string stored in a *fixed number of bytes*, given by the count.
|
||||
The first byte stored is the length of the string, or 255, whichever is
|
||||
smaller. The bytes of the string follow. If the string passed in to
|
||||
:func:`pack` is too long (longer than the count minus 1), only the leading
|
||||
``count-1`` bytes of the string are stored. If the string is shorter than
|
||||
``count-1``, it is padded with null bytes so that exactly count bytes in all
|
||||
are used. Note that for :func:`unpack`, the ``'p'`` format character consumes
|
||||
``count`` bytes, but that the string returned can never contain more than 255
|
||||
bytes.
|
||||
|
||||
(9)
|
||||
For the ``'s'`` format character, the count is interpreted as the length of the
|
||||
bytes, not a repeat count like for the other format characters; for example,
|
||||
``'10s'`` means a single 10-byte string mapping to or from a single
|
||||
Python byte string, while ``'10c'`` means 10
|
||||
separate one byte character elements (e.g., ``cccccccccc``) mapping
|
||||
to or from ten different Python byte objects. (See :ref:`struct-examples`
|
||||
for a concrete demonstration of the difference.)
|
||||
If a count is not given, it defaults to 1. For packing, the string is
|
||||
truncated or padded with null bytes as appropriate to make it fit. For
|
||||
unpacking, the resulting bytes object always has exactly the specified number
|
||||
of bytes. As a special case, ``'0s'`` means a single, empty string (while
|
||||
``'0c'`` means 0 characters).
|
||||
|
||||
A format character may be preceded by an integral repeat count. For example,
|
||||
the format string ``'4h'`` means exactly the same as ``'hhhh'``.
|
||||
|
@ -301,15 +336,6 @@ the format string ``'4h'`` means exactly the same as ``'hhhh'``.
|
|||
Whitespace characters between formats are ignored; a count and its format must
|
||||
not contain whitespace though.
|
||||
|
||||
For the ``'s'`` format character, the count is interpreted as the length of the
|
||||
bytes, not a repeat count like for the other format characters; for example,
|
||||
``'10s'`` means a single 10-byte string, while ``'10c'`` means 10 characters.
|
||||
If a count is not given, it defaults to 1. For packing, the string is
|
||||
truncated or padded with null bytes as appropriate to make it fit. For
|
||||
unpacking, the resulting bytes object always has exactly the specified number
|
||||
of bytes. As a special case, ``'0s'`` means a single, empty string (while
|
||||
``'0c'`` means 0 characters).
|
||||
|
||||
When packing a value ``x`` using one of the integer formats (``'b'``,
|
||||
``'B'``, ``'h'``, ``'H'``, ``'i'``, ``'I'``, ``'l'``, ``'L'``,
|
||||
``'q'``, ``'Q'``), if ``x`` is outside the valid range for that format
|
||||
|
@ -319,17 +345,6 @@ then :exc:`struct.error` is raised.
|
|||
Previously, some of the integer formats wrapped out-of-range values and
|
||||
raised :exc:`DeprecationWarning` instead of :exc:`struct.error`.
|
||||
|
||||
The ``'p'`` format character encodes a "Pascal string", meaning a short
|
||||
variable-length string stored in a *fixed number of bytes*, given by the count.
|
||||
The first byte stored is the length of the string, or 255, whichever is
|
||||
smaller. The bytes of the string follow. If the string passed in to
|
||||
:func:`pack` is too long (longer than the count minus 1), only the leading
|
||||
``count-1`` bytes of the string are stored. If the string is shorter than
|
||||
``count-1``, it is padded with null bytes so that exactly count bytes in all
|
||||
are used. Note that for :func:`unpack`, the ``'p'`` format character consumes
|
||||
``count`` bytes, but that the string returned can never contain more than 255
|
||||
bytes.
|
||||
|
||||
.. index:: single: ? (question mark); in struct format strings
|
||||
|
||||
For the ``'?'`` format character, the return value is either :const:`True` or
|
||||
|
@ -345,18 +360,36 @@ Examples
|
|||
^^^^^^^^
|
||||
|
||||
.. note::
|
||||
All examples assume a native byte order, size, and alignment with a
|
||||
big-endian machine.
|
||||
Native byte order examples (designated by the ``'@'`` format prefix or
|
||||
lack of any prefix character) may not match what the reader's
|
||||
machine produces as
|
||||
that depends on the platform and compiler.
|
||||
|
||||
A basic example of packing/unpacking three integers::
|
||||
Pack and unpack integers of three different sizes, using big endian
|
||||
ordering::
|
||||
|
||||
>>> from struct import *
|
||||
>>> pack('hhl', 1, 2, 3)
|
||||
b'\x00\x01\x00\x02\x00\x00\x00\x03'
|
||||
>>> unpack('hhl', b'\x00\x01\x00\x02\x00\x00\x00\x03')
|
||||
>>> pack(">bhl", 1, 2, 3)
|
||||
b'\x01\x00\x02\x00\x00\x00\x03'
|
||||
>>> unpack('>bhl', b'\x01\x00\x02\x00\x00\x00\x03'
|
||||
(1, 2, 3)
|
||||
>>> calcsize('hhl')
|
||||
8
|
||||
>>> calcsize('>bhl')
|
||||
7
|
||||
|
||||
Attempt to pack an integer which is too large for the defined field::
|
||||
|
||||
>>> pack(">h", 99999)
|
||||
Traceback (most recent call last):
|
||||
File "<stdin>", line 1, in <module>
|
||||
struct.error: 'h' format requires -32768 <= number <= 32767
|
||||
|
||||
Demonstrate the difference between ``'s'`` and ``'c'`` format
|
||||
characters::
|
||||
|
||||
>>> pack("@ccc", b'1', b'2', b'3')
|
||||
b'123'
|
||||
>>> pack("@3s", b'123')
|
||||
b'123'
|
||||
|
||||
Unpacked fields can be named by assigning them to variables or by wrapping
|
||||
the result in a named tuple::
|
||||
|
@ -369,35 +402,132 @@ the result in a named tuple::
|
|||
>>> Student._make(unpack('<10sHHb', record))
|
||||
Student(name=b'raymond ', serialnum=4658, school=264, gradelevel=8)
|
||||
|
||||
The ordering of format characters may have an impact on size since the padding
|
||||
needed to satisfy alignment requirements is different::
|
||||
The ordering of format characters may have an impact on size in native
|
||||
mode since padding is implicit. In standard mode, the user is
|
||||
responsible for inserting any desired padding.
|
||||
Note in
|
||||
the first ``pack`` call below that three NUL bytes were added after the
|
||||
packed ``'#'`` to align the following integer on a four-byte boundary.
|
||||
In this example, the output was produced on a little endian machine::
|
||||
|
||||
>>> pack('ci', b'*', 0x12131415)
|
||||
b'*\x00\x00\x00\x12\x13\x14\x15'
|
||||
>>> pack('ic', 0x12131415, b'*')
|
||||
b'\x12\x13\x14\x15*'
|
||||
>>> calcsize('ci')
|
||||
>>> pack('@ci', b'#', 0x12131415)
|
||||
b'#\x00\x00\x00\x15\x14\x13\x12'
|
||||
>>> pack('@ic', 0x12131415, b'#')
|
||||
b'\x15\x14\x13\x12#'
|
||||
>>> calcsize('@ci')
|
||||
8
|
||||
>>> calcsize('ic')
|
||||
>>> calcsize('@ic')
|
||||
5
|
||||
|
||||
The following format ``'llh0l'`` specifies two pad bytes at the end, assuming
|
||||
longs are aligned on 4-byte boundaries::
|
||||
The following format ``'llh0l'`` results in two pad bytes being added
|
||||
at the end, assuming the platform's longs are aligned on 4-byte boundaries::
|
||||
|
||||
>>> pack('llh0l', 1, 2, 3)
|
||||
>>> pack('@llh0l', 1, 2, 3)
|
||||
b'\x00\x00\x00\x01\x00\x00\x00\x02\x00\x03\x00\x00'
|
||||
|
||||
This only works when native size and alignment are in effect; standard size and
|
||||
alignment does not enforce any alignment.
|
||||
|
||||
|
||||
.. seealso::
|
||||
|
||||
Module :mod:`array`
|
||||
Packed binary storage of homogeneous data.
|
||||
|
||||
Module :mod:`xdrlib`
|
||||
Packing and unpacking of XDR data.
|
||||
Module :mod:`json`
|
||||
JSON encoder and decoder.
|
||||
|
||||
Module :mod:`pickle`
|
||||
Python object serialization.
|
||||
|
||||
|
||||
.. _applications:
|
||||
|
||||
Applications
|
||||
------------
|
||||
|
||||
Two main applications for the :mod:`struct` module exist, data
|
||||
interchange between Python and C code within an application or another
|
||||
application compiled using the same compiler (:ref:`native formats<struct-native-formats>`), and
|
||||
data interchange between applications using agreed upon data layout
|
||||
(:ref:`standard formats<struct-standard-formats>`). Generally speaking, the format strings
|
||||
constructed for these two domains are distinct.
|
||||
|
||||
|
||||
.. _struct-native-formats:
|
||||
|
||||
Native Formats
|
||||
^^^^^^^^^^^^^^
|
||||
|
||||
When constructing format strings which mimic native layouts, the
|
||||
compiler and machine architecture determine byte ordering and padding.
|
||||
In such cases, the ``@`` format character should be used to specify
|
||||
native byte ordering and data sizes. Internal pad bytes are normally inserted
|
||||
automatically. It is possible that a zero-repeat format code will be
|
||||
needed at the end of a format string to round up to the correct
|
||||
byte boundary for proper alignment of consective chunks of data.
|
||||
|
||||
Consider these two simple examples (on a 64-bit, little-endian
|
||||
machine)::
|
||||
|
||||
>>> calcsize('@lhl')
|
||||
24
|
||||
>>> calcsize('@llh')
|
||||
18
|
||||
|
||||
Data is not padded to an 8-byte boundary at the end of the second
|
||||
format string without the use of extra padding. A zero-repeat format
|
||||
code solves that problem::
|
||||
|
||||
>>> calcsize('@llh0l')
|
||||
24
|
||||
|
||||
The ``'x'`` format code can be used to specify the repeat, but for
|
||||
native formats it is better to use a zero-repeat format like ``'0l'``.
|
||||
|
||||
By default, native byte ordering and alignment is used, but it is
|
||||
better to be explicit and use the ``'@'`` prefix character.
|
||||
|
||||
|
||||
.. _struct-standard-formats:
|
||||
|
||||
Standard Formats
|
||||
^^^^^^^^^^^^^^^^
|
||||
|
||||
When exchanging data beyond your process such as networking or storage,
|
||||
be precise. Specify the exact byte order, size, and alignment. Do
|
||||
not assume they match the native order of a particular machine.
|
||||
For example, network byte order is big-endian, while many popular CPUs
|
||||
are little-endian. By defining this explicitly, the user need not
|
||||
care about the specifics of the platform their code is running on.
|
||||
The first character should typically be ``<`` or ``>``
|
||||
(or ``!``). Padding is the responsibility of the programmer. The
|
||||
zero-repeat format character won't work. Instead, the user must
|
||||
explicitly add ``'x'`` pad bytes where needed. Revisiting the
|
||||
examples from the previous section, we have::
|
||||
|
||||
>>> calcsize('<qh6xq')
|
||||
24
|
||||
>>> pack('<qh6xq', 1, 2, 3) == pack('@lhl', 1, 2, 3)
|
||||
True
|
||||
>>> calcsize('@llh')
|
||||
18
|
||||
>>> pack('@llh', 1, 2, 3) == pack('<qqh', 1, 2, 3)
|
||||
True
|
||||
>>> calcsize('<qqh6x')
|
||||
24
|
||||
>>> calcsize('@llh0l')
|
||||
24
|
||||
>>> pack('@llh0l', 1, 2, 3) == pack('<qqh6x', 1, 2, 3)
|
||||
True
|
||||
|
||||
The above results (executed on a 64-bit machine) aren't guaranteed to
|
||||
match when executed on different machines. For example, the examples
|
||||
below were executed on a 32-bit machine::
|
||||
|
||||
>>> calcsize('<qqh6x')
|
||||
24
|
||||
>>> calcsize('@llh0l')
|
||||
12
|
||||
>>> pack('@llh0l', 1, 2, 3) == pack('<qqh6x', 1, 2, 3)
|
||||
False
|
||||
|
||||
|
||||
.. _struct-objects:
|
||||
|
@ -411,9 +541,9 @@ The :mod:`struct` module also defines the following type:
|
|||
.. class:: Struct(format)
|
||||
|
||||
Return a new Struct object which writes and reads binary data according to
|
||||
the format string *format*. Creating a Struct object once and calling its
|
||||
methods is more efficient than calling the :mod:`struct` functions with the
|
||||
same format since the format string only needs to be compiled once.
|
||||
the format string *format*. Creating a ``Struct`` object once and calling its
|
||||
methods is more efficient than calling module-level functions with the
|
||||
same format since the format string is only compiled once.
|
||||
|
||||
.. note::
|
||||
|
||||
|
|
Loading…
Reference in New Issue