Correct typos in the codecs module documentation (#15135)

This commit is contained in:
Géry Ogam 2019-09-12 09:41:32 +02:00 committed by Carol Willing
parent 39de95b746
commit 891e9e3b44
1 changed files with 61 additions and 60 deletions

View File

@ -292,7 +292,7 @@ Error Handlers
To simplify and standardize error handling,
codecs may implement different error handling schemes by
accepting the *errors* string argument. The following string values are
accepting the *errors* string argument. The following string values are
defined and implemented by all standard Python codecs:
.. tabularcolumns:: |l|L|
@ -301,11 +301,11 @@ defined and implemented by all standard Python codecs:
| Value | Meaning |
+=========================+===============================================+
| ``'strict'`` | Raise :exc:`UnicodeError` (or a subclass); |
| | this is the default. Implemented in |
| | this is the default. Implemented in |
| | :func:`strict_errors`. |
+-------------------------+-----------------------------------------------+
| ``'ignore'`` | Ignore the malformed data and continue |
| | without further notice. Implemented in |
| | without further notice. Implemented in |
| | :func:`ignore_errors`. |
+-------------------------+-----------------------------------------------+
@ -327,11 +327,11 @@ The following error handlers are only applicable to
| | marker; Python will use the official |
| | ``U+FFFD`` REPLACEMENT CHARACTER for the |
| | built-in codecs on decoding, and '?' on |
| | encoding. Implemented in |
| | encoding. Implemented in |
| | :func:`replace_errors`. |
+-------------------------+-----------------------------------------------+
| ``'xmlcharrefreplace'`` | Replace with the appropriate XML character |
| | reference (only for encoding). Implemented |
| | reference (only for encoding). Implemented |
| | in :func:`xmlcharrefreplace_errors`. |
+-------------------------+-----------------------------------------------+
| ``'backslashreplace'`` | Replace with backslashed escape sequences. |
@ -339,15 +339,15 @@ The following error handlers are only applicable to
| | :func:`backslashreplace_errors`. |
+-------------------------+-----------------------------------------------+
| ``'namereplace'`` | Replace with ``\N{...}`` escape sequences |
| | (only for encoding). Implemented in |
| | (only for encoding). Implemented in |
| | :func:`namereplace_errors`. |
+-------------------------+-----------------------------------------------+
| ``'surrogateescape'`` | On decoding, replace byte with individual |
| | surrogate code ranging from ``U+DC80`` to |
| | ``U+DCFF``. This code will then be turned |
| | ``U+DCFF``. This code will then be turned |
| | back into the same byte when the |
| | ``'surrogateescape'`` error handler is used |
| | when encoding the data. (See :pep:`383` for |
| | when encoding the data. (See :pep:`383` for |
| | more.) |
+-------------------------+-----------------------------------------------+
@ -357,7 +357,7 @@ In addition, the following error handler is specific to the given codecs:
| Value | Codecs | Meaning |
+===================+========================+===========================================+
|``'surrogatepass'``| utf-8, utf-16, utf-32, | Allow encoding and decoding of surrogate |
| | utf-16-be, utf-16-le, | codes. These codecs normally treat the |
| | utf-16-be, utf-16-le, | codes. These codecs normally treat the |
| | utf-32-be, utf-32-le | presence of surrogates as an error. |
+-------------------+------------------------+-------------------------------------------+
@ -388,9 +388,9 @@ handler:
error handler must either raise this or a different exception, or return a
tuple with a replacement for the unencodable part of the input and a position
where encoding should continue. The replacement may be either :class:`str` or
:class:`bytes`. If the replacement is bytes, the encoder will simply copy
:class:`bytes`. If the replacement is bytes, the encoder will simply copy
them into the output buffer. If the replacement is a string, the encoder will
encode the replacement. Encoding continues on original input at the
encode the replacement. Encoding continues on original input at the
specified position. Negative position values will be treated as being
relative to the end of the input string. If the resulting position is out of
bound an :exc:`IndexError` will be raised.
@ -484,7 +484,7 @@ function interfaces of the stateless encoder and decoder:
.. method:: Codec.decode(input[, errors])
Decodes the object *input* and returns a tuple (output object, length
consumed). For instance, for a :term:`text encoding`, decoding converts
consumed). For instance, for a :term:`text encoding`, decoding converts
a bytes object encoded using a particular
character set encoding to a string object.
@ -568,7 +568,7 @@ define in order to be compatible with the Python codec registry.
implementation should make sure that ``0`` is the most common
state. (States that are more complicated than integers can be converted
into an integer by marshaling/pickling the state and encoding the bytes
of the resulting string into an integer).
of the resulting string into an integer.)
.. method:: setstate(state)
@ -751,7 +751,7 @@ compatible with the Python codec registry.
number of encoded bytes or code points to read
for decoding. The decoder can modify this setting as
appropriate. The default value -1 indicates to read and decode as much as
possible. This parameter is intended to
possible. This parameter is intended to
prevent having to decode huge files in one step.
The *firstline* flag indicates that
@ -780,8 +780,8 @@ compatible with the Python codec registry.
Read all lines available on the input stream and return them as a list of
lines.
Line-endings are implemented using the codec's decoder method and are
included in the list entries if *keepends* is true.
Line-endings are implemented using the codec's :meth:`decode` method and
are included in the list entries if *keepends* is true.
*sizehint*, if given, is passed as the *size* argument to the stream's
:meth:`read` method.
@ -791,7 +791,7 @@ compatible with the Python codec registry.
Resets the codec buffers used for keeping state.
Note that no stream repositioning should take place. This method is
Note that no stream repositioning should take place. This method is
primarily intended to be able to recover from decoding errors.
@ -841,7 +841,7 @@ The design is such that one can use the factory functions returned by the
code calling :meth:`read` and :meth:`write`, while *Reader* and *Writer*
work on the backend — the data in *stream*.
You can use these objects to do transparent transcodings from e.g. Latin-1
You can use these objects to do transparent transcodings, e.g., from Latin-1
to UTF-8 and back.
The *stream* argument must be a file-like object.
@ -866,10 +866,10 @@ Encodings and Unicode
---------------------
Strings are stored internally as sequences of code points in
range ``0x0``--``0x10FFFF``. (See :pep:`393` for
range ``0x0``--``0x10FFFF``. (See :pep:`393` for
more details about the implementation.)
Once a string object is used outside of CPU and memory, endianness
and how these arrays are stored as bytes become an issue. As with other
and how these arrays are stored as bytes become an issue. As with other
codecs, serialising a string into a sequence of bytes is known as *encoding*,
and recreating the string from the sequence of bytes is known as *decoding*.
There are a variety of different text serialisation codecs, which are
@ -964,7 +964,7 @@ to determine the byte order used for generating the byte sequence, but as a
signature that helps in guessing the encoding. On encoding the utf-8-sig codec
will write ``0xef``, ``0xbb``, ``0xbf`` as the first three bytes to the file. On
decoding ``utf-8-sig`` will skip those three bytes if they appear as the first
three bytes in the file. In UTF-8, the use of the BOM is discouraged and
three bytes in the file. In UTF-8, the use of the BOM is discouraged and
should generally be avoided.
@ -984,7 +984,7 @@ e.g. ``'utf-8'`` is a valid alias for the ``'utf_8'`` codec.
.. impl-detail::
Some common encodings can bypass the codecs lookup machinery to
improve performance. These optimization opportunities are only
improve performance. These optimization opportunities are only
recognized by CPython for a limited set of (case insensitive)
aliases: utf-8, utf8, latin-1, latin1, iso-8859-1, iso8859-1, mbcs
(Windows only), ascii, us-ascii, utf-16, utf16, utf-32, utf32, and
@ -1145,7 +1145,7 @@ particular, the following variants typically exist:
| iso2022_kr | csiso2022kr, iso2022kr, | Korean |
| | iso-2022-kr | |
+-----------------+--------------------------------+--------------------------------+
| latin_1 | iso-8859-1, iso8859-1, 8859, | West Europe |
| latin_1 | iso-8859-1, iso8859-1, 8859, | Western Europe |
| | cp819, latin, latin1, L1 | |
+-----------------+--------------------------------+--------------------------------+
| iso8859_2 | iso-8859-2, latin2, L2 | Central and Eastern Europe |
@ -1250,11 +1250,11 @@ Python Specific Encodings
-------------------------
A number of predefined codecs are specific to Python, so their codec names have
no meaning outside Python. These are listed in the tables below based on the
no meaning outside Python. These are listed in the tables below based on the
expected input and output types (note that while text encodings are the most
common use case for codecs, the underlying codec infrastructure supports
arbitrary data transforms rather than just text encodings). For asymmetric
codecs, the stated purpose describes the encoding direction.
arbitrary data transforms rather than just text encodings). For asymmetric
codecs, the stated meaning describes the encoding direction.
Text Encodings
^^^^^^^^^^^^^^
@ -1266,27 +1266,27 @@ encodings.
.. tabularcolumns:: |l|p{0.3\linewidth}|p{0.3\linewidth}|
+--------------------+---------+---------------------------+
| Codec | Aliases | Purpose |
| Codec | Aliases | Meaning |
+====================+=========+===========================+
| idna | | Implements :rfc:`3490`, |
| idna | | Implement :rfc:`3490`, |
| | | see also |
| | | :mod:`encodings.idna`. |
| | | Only ``errors='strict'`` |
| | | is supported. |
+--------------------+---------+---------------------------+
| mbcs | ansi, | Windows only: Encode |
| mbcs | ansi, | Windows only: Encode the |
| | dbcs | operand according to the |
| | | ANSI codepage (CP_ACP) |
| | | ANSI codepage (CP_ACP). |
+--------------------+---------+---------------------------+
| oem | | Windows only: Encode |
| oem | | Windows only: Encode the |
| | | operand according to the |
| | | OEM codepage (CP_OEMCP) |
| | | OEM codepage (CP_OEMCP). |
| | | |
| | | .. versionadded:: 3.6 |
+--------------------+---------+---------------------------+
| palmos | | Encoding of PalmOS 3.5 |
| palmos | | Encoding of PalmOS 3.5. |
+--------------------+---------+---------------------------+
| punycode | | Implements :rfc:`3492`. |
| punycode | | Implement :rfc:`3492`. |
| | | Stateful codecs are not |
| | | supported. |
+--------------------+---------+---------------------------+
@ -1309,8 +1309,8 @@ encodings.
| | | literal in ASCII-encoded |
| | | Python source code, |
| | | except that quotes are |
| | | not escaped. Decodes from |
| | | Latin-1 source code. |
| | | not escaped. Decode |
| | | from Latin-1 source code. |
| | | Beware that Python source |
| | | code actually uses UTF-8 |
| | | by default. |
@ -1326,19 +1326,19 @@ Binary Transforms
^^^^^^^^^^^^^^^^^
The following codecs provide binary transforms: :term:`bytes-like object`
to :class:`bytes` mappings. They are not supported by :meth:`bytes.decode`
to :class:`bytes` mappings. They are not supported by :meth:`bytes.decode`
(which only produces :class:`str` output).
.. tabularcolumns:: |l|L|L|L|
+----------------------+------------------+------------------------------+------------------------------+
| Codec | Aliases | Purpose | Encoder / decoder |
| Codec | Aliases | Meaning | Encoder / decoder |
+======================+==================+==============================+==============================+
| base64_codec [#b64]_ | base64, base_64 | Convert operand to multiline | :meth:`base64.encodebytes` / |
| | | MIME base64 (the result | :meth:`base64.decodebytes` |
| | | always includes a trailing | |
| | | ``'\n'``) | |
| base64_codec [#b64]_ | base64, base_64 | Convert the operand to | :meth:`base64.encodebytes` / |
| | | multiline MIME base64 (the | :meth:`base64.decodebytes` |
| | | result always includes a | |
| | | trailing ``'\n'``). | |
| | | | |
| | | .. versionchanged:: 3.4 | |
| | | accepts any | |
@ -1346,23 +1346,23 @@ to :class:`bytes` mappings. They are not supported by :meth:`bytes.decode`
| | | as input for encoding and | |
| | | decoding | |
+----------------------+------------------+------------------------------+------------------------------+
| bz2_codec | bz2 | Compress the operand | :meth:`bz2.compress` / |
| | | using bz2 | :meth:`bz2.decompress` |
| bz2_codec | bz2 | Compress the operand using | :meth:`bz2.compress` / |
| | | bz2. | :meth:`bz2.decompress` |
+----------------------+------------------+------------------------------+------------------------------+
| hex_codec | hex | Convert operand to | :meth:`binascii.b2a_hex` / |
| hex_codec | hex | Convert the operand to | :meth:`binascii.b2a_hex` / |
| | | hexadecimal | :meth:`binascii.a2b_hex` |
| | | representation, with two | |
| | | digits per byte | |
| | | digits per byte. | |
+----------------------+------------------+------------------------------+------------------------------+
| quopri_codec | quopri, | Convert operand to MIME | :meth:`quopri.encode` with |
| | quotedprintable, | quoted printable | ``quotetabs=True`` / |
| quopri_codec | quopri, | Convert the operand to MIME | :meth:`quopri.encode` with |
| | quotedprintable, | quoted printable. | ``quotetabs=True`` / |
| | quoted_printable | | :meth:`quopri.decode` |
+----------------------+------------------+------------------------------+------------------------------+
| uu_codec | uu | Convert the operand using | :meth:`uu.encode` / |
| | | uuencode | :meth:`uu.decode` |
| | | uuencode. | :meth:`uu.decode` |
+----------------------+------------------+------------------------------+------------------------------+
| zlib_codec | zip, zlib | Compress the operand | :meth:`zlib.compress` / |
| | | using gzip | :meth:`zlib.decompress` |
| zlib_codec | zip, zlib | Compress the operand using | :meth:`zlib.compress` / |
| | | gzip. | :meth:`zlib.decompress` |
+----------------------+------------------+------------------------------+------------------------------+
.. [#b64] In addition to :term:`bytes-like objects <bytes-like object>`,
@ -1382,16 +1382,17 @@ Text Transforms
^^^^^^^^^^^^^^^
The following codec provides a text transform: a :class:`str` to :class:`str`
mapping. It is not supported by :meth:`str.encode` (which only produces
mapping. It is not supported by :meth:`str.encode` (which only produces
:class:`bytes` output).
.. tabularcolumns:: |l|l|L|
+--------------------+---------+---------------------------+
| Codec | Aliases | Purpose |
| Codec | Aliases | Meaning |
+====================+=========+===========================+
| rot_13 | rot13 | Returns the Caesar-cypher |
| | | encryption of the operand |
| rot_13 | rot13 | Return the Caesar-cypher |
| | | encryption of the |
| | | operand. |
+--------------------+---------+---------------------------+
.. versionadded:: 3.2
@ -1429,7 +1430,7 @@ conversion between Unicode and ACE, separating an input string into labels
based on the separator characters defined in :rfc:`section 3.1 of RFC 3490 <3490#section-3.1>`
and converting each label to ACE as required, and conversely separating an input
byte string into labels based on the ``.`` separator and converting any ACE
labels found into unicode. Furthermore, the :mod:`socket` module
labels found into unicode. Furthermore, the :mod:`socket` module
transparently converts Unicode host names to ACE, so that applications need not
be concerned about converting host names themselves when they pass them to the
socket module. On top of that, modules that have host names as function
@ -1438,7 +1439,7 @@ names (:mod:`http.client` then also transparently sends an IDNA hostname in the
:mailheader:`Host` field if it sends that field at all).
When receiving host names from the wire (such as in reverse name lookup), no
automatic conversion to Unicode is performed: Applications wishing to present
automatic conversion to Unicode is performed: applications wishing to present
such host names to the user should decode them to Unicode.
The module :mod:`encodings.idna` also implements the nameprep procedure, which
@ -1470,7 +1471,7 @@ functions can be used directly if desired.
.. module:: encodings.mbcs
:synopsis: Windows ANSI codepage
Encode operand according to the ANSI codepage (CP_ACP).
This module implements the ANSI codepage (CP_ACP).
.. availability:: Windows only.
@ -1489,7 +1490,7 @@ Encode operand according to the ANSI codepage (CP_ACP).
:synopsis: UTF-8 codec with BOM signature
.. moduleauthor:: Walter Dörwald
This module implements a variant of the UTF-8 codec: On encoding a UTF-8 encoded
This module implements a variant of the UTF-8 codec. On encoding, a UTF-8 encoded
BOM will be prepended to the UTF-8 encoded bytes. For the stateful encoder this
is only done once (on the first write to the byte stream). For decoding an
is only done once (on the first write to the byte stream). On decoding, an
optional UTF-8 encoded BOM at the start of the data will be skipped.