This commit is contained in:
Brett Cannon 2013-06-16 13:14:06 -04:00
commit 645ab68f25
5 changed files with 41 additions and 15 deletions

View File

@ -78,7 +78,11 @@ It defines the following functions:
reference (for encoding only) reference (for encoding only)
* ``'backslashreplace'``: replace with backslashed escape sequences (for * ``'backslashreplace'``: replace with backslashed escape sequences (for
encoding only) encoding only)
* ``'surrogateescape'``: replace with surrogate U+DCxx, see :pep:`383` * ``'surrogateescape'``: on decoding, replace with code points in the Unicode
Private Use Area ranging from U+DC80 to U+DCFF. These private code
points will then be turned back into the same bytes when the
``surrogateescape`` error handler is used when encoding the data.
(See :pep:`383` for more.)
as well as any other error handling name defined via :func:`register_error`. as well as any other error handling name defined via :func:`register_error`.

View File

@ -905,16 +905,36 @@ are always available. They are listed here in alphabetical order.
the list of supported encodings. the list of supported encodings.
*errors* is an optional string that specifies how encoding and decoding *errors* is an optional string that specifies how encoding and decoding
errors are to be handled--this cannot be used in binary mode. Pass errors are to be handled--this cannot be used in binary mode.
``'strict'`` to raise a :exc:`ValueError` exception if there is an encoding A variety of standard error handlers are available, though any
error (the default of ``None`` has the same effect), or pass ``'ignore'`` to error handling name that has been registered with
ignore errors. (Note that ignoring encoding errors can lead to data loss.) :func:`codecs.register_error` is also valid. The standard names
``'replace'`` causes a replacement marker (such as ``'?'``) to be inserted are:
where there is malformed data. When writing, ``'xmlcharrefreplace'``
(replace with the appropriate XML character reference) or * ``'strict'`` to raise a :exc:`ValueError` exception if there is
``'backslashreplace'`` (replace with backslashed escape sequences) can be an encoding error. The default value of ``None`` has the same
used. Any other error handling name that has been registered with effect.
:func:`codecs.register_error` is also valid.
* ``'ignore'`` ignores errors. Note that ignoring encoding errors
can lead to data loss.
* ``'replace'`` causes a replacement marker (such as ``'?'``) to be inserted
where there is malformed data.
* ``'surrogateescape'`` will represent any incorrect bytes as code
points in the Unicode Private Use Area ranging from U+DC80 to
U+DCFF. These private code points will then be turned back into
the same bytes when the ``surrogateescape`` error handler is used
when writing data. This is useful for processing files in an
unknown encoding.
* ``'xmlcharrefreplace'`` is only supported when writing to a file.
Characters not supported by the encoding are replaced with the
appropriate XML character reference ``&#nnn;``.
* ``'backslashreplace'`` (also only supported when writing)
replaces unsupported characters with Python's backslashed escape
sequences.
.. index:: .. index::
single: universal newlines; open() built-in function single: universal newlines; open() built-in function

View File

@ -105,6 +105,7 @@ class Codec:
Python will use the official U+FFFD REPLACEMENT Python will use the official U+FFFD REPLACEMENT
CHARACTER for the builtin Unicode codecs on CHARACTER for the builtin Unicode codecs on
decoding and '?' on encoding. decoding and '?' on encoding.
'surrogateescape' - replace with private codepoints U+DCnn.
'xmlcharrefreplace' - Replace with the appropriate XML 'xmlcharrefreplace' - Replace with the appropriate XML
character reference (only for encoding). character reference (only for encoding).
'backslashreplace' - Replace with backslashed escape sequences 'backslashreplace' - Replace with backslashed escape sequences

View File

@ -168,8 +168,8 @@ PyDoc_STRVAR(open_doc,
"'strict' to raise a ValueError exception if there is an encoding error\n" "'strict' to raise a ValueError exception if there is an encoding error\n"
"(the default of None has the same effect), or pass 'ignore' to ignore\n" "(the default of None has the same effect), or pass 'ignore' to ignore\n"
"errors. (Note that ignoring encoding errors can lead to data loss.)\n" "errors. (Note that ignoring encoding errors can lead to data loss.)\n"
"See the documentation for codecs.register for a list of the permitted\n" "See the documentation for codecs.register or run 'help(codecs.Codec)'\n"
"encoding error strings.\n" "for a list of the permitted encoding error strings.\n"
"\n" "\n"
"newline controls how universal newlines works (it only applies to text\n" "newline controls how universal newlines works (it only applies to text\n"
"mode). It can be None, '', '\\n', '\\r', and '\\r\\n'. It works as\n" "mode). It can be None, '', '\\n', '\\r', and '\\r\\n'. It works as\n"

View File

@ -642,8 +642,9 @@ PyDoc_STRVAR(textiowrapper_doc,
"encoding gives the name of the encoding that the stream will be\n" "encoding gives the name of the encoding that the stream will be\n"
"decoded or encoded with. It defaults to locale.getpreferredencoding(False).\n" "decoded or encoded with. It defaults to locale.getpreferredencoding(False).\n"
"\n" "\n"
"errors determines the strictness of encoding and decoding (see the\n" "errors determines the strictness of encoding and decoding (see\n"
"codecs.register) and defaults to \"strict\".\n" "help(codecs.Codec) or the documentation for codecs.register) and\n"
"defaults to \"strict\".\n"
"\n" "\n"
"newline controls how line endings are handled. It can be None, '',\n" "newline controls how line endings are handled. It can be None, '',\n"
"'\\n', '\\r', and '\\r\\n'. It works as follows:\n" "'\\n', '\\r', and '\\r\\n'. It works as follows:\n"