From 614420df9796c8a4f01e24052fc0128b4c20c5bf Mon Sep 17 00:00:00 2001 From: Inada Naoki Date: Mon, 2 May 2022 17:25:05 +0900 Subject: [PATCH] gh-85679: Recommend `encoding="utf-8"` in tutorial (GH-91778) --- Doc/tutorial/inputoutput.rst | 28 ++++++++++++++++++---------- 1 file changed, 18 insertions(+), 10 deletions(-) diff --git a/Doc/tutorial/inputoutput.rst b/Doc/tutorial/inputoutput.rst index 7f83c4d4612..b50063654e2 100644 --- a/Doc/tutorial/inputoutput.rst +++ b/Doc/tutorial/inputoutput.rst @@ -279,11 +279,12 @@ Reading and Writing Files object: file :func:`open` returns a :term:`file object`, and is most commonly used with -two arguments: ``open(filename, mode)``. +two positional arguments and one keyword argument: +``open(filename, mode, encoding=None)`` :: - >>> f = open('workfile', 'w') + >>> f = open('workfile', 'w', encoding="utf-8") .. XXX str(f) is @@ -300,11 +301,14 @@ writing. The *mode* argument is optional; ``'r'`` will be assumed if it's omitted. Normally, files are opened in :dfn:`text mode`, that means, you read and write -strings from and to the file, which are encoded in a specific encoding. If -encoding is not specified, the default is platform dependent (see -:func:`open`). ``'b'`` appended to the mode opens the file in -:dfn:`binary mode`: now the data is read and written in the form of bytes -objects. This mode should be used for all files that don't contain text. +strings from and to the file, which are encoded in a specific *encoding*. +If *encoding* is not specified, the default is platform dependent +(see :func:`open`). +Because UTF-8 is the modern de-facto standard, ``encoding="utf-8"`` is +recommended unless you know that you need to use a different encoding. +Appending a ``'b'`` to the mode opens the file in :dfn:`binary mode`. +Binary mode data is read and written as :class:`bytes` objects. +You can not specify *encoding* when opening file in binary mode. In text mode, the default when reading is to convert platform-specific line endings (``\n`` on Unix, ``\r\n`` on Windows) to just ``\n``. When writing in @@ -320,7 +324,7 @@ after its suite finishes, even if an exception is raised at some point. Using :keyword:`!with` is also much shorter than writing equivalent :keyword:`try`\ -\ :keyword:`finally` blocks:: - >>> with open('workfile') as f: + >>> with open('workfile', encoding="utf-8") as f: ... read_data = f.read() >>> # We can check that the file has been automatically closed. @@ -490,11 +494,15 @@ simply serializes the object to a :term:`text file`. So if ``f`` is a json.dump(x, f) -To decode the object again, if ``f`` is a :term:`text file` object which has -been opened for reading:: +To decode the object again, if ``f`` is a :term:`binary file` or +:term:`text file` object which has been opened for reading:: x = json.load(f) +.. note:: + JSON files must be encoded in UTF-8. Use ``encoding="utf-8"`` when opening + JSON file as a :term:`text file` for both of reading and writing. + This simple serialization technique can handle lists and dictionaries, but serializing arbitrary class instances in JSON requires a bit of extra effort. The reference for the :mod:`json` module contains an explanation of this.