Issue #14674: Add a discussion of the json module's standard compliance.

Patch by Chris Rebert.
This commit is contained in:
Antoine Pitrou 2012-08-24 19:46:17 +02:00
parent e991236b4d
commit f3e0a69d88
2 changed files with 119 additions and 6 deletions

View File

@ -7,8 +7,10 @@
.. sectionauthor:: Bob Ippolito <bob@redivi.com> .. sectionauthor:: Bob Ippolito <bob@redivi.com>
.. versionadded:: 2.6 .. versionadded:: 2.6
`JSON (JavaScript Object Notation) <http://json.org>`_ is a subset of JavaScript `JSON (JavaScript Object Notation) <http://json.org>`_, specified by
syntax (ECMA-262 3rd edition) used as a lightweight data interchange format. :rfc:`4627`, is a lightweight data interchange format based on a subset of
`JavaScript <http://en.wikipedia.org/wiki/JavaScript>`_ syntax (`ECMA-262 3rd
edition <http://www.ecma-international.org/publications/files/ECMA-ST-ARCH/ECMA-262,%203rd%20edition,%20December%201999.pdf>`_).
:mod:`json` exposes an API familiar to users of the standard library :mod:`json` exposes an API familiar to users of the standard library
:mod:`marshal` and :mod:`pickle` modules. :mod:`marshal` and :mod:`pickle` modules.
@ -106,8 +108,10 @@ Using json.tool from the shell to validate and pretty-print::
.. note:: .. note::
The JSON produced by this module's default settings is a subset of JSON is a subset of `YAML <http://yaml.org/>`_ 1.2. The JSON produced by
YAML, so it may be used as a serializer for that as well. this module's default settings (in particular, the default *separators*
value) is also a subset of YAML 1.0 and 1.1. This module can thus also be
used as a YAML serializer.
Basic Usage Basic Usage
@ -193,7 +197,8 @@ Basic Usage
*object_hook* is an optional function that will be called with the result of *object_hook* is an optional function that will be called with the result of
any object literal decoded (a :class:`dict`). The return value of any object literal decoded (a :class:`dict`). The return value of
*object_hook* will be used instead of the :class:`dict`. This feature can be used *object_hook* will be used instead of the :class:`dict`. This feature can be used
to implement custom decoders (e.g. JSON-RPC class hinting). to implement custom decoders (e.g. `JSON-RPC <http://www.jsonrpc.org>`_
class hinting).
*object_pairs_hook* is an optional function that will be called with the *object_pairs_hook* is an optional function that will be called with the
result of any object literal decoded with an ordered list of pairs. The result of any object literal decoded with an ordered list of pairs. The
@ -242,7 +247,7 @@ Basic Usage
The other arguments have the same meaning as in :func:`load`. The other arguments have the same meaning as in :func:`load`.
Encoders and decoders Encoders and Decoders
--------------------- ---------------------
.. class:: JSONDecoder([encoding[, object_hook[, parse_float[, parse_int[, parse_constant[, strict[, object_pairs_hook]]]]]]]) .. class:: JSONDecoder([encoding[, object_hook[, parse_float[, parse_int[, parse_constant[, strict[, object_pairs_hook]]]]]]])
@ -438,3 +443,108 @@ Encoders and decoders
for chunk in JSONEncoder().iterencode(bigobject): for chunk in JSONEncoder().iterencode(bigobject):
mysocket.write(chunk) mysocket.write(chunk)
Standard Compliance
-------------------
The JSON format is specified by :rfc:`4627`. This section details this
module's level of compliance with the RFC. For simplicity,
:class:`JSONEncoder` and :class:`JSONDecoder` subclasses, and parameters other
than those explicitly mentioned, are not considered.
This module does not comply with the RFC in a strict fashion, implementing some
extensions that are valid JavaScript but not valid JSON. In particular:
- Top-level non-object, non-array values are accepted and output;
- Infinite and NaN number values are accepted and output;
- Repeated names within an object are accepted, and only the value of the last
name-value pair is used.
Since the RFC permits RFC-compliant parsers to accept input texts that are not
RFC-compliant, this module's deserializer is technically RFC-compliant under
default settings.
Character Encodings
^^^^^^^^^^^^^^^^^^^
The RFC recommends that JSON be represented using either UTF-8, UTF-16, or
UTF-32, with UTF-8 being the default. Accordingly, this module uses UTF-8 as
the default for its *encoding* parameter.
This module's deserializer only directly works with ASCII-compatible encodings;
UTF-16, UTF-32, and other ASCII-incompatible encodings require the use of
workarounds described in the documentation for the deserializer's *encoding*
parameter.
The RFC also non-normatively describes a limited encoding detection technique
for JSON texts; this module's deserializer does not implement this or any other
kind of encoding detection.
As permitted, though not required, by the RFC, this module's serializer sets
*ensure_ascii=True* by default, thus escaping the output so that the resulting
strings only contain ASCII characters.
Top-level Non-Object, Non-Array Values
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
The RFC specifies that the top-level value of a JSON text must be either a
JSON object or array (Python :class:`dict` or :class:`list`). This module's
deserializer also accepts input texts consisting solely of a
JSON null, boolean, number, or string value::
>>> just_a_json_string = '"spam and eggs"' # Not by itself a valid JSON text
>>> json.loads(just_a_json_string)
u'spam and eggs'
This module itself does not include a way to request that such input texts be
regarded as illegal. Likewise, this module's serializer also accepts single
Python :data:`None`, :class:`bool`, numeric, and :class:`str`
values as input and will generate output texts consisting solely of a top-level
JSON null, boolean, number, or string value without raising an exception::
>>> neither_a_list_nor_a_dict = u"spam and eggs"
>>> json.dumps(neither_a_list_nor_a_dict) # The result is not a valid JSON text
'"spam and eggs"'
This module's serializer does not itself include a way to enforce the
aforementioned constraint.
Infinite and NaN Number Values
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
The RFC does not permit the representation of infinite or NaN number values.
Despite that, by default, this module accepts and outputs ``Infinity``,
``-Infinity``, and ``NaN`` as if they were valid JSON number literal values::
>>> # Neither of these calls raises an exception, but the results are not valid JSON
>>> json.dumps(float('-inf'))
'-Infinity'
>>> json.dumps(float('nan'))
'NaN'
>>> # Same when deserializing
>>> json.loads('-Infinity')
-inf
>>> json.loads('NaN')
nan
In the serializer, the *allow_nan* parameter can be used to alter this
behavior. In the deserializer, the *parse_constant* parameter can be used to
alter this behavior.
Repeated Names Within an Object
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
The RFC specifies that the names within a JSON object should be unique, but
does not specify how repeated names in JSON objects should be handled. By
default, this module does not raise an exception; instead, it ignores all but
the last name-value pair for a given name::
>>> weird_json = '{"x": 1, "x": 2, "x": 3}'
>>> json.loads(weird_json)
{u'x': 3}
The *object_pairs_hook* parameter can be used to alter this behavior.

View File

@ -373,6 +373,9 @@ Build
Documentation Documentation
------------- -------------
- Issue #14674: Add a discussion of the json module's standard compliance.
Patch by Chris Rebert.
- Issue #15630: Add an example for "continue" stmt in the tutorial. Patch by - Issue #15630: Add an example for "continue" stmt in the tutorial. Patch by
Daniel Ellis. Daniel Ellis.