Issue #14674: Add a discussion of the json module's standard compliance.

Patch by Chris Rebert.
This commit is contained in:
Antoine Pitrou 2012-08-24 19:46:17 +02:00
parent e991236b4d
commit f3e0a69d88
2 changed files with 119 additions and 6 deletions

View File

@ -7,8 +7,10 @@
.. sectionauthor:: Bob Ippolito <bob@redivi.com>
.. versionadded:: 2.6
`JSON (JavaScript Object Notation) <http://json.org>`_ is a subset of JavaScript
syntax (ECMA-262 3rd edition) used as a lightweight data interchange format.
`JSON (JavaScript Object Notation) <http://json.org>`_, specified by
:rfc:`4627`, is a lightweight data interchange format based on a subset of
`JavaScript <http://en.wikipedia.org/wiki/JavaScript>`_ syntax (`ECMA-262 3rd
edition <http://www.ecma-international.org/publications/files/ECMA-ST-ARCH/ECMA-262,%203rd%20edition,%20December%201999.pdf>`_).
:mod:`json` exposes an API familiar to users of the standard library
:mod:`marshal` and :mod:`pickle` modules.
@ -106,8 +108,10 @@ Using json.tool from the shell to validate and pretty-print::
.. note::
The JSON produced by this module's default settings is a subset of
YAML, so it may be used as a serializer for that as well.
JSON is a subset of `YAML <http://yaml.org/>`_ 1.2. The JSON produced by
this module's default settings (in particular, the default *separators*
value) is also a subset of YAML 1.0 and 1.1. This module can thus also be
used as a YAML serializer.
Basic Usage
@ -193,7 +197,8 @@ Basic Usage
*object_hook* is an optional function that will be called with the result of
any object literal decoded (a :class:`dict`). The return value of
*object_hook* will be used instead of the :class:`dict`. This feature can be used
to implement custom decoders (e.g. JSON-RPC class hinting).
to implement custom decoders (e.g. `JSON-RPC <http://www.jsonrpc.org>`_
class hinting).
*object_pairs_hook* is an optional function that will be called with the
result of any object literal decoded with an ordered list of pairs. The
@ -242,7 +247,7 @@ Basic Usage
The other arguments have the same meaning as in :func:`load`.
Encoders and decoders
Encoders and Decoders
---------------------
.. class:: JSONDecoder([encoding[, object_hook[, parse_float[, parse_int[, parse_constant[, strict[, object_pairs_hook]]]]]]])
@ -438,3 +443,108 @@ Encoders and decoders
for chunk in JSONEncoder().iterencode(bigobject):
mysocket.write(chunk)
Standard Compliance
-------------------
The JSON format is specified by :rfc:`4627`. This section details this
module's level of compliance with the RFC. For simplicity,
:class:`JSONEncoder` and :class:`JSONDecoder` subclasses, and parameters other
than those explicitly mentioned, are not considered.
This module does not comply with the RFC in a strict fashion, implementing some
extensions that are valid JavaScript but not valid JSON. In particular:
- Top-level non-object, non-array values are accepted and output;
- Infinite and NaN number values are accepted and output;
- Repeated names within an object are accepted, and only the value of the last
name-value pair is used.
Since the RFC permits RFC-compliant parsers to accept input texts that are not
RFC-compliant, this module's deserializer is technically RFC-compliant under
default settings.
Character Encodings
^^^^^^^^^^^^^^^^^^^
The RFC recommends that JSON be represented using either UTF-8, UTF-16, or
UTF-32, with UTF-8 being the default. Accordingly, this module uses UTF-8 as
the default for its *encoding* parameter.
This module's deserializer only directly works with ASCII-compatible encodings;
UTF-16, UTF-32, and other ASCII-incompatible encodings require the use of
workarounds described in the documentation for the deserializer's *encoding*
parameter.
The RFC also non-normatively describes a limited encoding detection technique
for JSON texts; this module's deserializer does not implement this or any other
kind of encoding detection.
As permitted, though not required, by the RFC, this module's serializer sets
*ensure_ascii=True* by default, thus escaping the output so that the resulting
strings only contain ASCII characters.
Top-level Non-Object, Non-Array Values
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
The RFC specifies that the top-level value of a JSON text must be either a
JSON object or array (Python :class:`dict` or :class:`list`). This module's
deserializer also accepts input texts consisting solely of a
JSON null, boolean, number, or string value::
>>> just_a_json_string = '"spam and eggs"' # Not by itself a valid JSON text
>>> json.loads(just_a_json_string)
u'spam and eggs'
This module itself does not include a way to request that such input texts be
regarded as illegal. Likewise, this module's serializer also accepts single
Python :data:`None`, :class:`bool`, numeric, and :class:`str`
values as input and will generate output texts consisting solely of a top-level
JSON null, boolean, number, or string value without raising an exception::
>>> neither_a_list_nor_a_dict = u"spam and eggs"
>>> json.dumps(neither_a_list_nor_a_dict) # The result is not a valid JSON text
'"spam and eggs"'
This module's serializer does not itself include a way to enforce the
aforementioned constraint.
Infinite and NaN Number Values
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
The RFC does not permit the representation of infinite or NaN number values.
Despite that, by default, this module accepts and outputs ``Infinity``,
``-Infinity``, and ``NaN`` as if they were valid JSON number literal values::
>>> # Neither of these calls raises an exception, but the results are not valid JSON
>>> json.dumps(float('-inf'))
'-Infinity'
>>> json.dumps(float('nan'))
'NaN'
>>> # Same when deserializing
>>> json.loads('-Infinity')
-inf
>>> json.loads('NaN')
nan
In the serializer, the *allow_nan* parameter can be used to alter this
behavior. In the deserializer, the *parse_constant* parameter can be used to
alter this behavior.
Repeated Names Within an Object
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
The RFC specifies that the names within a JSON object should be unique, but
does not specify how repeated names in JSON objects should be handled. By
default, this module does not raise an exception; instead, it ignores all but
the last name-value pair for a given name::
>>> weird_json = '{"x": 1, "x": 2, "x": 3}'
>>> json.loads(weird_json)
{u'x': 3}
The *object_pairs_hook* parameter can be used to alter this behavior.

View File

@ -373,6 +373,9 @@ Build
Documentation
-------------
- Issue #14674: Add a discussion of the json module's standard compliance.
Patch by Chris Rebert.
- Issue #15630: Add an example for "continue" stmt in the tutorial. Patch by
Daniel Ellis.