Issue #14674: Add a discussion of the json module's standard compliance.
Patch by Chris Rebert.
This commit is contained in:
parent
e991236b4d
commit
f3e0a69d88
|
@ -7,8 +7,10 @@
|
|||
.. sectionauthor:: Bob Ippolito <bob@redivi.com>
|
||||
.. versionadded:: 2.6
|
||||
|
||||
`JSON (JavaScript Object Notation) <http://json.org>`_ is a subset of JavaScript
|
||||
syntax (ECMA-262 3rd edition) used as a lightweight data interchange format.
|
||||
`JSON (JavaScript Object Notation) <http://json.org>`_, specified by
|
||||
:rfc:`4627`, is a lightweight data interchange format based on a subset of
|
||||
`JavaScript <http://en.wikipedia.org/wiki/JavaScript>`_ syntax (`ECMA-262 3rd
|
||||
edition <http://www.ecma-international.org/publications/files/ECMA-ST-ARCH/ECMA-262,%203rd%20edition,%20December%201999.pdf>`_).
|
||||
|
||||
:mod:`json` exposes an API familiar to users of the standard library
|
||||
:mod:`marshal` and :mod:`pickle` modules.
|
||||
|
@ -106,8 +108,10 @@ Using json.tool from the shell to validate and pretty-print::
|
|||
|
||||
.. note::
|
||||
|
||||
The JSON produced by this module's default settings is a subset of
|
||||
YAML, so it may be used as a serializer for that as well.
|
||||
JSON is a subset of `YAML <http://yaml.org/>`_ 1.2. The JSON produced by
|
||||
this module's default settings (in particular, the default *separators*
|
||||
value) is also a subset of YAML 1.0 and 1.1. This module can thus also be
|
||||
used as a YAML serializer.
|
||||
|
||||
|
||||
Basic Usage
|
||||
|
@ -193,7 +197,8 @@ Basic Usage
|
|||
*object_hook* is an optional function that will be called with the result of
|
||||
any object literal decoded (a :class:`dict`). The return value of
|
||||
*object_hook* will be used instead of the :class:`dict`. This feature can be used
|
||||
to implement custom decoders (e.g. JSON-RPC class hinting).
|
||||
to implement custom decoders (e.g. `JSON-RPC <http://www.jsonrpc.org>`_
|
||||
class hinting).
|
||||
|
||||
*object_pairs_hook* is an optional function that will be called with the
|
||||
result of any object literal decoded with an ordered list of pairs. The
|
||||
|
@ -242,7 +247,7 @@ Basic Usage
|
|||
The other arguments have the same meaning as in :func:`load`.
|
||||
|
||||
|
||||
Encoders and decoders
|
||||
Encoders and Decoders
|
||||
---------------------
|
||||
|
||||
.. class:: JSONDecoder([encoding[, object_hook[, parse_float[, parse_int[, parse_constant[, strict[, object_pairs_hook]]]]]]])
|
||||
|
@ -438,3 +443,108 @@ Encoders and decoders
|
|||
|
||||
for chunk in JSONEncoder().iterencode(bigobject):
|
||||
mysocket.write(chunk)
|
||||
|
||||
|
||||
Standard Compliance
|
||||
-------------------
|
||||
|
||||
The JSON format is specified by :rfc:`4627`. This section details this
|
||||
module's level of compliance with the RFC. For simplicity,
|
||||
:class:`JSONEncoder` and :class:`JSONDecoder` subclasses, and parameters other
|
||||
than those explicitly mentioned, are not considered.
|
||||
|
||||
This module does not comply with the RFC in a strict fashion, implementing some
|
||||
extensions that are valid JavaScript but not valid JSON. In particular:
|
||||
|
||||
- Top-level non-object, non-array values are accepted and output;
|
||||
- Infinite and NaN number values are accepted and output;
|
||||
- Repeated names within an object are accepted, and only the value of the last
|
||||
name-value pair is used.
|
||||
|
||||
Since the RFC permits RFC-compliant parsers to accept input texts that are not
|
||||
RFC-compliant, this module's deserializer is technically RFC-compliant under
|
||||
default settings.
|
||||
|
||||
Character Encodings
|
||||
^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
The RFC recommends that JSON be represented using either UTF-8, UTF-16, or
|
||||
UTF-32, with UTF-8 being the default. Accordingly, this module uses UTF-8 as
|
||||
the default for its *encoding* parameter.
|
||||
|
||||
This module's deserializer only directly works with ASCII-compatible encodings;
|
||||
UTF-16, UTF-32, and other ASCII-incompatible encodings require the use of
|
||||
workarounds described in the documentation for the deserializer's *encoding*
|
||||
parameter.
|
||||
|
||||
The RFC also non-normatively describes a limited encoding detection technique
|
||||
for JSON texts; this module's deserializer does not implement this or any other
|
||||
kind of encoding detection.
|
||||
|
||||
As permitted, though not required, by the RFC, this module's serializer sets
|
||||
*ensure_ascii=True* by default, thus escaping the output so that the resulting
|
||||
strings only contain ASCII characters.
|
||||
|
||||
|
||||
Top-level Non-Object, Non-Array Values
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
The RFC specifies that the top-level value of a JSON text must be either a
|
||||
JSON object or array (Python :class:`dict` or :class:`list`). This module's
|
||||
deserializer also accepts input texts consisting solely of a
|
||||
JSON null, boolean, number, or string value::
|
||||
|
||||
>>> just_a_json_string = '"spam and eggs"' # Not by itself a valid JSON text
|
||||
>>> json.loads(just_a_json_string)
|
||||
u'spam and eggs'
|
||||
|
||||
This module itself does not include a way to request that such input texts be
|
||||
regarded as illegal. Likewise, this module's serializer also accepts single
|
||||
Python :data:`None`, :class:`bool`, numeric, and :class:`str`
|
||||
values as input and will generate output texts consisting solely of a top-level
|
||||
JSON null, boolean, number, or string value without raising an exception::
|
||||
|
||||
>>> neither_a_list_nor_a_dict = u"spam and eggs"
|
||||
>>> json.dumps(neither_a_list_nor_a_dict) # The result is not a valid JSON text
|
||||
'"spam and eggs"'
|
||||
|
||||
This module's serializer does not itself include a way to enforce the
|
||||
aforementioned constraint.
|
||||
|
||||
|
||||
Infinite and NaN Number Values
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
The RFC does not permit the representation of infinite or NaN number values.
|
||||
Despite that, by default, this module accepts and outputs ``Infinity``,
|
||||
``-Infinity``, and ``NaN`` as if they were valid JSON number literal values::
|
||||
|
||||
>>> # Neither of these calls raises an exception, but the results are not valid JSON
|
||||
>>> json.dumps(float('-inf'))
|
||||
'-Infinity'
|
||||
>>> json.dumps(float('nan'))
|
||||
'NaN'
|
||||
>>> # Same when deserializing
|
||||
>>> json.loads('-Infinity')
|
||||
-inf
|
||||
>>> json.loads('NaN')
|
||||
nan
|
||||
|
||||
In the serializer, the *allow_nan* parameter can be used to alter this
|
||||
behavior. In the deserializer, the *parse_constant* parameter can be used to
|
||||
alter this behavior.
|
||||
|
||||
|
||||
Repeated Names Within an Object
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
The RFC specifies that the names within a JSON object should be unique, but
|
||||
does not specify how repeated names in JSON objects should be handled. By
|
||||
default, this module does not raise an exception; instead, it ignores all but
|
||||
the last name-value pair for a given name::
|
||||
|
||||
>>> weird_json = '{"x": 1, "x": 2, "x": 3}'
|
||||
>>> json.loads(weird_json)
|
||||
{u'x': 3}
|
||||
|
||||
The *object_pairs_hook* parameter can be used to alter this behavior.
|
||||
|
|
Loading…
Reference in New Issue