mirror of https://github.com/python/cpython
889 lines
36 KiB
ReStructuredText
889 lines
36 KiB
ReStructuredText
:mod:`pickle` --- Python object serialization
|
|
=============================================
|
|
|
|
.. index::
|
|
single: persistence
|
|
pair: persistent; objects
|
|
pair: serializing; objects
|
|
pair: marshalling; objects
|
|
pair: flattening; objects
|
|
pair: pickling; objects
|
|
|
|
.. module:: pickle
|
|
:synopsis: Convert Python objects to streams of bytes and back.
|
|
.. sectionauthor:: Jim Kerr <jbkerr@sr.hp.com>.
|
|
.. sectionauthor:: Barry Warsaw <barry@zope.com>
|
|
|
|
The :mod:`pickle` module implements a fundamental, but powerful algorithm for
|
|
serializing and de-serializing a Python object structure. "Pickling" is the
|
|
process whereby a Python object hierarchy is converted into a byte stream, and
|
|
"unpickling" is the inverse operation, whereby a byte stream is converted back
|
|
into an object hierarchy. Pickling (and unpickling) is alternatively known as
|
|
"serialization", "marshalling," [#]_ or "flattening", however, to avoid
|
|
confusion, the terms used here are "pickling" and "unpickling".
|
|
|
|
This documentation describes both the :mod:`pickle` module and the
|
|
:mod:`cPickle` module.
|
|
|
|
|
|
Relationship to other Python modules
|
|
------------------------------------
|
|
|
|
The :mod:`pickle` module has an optimized cousin called the :mod:`cPickle`
|
|
module. As its name implies, :mod:`cPickle` is written in C, so it can be up to
|
|
1000 times faster than :mod:`pickle`. However it does not support subclassing
|
|
of the :func:`Pickler` and :func:`Unpickler` classes, because in :mod:`cPickle`
|
|
these are functions, not classes. Most applications have no need for this
|
|
functionality, and can benefit from the improved performance of :mod:`cPickle`.
|
|
Other than that, the interfaces of the two modules are nearly identical; the
|
|
common interface is described in this manual and differences are pointed out
|
|
where necessary. In the following discussions, we use the term "pickle" to
|
|
collectively describe the :mod:`pickle` and :mod:`cPickle` modules.
|
|
|
|
The data streams the two modules produce are guaranteed to be interchangeable.
|
|
|
|
Python has a more primitive serialization module called :mod:`marshal`, but in
|
|
general :mod:`pickle` should always be the preferred way to serialize Python
|
|
objects. :mod:`marshal` exists primarily to support Python's :file:`.pyc`
|
|
files.
|
|
|
|
The :mod:`pickle` module differs from :mod:`marshal` several significant ways:
|
|
|
|
* The :mod:`pickle` module keeps track of the objects it has already serialized,
|
|
so that later references to the same object won't be serialized again.
|
|
:mod:`marshal` doesn't do this.
|
|
|
|
This has implications both for recursive objects and object sharing. Recursive
|
|
objects are objects that contain references to themselves. These are not
|
|
handled by marshal, and in fact, attempting to marshal recursive objects will
|
|
crash your Python interpreter. Object sharing happens when there are multiple
|
|
references to the same object in different places in the object hierarchy being
|
|
serialized. :mod:`pickle` stores such objects only once, and ensures that all
|
|
other references point to the master copy. Shared objects remain shared, which
|
|
can be very important for mutable objects.
|
|
|
|
* :mod:`marshal` cannot be used to serialize user-defined classes and their
|
|
instances. :mod:`pickle` can save and restore class instances transparently,
|
|
however the class definition must be importable and live in the same module as
|
|
when the object was stored.
|
|
|
|
* The :mod:`marshal` serialization format is not guaranteed to be portable
|
|
across Python versions. Because its primary job in life is to support
|
|
:file:`.pyc` files, the Python implementers reserve the right to change the
|
|
serialization format in non-backwards compatible ways should the need arise.
|
|
The :mod:`pickle` serialization format is guaranteed to be backwards compatible
|
|
across Python releases.
|
|
|
|
.. warning::
|
|
|
|
The :mod:`pickle` module is not intended to be secure against erroneous or
|
|
maliciously constructed data. Never unpickle data received from an untrusted or
|
|
unauthenticated source.
|
|
|
|
Note that serialization is a more primitive notion than persistence; although
|
|
:mod:`pickle` reads and writes file objects, it does not handle the issue of
|
|
naming persistent objects, nor the (even more complicated) issue of concurrent
|
|
access to persistent objects. The :mod:`pickle` module can transform a complex
|
|
object into a byte stream and it can transform the byte stream into an object
|
|
with the same internal structure. Perhaps the most obvious thing to do with
|
|
these byte streams is to write them onto a file, but it is also conceivable to
|
|
send them across a network or store them in a database. The module
|
|
:mod:`shelve` provides a simple interface to pickle and unpickle objects on
|
|
DBM-style database files.
|
|
|
|
|
|
Data stream format
|
|
------------------
|
|
|
|
.. index::
|
|
single: XDR
|
|
single: External Data Representation
|
|
|
|
The data format used by :mod:`pickle` is Python-specific. This has the
|
|
advantage that there are no restrictions imposed by external standards such as
|
|
XDR (which can't represent pointer sharing); however it means that non-Python
|
|
programs may not be able to reconstruct pickled Python objects.
|
|
|
|
By default, the :mod:`pickle` data format uses a printable ASCII representation.
|
|
This is slightly more voluminous than a binary representation. The big
|
|
advantage of using printable ASCII (and of some other characteristics of
|
|
:mod:`pickle`'s representation) is that for debugging or recovery purposes it is
|
|
possible for a human to read the pickled file with a standard text editor.
|
|
|
|
There are currently 3 different protocols which can be used for pickling.
|
|
|
|
* Protocol version 0 is the original ASCII protocol and is backwards compatible
|
|
with earlier versions of Python.
|
|
|
|
* Protocol version 1 is the old binary format which is also compatible with
|
|
earlier versions of Python.
|
|
|
|
* Protocol version 2 was introduced in Python 2.3. It provides much more
|
|
efficient pickling of :term:`new-style class`\es.
|
|
|
|
Refer to :pep:`307` for more information.
|
|
|
|
If a *protocol* is not specified, protocol 0 is used. If *protocol* is specified
|
|
as a negative value or :const:`HIGHEST_PROTOCOL`, the highest protocol version
|
|
available will be used.
|
|
|
|
.. versionchanged:: 2.3
|
|
Introduced the *protocol* parameter.
|
|
|
|
A binary format, which is slightly more efficient, can be chosen by specifying a
|
|
*protocol* version >= 1.
|
|
|
|
|
|
Usage
|
|
-----
|
|
|
|
To serialize an object hierarchy, you first create a pickler, then you call the
|
|
pickler's :meth:`dump` method. To de-serialize a data stream, you first create
|
|
an unpickler, then you call the unpickler's :meth:`load` method. The
|
|
:mod:`pickle` module provides the following constant:
|
|
|
|
|
|
.. data:: HIGHEST_PROTOCOL
|
|
|
|
The highest protocol version available. This value can be passed as a
|
|
*protocol* value.
|
|
|
|
.. versionadded:: 2.3
|
|
|
|
.. note::
|
|
|
|
Be sure to always open pickle files created with protocols >= 1 in binary mode.
|
|
For the old ASCII-based pickle protocol 0 you can use either text mode or binary
|
|
mode as long as you stay consistent.
|
|
|
|
A pickle file written with protocol 0 in binary mode will contain lone linefeeds
|
|
as line terminators and therefore will look "funny" when viewed in Notepad or
|
|
other editors which do not support this format.
|
|
|
|
The :mod:`pickle` module provides the following functions to make the pickling
|
|
process more convenient:
|
|
|
|
|
|
.. function:: dump(obj, file[, protocol])
|
|
|
|
Write a pickled representation of *obj* to the open file object *file*. This is
|
|
equivalent to ``Pickler(file, protocol).dump(obj)``.
|
|
|
|
If the *protocol* parameter is omitted, protocol 0 is used. If *protocol* is
|
|
specified as a negative value or :const:`HIGHEST_PROTOCOL`, the highest protocol
|
|
version will be used.
|
|
|
|
.. versionchanged:: 2.3
|
|
Introduced the *protocol* parameter.
|
|
|
|
*file* must have a :meth:`write` method that accepts a single string argument.
|
|
It can thus be a file object opened for writing, a :mod:`StringIO` object, or
|
|
any other custom object that meets this interface.
|
|
|
|
|
|
.. function:: load(file)
|
|
|
|
Read a string from the open file object *file* and interpret it as a pickle data
|
|
stream, reconstructing and returning the original object hierarchy. This is
|
|
equivalent to ``Unpickler(file).load()``.
|
|
|
|
*file* must have two methods, a :meth:`read` method that takes an integer
|
|
argument, and a :meth:`readline` method that requires no arguments. Both
|
|
methods should return a string. Thus *file* can be a file object opened for
|
|
reading, a :mod:`StringIO` object, or any other custom object that meets this
|
|
interface.
|
|
|
|
This function automatically determines whether the data stream was written in
|
|
binary mode or not.
|
|
|
|
|
|
.. function:: dumps(obj[, protocol])
|
|
|
|
Return the pickled representation of the object as a string, instead of writing
|
|
it to a file.
|
|
|
|
If the *protocol* parameter is omitted, protocol 0 is used. If *protocol* is
|
|
specified as a negative value or :const:`HIGHEST_PROTOCOL`, the highest protocol
|
|
version will be used.
|
|
|
|
.. versionchanged:: 2.3
|
|
The *protocol* parameter was added.
|
|
|
|
|
|
.. function:: loads(string)
|
|
|
|
Read a pickled object hierarchy from a string. Characters in the string past
|
|
the pickled object's representation are ignored.
|
|
|
|
The :mod:`pickle` module also defines three exceptions:
|
|
|
|
|
|
.. exception:: PickleError
|
|
|
|
A common base class for the other exceptions defined below. This inherits from
|
|
:exc:`Exception`.
|
|
|
|
|
|
.. exception:: PicklingError
|
|
|
|
This exception is raised when an unpicklable object is passed to the
|
|
:meth:`dump` method.
|
|
|
|
|
|
.. exception:: UnpicklingError
|
|
|
|
This exception is raised when there is a problem unpickling an object. Note that
|
|
other exceptions may also be raised during unpickling, including (but not
|
|
necessarily limited to) :exc:`AttributeError`, :exc:`EOFError`,
|
|
:exc:`ImportError`, and :exc:`IndexError`.
|
|
|
|
The :mod:`pickle` module also exports two callables [#]_, :class:`Pickler` and
|
|
:class:`Unpickler`:
|
|
|
|
|
|
.. class:: Pickler(file[, protocol])
|
|
|
|
This takes a file-like object to which it will write a pickle data stream.
|
|
|
|
If the *protocol* parameter is omitted, protocol 0 is used. If *protocol* is
|
|
specified as a negative value or :const:`HIGHEST_PROTOCOL`, the highest
|
|
protocol version will be used.
|
|
|
|
.. versionchanged:: 2.3
|
|
Introduced the *protocol* parameter.
|
|
|
|
*file* must have a :meth:`write` method that accepts a single string argument.
|
|
It can thus be an open file object, a :mod:`StringIO` object, or any other
|
|
custom object that meets this interface.
|
|
|
|
:class:`Pickler` objects define one (or two) public methods:
|
|
|
|
|
|
.. method:: dump(obj)
|
|
|
|
Write a pickled representation of *obj* to the open file object given in the
|
|
constructor. Either the binary or ASCII format will be used, depending on the
|
|
value of the *protocol* argument passed to the constructor.
|
|
|
|
|
|
.. method:: clear_memo()
|
|
|
|
Clears the pickler's "memo". The memo is the data structure that remembers
|
|
which objects the pickler has already seen, so that shared or recursive objects
|
|
pickled by reference and not by value. This method is useful when re-using
|
|
picklers.
|
|
|
|
.. note::
|
|
|
|
Prior to Python 2.3, :meth:`clear_memo` was only available on the picklers
|
|
created by :mod:`cPickle`. In the :mod:`pickle` module, picklers have an
|
|
instance variable called :attr:`memo` which is a Python dictionary. So to clear
|
|
the memo for a :mod:`pickle` module pickler, you could do the following::
|
|
|
|
mypickler.memo.clear()
|
|
|
|
Code that does not need to support older versions of Python should simply use
|
|
:meth:`clear_memo`.
|
|
|
|
It is possible to make multiple calls to the :meth:`dump` method of the same
|
|
:class:`Pickler` instance. These must then be matched to the same number of
|
|
calls to the :meth:`load` method of the corresponding :class:`Unpickler`
|
|
instance. If the same object is pickled by multiple :meth:`dump` calls, the
|
|
:meth:`load` will all yield references to the same object. [#]_
|
|
|
|
:class:`Unpickler` objects are defined as:
|
|
|
|
|
|
.. class:: Unpickler(file)
|
|
|
|
This takes a file-like object from which it will read a pickle data stream.
|
|
This class automatically determines whether the data stream was written in
|
|
binary mode or not, so it does not need a flag as in the :class:`Pickler`
|
|
factory.
|
|
|
|
*file* must have two methods, a :meth:`read` method that takes an integer
|
|
argument, and a :meth:`readline` method that requires no arguments. Both
|
|
methods should return a string. Thus *file* can be a file object opened for
|
|
reading, a :mod:`StringIO` object, or any other custom object that meets this
|
|
interface.
|
|
|
|
:class:`Unpickler` objects have one (or two) public methods:
|
|
|
|
|
|
.. method:: load()
|
|
|
|
Read a pickled object representation from the open file object given in
|
|
the constructor, and return the reconstituted object hierarchy specified
|
|
therein.
|
|
|
|
This method automatically determines whether the data stream was written
|
|
in binary mode or not.
|
|
|
|
|
|
.. method:: noload()
|
|
|
|
This is just like :meth:`load` except that it doesn't actually create any
|
|
objects. This is useful primarily for finding what's called "persistent
|
|
ids" that may be referenced in a pickle data stream. See section
|
|
:ref:`pickle-protocol` below for more details.
|
|
|
|
**Note:** the :meth:`noload` method is currently only available on
|
|
:class:`Unpickler` objects created with the :mod:`cPickle` module.
|
|
:mod:`pickle` module :class:`Unpickler`\ s do not have the :meth:`noload`
|
|
method.
|
|
|
|
|
|
What can be pickled and unpickled?
|
|
----------------------------------
|
|
|
|
The following types can be pickled:
|
|
|
|
* ``None``, ``True``, and ``False``
|
|
|
|
* integers, long integers, floating point numbers, complex numbers
|
|
|
|
* normal and Unicode strings
|
|
|
|
* tuples, lists, sets, and dictionaries containing only picklable objects
|
|
|
|
* functions defined at the top level of a module
|
|
|
|
* built-in functions defined at the top level of a module
|
|
|
|
* classes that are defined at the top level of a module
|
|
|
|
* instances of such classes whose :attr:`__dict__` or :meth:`__setstate__` is
|
|
picklable (see section :ref:`pickle-protocol` for details)
|
|
|
|
Attempts to pickle unpicklable objects will raise the :exc:`PicklingError`
|
|
exception; when this happens, an unspecified number of bytes may have already
|
|
been written to the underlying file. Trying to pickle a highly recursive data
|
|
structure may exceed the maximum recursion depth, a :exc:`RuntimeError` will be
|
|
raised in this case. You can carefully raise this limit with
|
|
:func:`sys.setrecursionlimit`.
|
|
|
|
Note that functions (built-in and user-defined) are pickled by "fully qualified"
|
|
name reference, not by value. This means that only the function name is
|
|
pickled, along with the name of module the function is defined in. Neither the
|
|
function's code, nor any of its function attributes are pickled. Thus the
|
|
defining module must be importable in the unpickling environment, and the module
|
|
must contain the named object, otherwise an exception will be raised. [#]_
|
|
|
|
Similarly, classes are pickled by named reference, so the same restrictions in
|
|
the unpickling environment apply. Note that none of the class's code or data is
|
|
pickled, so in the following example the class attribute ``attr`` is not
|
|
restored in the unpickling environment::
|
|
|
|
class Foo:
|
|
attr = 'a class attr'
|
|
|
|
picklestring = pickle.dumps(Foo)
|
|
|
|
These restrictions are why picklable functions and classes must be defined in
|
|
the top level of a module.
|
|
|
|
Similarly, when class instances are pickled, their class's code and data are not
|
|
pickled along with them. Only the instance data are pickled. This is done on
|
|
purpose, so you can fix bugs in a class or add methods to the class and still
|
|
load objects that were created with an earlier version of the class. If you
|
|
plan to have long-lived objects that will see many versions of a class, it may
|
|
be worthwhile to put a version number in the objects so that suitable
|
|
conversions can be made by the class's :meth:`__setstate__` method.
|
|
|
|
|
|
.. _pickle-protocol:
|
|
|
|
The pickle protocol
|
|
-------------------
|
|
|
|
.. currentmodule:: None
|
|
|
|
This section describes the "pickling protocol" that defines the interface
|
|
between the pickler/unpickler and the objects that are being serialized. This
|
|
protocol provides a standard way for you to define, customize, and control how
|
|
your objects are serialized and de-serialized. The description in this section
|
|
doesn't cover specific customizations that you can employ to make the unpickling
|
|
environment slightly safer from untrusted pickle data streams; see section
|
|
:ref:`pickle-sub` for more details.
|
|
|
|
|
|
.. _pickle-inst:
|
|
|
|
Pickling and unpickling normal class instances
|
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
|
|
|
.. method:: object.__getinitargs__()
|
|
|
|
When a pickled class instance is unpickled, its :meth:`__init__` method is
|
|
normally *not* invoked. If it is desirable that the :meth:`__init__` method
|
|
be called on unpickling, an old-style class can define a method
|
|
:meth:`__getinitargs__`, which should return a *tuple* containing the
|
|
arguments to be passed to the class constructor (:meth:`__init__` for
|
|
example). The :meth:`__getinitargs__` method is called at pickle time; the
|
|
tuple it returns is incorporated in the pickle for the instance.
|
|
|
|
.. method:: object.__getnewargs__()
|
|
|
|
New-style types can provide a :meth:`__getnewargs__` method that is used for
|
|
protocol 2. Implementing this method is needed if the type establishes some
|
|
internal invariants when the instance is created, or if the memory allocation
|
|
is affected by the values passed to the :meth:`__new__` method for the type
|
|
(as it is for tuples and strings). Instances of a :term:`new-style class`
|
|
``C`` are created using ::
|
|
|
|
obj = C.__new__(C, *args)
|
|
|
|
where *args* is the result of calling :meth:`__getnewargs__` on the original
|
|
object; if there is no :meth:`__getnewargs__`, an empty tuple is assumed.
|
|
|
|
.. method:: object.__getstate__()
|
|
|
|
Classes can further influence how their instances are pickled; if the class
|
|
defines the method :meth:`__getstate__`, it is called and the return state is
|
|
pickled as the contents for the instance, instead of the contents of the
|
|
instance's dictionary. If there is no :meth:`__getstate__` method, the
|
|
instance's :attr:`__dict__` is pickled.
|
|
|
|
.. method:: object.__setstate__()
|
|
|
|
Upon unpickling, if the class also defines the method :meth:`__setstate__`,
|
|
it is called with the unpickled state. [#]_ If there is no
|
|
:meth:`__setstate__` method, the pickled state must be a dictionary and its
|
|
items are assigned to the new instance's dictionary. If a class defines both
|
|
:meth:`__getstate__` and :meth:`__setstate__`, the state object needn't be a
|
|
dictionary and these methods can do what they want. [#]_
|
|
|
|
.. warning::
|
|
|
|
For :term:`new-style class`\es, if :meth:`__getstate__` returns a false
|
|
value, the :meth:`__setstate__` method will not be called.
|
|
|
|
.. note::
|
|
|
|
At unpickling time, some methods like :meth:`__getattr__`,
|
|
:meth:`__getattribute__`, or :meth:`__setattr__` may be called upon the
|
|
instance. In case those methods rely on some internal invariant being
|
|
true, the type should implement either :meth:`__getinitargs__` or
|
|
:meth:`__getnewargs__` to establish such an invariant; otherwise, neither
|
|
:meth:`__new__` nor :meth:`__init__` will be called.
|
|
|
|
|
|
Pickling and unpickling extension types
|
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
|
|
|
.. method:: object.__reduce__()
|
|
|
|
When the :class:`Pickler` encounters an object of a type it knows nothing
|
|
about --- such as an extension type --- it looks in two places for a hint of
|
|
how to pickle it. One alternative is for the object to implement a
|
|
:meth:`__reduce__` method. If provided, at pickling time :meth:`__reduce__`
|
|
will be called with no arguments, and it must return either a string or a
|
|
tuple.
|
|
|
|
If a string is returned, it names a global variable whose contents are
|
|
pickled as normal. The string returned by :meth:`__reduce__` should be the
|
|
object's local name relative to its module; the pickle module searches the
|
|
module namespace to determine the object's module.
|
|
|
|
When a tuple is returned, it must be between two and five elements long.
|
|
Optional elements can either be omitted, or ``None`` can be provided as their
|
|
value. The contents of this tuple are pickled as normal and used to
|
|
reconstruct the object at unpickling time. The semantics of each element
|
|
are:
|
|
|
|
* A callable object that will be called to create the initial version of the
|
|
object. The next element of the tuple will provide arguments for this
|
|
callable, and later elements provide additional state information that will
|
|
subsequently be used to fully reconstruct the pickled data.
|
|
|
|
In the unpickling environment this object must be either a class, a
|
|
callable registered as a "safe constructor" (see below), or it must have an
|
|
attribute :attr:`__safe_for_unpickling__` with a true value. Otherwise, an
|
|
:exc:`UnpicklingError` will be raised in the unpickling environment. Note
|
|
that as usual, the callable itself is pickled by name.
|
|
|
|
* A tuple of arguments for the callable object.
|
|
|
|
.. versionchanged:: 2.5
|
|
Formerly, this argument could also be ``None``.
|
|
|
|
* Optionally, the object's state, which will be passed to the object's
|
|
:meth:`__setstate__` method as described in section :ref:`pickle-inst`. If
|
|
the object has no :meth:`__setstate__` method, then, as above, the value
|
|
must be a dictionary and it will be added to the object's :attr:`__dict__`.
|
|
|
|
* Optionally, an iterator (and not a sequence) yielding successive list
|
|
items. These list items will be pickled, and appended to the object using
|
|
either ``obj.append(item)`` or ``obj.extend(list_of_items)``. This is
|
|
primarily used for list subclasses, but may be used by other classes as
|
|
long as they have :meth:`append` and :meth:`extend` methods with the
|
|
appropriate signature. (Whether :meth:`append` or :meth:`extend` is used
|
|
depends on which pickle protocol version is used as well as the number of
|
|
items to append, so both must be supported.)
|
|
|
|
* Optionally, an iterator (not a sequence) yielding successive dictionary
|
|
items, which should be tuples of the form ``(key, value)``. These items
|
|
will be pickled and stored to the object using ``obj[key] = value``. This
|
|
is primarily used for dictionary subclasses, but may be used by other
|
|
classes as long as they implement :meth:`__setitem__`.
|
|
|
|
.. method:: object.__reduce_ex__(protocol)
|
|
|
|
It is sometimes useful to know the protocol version when implementing
|
|
:meth:`__reduce__`. This can be done by implementing a method named
|
|
:meth:`__reduce_ex__` instead of :meth:`__reduce__`. :meth:`__reduce_ex__`,
|
|
when it exists, is called in preference over :meth:`__reduce__` (you may
|
|
still provide :meth:`__reduce__` for backwards compatibility). The
|
|
:meth:`__reduce_ex__` method will be called with a single integer argument,
|
|
the protocol version.
|
|
|
|
The :class:`object` class implements both :meth:`__reduce__` and
|
|
:meth:`__reduce_ex__`; however, if a subclass overrides :meth:`__reduce__`
|
|
but not :meth:`__reduce_ex__`, the :meth:`__reduce_ex__` implementation
|
|
detects this and calls :meth:`__reduce__`.
|
|
|
|
An alternative to implementing a :meth:`__reduce__` method on the object to be
|
|
pickled, is to register the callable with the :mod:`copy_reg` module. This
|
|
module provides a way for programs to register "reduction functions" and
|
|
constructors for user-defined types. Reduction functions have the same
|
|
semantics and interface as the :meth:`__reduce__` method described above, except
|
|
that they are called with a single argument, the object to be pickled.
|
|
|
|
The registered constructor is deemed a "safe constructor" for purposes of
|
|
unpickling as described above.
|
|
|
|
|
|
Pickling and unpickling external objects
|
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
|
|
|
.. index::
|
|
single: persistent_id (pickle protocol)
|
|
single: persistent_load (pickle protocol)
|
|
|
|
For the benefit of object persistence, the :mod:`pickle` module supports the
|
|
notion of a reference to an object outside the pickled data stream. Such
|
|
objects are referenced by a "persistent id", which is just an arbitrary string
|
|
of printable ASCII characters. The resolution of such names is not defined by
|
|
the :mod:`pickle` module; it will delegate this resolution to user defined
|
|
functions on the pickler and unpickler. [#]_
|
|
|
|
To define external persistent id resolution, you need to set the
|
|
:attr:`persistent_id` attribute of the pickler object and the
|
|
:attr:`persistent_load` attribute of the unpickler object.
|
|
|
|
To pickle objects that have an external persistent id, the pickler must have a
|
|
custom :func:`persistent_id` method that takes an object as an argument and
|
|
returns either ``None`` or the persistent id for that object. When ``None`` is
|
|
returned, the pickler simply pickles the object as normal. When a persistent id
|
|
string is returned, the pickler will pickle that string, along with a marker so
|
|
that the unpickler will recognize the string as a persistent id.
|
|
|
|
To unpickle external objects, the unpickler must have a custom
|
|
:func:`persistent_load` function that takes a persistent id string and returns
|
|
the referenced object.
|
|
|
|
Here's a silly example that *might* shed more light::
|
|
|
|
import pickle
|
|
from cStringIO import StringIO
|
|
|
|
src = StringIO()
|
|
p = pickle.Pickler(src)
|
|
|
|
def persistent_id(obj):
|
|
if hasattr(obj, 'x'):
|
|
return 'the value %d' % obj.x
|
|
else:
|
|
return None
|
|
|
|
p.persistent_id = persistent_id
|
|
|
|
class Integer:
|
|
def __init__(self, x):
|
|
self.x = x
|
|
def __str__(self):
|
|
return 'My name is integer %d' % self.x
|
|
|
|
i = Integer(7)
|
|
print i
|
|
p.dump(i)
|
|
|
|
datastream = src.getvalue()
|
|
print repr(datastream)
|
|
dst = StringIO(datastream)
|
|
|
|
up = pickle.Unpickler(dst)
|
|
|
|
class FancyInteger(Integer):
|
|
def __str__(self):
|
|
return 'I am the integer %d' % self.x
|
|
|
|
def persistent_load(persid):
|
|
if persid.startswith('the value '):
|
|
value = int(persid.split()[2])
|
|
return FancyInteger(value)
|
|
else:
|
|
raise pickle.UnpicklingError, 'Invalid persistent id'
|
|
|
|
up.persistent_load = persistent_load
|
|
|
|
j = up.load()
|
|
print j
|
|
|
|
In the :mod:`cPickle` module, the unpickler's :attr:`persistent_load` attribute
|
|
can also be set to a Python list, in which case, when the unpickler reaches a
|
|
persistent id, the persistent id string will simply be appended to this list.
|
|
This functionality exists so that a pickle data stream can be "sniffed" for
|
|
object references without actually instantiating all the objects in a pickle.
|
|
[#]_ Setting :attr:`persistent_load` to a list is usually used in conjunction
|
|
with the :meth:`noload` method on the Unpickler.
|
|
|
|
.. BAW: Both pickle and cPickle support something called inst_persistent_id()
|
|
which appears to give unknown types a second shot at producing a persistent
|
|
id. Since Jim Fulton can't remember why it was added or what it's for, I'm
|
|
leaving it undocumented.
|
|
|
|
|
|
.. _pickle-sub:
|
|
|
|
Subclassing Unpicklers
|
|
----------------------
|
|
|
|
.. index::
|
|
single: load_global() (pickle protocol)
|
|
single: find_global() (pickle protocol)
|
|
|
|
By default, unpickling will import any class that it finds in the pickle data.
|
|
You can control exactly what gets unpickled and what gets called by customizing
|
|
your unpickler. Unfortunately, exactly how you do this is different depending
|
|
on whether you're using :mod:`pickle` or :mod:`cPickle`. [#]_
|
|
|
|
In the :mod:`pickle` module, you need to derive a subclass from
|
|
:class:`Unpickler`, overriding the :meth:`load_global` method.
|
|
:meth:`load_global` should read two lines from the pickle data stream where the
|
|
first line will the name of the module containing the class and the second line
|
|
will be the name of the instance's class. It then looks up the class, possibly
|
|
importing the module and digging out the attribute, then it appends what it
|
|
finds to the unpickler's stack. Later on, this class will be assigned to the
|
|
:attr:`__class__` attribute of an empty class, as a way of magically creating an
|
|
instance without calling its class's :meth:`__init__`. Your job (should you
|
|
choose to accept it), would be to have :meth:`load_global` push onto the
|
|
unpickler's stack, a known safe version of any class you deem safe to unpickle.
|
|
It is up to you to produce such a class. Or you could raise an error if you
|
|
want to disallow all unpickling of instances. If this sounds like a hack,
|
|
you're right. Refer to the source code to make this work.
|
|
|
|
Things are a little cleaner with :mod:`cPickle`, but not by much. To control
|
|
what gets unpickled, you can set the unpickler's :attr:`find_global` attribute
|
|
to a function or ``None``. If it is ``None`` then any attempts to unpickle
|
|
instances will raise an :exc:`UnpicklingError`. If it is a function, then it
|
|
should accept a module name and a class name, and return the corresponding class
|
|
object. It is responsible for looking up the class and performing any necessary
|
|
imports, and it may raise an error to prevent instances of the class from being
|
|
unpickled.
|
|
|
|
The moral of the story is that you should be really careful about the source of
|
|
the strings your application unpickles.
|
|
|
|
|
|
.. _pickle-example:
|
|
|
|
Example
|
|
-------
|
|
|
|
For the simplest code, use the :func:`dump` and :func:`load` functions. Note
|
|
that a self-referencing list is pickled and restored correctly. ::
|
|
|
|
import pickle
|
|
|
|
data1 = {'a': [1, 2.0, 3, 4+6j],
|
|
'b': ('string', u'Unicode string'),
|
|
'c': None}
|
|
|
|
selfref_list = [1, 2, 3]
|
|
selfref_list.append(selfref_list)
|
|
|
|
output = open('data.pkl', 'wb')
|
|
|
|
# Pickle dictionary using protocol 0.
|
|
pickle.dump(data1, output)
|
|
|
|
# Pickle the list using the highest protocol available.
|
|
pickle.dump(selfref_list, output, -1)
|
|
|
|
output.close()
|
|
|
|
The following example reads the resulting pickled data. When reading a
|
|
pickle-containing file, you should open the file in binary mode because you
|
|
can't be sure if the ASCII or binary format was used. ::
|
|
|
|
import pprint, pickle
|
|
|
|
pkl_file = open('data.pkl', 'rb')
|
|
|
|
data1 = pickle.load(pkl_file)
|
|
pprint.pprint(data1)
|
|
|
|
data2 = pickle.load(pkl_file)
|
|
pprint.pprint(data2)
|
|
|
|
pkl_file.close()
|
|
|
|
Here's a larger example that shows how to modify pickling behavior for a class.
|
|
The :class:`TextReader` class opens a text file, and returns the line number and
|
|
line contents each time its :meth:`readline` method is called. If a
|
|
:class:`TextReader` instance is pickled, all attributes *except* the file object
|
|
member are saved. When the instance is unpickled, the file is reopened, and
|
|
reading resumes from the last location. The :meth:`__setstate__` and
|
|
:meth:`__getstate__` methods are used to implement this behavior. ::
|
|
|
|
#!/usr/local/bin/python
|
|
|
|
class TextReader:
|
|
"""Print and number lines in a text file."""
|
|
def __init__(self, file):
|
|
self.file = file
|
|
self.fh = open(file)
|
|
self.lineno = 0
|
|
|
|
def readline(self):
|
|
self.lineno = self.lineno + 1
|
|
line = self.fh.readline()
|
|
if not line:
|
|
return None
|
|
if line.endswith("\n"):
|
|
line = line[:-1]
|
|
return "%d: %s" % (self.lineno, line)
|
|
|
|
def __getstate__(self):
|
|
odict = self.__dict__.copy() # copy the dict since we change it
|
|
del odict['fh'] # remove filehandle entry
|
|
return odict
|
|
|
|
def __setstate__(self, dict):
|
|
fh = open(dict['file']) # reopen file
|
|
count = dict['lineno'] # read from file...
|
|
while count: # until line count is restored
|
|
fh.readline()
|
|
count = count - 1
|
|
self.__dict__.update(dict) # update attributes
|
|
self.fh = fh # save the file object
|
|
|
|
A sample usage might be something like this::
|
|
|
|
>>> import TextReader
|
|
>>> obj = TextReader.TextReader("TextReader.py")
|
|
>>> obj.readline()
|
|
'1: #!/usr/local/bin/python'
|
|
>>> obj.readline()
|
|
'2: '
|
|
>>> obj.readline()
|
|
'3: class TextReader:'
|
|
>>> import pickle
|
|
>>> pickle.dump(obj, open('save.p', 'wb'))
|
|
|
|
If you want to see that :mod:`pickle` works across Python processes, start
|
|
another Python session, before continuing. What follows can happen from either
|
|
the same process or a new process. ::
|
|
|
|
>>> import pickle
|
|
>>> reader = pickle.load(open('save.p', 'rb'))
|
|
>>> reader.readline()
|
|
'4: """Print and number lines in a text file."""'
|
|
|
|
|
|
.. seealso::
|
|
|
|
Module :mod:`copy_reg`
|
|
Pickle interface constructor registration for extension types.
|
|
|
|
Module :mod:`shelve`
|
|
Indexed databases of objects; uses :mod:`pickle`.
|
|
|
|
Module :mod:`copy`
|
|
Shallow and deep object copying.
|
|
|
|
Module :mod:`marshal`
|
|
High-performance serialization of built-in types.
|
|
|
|
|
|
:mod:`cPickle` --- A faster :mod:`pickle`
|
|
=========================================
|
|
|
|
.. module:: cPickle
|
|
:synopsis: Faster version of pickle, but not subclassable.
|
|
.. moduleauthor:: Jim Fulton <jim@zope.com>
|
|
.. sectionauthor:: Fred L. Drake, Jr. <fdrake@acm.org>
|
|
|
|
|
|
.. index:: module: pickle
|
|
|
|
The :mod:`cPickle` module supports serialization and de-serialization of Python
|
|
objects, providing an interface and functionality nearly identical to the
|
|
:mod:`pickle` module. There are several differences, the most important being
|
|
performance and subclassability.
|
|
|
|
First, :mod:`cPickle` can be up to 1000 times faster than :mod:`pickle` because
|
|
the former is implemented in C. Second, in the :mod:`cPickle` module the
|
|
callables :func:`Pickler` and :func:`Unpickler` are functions, not classes.
|
|
This means that you cannot use them to derive custom pickling and unpickling
|
|
subclasses. Most applications have no need for this functionality and should
|
|
benefit from the greatly improved performance of the :mod:`cPickle` module.
|
|
|
|
The pickle data stream produced by :mod:`pickle` and :mod:`cPickle` are
|
|
identical, so it is possible to use :mod:`pickle` and :mod:`cPickle`
|
|
interchangeably with existing pickles. [#]_
|
|
|
|
There are additional minor differences in API between :mod:`cPickle` and
|
|
:mod:`pickle`, however for most applications, they are interchangeable. More
|
|
documentation is provided in the :mod:`pickle` module documentation, which
|
|
includes a list of the documented differences.
|
|
|
|
.. rubric:: Footnotes
|
|
|
|
.. [#] Don't confuse this with the :mod:`marshal` module
|
|
|
|
.. [#] In the :mod:`pickle` module these callables are classes, which you could
|
|
subclass to customize the behavior. However, in the :mod:`cPickle` module these
|
|
callables are factory functions and so cannot be subclassed. One common reason
|
|
to subclass is to control what objects can actually be unpickled. See section
|
|
:ref:`pickle-sub` for more details.
|
|
|
|
.. [#] *Warning*: this is intended for pickling multiple objects without intervening
|
|
modifications to the objects or their parts. If you modify an object and then
|
|
pickle it again using the same :class:`Pickler` instance, the object is not
|
|
pickled again --- a reference to it is pickled and the :class:`Unpickler` will
|
|
return the old value, not the modified one. There are two problems here: (1)
|
|
detecting changes, and (2) marshalling a minimal set of changes. Garbage
|
|
Collection may also become a problem here.
|
|
|
|
.. [#] The exception raised will likely be an :exc:`ImportError` or an
|
|
:exc:`AttributeError` but it could be something else.
|
|
|
|
.. [#] These methods can also be used to implement copying class instances.
|
|
|
|
.. [#] This protocol is also used by the shallow and deep copying operations defined in
|
|
the :mod:`copy` module.
|
|
|
|
.. [#] The actual mechanism for associating these user defined functions is slightly
|
|
different for :mod:`pickle` and :mod:`cPickle`. The description given here
|
|
works the same for both implementations. Users of the :mod:`pickle` module
|
|
could also use subclassing to effect the same results, overriding the
|
|
:meth:`persistent_id` and :meth:`persistent_load` methods in the derived
|
|
classes.
|
|
|
|
.. [#] We'll leave you with the image of Guido and Jim sitting around sniffing pickles
|
|
in their living rooms.
|
|
|
|
.. [#] A word of caution: the mechanisms described here use internal attributes and
|
|
methods, which are subject to change in future versions of Python. We intend to
|
|
someday provide a common interface for controlling this behavior, which will
|
|
work in either :mod:`pickle` or :mod:`cPickle`.
|
|
|
|
.. [#] Since the pickle data format is actually a tiny stack-oriented programming
|
|
language, and some freedom is taken in the encodings of certain objects, it is
|
|
possible that the two modules produce different data streams for the same input
|
|
objects. However it is guaranteed that they will always be able to read each
|
|
other's data streams.
|
|
|