pickle.py
The code implicitly assumed that all ints fit in 4 bytes, causing all
sorts of mischief (from nonsense results to corrupted pickles).
Repaired that.
marshal.c
The int marshaling code assumed that right shifts of signed longs
sign-extend. Repaired that.
bugs #126161 and 123634).
The solution doesn't use the unicode-escape encoding; that has other
problems (it seems not 100% reversible). Rather, it transforms the
input Unicode object slightly before encoding it using
raw-unicode-escape, so that the decoding will reconstruct the original
string: backslash and newline characters are translated into their
\uXXXX counterparts.
This is backwards incompatible for strings containing backslashes, but
for some of those strings, the pickling was already broken.
Strings are unpickled by calling eval on the string's repr. This
change makes pickle work like cPickle; it checks if the pickled
string is safe to eval and raises ValueError if it is not.
test suite modifications:
Verify that pickle catches a variety of insecure string pickles
Make test_pickle and test_cpickle use exactly the same test suite
Add test for pickling recursive object
who writes:
Here is batch 2, as a big collection of CVS context diffs.
Along with moving comments into docstrings, i've added a
couple of missing docstrings and attempted to make sure more
module docstrings begin with a one-line summary.
I did not add docstrings to the methods in profile.py for
fear of upsetting any careful optimizations there, though
i did move class documentation into class docstrings.
The convention i'm using is to leave credits/version/copyright
type of stuff in # comments, and move the rest of the descriptive
stuff about module usage into module docstrings. Hope this is
okay.
I found the following patch helpful in tracking down a bug in some
code. I had appended time, the module, instead of time.time(). Not
sure if it is generally true that printing the repr of the object is
good, but I expect that most unpicklable things will have fairly
information and concise reprs (like files or sockets or modules).
"""
I've attached a long overdue patch to pickle.py to bring it to format
1.3, which is the same as 1.2 except that the binary float format
is supported. This is done using the new platform-indepent format
features of struct.
This patch also gets rid of the undocumented obsolete Pickler
dump_special method.
"""
there's an __getinitargs__() method), if a TypeError occurs, catch and
reraise it but add info to the error about the class name being
instantiated. This makes debugging a lot easier if __getinitargs__()
returns something bogus (e.g. a string instead of a singleton tuple).
Fixed problems when unpickling in restricted execution environments.
These methods try to assign to an instance's __class__ attribute, or
access the instances __dict__, which are prohibited in REE. For the
first two methods, I re-implemented the old behavior when assignment
to value.__class__ fails.
For the load_build() I also re-implemented the old behavior when
inst.__dict__.update() fails but this means that unpickling in REE is
semantically different than unpickling in unrestricted mode.
The attached patch adds the following behavior to the handling
of REDUCE codes:
- A user-defined type may have a __reduce__ method that returns
a string rather than a tuple, in which case the object is
saved as a global object with a name given by the string returned
by reduce.
This was a feature added to cPickle a long time ago.
- User-defined types can now support unpickling without
executing a constructor.
The second value returned from '__reduce__' can now be None,
rather than an argument tuple. On unpickling, if the
second value returned from '__reduce__' during pickling was
None, then rather than calling the first value returned from
'__reduce__', directly, the '__basicnew__' method of the
first value returned from '__reduce__' is called without
arguments.
I also got rid of a few of Chris' extra ()s, which he used
to make python ifs look like C ifs.
mode. The pickler always uses base 10 so the default base should be
fine. (The base gets us in trouble when there's no strop module, as
the atoi() in string.py only supports base 10. This is for JPython.)
not define __getinitargs__, bypass the __init__ constructor
completely. This uses the trick of instantiating an empty dummy class
and then changing inst.__class__ to the real class. This is done in
two places: once for the INST and once for the OBJ format code.
Also replaced the much outdated long doc string with a short summary
of the module; the information of that doc string is already
incorporated in the library reference manual.
- Don't use "from copy_reg import *".
- Use cls.__module__ instead of calling whichobject(cls, cls.__name__);
also try __module__ in whichmodule(), just in case.
- After calling save_reduce(), add the object to the memo.
instance, use inst.__dict__.update(value) instead of a for loop with
setattr() over the value.keys(). This is more consistent (the
pickling doesn't use getattr() either but pickles inst.__dict__) and
avoids problems with instances that have a __setattr__ hook.
But it *is* a semantic change (because the setattr hook is no longer
used). So beware!
The optimizations consist mostly of using local variables to cache
methods or instance variables used a lot (e.g. "self.write").
The loopholes allows marshalling extension types as long as they have
a __class__ attribute (in which case they may support the rest of the
class piclking protocol as well). This allows pickling MESS extension
types.
pickle.py: new low-level persistency module (used to be called flatten)
dbmac.py: stupid dbm clone for the Mac
anydbm.py: generic dbm interface (should be extended to support gdbm)