of 2-space and 4-space indents. Whatever, when I saw the checkin diff it
was clear that what my editor thinks a tab means didn't match this module's
belief. Removed all the tabs from the lines I added and changed, left
everything else alone.
pickled into the signed(!) 4-byte BININT format, so were getting unpickled
again as negative ints. Repaired that.
Added some minimal docs at the top about what I've learned about the pickle
format codes (little of which was obvious from staring at the code,
although that's partly because all the size-related bugs greatly obscured
the true intent of the code).
Happy side effect: because save_int() needed to grow a *proper* range
check in order to fix this bug, it can now use the more-efficient BININT1,
BININT2 and BININT formats when the long's value is small enough to fit
in a signed 4-byte int (before this, on a sizeof(long)==8 box it always
used the general INT format for negative ints).
test_cpickle works again on sizeof(long)==8 machines. test_pickle is
still busted big-time.
binary pickle, and the latter contains a pickle of a negative Python
int i written on a sizeof(long)==4 box (and whether by cPickle or
pickle.py), it's read incorrectly as i + 2**32. The patch repairs that,
and allows test_cpickle.py (to which I added a relevant test case earlier
today) to work again on sizeof(long)==8 boxes.
There's another (at least one) sizeof(long)==8 binary pickle bug, but in
pickle.py instead. That bug is still there, and test_pickle.py doesn't
catch it yet (try pickling and unpickling, e.g., 1 << 46).
bugs #126161 and 123634).
The solution doesn't use the unicode-escape encoding; that has other
problems (it seems not 100% reversible). Rather, it transforms the
input Unicode object slightly before encoding it using
raw-unicode-escape, so that the decoding will reconstruct the original
string: backslash and newline characters are translated into their
\uXXXX counterparts.
This is backwards incompatible for strings containing backslashes, but
for some of those strings, the pickling was already broken.
Note that SF bug #123634 complains specifically that cPickle fails to
unpickle the pickle for u'' (the empty Unicode string) correctly.
This was an off-by-one error in load_unicode().
XXX Ugliness: in order to do the modified raw-unicode-escape, I've
cut-and-pasted a copy of PyUnicode_EncodeRawUnicodeEscape() into this
file that also encodes '\\' and '\n'. It might be nice to migrate
this into the Unicode implementation and give this encoding a new name
('half-raw-unicode-escape'? 'pickle-unicode-escape'?); that would help
pickle.py too. But right now I can't be bothered with the necessary
infrastructural changes.
and fwrite return size_t, so it is safer to cast up to the largest type for the
comparison. I believe the cast is required at all to remove compiler warnings.
This patch fixes cPickle.c for 64-bit platforms.
- The false assumption sizeof(long) == size(void*) exists where
PyInt_FromLong is used to represent a pointer. The safe Python call
for this is PyLong_FromVoidPtr. (On platforms where the above
assumption *is* true a PyInt is returned as before so there is no
effective change.)
- use size_t instead of int for some variables
into. Jim writes:
The core dump was due to a C decrement operation
in a macro invocation in load_pop. (BAD)
I fixed this by moving the decrement outside
the macro call.
I added a comment to load_pop and load_mark
to document the fact that cPickle separates the
unpickling stack into two separate stacks, one for
objects and one for marks.
I also moved some increments out of some macro
calls (PyTuple_SET_ITEM and PyList_SET_ITEM).
This wasn't necessary, but made me feel better. :)
I tested these changes in *my* cPickle, which
doesn't have the new Unicode stuff.
For more comments, read the patches@python.org archives.
For documentation read the comments in mymalloc.h and objimpl.h.
(This is not exactly what Vladimir posted to the patches list; I've
made a few changes, and Vladimir sent me a fix in private email for a
problem that only occurs in debug mode. I'm also holding back on his
change to main.c, which seems unnecessary to me.)
"""
Problem description:
Run the following script:
import test.test_cpickle
for x in xrange(1000000):
reload(test.test_cpickle)
Watch Python's memory use go up up and away!
In the course of debugging this I also saw that cPickle is
inconsistent with pickle - if you attempt a pickle.load or pickle.dump
on a closed file, you get a ValueError, whereas the corresponding
cPickle operations give an IOError. Since cPickle is advertised as
being compatible with pickle, I changed these exceptions to match.
"""
"""
It fixes a memory corruption error resulting from BadPickleGet
exceptions in load_get, load_binget and load_long_binget. This was
initially reported on c.l.py as a problem with Cookie.py; see the thread
titled "python core dump (SIGBUS) on Solaris" for more details.
If PyDict_GetItem(self->memo, py_key) call failed, then py_key was being
Py_DECREF'd out of existence before call was made to
PyErr_SetObject(BadPickleGet, py_key).
The bug can be duplicated as follows:
import cPickle
cPickle.loads('garyp')
This raises a BadPickleGet exception whose value is a freed object. A
core dump will soon follow.
"""
Jim Fulton approves of the patch.
- Don't call Py_FatalError() when initialization fails.
- Fix bogus use of return value from PyRun_String().
- Fix misc. compiler errors on some platforms.
I've updated cPickle.c to use class exceptions:
Changed pickle error types to classes:
PickleError
PicklingError
UnpickleableError
UnpicklingError
And change the handling of unpickleable objects so that an UnpickleableError
is raised with the unpickleable object as the argument. UnpickleableError
has a reasonable string representation and provides access to the problem
object, which is useful during debugging.
[I'm still waiting for patches to do the same to pickle.py.]
I have attached a new cPickle that adds a new control attribute
to unpicklers:
Added new Unpickler attribute, find_global. If set to None, then
global and instance pickles are disabled. Otherwise, it should be set to
a callable object that takes two arguments, a module name and an
object name, and returns an object. If the attribute is unset, then
the default mechanism is used.
This feature provides an additional mechanism for controlling which
classes can be used for unpickling.
- New copyright. (Open source)
- Added new protocol for binary string pickles that
takes out unneeded puts:
p=Pickler()
p.dump(x)
p.dump(y)
thePickle=p.getvalue()
This has little or no impact on pickling time, but
often reduces unpickling time and pickle size, sometimes
significantly.
- Changed unpickler to use internal data structure instead
of list to reduce unpickling times by about a third.
- Many cleanups to get rid of obfuscated error handling
involving 'goto finally' and status variables.
- Extensive reGuidofication. (formatting :)
- Fixed binary floating-point pickling bug. 0.0 was not
pickled correctly.
- Now use binary floating point format when saving
floats in binary mode.
- Fixed some error message spelling error.
I had to make a slight diddle to work with Python 1.4, which
we and some of our customers are still using. :(
I've also made a few minor enhancements:
- You can now both get and set the memo using a 'memo'
attribute. This is handy for certain advanced applications
that we have.
- Added a 'binary' attribute to get and set the binary
mode for a pickler.
- Added a somewhat experimental 'fast' attribute. When this
is set, objects are not placed in the memo during pickling.
This should lead to faster pickling and smaller pickles in
cases where:
o you *know* there are no circular references, and
o either you've:
- preloaded the memo with class information
by pickling classes in non-fast mode or by
manipilating the memo directly, or
- aren't pickling instances.
1. Only DECREF the class's module when the module is retrieved via
PyImport_Import. If it is retrieved from the modules dictionary with
PyDict_GetItem, it is using a borrowed reference.
2. If the module doesn't define the desired class, raise the same
SystemError that pickle.py does instead of returning an AttributeError
(which is cryptic at best).
Also, fix the PyArg_ParseTuple in cpm_loads (the externally visible
loads) function: Use "S" instead of "O" because cStringIO will croak
with a "bad arguments to internal function" if passed anything other
than a string.
- Loading non-binary string pickles checks for insecure
strings. This is needed because cPickle (still)
uses a restricted eval to parse non-binary string pickles.
This change is needed to prevent untrusted
pickles like::
"S'hello world'*2000000\012p0\012."
from hosing an application.
- User-defined types can now support unpickling without
executing a constructor.
The second value returned from __reduce__ can now be None,
rather than an argument tuple. On unpickling, if the second
value returned from __reduce__ during pickling was None, then
rather than calling the first value returned from __reduce__,
directly, the __basicnew__ method of the first value returned
from __reduce__ is called without arguments.