this clarifies that they are part of an internal API (albeit shared
between pickle.py, copy_reg.py and cPickle.c).
I'd like to do the same for copy_reg.dispatch_table, but worry that it
might be used by existing code. This risk doesn't exist for the
extension registry.
outcome as __slotnames__ on the class. (Like __slots__, it's not safe
to ask for this as an attribute -- you must look for it in the
specific class's __dict__. But it must be set using attribute
notation, because __dict__ is a read-only proxy.)
Assorted code cleanups; e.g., sizeof(char) is 1 by definition, so there's
no need to do things like multiply by sizeof(char) in hairy malloc
arguments. Fixed an undetected-overflow bug in readline_file().
longobject.c: Fixed a really stupid bug in the new _PyLong_NumBits.
pickle.py: Fixed stupid bug in save_long(): When proto is 2, it
wrote LONG1 or LONG4, but forgot to return then -- it went on to
append the proto 1 LONG opcode too.
Fixed equally stupid cancelling bugs in load_long1() and
load_long4(): they *returned* the unpickled long instead of pushing
it on the stack. The return values were ignored. Tests passed
before only because save_long() pickled the long twice.
Fixed bugs in encode_long().
Noted that decode_long() is quadratic-time despite our hopes,
because long(string, 16) is still quadratic-time in len(string).
It's hex() that's linear-time. I don't know a way to make decode_long()
linear-time in Python, short of maybe transforming the 256's-complement
bytes into marshal's funky internal format, and letting marshal decode
that. It would be more valuable to make long(string, 16) linear time.
pickletester.py: Added a global "protocols" vector so tests can try
all the protocols in a sane way. Changed test_ints() and test_unicode()
to do so. Added a new test_long(), but the tail end of it is disabled
because it "takes forever" under pickle.py (but runs very quickly under
cPickle: cPickle proto 2 for longs is linear-time).
The 4th item can be None or an iterator yielding list items, which are
used to append() or extend() the object. The 5th item can be None or
an iterator yielding a dict's (key, value) pairs, which are stuffed
into the object using __setitem__.
Also (as a separate, though related, feature) add "batching" for list
and dict items. If you pickled a dict or list with a million items in
the past, it would push a million items onto the stack. It now pushes
only 1000 items at a time on the stack, using repeated APPENDS or
SETITEMS opcodes. (For lists, I hope that using many short extend()
calls doesn't exhibit quadratic behavior.)
__module__ is the string name of the module the function was defined
in, just like __module__ of classes. In some cases, particularly for
C functions, the __module__ may be None.
Change PyCFunction_New() from a function to a macro, but keep an
unused copy of the function around so that we don't change the binary
API.
Change pickle's save_global() to use whichmodule() if __module__ is
None, but add the __module__ logic to whichmodule() since it might be
used outside of pickle.
on the type instead of self.save(t). This defeated the purpose of
NEWOBJ, because it didn't generate a BINGET opcode when t was already
memoized; but moreover, it would generate multiple BINPUT opcodes for
the same type! pickletools.dis() doesn't like this.
How I found this? I was playing with picklesize.py in the datetime
sandbox, and noticed that protocol 2 pickles for multiple objects were
in fact larger than protocol 1 pickles! That was suspicious, so I
decided to disassemble one of the pickles.
This really needs a unit test, but I'm exhausted. I'll be late for
work as it is. :-(
the same function, don't save the state or write a BUILD opcode. This
is so that a type (e.g. datetime :-) can support protocol 2 using
__getnewargs__ while also supporting protocol 0 and 1 using
__getstate__. (Without this, the state would be pickled twice with
protocol 2, unless __getstate__ is defined to return None, which
breaks protocol 0 and 1.)
types. The special handling for these can now be removed from save_newobj().
Add some testing for this.
Also add support for setting the 'fast' flag on the Python Pickler class,
which suppresses use of the memo.
object.__reduce__, do a getattr() on the class so we can explicitly
test for it. The reduce()-calling code becomes a bit more regular as
a result.
Also add support slots: if an object has slots, the default state is
(dict, slots) where dict is the __dict__ or None, and slots is a dict
mapping slot names to slot values. We do a best-effort approach to
find slot names, assuming the __slots__ fields of classes aren't
modified after class definition time to misrepresent the actual list
of slots defined by a class.
be one of 0, 1 or 2).
I should note that the previous checkin also added NEWOBJ support to
the unpickler -- but there's nothing yet that generates this.
some notion of low-level efficiency. Undid that, but left one routine
alone: save_inst() claims it has a reason for not using memoize().
I don't understand that comment, so added an XXX comment there.
then the embedded argument consumes at least 256 bytes. The difference
between a 3-byte prefix (LONG2 + 2 bytes) and a 5-byte prefix (LONG4 +
4 bytes) is at worst less than 1%. Note that binary strings and binary
Unicode strings also have only "size is 1 byte, or size is 4 bytes?"
flavors, and I expect for the same reason. The only place a 2-byte
thingie was used was in BININT2, where the 2 bytes make up the *entire*
embedded argument (and now EXT2 also does this); that's a large savings
over 4 bytes, because the total opcode+argument size is so small in
the BININT2/EXT2 case.
Removed the TAKEN_FROM_ARGUMENT "number of bytes" code, and bifurcated it
into TAKEN_FROM_ARGUMENT1 and TAKEN_FROM_ARGUMENT4. Now there's enough
info in ArgumentDescriptor objects to deduce the # of bytes consumed by
each opcode.
Rearranged the order in which proto2 opcodes are listed in pickle.py.
add memoize() helper function to update the memo.
The first element of the tuple returned by __reduce__() must be a
callable. If it isn't the Unpickler will raise an error. Catch this
error in the pickler and raise the error there.
The memoize() helper also has a comment explaining how the memo
works. So methods can't use memoize() because the write funny codes.
This fixes the charming, but unhelpful error message for
>>> pickle.dumps(type.__new__)
Can't pickle <built-in method __new__ of type object at 0x812a440>: it's not the same object as datetime.math.__new__
Bugfix candidate.
Change pickling format for bools to use a backwards compatible
encoding. This means you can pickle True or False on Python 2.3
and Python 2.2 or before will read it back as 1 or 0. The code
used for pickling bools before would create pickles that could
not be read in previous Python versions.
PEP 285. Everything described in the PEP is here, and there is even
some documentation. I had to fix 12 unit tests; all but one of these
were printing Boolean outcomes that changed from 0/1 to False/True.
(The exception is test_unicode.py, which did a type(x) == type(y)
style comparison. I could've fixed that with a single line using
issubtype(x, type(y)), but instead chose to be explicit about those
places where a bool is expected.
Still to do: perhaps more documentation; change standard library
modules to return False/True from predicates.
metaclass, reported by Dan Parisien.
Objects that are instances of custom metaclasses, i.e. whose class is
a subclass of 'type', should be pickled the same as new-style classes
(objects whose class is 'type'). This can't be done through a
dispatch table entry, and the __reduce__ trick doesn't work for these,
since it finds the unbound __reduce__ for instances of the class
(inherited from 'object'). So check explicitly using issubclass().
load_inst(): Implement the security hook that cPickle already had.
When unpickling callables which are not classes, we look to see if the
object has an attribute __safe_for_unpickling__. If this exists and
has a true value, then we can call it to create the unpickled object.
Otherwise we raise an UnpicklingError.
find_class(): We no longer mask ImportError, KeyError, and
AttributeError by transforming them into SystemError. The latter is
definitely not the right thing to do, so we let the former three
exceptions simply propagate up if they occur, i.e. we remove the
try/except!
64-bit INTs on 32-bit boxes (where they become longs). Also exploit that
int(str) and long(str) will ignore a trailing newline (saves creating a
new string at the Python level).
pickletester.py: Simulate reading a pickle produced by a 64-bit box.
is pickled as a global must now exist by the name under which it is
pickled, otherwise the pickling fails. Previously, such things would
fail on unpickling, or unpickle as the wrong global object. I'm
hoping that this won't break existing code that is playing tricks with
this.
I need a volunteer to do this for cPickle too.
pickle.py
The code implicitly assumed that all ints fit in 4 bytes, causing all
sorts of mischief (from nonsense results to corrupted pickles).
Repaired that.
marshal.c
The int marshaling code assumed that right shifts of signed longs
sign-extend. Repaired that.