Commit Graph

139 Commits

Author SHA1 Message Date
Tim Peters bf2674be0e long(string, base) now takes time linear in len(string) when base is a
power of 2.  Enabled the tail end of test_long() in pickletester.py
because it no longer takes forever when run from test_pickle.py.
2003-02-02 07:51:32 +00:00
Tim Peters ee1a53cbb1 cPickle.c: Full support for the new LONG1 and LONG4. Added comments.
Assorted code cleanups; e.g., sizeof(char) is 1 by definition, so there's
no need to do things like multiply by sizeof(char) in hairy malloc
arguments.  Fixed an undetected-overflow bug in readline_file().

longobject.c:  Fixed a really stupid bug in the new _PyLong_NumBits.

pickle.py:  Fixed stupid bug in save_long():  When proto is 2, it
wrote LONG1 or LONG4, but forgot to return then -- it went on to
append the proto 1 LONG opcode too.
Fixed equally stupid cancelling bugs in load_long1() and
load_long4():  they *returned* the unpickled long instead of pushing
it on the stack.  The return values were ignored.  Tests passed
before only because save_long() pickled the long twice.

Fixed bugs in encode_long().

Noted that decode_long() is quadratic-time despite our hopes,
because long(string, 16) is still quadratic-time in len(string).
It's hex() that's linear-time.  I don't know a way to make decode_long()
linear-time in Python, short of maybe transforming the 256's-complement
bytes into marshal's funky internal format, and letting marshal decode
that.  It would be more valuable to make long(string, 16) linear time.

pickletester.py:  Added a global "protocols" vector so tests can try
all the protocols in a sane way.  Changed test_ints() and test_unicode()
to do so.  Added a new test_long(), but the tail end of it is disabled
because it "takes forever" under pickle.py (but runs very quickly under
cPickle:  cPickle proto 2 for longs is linear-time).
2003-02-02 02:57:53 +00:00
Tim Peters 5bd2a79b22 The C pickle now knows how to deal with a proto= argument. Assorted
code cleanups, and purged more references to text-vs-binary modes.
2003-02-01 16:45:06 +00:00
Guido van Rossum 7eff63abce Change the default protocol back to 0.
Add a feature suggested by Tim: a negative protocol value means to use
the largest protocol value supported.
2003-01-31 19:42:31 +00:00
Guido van Rossum 25cb7dfb0f Another extension to reduce(). It can return a 4- or 5-tuple now.
The 4th item can be None or an iterator yielding list items, which are
used to append() or extend() the object.  The 5th item can be None or
an iterator yielding a dict's (key, value) pairs, which are stuffed
into the object using __setitem__.

Also (as a separate, though related, feature) add "batching" for list
and dict items.  If you pickled a dict or list with a million items in
the past, it would push a million items onto the stack.  It now pushes
only 1000 items at a time on the stack, using repeated APPENDS or
SETITEMS opcodes.  (For lists, I hope that using many short extend()
calls doesn't exhibit quadratic behavior.)
2003-01-31 18:53:21 +00:00
Jeremy Hylton 4f0dcc9a9a Provide __module__ attributes for functions defined in C and Python.
__module__ is the string name of the module the function was defined
in, just like __module__ of classes.  In some cases, particularly for
C functions, the __module__ may be None.

Change PyCFunction_New() from a function to a macro, but keep an
unused copy of the function around so that we don't change the binary
API.

Change pickle's save_global() to use whichmodule() if __module__ is
None, but add the __module__ logic to whichmodule() since it might be
used outside of pickle.
2003-01-31 18:33:18 +00:00
Guido van Rossum f7f4517fae Pass the object to save_reduce(), so the memoize() call can go into
save_reduce(), before the state is pickled.  This makes it possible
for an object to be referenced from its own (mutable) state.
2003-01-31 17:17:49 +00:00
Guido van Rossum d053b4b416 Add a magical feature to save_reduce so that __reduce__ can cause
NEWOBJ to be generated.
2003-01-31 16:51:45 +00:00
Tim Peters 4b23f2b44b It's Official: for LONG1/LONG4, a "byte count" of 0 is taken as a
shortcut meaning 0L.  This allows LONG1 to encode 0L in two bytes
total.
2003-01-31 16:43:39 +00:00
Neal Norwitz d17406830c Fix typo 2003-01-31 04:04:23 +00:00
Tim Peters 91149821d3 Linear-time implementations of {encode,decode}_long. 2003-01-31 03:43:58 +00:00
Tim Peters d01c1e91c4 load_inst(), load_obj(): Put the bulk of these into a common new
_instantiate() method.
2003-01-30 15:41:46 +00:00
Guido van Rossum 9b40e804c7 There was a subtle big in save_newobj(): it used self.save_global(t)
on the type instead of self.save(t).  This defeated the purpose of
NEWOBJ, because it didn't generate a BINGET opcode when t was already
memoized; but moreover, it would generate multiple BINPUT opcodes for
the same type!  pickletools.dis() doesn't like this.

How I found this?  I was playing with picklesize.py in the datetime
sandbox, and noticed that protocol 2 pickles for multiple objects were
in fact larger than protocol 1 pickles!  That was suspicious, so I
decided to disassemble one of the pickles.

This really needs a unit test, but I'm exhausted.  I'll be late for
work as it is. :-(
2003-01-30 06:37:41 +00:00
Guido van Rossum 4fba220f4a Slight code rearrangement to avoid testing getstate twice. 2003-01-30 05:41:19 +00:00
Guido van Rossum 45486176ea In save_newobj(), if an object's __getnewargs__ and __getstate__ are
the same function, don't save the state or write a BUILD opcode.  This
is so that a type (e.g. datetime :-) can support protocol 2 using
__getnewargs__ while also supporting protocol 0 and 1 using
__getstate__.  (Without this, the state would be pickled twice with
protocol 2, unless __getstate__ is defined to return None, which
breaks protocol 0 and 1.)
2003-01-30 05:39:04 +00:00
Guido van Rossum ba884f3d22 Use %c rather than chr() to turn some ints into chars. 2003-01-29 20:14:23 +00:00
Guido van Rossum 5d9113d8be Implement appropriate __getnewargs__ for all immutable subclassable builtin
types.  The special handling for these can now be removed from save_newobj().
Add some testing for this.

Also add support for setting the 'fast' flag on the Python Pickler class,
which suppresses use of the memo.
2003-01-29 17:58:45 +00:00
Guido van Rossum 586c9e813c Declare Protocol 2 as implemented. 2003-01-29 06:16:12 +00:00
Guido van Rossum 255f3ee0a5 Support for extension codes. (By accident I checked in the tests first.) 2003-01-29 06:14:11 +00:00
Tim Peters c0c12b5707 pickle: Comment repair.
pickletools:  Import decode_long from pickle instead of duplicating it.
2003-01-29 00:56:17 +00:00
Guido van Rossum 4e2491dbb1 Add a comment about how some built-in types should grow a
__getnewargs__ method.
2003-01-28 22:31:25 +00:00
Guido van Rossum b26a97aa50 Get rid of __safe_for_unpickling__ and safe_constructors.
Also tidied up a few lines, got rid of apply(), added a comment.
2003-01-28 22:29:13 +00:00
Guido van Rossum ac5b5d2e8b Instead of bad hacks trying to worm around the inherited
object.__reduce__, do a getattr() on the class so we can explicitly
test for it.  The reduce()-calling code becomes a bit more regular as
a result.

Also add support slots: if an object has slots, the default state is
(dict, slots) where dict is the __dict__ or None, and slots is a dict
mapping slot names to slot values.  We do a best-effort approach to
find slot names, assuming the __slots__ fields of classes aren't
modified after class definition time to misrepresent the actual list
of slots defined by a class.
2003-01-28 22:01:16 +00:00
Guido van Rossum 3d8c01b31c The default __reduce__ on the base object type obscured any
possibility of calling save_reduce().  Add a special hack for this.
The tests for this are much simpler now (no __getstate__ or
__getnewargs__ needed).
2003-01-28 19:48:18 +00:00
Guido van Rossum 54fb192508 Move the NEWOBJ-generating code to a separate function, and invoke it
after checking for __reduce__.
2003-01-28 18:22:35 +00:00
Guido van Rossum 533dbcf250 Some experimental support for generating NEWOBJ with proto=2, and
fixed a bug in load_newobj().
2003-01-28 17:55:05 +00:00
Tim Peters a6ae9a2128 save_empty_tuple(): Comment on why we can't get rid of this. 2003-01-28 16:58:41 +00:00
Tim Peters 82ca59e002 save_dict(): Added a comment about the control flow NealN missed. 2003-01-28 16:47:59 +00:00
Tim Peters 13a25fb8e6 _is_string_secure(): This method is no longer used; removed it. (It
was used before string-escape codecs were added to the core.)
2003-01-28 16:42:22 +00:00
Guido van Rossum bc64e22ed6 Made save() fit on a page, while adding comments. (I moved some type
checks to save_reduce(), which can also be called from a subclass.)

Also tweaked some more comments.
2003-01-28 16:34:19 +00:00
Tim Peters ad5a771fae Got rid of the _quotes global. Used only once, and is trivial. 2003-01-28 16:23:33 +00:00
Guido van Rossum 1be3175992 Add a few comments. Change the way the protocol is checked (it must
be one of 0, 1 or 2).

I should note that the previous checkin also added NEWOBJ support to
the unpickler -- but there's nothing yet that generates this.
2003-01-28 15:19:53 +00:00
Guido van Rossum 3a41c61dd4 Rename all variables 'object' to 'obj' to avoid conflicts with the
type 'object'.  Also minor docstring tweakage, and rearranged a few
lines in save().
2003-01-28 15:10:22 +00:00
Guido van Rossum cbe2dbddda Don't memoize the empty tuple in protocol 0. 2003-01-28 14:40:16 +00:00
Tim Peters d97da80dd5 save_tuple(): So long as the charter is rewriting for clarity, the snaky
control flow had to be simplified.
2003-01-28 05:48:29 +00:00
Tim Peters ff57bff16e save_tuple(): I believe the new code for TUPLE{1,2,3} in proto 2 was
incorrect for recursive tuples.  Tried to repair; seems to work OK, but
there are no checked-in tests for this yet.
2003-01-28 05:34:53 +00:00
Guido van Rossum 7d97d31a1b OK, this is really the last one tonight!
NEWFALSE and NEWTRUE.
2003-01-28 04:25:27 +00:00
Guido van Rossum 44f0ea5f73 More protocol 2: TUPLE1, TUPLE2, TUPLE3.
Also moved the special case for empty tuples from save() to save_tuple().
2003-01-28 04:14:51 +00:00
Tim Peters 3b769835ca save_inst(): Rewrote to have only one branch on self.bin. Also got rid
of my recent XXX comment, taking a (what appears to be vanishingly small)
chance and calling self.memoize() instead.
2003-01-28 03:51:36 +00:00
Guido van Rossum d6c9e63af9 First baby steps towards implementing protocol 2: PROTO, LONG1 and LONG4. 2003-01-28 03:49:52 +00:00
Tim Peters d95c2df3a9 Fixed odd whitespace after "if", which I believe I introduced long ago. 2003-01-28 03:41:54 +00:00
Tim Peters 8fda7bc48d save_int(): Fixed two new off-by-1 glitches. 2003-01-28 03:40:52 +00:00
Guido van Rossum e0b904232f Add a comment explaining that struct.pack() beats marshal.dumps(), but
marshal.loads() beats struct.unpack()!  Possibly because the latter
creates a one-tuple. :-(
2003-01-28 03:17:21 +00:00
Guido van Rossum 5c938d00a1 Got rid of mdumps; I timed it, and struct.pack("<i", x) is more than
40% faster than marshal.dumps(x)[1:]!  (That's not counting the
module attribute lookups, which can be avoided in either case.)
2003-01-28 03:03:08 +00:00
Tim Peters f558da0f90 save_tuple(): Minor rewriting, and added a comment about the subtlety
created by recursive tuples.
2003-01-28 02:09:55 +00:00
Tim Peters 209ad95b00 load_appends(): replaced .append() loop with an .extend(). 2003-01-28 01:44:45 +00:00
Tim Peters c23d18a955 Comments. 2003-01-28 01:41:51 +00:00
Tim Peters 064567e41a save_dict(): Untangled most of the bin-vs-not-bin logic. Also used
iteritems() instead of materializing a (possibly giant) list of the
items.
2003-01-28 01:34:43 +00:00
Tim Peters 21c18f0bf5 save_list(): Rewrote, to untangle the proto 0 from the proto 1 cases.
The code is much easier to follow now, and I bet it's faster too.
2003-01-28 01:15:46 +00:00
Tim Peters 22dc6f4f7a save_list(): removed unused local "d". 2003-01-28 01:07:48 +00:00