From d05c9ff84501d93b13de40a9c7b0360c7d2ebada Mon Sep 17 00:00:00 2001 From: Alexandre Vassalotti Date: Sat, 7 Dec 2013 01:09:27 -0800 Subject: [PATCH] Issue #6784: Strings from Python 2 can now be unpickled as bytes objects. Initial patch by Merlijn van Deen. I've added a few unrelated docstring fixes in the patch while I was at it, which makes the documentation for pickle a bit more consistent. --- Doc/library/pickle.rst | 88 ++++---- Lib/pickle.py | 71 ++++--- Lib/pickletools.py | 183 +++++++++-------- Lib/test/pickletester.py | 30 ++- Lib/test/test_pickle.py | 4 + Misc/ACKS | 1 + Misc/NEWS | 4 + Modules/_pickle.c | 432 +++++++++++++++++++++------------------ 8 files changed, 447 insertions(+), 366 deletions(-) diff --git a/Doc/library/pickle.rst b/Doc/library/pickle.rst index 1f35b606c19..897621147c0 100644 --- a/Doc/library/pickle.rst +++ b/Doc/library/pickle.rst @@ -173,7 +173,7 @@ The :mod:`pickle` module provides the following constants: An integer, the default :ref:`protocol version ` used for pickling. May be less than :data:`HIGHEST_PROTOCOL`. Currently the - default protocol is 3, a new protocol designed for Python 3.0. + default protocol is 3, a new protocol designed for Python 3. The :mod:`pickle` module provides the following functions to make the pickling @@ -184,9 +184,9 @@ process more convenient: Write a pickled representation of *obj* to the open :term:`file object` *file*. This is equivalent to ``Pickler(file, protocol).dump(obj)``. - The optional *protocol* argument tells the pickler to use the given protocol; - supported protocols are 0, 1, 2, 3. The default protocol is 3; a - backward-incompatible protocol designed for Python 3.0. + The optional *protocol* argument tells the pickler to use the given + protocol; supported protocols are 0, 1, 2, 3. The default protocol is 3; a + backward-incompatible protocol designed for Python 3. Specifying a negative protocol version selects the highest protocol version supported. The higher the protocol used, the more recent the version of @@ -198,64 +198,66 @@ process more convenient: interface. If *fix_imports* is true and *protocol* is less than 3, pickle will try to - map the new Python 3.x names to the old module names used in Python 2.x, - so that the pickle data stream is readable with Python 2.x. + map the new Python 3 names to the old module names used in Python 2, so + that the pickle data stream is readable with Python 2. .. function:: dumps(obj, protocol=None, \*, fix_imports=True) - Return the pickled representation of the object as a :class:`bytes` - object, instead of writing it to a file. + Return the pickled representation of the object as a :class:`bytes` object, + instead of writing it to a file. - The optional *protocol* argument tells the pickler to use the given protocol; - supported protocols are 0, 1, 2, 3. The default protocol is 3; a - backward-incompatible protocol designed for Python 3.0. + The optional *protocol* argument tells the pickler to use the given + protocol; supported protocols are 0, 1, 2, 3 and 4. The default protocol + is 3; a backward-incompatible protocol designed for Python 3. Specifying a negative protocol version selects the highest protocol version supported. The higher the protocol used, the more recent the version of Python needed to read the pickle produced. If *fix_imports* is true and *protocol* is less than 3, pickle will try to - map the new Python 3.x names to the old module names used in Python 2.x, - so that the pickle data stream is readable with Python 2.x. + map the new Python 3 names to the old module names used in Python 2, so + that the pickle data stream is readable with Python 2. .. function:: load(file, \*, fix_imports=True, encoding="ASCII", errors="strict") - Read a pickled object representation from the open :term:`file object` *file* - and return the reconstituted object hierarchy specified therein. This is - equivalent to ``Unpickler(file).load()``. + Read a pickled object representation from the open :term:`file object` + *file* and return the reconstituted object hierarchy specified therein. + This is equivalent to ``Unpickler(file).load()``. - The protocol version of the pickle is detected automatically, so no protocol - argument is needed. Bytes past the pickled object's representation are - ignored. + The protocol version of the pickle is detected automatically, so no + protocol argument is needed. Bytes past the pickled object's + representation are ignored. The argument *file* must have two methods, a read() method that takes an integer argument, and a readline() method that requires no arguments. Both - methods should return bytes. Thus *file* can be an on-disk file opened - for binary reading, a :class:`io.BytesIO` object, or any other custom object + methods should return bytes. Thus *file* can be an on-disk file opened for + binary reading, a :class:`io.BytesIO` object, or any other custom object that meets this interface. Optional keyword arguments are *fix_imports*, *encoding* and *errors*, which are used to control compatibility support for pickle stream generated - by Python 2.x. If *fix_imports* is true, pickle will try to map the old - Python 2.x names to the new names used in Python 3.x. The *encoding* and + by Python 2. If *fix_imports* is true, pickle will try to map the old + Python 2 names to the new names used in Python 3. The *encoding* and *errors* tell pickle how to decode 8-bit string instances pickled by Python - 2.x; these default to 'ASCII' and 'strict', respectively. + 2; these default to 'ASCII' and 'strict', respectively. The *encoding* can + be 'bytes' to read these 8-bit string instances as bytes objects. .. function:: loads(bytes_object, \*, fix_imports=True, encoding="ASCII", errors="strict") Read a pickled object hierarchy from a :class:`bytes` object and return the reconstituted object hierarchy specified therein - The protocol version of the pickle is detected automatically, so no protocol - argument is needed. Bytes past the pickled object's representation are - ignored. + The protocol version of the pickle is detected automatically, so no + protocol argument is needed. Bytes past the pickled object's + representation are ignored. Optional keyword arguments are *fix_imports*, *encoding* and *errors*, which are used to control compatibility support for pickle stream generated - by Python 2.x. If *fix_imports* is true, pickle will try to map the old - Python 2.x names to the new names used in Python 3.x. The *encoding* and + by Python 2. If *fix_imports* is true, pickle will try to map the old + Python 2 names to the new names used in Python 3. The *encoding* and *errors* tell pickle how to decode 8-bit string instances pickled by Python - 2.x; these default to 'ASCII' and 'strict', respectively. + 2; these default to 'ASCII' and 'strict', respectively. The *encoding* can + be 'bytes' to read these 8-bit string instances as bytes objects. The :mod:`pickle` module defines three exceptions: @@ -290,9 +292,9 @@ The :mod:`pickle` module exports two classes, :class:`Pickler` and This takes a binary file for writing a pickle data stream. - The optional *protocol* argument tells the pickler to use the given protocol; - supported protocols are 0, 1, 2, 3. The default protocol is 3; a - backward-incompatible protocol designed for Python 3.0. + The optional *protocol* argument tells the pickler to use the given + protocol; supported protocols are 0, 1, 2, 3 and 4. The default protocol + is 3; a backward-incompatible protocol designed for Python 3. Specifying a negative protocol version selects the highest protocol version supported. The higher the protocol used, the more recent the version of @@ -300,11 +302,12 @@ The :mod:`pickle` module exports two classes, :class:`Pickler` and The *file* argument must have a write() method that accepts a single bytes argument. It can thus be an on-disk file opened for binary writing, a - :class:`io.BytesIO` instance, or any other custom object that meets this interface. + :class:`io.BytesIO` instance, or any other custom object that meets this + interface. If *fix_imports* is true and *protocol* is less than 3, pickle will try to - map the new Python 3.x names to the old module names used in Python 2.x, - so that the pickle data stream is readable with Python 2.x. + map the new Python 3 names to the old module names used in Python 2, so + that the pickle data stream is readable with Python 2. .. method:: dump(obj) @@ -366,16 +369,17 @@ The :mod:`pickle` module exports two classes, :class:`Pickler` and The argument *file* must have two methods, a read() method that takes an integer argument, and a readline() method that requires no arguments. Both - methods should return bytes. Thus *file* can be an on-disk file object opened - for binary reading, a :class:`io.BytesIO` object, or any other custom object - that meets this interface. + methods should return bytes. Thus *file* can be an on-disk file object + opened for binary reading, a :class:`io.BytesIO` object, or any other + custom object that meets this interface. Optional keyword arguments are *fix_imports*, *encoding* and *errors*, which are used to control compatibility support for pickle stream generated - by Python 2.x. If *fix_imports* is true, pickle will try to map the old - Python 2.x names to the new names used in Python 3.x. The *encoding* and + by Python 2. If *fix_imports* is true, pickle will try to map the old + Python 2 names to the new names used in Python 3. The *encoding* and *errors* tell pickle how to decode 8-bit string instances pickled by Python - 2.x; these default to 'ASCII' and 'strict', respectively. + 2; these default to 'ASCII' and 'strict', respectively. The *encoding* can + be 'bytes' to read these ß8-bit string instances as bytes objects. .. method:: load() diff --git a/Lib/pickle.py b/Lib/pickle.py index c57149a3935..9cd0132a188 100644 --- a/Lib/pickle.py +++ b/Lib/pickle.py @@ -348,24 +348,25 @@ class _Pickler: def __init__(self, file, protocol=None, *, fix_imports=True): """This takes a binary file for writing a pickle data stream. - The optional protocol argument tells the pickler to use the + The optional *protocol* argument tells the pickler to use the given protocol; supported protocols are 0, 1, 2, 3 and 4. The - default protocol is 3; a backward-incompatible protocol designed for - Python 3. + default protocol is 3; a backward-incompatible protocol designed + for Python 3. Specifying a negative protocol version selects the highest protocol version supported. The higher the protocol used, the more recent the version of Python needed to read the pickle produced. - The file argument must have a write() method that accepts a single - bytes argument. It can thus be a file object opened for binary - writing, a io.BytesIO instance, or any other custom object that - meets this interface. + The *file* argument must have a write() method that accepts a + single bytes argument. It can thus be a file object opened for + binary writing, a io.BytesIO instance, or any other custom + object that meets this interface. - If fix_imports is True and protocol is less than 3, pickle will try to - map the new Python 3 names to the old module names used in Python 2, - so that the pickle data stream is readable with Python 2. + If *fix_imports* is True and *protocol* is less than 3, pickle + will try to map the new Python 3 names to the old module names + used in Python 2, so that the pickle data stream is readable + with Python 2. """ if protocol is None: protocol = DEFAULT_PROTOCOL @@ -389,10 +390,9 @@ class _Pickler: """Clears the pickler's "memo". The memo is the data structure that remembers which objects the - pickler has already seen, so that shared or recursive objects are - pickled by reference and not by value. This method is useful when - re-using picklers. - + pickler has already seen, so that shared or recursive objects + are pickled by reference and not by value. This method is + useful when re-using picklers. """ self.memo.clear() @@ -975,8 +975,14 @@ class _Unpickler: encoding="ASCII", errors="strict"): """This takes a binary file for reading a pickle data stream. - The protocol version of the pickle is detected automatically, so no - proto argument is needed. + The protocol version of the pickle is detected automatically, so + no proto argument is needed. + + The argument *file* must have two methods, a read() method that + takes an integer argument, and a readline() method that requires + no arguments. Both methods should return bytes. Thus *file* + can be a binary file object opened for reading, a io.BytesIO + object, or any other custom object that meets this interface. The file-like object must have two methods, a read() method that takes an integer argument, and a readline() method that @@ -985,13 +991,14 @@ class _Unpickler: reading, a BytesIO object, or any other custom object that meets this interface. - Optional keyword arguments are *fix_imports*, *encoding* and *errors*, - which are used to control compatiblity support for pickle stream - generated by Python 2.x. If *fix_imports* is True, pickle will try to - map the old Python 2.x names to the new names used in Python 3.x. The - *encoding* and *errors* tell pickle how to decode 8-bit string - instances pickled by Python 2.x; these default to 'ASCII' and - 'strict', respectively. + Optional keyword arguments are *fix_imports*, *encoding* and + *errors*, which are used to control compatiblity support for + pickle stream generated by Python 2. If *fix_imports* is True, + pickle will try to map the old Python 2 names to the new names + used in Python 3. The *encoding* and *errors* tell pickle how + to decode 8-bit string instances pickled by Python 2; these + default to 'ASCII' and 'strict', respectively. *encoding* can be + 'bytes' to read theses 8-bit string instances as bytes objects. """ self._file_readline = file.readline self._file_read = file.read @@ -1139,6 +1146,15 @@ class _Unpickler: self.append(unpack('>d', self.read(8))[0]) dispatch[BINFLOAT[0]] = load_binfloat + def _decode_string(self, value): + # Used to allow strings from Python 2 to be decoded either as + # bytes or Unicode strings. This should be used only with the + # STRING, BINSTRING and SHORT_BINSTRING opcodes. + if self.encoding == "bytes": + return value + else: + return value.decode(self.encoding, self.errors) + def load_string(self): data = self.readline()[:-1] # Strip outermost quotes @@ -1146,8 +1162,7 @@ class _Unpickler: data = data[1:-1] else: raise UnpicklingError("the STRING opcode argument must be quoted") - self.append(codecs.escape_decode(data)[0] - .decode(self.encoding, self.errors)) + self.append(self._decode_string(codecs.escape_decode(data)[0])) dispatch[STRING[0]] = load_string def load_binstring(self): @@ -1156,8 +1171,7 @@ class _Unpickler: if len < 0: raise UnpicklingError("BINSTRING pickle has negative byte count") data = self.read(len) - value = str(data, self.encoding, self.errors) - self.append(value) + self.append(self._decode_string(data)) dispatch[BINSTRING[0]] = load_binstring def load_binbytes(self): @@ -1191,8 +1205,7 @@ class _Unpickler: def load_short_binstring(self): len = self.read(1)[0] data = self.read(len) - value = str(data, self.encoding, self.errors) - self.append(value) + self.append(self._decode_string(data)) dispatch[SHORT_BINSTRING[0]] = load_short_binstring def load_short_binbytes(self): diff --git a/Lib/pickletools.py b/Lib/pickletools.py index a2480f6510a..71c2aa1c79e 100644 --- a/Lib/pickletools.py +++ b/Lib/pickletools.py @@ -969,113 +969,107 @@ class StackObject(object): return self.name -pyint = StackObject( - name='int', - obtype=int, - doc="A short (as opposed to long) Python integer object.") - -pylong = StackObject( - name='long', - obtype=int, - doc="A long (as opposed to short) Python integer object.") +pyint = pylong = StackObject( + name='int', + obtype=int, + doc="A Python integer object.") pyinteger_or_bool = StackObject( - name='int_or_bool', - obtype=(int, bool), - doc="A Python integer object (short or long), or " - "a Python bool.") + name='int_or_bool', + obtype=(int, bool), + doc="A Python integer or boolean object.") pybool = StackObject( - name='bool', - obtype=(bool,), - doc="A Python bool object.") + name='bool', + obtype=bool, + doc="A Python boolean object.") pyfloat = StackObject( - name='float', - obtype=float, - doc="A Python float object.") + name='float', + obtype=float, + doc="A Python float object.") -pystring = StackObject( - name='string', - obtype=bytes, - doc="A Python (8-bit) string object.") +pybytes_or_str = pystring = StackObject( + name='bytes_or_str', + obtype=(bytes, str), + doc="A Python bytes or (Unicode) string object.") pybytes = StackObject( - name='bytes', - obtype=bytes, - doc="A Python bytes object.") + name='bytes', + obtype=bytes, + doc="A Python bytes object.") pyunicode = StackObject( - name='str', - obtype=str, - doc="A Python (Unicode) string object.") + name='str', + obtype=str, + doc="A Python (Unicode) string object.") pynone = StackObject( - name="None", - obtype=type(None), - doc="The Python None object.") + name="None", + obtype=type(None), + doc="The Python None object.") pytuple = StackObject( - name="tuple", - obtype=tuple, - doc="A Python tuple object.") + name="tuple", + obtype=tuple, + doc="A Python tuple object.") pylist = StackObject( - name="list", - obtype=list, - doc="A Python list object.") + name="list", + obtype=list, + doc="A Python list object.") pydict = StackObject( - name="dict", - obtype=dict, - doc="A Python dict object.") + name="dict", + obtype=dict, + doc="A Python dict object.") pyset = StackObject( - name="set", - obtype=set, - doc="A Python set object.") + name="set", + obtype=set, + doc="A Python set object.") pyfrozenset = StackObject( - name="frozenset", - obtype=set, - doc="A Python frozenset object.") + name="frozenset", + obtype=set, + doc="A Python frozenset object.") anyobject = StackObject( - name='any', - obtype=object, - doc="Any kind of object whatsoever.") + name='any', + obtype=object, + doc="Any kind of object whatsoever.") markobject = StackObject( - name="mark", - obtype=StackObject, - doc="""'The mark' is a unique object. + name="mark", + obtype=StackObject, + doc="""'The mark' is a unique object. - Opcodes that operate on a variable number of objects - generally don't embed the count of objects in the opcode, - or pull it off the stack. Instead the MARK opcode is used - to push a special marker object on the stack, and then - some other opcodes grab all the objects from the top of - the stack down to (but not including) the topmost marker - object. - """) +Opcodes that operate on a variable number of objects +generally don't embed the count of objects in the opcode, +or pull it off the stack. Instead the MARK opcode is used +to push a special marker object on the stack, and then +some other opcodes grab all the objects from the top of +the stack down to (but not including) the topmost marker +object. +""") stackslice = StackObject( - name="stackslice", - obtype=StackObject, - doc="""An object representing a contiguous slice of the stack. + name="stackslice", + obtype=StackObject, + doc="""An object representing a contiguous slice of the stack. - This is used in conjunction with markobject, to represent all - of the stack following the topmost markobject. For example, - the POP_MARK opcode changes the stack from +This is used in conjunction with markobject, to represent all +of the stack following the topmost markobject. For example, +the POP_MARK opcode changes the stack from - [..., markobject, stackslice] - to - [...] + [..., markobject, stackslice] +to + [...] - No matter how many object are on the stack after the topmost - markobject, POP_MARK gets rid of all of them (including the - topmost markobject too). - """) +No matter how many object are on the stack after the topmost +markobject, POP_MARK gets rid of all of them (including the +topmost markobject too). +""") ############################################################################## # Descriptors for pickle opcodes. @@ -1212,7 +1206,7 @@ opcodes = [ code='L', arg=decimalnl_long, stack_before=[], - stack_after=[pylong], + stack_after=[pyint], proto=0, doc="""Push a long integer. @@ -1230,7 +1224,7 @@ opcodes = [ code='\x8a', arg=long1, stack_before=[], - stack_after=[pylong], + stack_after=[pyint], proto=2, doc="""Long integer using one-byte length. @@ -1241,7 +1235,7 @@ opcodes = [ code='\x8b', arg=long4, stack_before=[], - stack_after=[pylong], + stack_after=[pyint], proto=2, doc="""Long integer using found-byte length. @@ -1254,45 +1248,50 @@ opcodes = [ code='S', arg=stringnl, stack_before=[], - stack_after=[pystring], + stack_after=[pybytes_or_str], proto=0, doc="""Push a Python string object. The argument is a repr-style string, with bracketing quote characters, and perhaps embedded escapes. The argument extends until the next - newline character. (Actually, they are decoded into a str instance + newline character. These are usually decoded into a str instance using the encoding given to the Unpickler constructor. or the default, - 'ASCII'.) + 'ASCII'. If the encoding given was 'bytes' however, they will be + decoded as bytes object instead. """), I(name='BINSTRING', code='T', arg=string4, stack_before=[], - stack_after=[pystring], + stack_after=[pybytes_or_str], proto=1, doc="""Push a Python string object. - There are two arguments: the first is a 4-byte little-endian signed int - giving the number of bytes in the string, and the second is that many - bytes, which are taken literally as the string content. (Actually, - they are decoded into a str instance using the encoding given to the - Unpickler constructor. or the default, 'ASCII'.) + There are two arguments: the first is a 4-byte little-endian + signed int giving the number of bytes in the string, and the + second is that many bytes, which are taken literally as the string + content. These are usually decoded into a str instance using the + encoding given to the Unpickler constructor. or the default, + 'ASCII'. If the encoding given was 'bytes' however, they will be + decoded as bytes object instead. """), I(name='SHORT_BINSTRING', code='U', arg=string1, stack_before=[], - stack_after=[pystring], + stack_after=[pybytes_or_str], proto=1, doc="""Push a Python string object. - There are two arguments: the first is a 1-byte unsigned int giving - the number of bytes in the string, and the second is that many bytes, - which are taken literally as the string content. (Actually, they - are decoded into a str instance using the encoding given to the - Unpickler constructor. or the default, 'ASCII'.) + There are two arguments: the first is a 1-byte unsigned int giving + the number of bytes in the string, and the second is that many + bytes, which are taken literally as the string content. These are + usually decoded into a str instance using the encoding given to + the Unpickler constructor. or the default, 'ASCII'. If the + encoding given was 'bytes' however, they will be decoded as bytes + object instead. """), # Bytes (protocol 3 only; older protocols don't support bytes at all) diff --git a/Lib/test/pickletester.py b/Lib/test/pickletester.py index 040c26f2577..05befbf4254 100644 --- a/Lib/test/pickletester.py +++ b/Lib/test/pickletester.py @@ -1305,6 +1305,35 @@ class AbstractPickleTests(unittest.TestCase): dumped = self.dumps(set([3]), 2) self.assertEqual(dumped, DATA6) + def test_load_python2_str_as_bytes(self): + # From Python 2: pickle.dumps('a\x00\xa0', protocol=0) + self.assertEqual(self.loads(b"S'a\\x00\\xa0'\n.", + encoding="bytes"), b'a\x00\xa0') + # From Python 2: pickle.dumps('a\x00\xa0', protocol=1) + self.assertEqual(self.loads(b'U\x03a\x00\xa0.', + encoding="bytes"), b'a\x00\xa0') + # From Python 2: pickle.dumps('a\x00\xa0', protocol=2) + self.assertEqual(self.loads(b'\x80\x02U\x03a\x00\xa0.', + encoding="bytes"), b'a\x00\xa0') + + def test_load_python2_unicode_as_str(self): + # From Python 2: pickle.dumps(u'π', protocol=0) + self.assertEqual(self.loads(b'V\\u03c0\n.', + encoding='bytes'), 'π') + # From Python 2: pickle.dumps(u'π', protocol=1) + self.assertEqual(self.loads(b'X\x02\x00\x00\x00\xcf\x80.', + encoding="bytes"), 'π') + # From Python 2: pickle.dumps(u'π', protocol=2) + self.assertEqual(self.loads(b'\x80\x02X\x02\x00\x00\x00\xcf\x80.', + encoding="bytes"), 'π') + + def test_load_long_python2_str_as_bytes(self): + # From Python 2: pickle.dumps('x' * 300, protocol=1) + self.assertEqual(self.loads(pickle.BINSTRING + + struct.pack("encoding, self->errors); - Py_DECREF(bytes); - if (str == NULL) + + /* Leave the Python 2.x strings as bytes if the *encoding* given to the + Unpickler was 'bytes'. Otherwise, convert them to unicode. */ + if (strcmp(self->encoding, "bytes") == 0) { + obj = bytes; + } + else { + obj = PyUnicode_FromEncodedObject(bytes, self->encoding, self->errors); + Py_DECREF(bytes); + if (obj == NULL) { + return -1; + } + } + + PDATA_PUSH(self->stack, obj, -1); + return 0; +} + +static int +load_counted_binstring(UnpicklerObject *self, int nbytes) +{ + PyObject *obj; + Py_ssize_t size; + char *s; + + if (_Unpickler_Read(self, &s, nbytes) < 0) return -1; - PDATA_PUSH(self->stack, str, -1); + size = calc_binsize(s, nbytes); + if (size < 0) { + PickleState *st = _Pickle_GetGlobalState(); + PyErr_Format(st->UnpicklingError, + "BINSTRING exceeds system's maximum size of %zd bytes", + PY_SSIZE_T_MAX); + return -1; + } + + if (_Unpickler_Read(self, &s, size) < 0) + return -1; + + /* Convert Python 2.x strings to bytes if the *encoding* given to the + Unpickler was 'bytes'. Otherwise, convert them to unicode. */ + if (strcmp(self->encoding, "bytes") == 0) { + obj = PyBytes_FromStringAndSize(s, size); + } + else { + obj = PyUnicode_Decode(s, size, self->encoding, self->errors); + } + if (obj == NULL) { + return -1; + } + + PDATA_PUSH(self->stack, obj, -1); return 0; } @@ -4895,36 +4938,6 @@ load_counted_binbytes(UnpicklerObject *self, int nbytes) return 0; } -static int -load_counted_binstring(UnpicklerObject *self, int nbytes) -{ - PyObject *str; - Py_ssize_t size; - char *s; - - if (_Unpickler_Read(self, &s, nbytes) < 0) - return -1; - - size = calc_binsize(s, nbytes); - if (size < 0) { - PickleState *st = _Pickle_GetGlobalState(); - PyErr_Format(st->UnpicklingError, - "BINSTRING exceeds system's maximum size of %zd bytes", - PY_SSIZE_T_MAX); - return -1; - } - - if (_Unpickler_Read(self, &s, size) < 0) - return -1; - /* Convert Python 2.x strings to unicode. */ - str = PyUnicode_Decode(s, size, self->encoding, self->errors); - if (str == NULL) - return -1; - - PDATA_PUSH(self->stack, str, -1); - return 0; -} - static int load_unicode(UnpicklerObject *self) { @@ -6258,25 +6271,25 @@ _pickle.Unpickler.load Load a pickle. -Read a pickled object representation from the open file object given in -the constructor, and return the reconstituted object hierarchy specified -therein. +Read a pickled object representation from the open file object given +in the constructor, and return the reconstituted object hierarchy +specified therein. [clinic]*/ PyDoc_STRVAR(_pickle_Unpickler_load__doc__, "load()\n" "Load a pickle.\n" "\n" -"Read a pickled object representation from the open file object given in\n" -"the constructor, and return the reconstituted object hierarchy specified\n" -"therein."); +"Read a pickled object representation from the open file object given\n" +"in the constructor, and return the reconstituted object hierarchy\n" +"specified therein."); #define _PICKLE_UNPICKLER_LOAD_METHODDEF \ {"load", (PyCFunction)_pickle_Unpickler_load, METH_NOARGS, _pickle_Unpickler_load__doc__}, static PyObject * _pickle_Unpickler_load(PyObject *self) -/*[clinic checksum: 9a30ba4e4d9221d4dcd705e1471ab11b2c9e3ac6]*/ +/*[clinic checksum: c2ae1263f0dd000f34ccf0fe59d7c544464babc4]*/ { UnpicklerObject *unpickler = (UnpicklerObject*)self; @@ -6310,8 +6323,9 @@ _pickle.Unpickler.find_class Return an object from a specified module. -If necessary, the module will be imported. Subclasses may override this -method (e.g. to restrict unpickling of arbitrary classes and functions). +If necessary, the module will be imported. Subclasses may override +this method (e.g. to restrict unpickling of arbitrary classes and +functions). This method is called whenever a class or a function object is needed. Both arguments passed are str objects. @@ -6321,8 +6335,9 @@ PyDoc_STRVAR(_pickle_Unpickler_find_class__doc__, "find_class(module_name, global_name)\n" "Return an object from a specified module.\n" "\n" -"If necessary, the module will be imported. Subclasses may override this\n" -"method (e.g. to restrict unpickling of arbitrary classes and functions).\n" +"If necessary, the module will be imported. Subclasses may override\n" +"this method (e.g. to restrict unpickling of arbitrary classes and\n" +"functions).\n" "\n" "This method is called whenever a class or a function object is\n" "needed. Both arguments passed are str objects."); @@ -6352,7 +6367,7 @@ exit: static PyObject * _pickle_Unpickler_find_class_impl(UnpicklerObject *self, PyObject *module_name, PyObject *global_name) -/*[clinic checksum: b7d05d4dd8adc698e5780c1ac2be0f5062d33915]*/ +/*[clinic checksum: 1f353d13a32c9d94feb1466b3c2d0529a7e5650e]*/ { PyObject *global; PyObject *modules_dict; @@ -6515,23 +6530,23 @@ _pickle.Unpickler.__init__ This takes a binary file for reading a pickle data stream. The protocol version of the pickle is detected automatically, so no -proto argument is needed. +protocol argument is needed. Bytes past the pickled object's +representation are ignored. -The file-like object must have two methods, a read() method -that takes an integer argument, and a readline() method that -requires no arguments. Both methods should return bytes. -Thus file-like object can be a binary file object opened for -reading, a BytesIO object, or any other custom object that -meets this interface. +The argument *file* must have two methods, a read() method that takes +an integer argument, and a readline() method that requires no +arguments. Both methods should return bytes. Thus *file* can be a +binary file object opened for reading, a io.BytesIO object, or any +other custom object that meets this interface. Optional keyword arguments are *fix_imports*, *encoding* and *errors*, which are used to control compatiblity support for pickle stream -generated by Python 2.x. If *fix_imports* is True, pickle will try to -map the old Python 2.x names to the new names used in Python 3.x. The +generated by Python 2. If *fix_imports* is True, pickle will try to +map the old Python 2 names to the new names used in Python 3. The *encoding* and *errors* tell pickle how to decode 8-bit string -instances pickled by Python 2.x; these default to 'ASCII' and -'strict', respectively. - +instances pickled by Python 2; these default to 'ASCII' and 'strict', +respectively. The *encoding* can be 'bytes' to read these 8-bit +string instances as bytes objects. [clinic]*/ PyDoc_STRVAR(_pickle_Unpickler___init____doc__, @@ -6539,22 +6554,23 @@ PyDoc_STRVAR(_pickle_Unpickler___init____doc__, "This takes a binary file for reading a pickle data stream.\n" "\n" "The protocol version of the pickle is detected automatically, so no\n" -"proto argument is needed.\n" +"protocol argument is needed. Bytes past the pickled object\'s\n" +"representation are ignored.\n" "\n" -"The file-like object must have two methods, a read() method\n" -"that takes an integer argument, and a readline() method that\n" -"requires no arguments. Both methods should return bytes.\n" -"Thus file-like object can be a binary file object opened for\n" -"reading, a BytesIO object, or any other custom object that\n" -"meets this interface.\n" +"The argument *file* must have two methods, a read() method that takes\n" +"an integer argument, and a readline() method that requires no\n" +"arguments. Both methods should return bytes. Thus *file* can be a\n" +"binary file object opened for reading, a io.BytesIO object, or any\n" +"other custom object that meets this interface.\n" "\n" "Optional keyword arguments are *fix_imports*, *encoding* and *errors*,\n" "which are used to control compatiblity support for pickle stream\n" -"generated by Python 2.x. If *fix_imports* is True, pickle will try to\n" -"map the old Python 2.x names to the new names used in Python 3.x. The\n" +"generated by Python 2. If *fix_imports* is True, pickle will try to\n" +"map the old Python 2 names to the new names used in Python 3. The\n" "*encoding* and *errors* tell pickle how to decode 8-bit string\n" -"instances pickled by Python 2.x; these default to \'ASCII\' and\n" -"\'strict\', respectively."); +"instances pickled by Python 2; these default to \'ASCII\' and \'strict\',\n" +"respectively. The *encoding* can be \'bytes\' to read these 8-bit\n" +"string instances as bytes objects."); #define _PICKLE_UNPICKLER___INIT___METHODDEF \ {"__init__", (PyCFunction)_pickle_Unpickler___init__, METH_VARARGS|METH_KEYWORDS, _pickle_Unpickler___init____doc__}, @@ -6584,7 +6600,7 @@ exit: static PyObject * _pickle_Unpickler___init___impl(UnpicklerObject *self, PyObject *file, int fix_imports, const char *encoding, const char *errors) -/*[clinic checksum: bed0d8bbe1c647960ccc6f997b33bf33935fa56f]*/ +/*[clinic checksum: 9ce6783224e220573d42a94fe1bb7199d6f1c5a6]*/ { _Py_IDENTIFIER(persistent_load); @@ -7033,48 +7049,50 @@ _pickle.dump Write a pickled representation of obj to the open file object file. -This is equivalent to ``Pickler(file, protocol).dump(obj)``, but may be more -efficient. +This is equivalent to ``Pickler(file, protocol).dump(obj)``, but may +be more efficient. -The optional protocol argument tells the pickler to use the given protocol -supported protocols are 0, 1, 2, 3. The default protocol is 3; a -backward-incompatible protocol designed for Python 3.0. +The optional *protocol* argument tells the pickler to use the given +protocol supported protocols are 0, 1, 2, 3 and 4. The default +protocol is 3; a backward-incompatible protocol designed for Python 3. -Specifying a negative protocol version selects the highest protocol version -supported. The higher the protocol used, the more recent the version of -Python needed to read the pickle produced. +Specifying a negative protocol version selects the highest protocol +version supported. The higher the protocol used, the more recent the +version of Python needed to read the pickle produced. -The file argument must have a write() method that accepts a single bytes -argument. It can thus be a file object opened for binary writing, a -io.BytesIO instance, or any other custom object that meets this interface. +The *file* argument must have a write() method that accepts a single +bytes argument. It can thus be a file object opened for binary +writing, a io.BytesIO instance, or any other custom object that meets +this interface. -If fix_imports is True and protocol is less than 3, pickle will try to -map the new Python 3.x names to the old module names used in Python 2.x, -so that the pickle data stream is readable with Python 2.x. +If *fix_imports* is True and protocol is less than 3, pickle will try +to map the new Python 3 names to the old module names used in Python +2, so that the pickle data stream is readable with Python 2. [clinic]*/ PyDoc_STRVAR(_pickle_dump__doc__, "dump(obj, file, protocol=None, *, fix_imports=True)\n" "Write a pickled representation of obj to the open file object file.\n" "\n" -"This is equivalent to ``Pickler(file, protocol).dump(obj)``, but may be more\n" -"efficient.\n" +"This is equivalent to ``Pickler(file, protocol).dump(obj)``, but may\n" +"be more efficient.\n" "\n" -"The optional protocol argument tells the pickler to use the given protocol\n" -"supported protocols are 0, 1, 2, 3. The default protocol is 3; a\n" -"backward-incompatible protocol designed for Python 3.0.\n" +"The optional *protocol* argument tells the pickler to use the given\n" +"protocol supported protocols are 0, 1, 2, 3 and 4. The default\n" +"protocol is 3; a backward-incompatible protocol designed for Python 3.\n" "\n" -"Specifying a negative protocol version selects the highest protocol version\n" -"supported. The higher the protocol used, the more recent the version of\n" -"Python needed to read the pickle produced.\n" +"Specifying a negative protocol version selects the highest protocol\n" +"version supported. The higher the protocol used, the more recent the\n" +"version of Python needed to read the pickle produced.\n" "\n" -"The file argument must have a write() method that accepts a single bytes\n" -"argument. It can thus be a file object opened for binary writing, a\n" -"io.BytesIO instance, or any other custom object that meets this interface.\n" +"The *file* argument must have a write() method that accepts a single\n" +"bytes argument. It can thus be a file object opened for binary\n" +"writing, a io.BytesIO instance, or any other custom object that meets\n" +"this interface.\n" "\n" -"If fix_imports is True and protocol is less than 3, pickle will try to\n" -"map the new Python 3.x names to the old module names used in Python 2.x,\n" -"so that the pickle data stream is readable with Python 2.x."); +"If *fix_imports* is True and protocol is less than 3, pickle will try\n" +"to map the new Python 3 names to the old module names used in Python\n" +"2, so that the pickle data stream is readable with Python 2."); #define _PICKLE_DUMP_METHODDEF \ {"dump", (PyCFunction)_pickle_dump, METH_VARARGS|METH_KEYWORDS, _pickle_dump__doc__}, @@ -7104,7 +7122,7 @@ exit: static PyObject * _pickle_dump_impl(PyModuleDef *module, PyObject *obj, PyObject *file, PyObject *protocol, int fix_imports) -/*[clinic checksum: e442721b16052d921b5e3fbd146d0a62e94a459e]*/ +/*[clinic checksum: eb5c23e64da34477178230b704d2cc9c6b6650ea]*/ { PicklerObject *pickler = _Pickler_New(); @@ -7142,34 +7160,34 @@ _pickle.dumps Return the pickled representation of the object as a bytes object. -The optional protocol argument tells the pickler to use the given protocol; -supported protocols are 0, 1, 2, 3. The default protocol is 3; a -backward-incompatible protocol designed for Python 3.0. +The optional *protocol* argument tells the pickler to use the given +protocol; supported protocols are 0, 1, 2, 3 and 4. The default +protocol is 3; a backward-incompatible protocol designed for Python 3. -Specifying a negative protocol version selects the highest protocol version -supported. The higher the protocol used, the more recent the version of -Python needed to read the pickle produced. +Specifying a negative protocol version selects the highest protocol +version supported. The higher the protocol used, the more recent the +version of Python needed to read the pickle produced. -If fix_imports is True and *protocol* is less than 3, pickle will try to -map the new Python 3.x names to the old module names used in Python 2.x, -so that the pickle data stream is readable with Python 2.x. +If *fix_imports* is True and *protocol* is less than 3, pickle will +try to map the new Python 3 names to the old module names used in +Python 2, so that the pickle data stream is readable with Python 2. [clinic]*/ PyDoc_STRVAR(_pickle_dumps__doc__, "dumps(obj, protocol=None, *, fix_imports=True)\n" "Return the pickled representation of the object as a bytes object.\n" "\n" -"The optional protocol argument tells the pickler to use the given protocol;\n" -"supported protocols are 0, 1, 2, 3. The default protocol is 3; a\n" -"backward-incompatible protocol designed for Python 3.0.\n" +"The optional *protocol* argument tells the pickler to use the given\n" +"protocol; supported protocols are 0, 1, 2, 3 and 4. The default\n" +"protocol is 3; a backward-incompatible protocol designed for Python 3.\n" "\n" -"Specifying a negative protocol version selects the highest protocol version\n" -"supported. The higher the protocol used, the more recent the version of\n" -"Python needed to read the pickle produced.\n" +"Specifying a negative protocol version selects the highest protocol\n" +"version supported. The higher the protocol used, the more recent the\n" +"version of Python needed to read the pickle produced.\n" "\n" -"If fix_imports is True and *protocol* is less than 3, pickle will try to\n" -"map the new Python 3.x names to the old module names used in Python 2.x,\n" -"so that the pickle data stream is readable with Python 2.x."); +"If *fix_imports* is True and *protocol* is less than 3, pickle will\n" +"try to map the new Python 3 names to the old module names used in\n" +"Python 2, so that the pickle data stream is readable with Python 2."); #define _PICKLE_DUMPS_METHODDEF \ {"dumps", (PyCFunction)_pickle_dumps, METH_VARARGS|METH_KEYWORDS, _pickle_dumps__doc__}, @@ -7198,7 +7216,7 @@ exit: static PyObject * _pickle_dumps_impl(PyModuleDef *module, PyObject *obj, PyObject *protocol, int fix_imports) -/*[clinic checksum: df6262c4c487f537f47aec8a1709318204c1e174]*/ +/*[clinic checksum: e9b915d61202a9692cb6c6718db74fe54fc9c4d1]*/ { PyObject *result; PicklerObject *pickler = _Pickler_New(); @@ -7231,50 +7249,56 @@ _pickle.load encoding: str = 'ASCII' errors: str = 'strict' -Return a reconstituted object from the pickle data stored in a file. +Read and return an object from the pickle data stored in a file. -This is equivalent to ``Unpickler(file).load()``, but may be more efficient. +This is equivalent to ``Unpickler(file).load()``, but may be more +efficient. -The protocol version of the pickle is detected automatically, so no protocol -argument is needed. Bytes past the pickled object's representation are -ignored. +The protocol version of the pickle is detected automatically, so no +protocol argument is needed. Bytes past the pickled object's +representation are ignored. -The argument file must have two methods, a read() method that takes an -integer argument, and a readline() method that requires no arguments. Both -methods should return bytes. Thus *file* can be a binary file object opened -for reading, a BytesIO object, or any other custom object that meets this -interface. +The argument *file* must have two methods, a read() method that takes +an integer argument, and a readline() method that requires no +arguments. Both methods should return bytes. Thus *file* can be a +binary file object opened for reading, a io.BytesIO object, or any +other custom object that meets this interface. -Optional keyword arguments are fix_imports, encoding and errors, -which are used to control compatiblity support for pickle stream generated -by Python 2.x. If fix_imports is True, pickle will try to map the old -Python 2.x names to the new names used in Python 3.x. The encoding and -errors tell pickle how to decode 8-bit string instances pickled by Python -2.x; these default to 'ASCII' and 'strict', respectively. +Optional keyword arguments are *fix_imports*, *encoding* and *errors*, +which are used to control compatiblity support for pickle stream +generated by Python 2. If *fix_imports* is True, pickle will try to +map the old Python 2 names to the new names used in Python 3. The +*encoding* and *errors* tell pickle how to decode 8-bit string +instances pickled by Python 2; these default to 'ASCII' and 'strict', +respectively. The *encoding* can be 'bytes' to read these 8-bit +string instances as bytes objects. [clinic]*/ PyDoc_STRVAR(_pickle_load__doc__, "load(file, *, fix_imports=True, encoding=\'ASCII\', errors=\'strict\')\n" -"Return a reconstituted object from the pickle data stored in a file.\n" +"Read and return an object from the pickle data stored in a file.\n" "\n" -"This is equivalent to ``Unpickler(file).load()``, but may be more efficient.\n" +"This is equivalent to ``Unpickler(file).load()``, but may be more\n" +"efficient.\n" "\n" -"The protocol version of the pickle is detected automatically, so no protocol\n" -"argument is needed. Bytes past the pickled object\'s representation are\n" -"ignored.\n" +"The protocol version of the pickle is detected automatically, so no\n" +"protocol argument is needed. Bytes past the pickled object\'s\n" +"representation are ignored.\n" "\n" -"The argument file must have two methods, a read() method that takes an\n" -"integer argument, and a readline() method that requires no arguments. Both\n" -"methods should return bytes. Thus *file* can be a binary file object opened\n" -"for reading, a BytesIO object, or any other custom object that meets this\n" -"interface.\n" +"The argument *file* must have two methods, a read() method that takes\n" +"an integer argument, and a readline() method that requires no\n" +"arguments. Both methods should return bytes. Thus *file* can be a\n" +"binary file object opened for reading, a io.BytesIO object, or any\n" +"other custom object that meets this interface.\n" "\n" -"Optional keyword arguments are fix_imports, encoding and errors,\n" -"which are used to control compatiblity support for pickle stream generated\n" -"by Python 2.x. If fix_imports is True, pickle will try to map the old\n" -"Python 2.x names to the new names used in Python 3.x. The encoding and\n" -"errors tell pickle how to decode 8-bit string instances pickled by Python\n" -"2.x; these default to \'ASCII\' and \'strict\', respectively."); +"Optional keyword arguments are *fix_imports*, *encoding* and *errors*,\n" +"which are used to control compatiblity support for pickle stream\n" +"generated by Python 2. If *fix_imports* is True, pickle will try to\n" +"map the old Python 2 names to the new names used in Python 3. The\n" +"*encoding* and *errors* tell pickle how to decode 8-bit string\n" +"instances pickled by Python 2; these default to \'ASCII\' and \'strict\',\n" +"respectively. The *encoding* can be \'bytes\' to read these 8-bit\n" +"string instances as bytes objects."); #define _PICKLE_LOAD_METHODDEF \ {"load", (PyCFunction)_pickle_load, METH_VARARGS|METH_KEYWORDS, _pickle_load__doc__}, @@ -7304,7 +7328,7 @@ exit: static PyObject * _pickle_load_impl(PyModuleDef *module, PyObject *file, int fix_imports, const char *encoding, const char *errors) -/*[clinic checksum: e10796f6765b22ce48dca6940f11b3933853ca35]*/ +/*[clinic checksum: b41f06970e57acf2fd602e4b7f88e3f3e1e53087]*/ { PyObject *result; UnpicklerObject *unpickler = _Unpickler_New(); @@ -7339,34 +7363,38 @@ _pickle.loads encoding: str = 'ASCII' errors: str = 'strict' -Return a reconstituted object from the given pickle data. +Read and return an object from the given pickle data. -The protocol version of the pickle is detected automatically, so no protocol -argument is needed. Bytes past the pickled object's representation are -ignored. +The protocol version of the pickle is detected automatically, so no +protocol argument is needed. Bytes past the pickled object's +representation are ignored. -Optional keyword arguments are fix_imports, encoding and errors, which -are used to control compatiblity support for pickle stream generated -by Python 2.x. If fix_imports is True, pickle will try to map the old -Python 2.x names to the new names used in Python 3.x. The encoding and -errors tell pickle how to decode 8-bit string instances pickled by Python -2.x; these default to 'ASCII' and 'strict', respectively. +Optional keyword arguments are *fix_imports*, *encoding* and *errors*, +which are used to control compatiblity support for pickle stream +generated by Python 2. If *fix_imports* is True, pickle will try to +map the old Python 2 names to the new names used in Python 3. The +*encoding* and *errors* tell pickle how to decode 8-bit string +instances pickled by Python 2; these default to 'ASCII' and 'strict', +respectively. The *encoding* can be 'bytes' to read these 8-bit +string instances as bytes objects. [clinic]*/ PyDoc_STRVAR(_pickle_loads__doc__, "loads(data, *, fix_imports=True, encoding=\'ASCII\', errors=\'strict\')\n" -"Return a reconstituted object from the given pickle data.\n" +"Read and return an object from the given pickle data.\n" "\n" -"The protocol version of the pickle is detected automatically, so no protocol\n" -"argument is needed. Bytes past the pickled object\'s representation are\n" -"ignored.\n" +"The protocol version of the pickle is detected automatically, so no\n" +"protocol argument is needed. Bytes past the pickled object\'s\n" +"representation are ignored.\n" "\n" -"Optional keyword arguments are fix_imports, encoding and errors, which\n" -"are used to control compatiblity support for pickle stream generated\n" -"by Python 2.x. If fix_imports is True, pickle will try to map the old\n" -"Python 2.x names to the new names used in Python 3.x. The encoding and\n" -"errors tell pickle how to decode 8-bit string instances pickled by Python\n" -"2.x; these default to \'ASCII\' and \'strict\', respectively."); +"Optional keyword arguments are *fix_imports*, *encoding* and *errors*,\n" +"which are used to control compatiblity support for pickle stream\n" +"generated by Python 2. If *fix_imports* is True, pickle will try to\n" +"map the old Python 2 names to the new names used in Python 3. The\n" +"*encoding* and *errors* tell pickle how to decode 8-bit string\n" +"instances pickled by Python 2; these default to \'ASCII\' and \'strict\',\n" +"respectively. The *encoding* can be \'bytes\' to read these 8-bit\n" +"string instances as bytes objects."); #define _PICKLE_LOADS_METHODDEF \ {"loads", (PyCFunction)_pickle_loads, METH_VARARGS|METH_KEYWORDS, _pickle_loads__doc__}, @@ -7396,7 +7424,7 @@ exit: static PyObject * _pickle_loads_impl(PyModuleDef *module, PyObject *data, int fix_imports, const char *encoding, const char *errors) -/*[clinic checksum: 29ee725efcbf51a3533c19cb8261a8e267b7080a]*/ +/*[clinic checksum: 0663de43aca6c21508a777e29d98c9c3a6e7f72d]*/ { PyObject *result; UnpicklerObject *unpickler = _Unpickler_New();