2007-08-15 11:28:22 -03:00
|
|
|
:mod:`bz2` --- Compression compatible with :program:`bzip2`
|
|
|
|
===========================================================
|
|
|
|
|
|
|
|
.. module:: bz2
|
2009-04-05 19:20:44 -03:00
|
|
|
:synopsis: Interface to compression and decompression routines
|
|
|
|
compatible with bzip2.
|
2007-08-15 11:28:22 -03:00
|
|
|
.. moduleauthor:: Gustavo Niemeyer <niemeyer@conectiva.com>
|
|
|
|
.. sectionauthor:: Gustavo Niemeyer <niemeyer@conectiva.com>
|
|
|
|
|
|
|
|
|
|
|
|
This module provides a comprehensive interface for the bz2 compression library.
|
|
|
|
It implements a complete file interface, one-shot (de)compression functions, and
|
|
|
|
types for sequential (de)compression.
|
|
|
|
|
Merged revisions 58817-58861 via svnmerge from
svn+ssh://pythondev@svn.python.org/python/trunk
........
r58822 | brett.cannon | 2007-11-02 23:47:02 -0700 (Fri, 02 Nov 2007) | 2 lines
Add a missing quotation mark.
........
r58840 | skip.montanaro | 2007-11-04 07:56:52 -0800 (Sun, 04 Nov 2007) | 2 lines
Note change to get_dialect semantics in 2.5. Will backport to 2.5.
........
r58844 | georg.brandl | 2007-11-04 09:43:49 -0800 (Sun, 04 Nov 2007) | 2 lines
Fix syntax for versionchanged markup.
........
r58850 | gregory.p.smith | 2007-11-04 18:32:26 -0800 (Sun, 04 Nov 2007) | 9 lines
Fixes bug 477182 on pybsddb.sf.net. DB objects now load the flags and
pay attention to them when opening an existing database. This means
that d[] behaves properly even on databases previously created with DB_DUP
or DB_DUPSORT flags to allow duplicate keys.
http://sourceforge.net/tracker/index.php?func=detail&aid=477182&group_id=13900&atid=113900
Do not backport, this bugfix could be considered an API change.
........
r58851 | gregory.p.smith | 2007-11-04 18:56:31 -0800 (Sun, 04 Nov 2007) | 3 lines
Add the bsddb.db.DBEnv.lock_id_free method.
Improve test_lock's tempdir creation and cleanup.
........
r58852 | gregory.p.smith | 2007-11-05 01:06:28 -0800 (Mon, 05 Nov 2007) | 3 lines
* db->get_types is only available in BerkeleyDB >= 4.2
* get compiling with older versions of python again for a stand alone release.
........
r58853 | gregory.p.smith | 2007-11-05 01:07:40 -0800 (Mon, 05 Nov 2007) | 2 lines
* db->get_flags is only available in BerkeleyDB >= 4.2
........
r58854 | mark.summerfield | 2007-11-05 01:22:48 -0800 (Mon, 05 Nov 2007) | 3 lines
Added cross-references between the various archive file formats.
........
r58857 | mark.summerfield | 2007-11-05 06:38:50 -0800 (Mon, 05 Nov 2007) | 5 lines
Clarified the fact that you can have comments for individual archive
members even though comments to the archive itself aren't currently
supported.
........
2007-11-05 15:43:04 -04:00
|
|
|
Here is a summary of the features offered by the bz2 module:
|
2007-08-15 11:28:22 -03:00
|
|
|
|
|
|
|
* :class:`BZ2File` class implements a complete file interface, including
|
2010-03-12 20:26:04 -04:00
|
|
|
:meth:`~BZ2File.readline`, :meth:`~BZ2File.readlines`,
|
|
|
|
:meth:`~BZ2File.writelines`, :meth:`~BZ2File.seek`, etc;
|
2007-08-15 11:28:22 -03:00
|
|
|
|
2010-03-12 20:26:04 -04:00
|
|
|
* :class:`BZ2File` class implements emulated :meth:`~BZ2File.seek` support;
|
2007-08-15 11:28:22 -03:00
|
|
|
|
|
|
|
* :class:`BZ2File` class implements universal newline support;
|
|
|
|
|
2010-09-15 08:11:28 -03:00
|
|
|
* :class:`BZ2File` class offers an optimized line iteration using a readahead
|
|
|
|
algorithm;
|
2007-08-15 11:28:22 -03:00
|
|
|
|
|
|
|
* Sequential (de)compression supported by :class:`BZ2Compressor` and
|
|
|
|
:class:`BZ2Decompressor` classes;
|
|
|
|
|
|
|
|
* One-shot (de)compression supported by :func:`compress` and :func:`decompress`
|
|
|
|
functions;
|
|
|
|
|
Merged revisions 58817-58861 via svnmerge from
svn+ssh://pythondev@svn.python.org/python/trunk
........
r58822 | brett.cannon | 2007-11-02 23:47:02 -0700 (Fri, 02 Nov 2007) | 2 lines
Add a missing quotation mark.
........
r58840 | skip.montanaro | 2007-11-04 07:56:52 -0800 (Sun, 04 Nov 2007) | 2 lines
Note change to get_dialect semantics in 2.5. Will backport to 2.5.
........
r58844 | georg.brandl | 2007-11-04 09:43:49 -0800 (Sun, 04 Nov 2007) | 2 lines
Fix syntax for versionchanged markup.
........
r58850 | gregory.p.smith | 2007-11-04 18:32:26 -0800 (Sun, 04 Nov 2007) | 9 lines
Fixes bug 477182 on pybsddb.sf.net. DB objects now load the flags and
pay attention to them when opening an existing database. This means
that d[] behaves properly even on databases previously created with DB_DUP
or DB_DUPSORT flags to allow duplicate keys.
http://sourceforge.net/tracker/index.php?func=detail&aid=477182&group_id=13900&atid=113900
Do not backport, this bugfix could be considered an API change.
........
r58851 | gregory.p.smith | 2007-11-04 18:56:31 -0800 (Sun, 04 Nov 2007) | 3 lines
Add the bsddb.db.DBEnv.lock_id_free method.
Improve test_lock's tempdir creation and cleanup.
........
r58852 | gregory.p.smith | 2007-11-05 01:06:28 -0800 (Mon, 05 Nov 2007) | 3 lines
* db->get_types is only available in BerkeleyDB >= 4.2
* get compiling with older versions of python again for a stand alone release.
........
r58853 | gregory.p.smith | 2007-11-05 01:07:40 -0800 (Mon, 05 Nov 2007) | 2 lines
* db->get_flags is only available in BerkeleyDB >= 4.2
........
r58854 | mark.summerfield | 2007-11-05 01:22:48 -0800 (Mon, 05 Nov 2007) | 3 lines
Added cross-references between the various archive file formats.
........
r58857 | mark.summerfield | 2007-11-05 06:38:50 -0800 (Mon, 05 Nov 2007) | 5 lines
Clarified the fact that you can have comments for individual archive
members even though comments to the archive itself aren't currently
supported.
........
2007-11-05 15:43:04 -04:00
|
|
|
* Thread safety uses individual locking mechanism.
|
2007-08-15 11:28:22 -03:00
|
|
|
|
|
|
|
|
|
|
|
(De)compression of files
|
|
|
|
------------------------
|
|
|
|
|
|
|
|
Handling of compressed files is offered by the :class:`BZ2File` class.
|
|
|
|
|
|
|
|
|
2012-08-15 11:43:58 -03:00
|
|
|
.. index::
|
|
|
|
single: universal newlines; bz2.BZ2File class
|
|
|
|
|
2009-04-05 19:20:44 -03:00
|
|
|
.. class:: BZ2File(filename, mode='r', buffering=0, compresslevel=9)
|
2007-08-15 11:28:22 -03:00
|
|
|
|
2008-04-24 22:59:09 -03:00
|
|
|
Open a bz2 file. Mode can be either ``'r'`` or ``'w'``, for reading (default)
|
2007-08-15 11:28:22 -03:00
|
|
|
or writing. When opened for writing, the file will be created if it doesn't
|
2008-04-24 22:59:09 -03:00
|
|
|
exist, and truncated otherwise. If *buffering* is given, ``0`` means
|
|
|
|
unbuffered, and larger numbers specify the buffer size; the default is
|
|
|
|
``0``. If *compresslevel* is given, it must be a number between ``1`` and
|
|
|
|
``9``; the default is ``9``. Add a ``'U'`` to mode to open the file for input
|
2012-08-15 11:43:58 -03:00
|
|
|
in :term:`universal newlines` mode. Any line ending in the input file will be
|
2008-04-24 22:59:09 -03:00
|
|
|
seen as a ``'\n'`` in Python. Also, a file so opened gains the attribute
|
2007-08-15 11:28:22 -03:00
|
|
|
:attr:`newlines`; the value for this attribute is one of ``None`` (no newline
|
2008-04-24 22:59:09 -03:00
|
|
|
read yet), ``'\r'``, ``'\n'``, ``'\r\n'`` or a tuple containing all the
|
|
|
|
newline types seen. Universal newlines are available only when
|
|
|
|
reading. Instances support iteration in the same way as normal :class:`file`
|
|
|
|
instances.
|
2007-08-15 11:28:22 -03:00
|
|
|
|
Merged revisions 69998-69999,70002,70022-70023,70025-70026,70061,70086,70145,70171,70183,70188,70235,70244,70275,70281 via svnmerge from
svn+ssh://pythondev@svn.python.org/python/trunk
........
r69998 | benjamin.peterson | 2009-02-26 13:04:40 -0600 (Thu, 26 Feb 2009) | 1 line
the startship is rather outdated now
........
r69999 | benjamin.peterson | 2009-02-26 13:05:59 -0600 (Thu, 26 Feb 2009) | 1 line
comma
........
r70002 | andrew.kuchling | 2009-02-26 16:34:30 -0600 (Thu, 26 Feb 2009) | 1 line
The curses panel library is now supported
........
r70022 | georg.brandl | 2009-02-27 10:23:18 -0600 (Fri, 27 Feb 2009) | 1 line
#5361: fix typo.
........
r70023 | georg.brandl | 2009-02-27 10:39:26 -0600 (Fri, 27 Feb 2009) | 1 line
#5363: fix cmpfiles() docs. Another instance where a prose description is twice as long as the code.
........
r70025 | georg.brandl | 2009-02-27 10:52:55 -0600 (Fri, 27 Feb 2009) | 1 line
#5344: fix punctuation.
........
r70026 | georg.brandl | 2009-02-27 10:59:03 -0600 (Fri, 27 Feb 2009) | 1 line
#5365: add quick look conversion table for different time representations.
........
r70061 | hirokazu.yamamoto | 2009-02-28 09:24:00 -0600 (Sat, 28 Feb 2009) | 1 line
Binary flag is needed on windows.
........
r70086 | benjamin.peterson | 2009-03-01 21:35:12 -0600 (Sun, 01 Mar 2009) | 1 line
fix a silly problem of caching gone wrong #5401
........
r70145 | benjamin.peterson | 2009-03-03 16:51:57 -0600 (Tue, 03 Mar 2009) | 1 line
making the writing more formal
........
r70171 | facundo.batista | 2009-03-04 15:18:17 -0600 (Wed, 04 Mar 2009) | 3 lines
Fixed a typo.
........
r70183 | benjamin.peterson | 2009-03-04 18:17:57 -0600 (Wed, 04 Mar 2009) | 1 line
add example
........
r70188 | hirokazu.yamamoto | 2009-03-05 03:34:14 -0600 (Thu, 05 Mar 2009) | 1 line
Fixed memory leak on failure.
........
r70235 | benjamin.peterson | 2009-03-07 18:21:17 -0600 (Sat, 07 Mar 2009) | 1 line
fix funky indentation
........
r70244 | martin.v.loewis | 2009-03-08 09:06:19 -0500 (Sun, 08 Mar 2009) | 2 lines
Add Chris Withers.
........
r70275 | georg.brandl | 2009-03-09 11:35:48 -0500 (Mon, 09 Mar 2009) | 2 lines
Add missing space.
........
r70281 | benjamin.peterson | 2009-03-09 15:38:56 -0500 (Mon, 09 Mar 2009) | 1 line
gzip and bz2 are context managers
........
2009-03-09 18:04:33 -03:00
|
|
|
:class:`BZ2File` supports the :keyword:`with` statement.
|
|
|
|
|
2009-03-09 18:08:47 -03:00
|
|
|
.. versionchanged:: 3.1
|
Merged revisions 69998-69999,70002,70022-70023,70025-70026,70061,70086,70145,70171,70183,70188,70235,70244,70275,70281 via svnmerge from
svn+ssh://pythondev@svn.python.org/python/trunk
........
r69998 | benjamin.peterson | 2009-02-26 13:04:40 -0600 (Thu, 26 Feb 2009) | 1 line
the startship is rather outdated now
........
r69999 | benjamin.peterson | 2009-02-26 13:05:59 -0600 (Thu, 26 Feb 2009) | 1 line
comma
........
r70002 | andrew.kuchling | 2009-02-26 16:34:30 -0600 (Thu, 26 Feb 2009) | 1 line
The curses panel library is now supported
........
r70022 | georg.brandl | 2009-02-27 10:23:18 -0600 (Fri, 27 Feb 2009) | 1 line
#5361: fix typo.
........
r70023 | georg.brandl | 2009-02-27 10:39:26 -0600 (Fri, 27 Feb 2009) | 1 line
#5363: fix cmpfiles() docs. Another instance where a prose description is twice as long as the code.
........
r70025 | georg.brandl | 2009-02-27 10:52:55 -0600 (Fri, 27 Feb 2009) | 1 line
#5344: fix punctuation.
........
r70026 | georg.brandl | 2009-02-27 10:59:03 -0600 (Fri, 27 Feb 2009) | 1 line
#5365: add quick look conversion table for different time representations.
........
r70061 | hirokazu.yamamoto | 2009-02-28 09:24:00 -0600 (Sat, 28 Feb 2009) | 1 line
Binary flag is needed on windows.
........
r70086 | benjamin.peterson | 2009-03-01 21:35:12 -0600 (Sun, 01 Mar 2009) | 1 line
fix a silly problem of caching gone wrong #5401
........
r70145 | benjamin.peterson | 2009-03-03 16:51:57 -0600 (Tue, 03 Mar 2009) | 1 line
making the writing more formal
........
r70171 | facundo.batista | 2009-03-04 15:18:17 -0600 (Wed, 04 Mar 2009) | 3 lines
Fixed a typo.
........
r70183 | benjamin.peterson | 2009-03-04 18:17:57 -0600 (Wed, 04 Mar 2009) | 1 line
add example
........
r70188 | hirokazu.yamamoto | 2009-03-05 03:34:14 -0600 (Thu, 05 Mar 2009) | 1 line
Fixed memory leak on failure.
........
r70235 | benjamin.peterson | 2009-03-07 18:21:17 -0600 (Sat, 07 Mar 2009) | 1 line
fix funky indentation
........
r70244 | martin.v.loewis | 2009-03-08 09:06:19 -0500 (Sun, 08 Mar 2009) | 2 lines
Add Chris Withers.
........
r70275 | georg.brandl | 2009-03-09 11:35:48 -0500 (Mon, 09 Mar 2009) | 2 lines
Add missing space.
........
r70281 | benjamin.peterson | 2009-03-09 15:38:56 -0500 (Mon, 09 Mar 2009) | 1 line
gzip and bz2 are context managers
........
2009-03-09 18:04:33 -03:00
|
|
|
Support for the :keyword:`with` statement was added.
|
|
|
|
|
2007-08-15 11:28:22 -03:00
|
|
|
|
2012-02-04 17:44:49 -04:00
|
|
|
.. note::
|
|
|
|
|
|
|
|
This class does not support input files containing multiple streams (such
|
|
|
|
as those produced by the :program:`pbzip2` tool). When reading such an
|
|
|
|
input file, only the first stream will be accessible. If you require
|
2012-02-05 08:29:00 -04:00
|
|
|
support for multi-stream files, consider using the third-party
|
|
|
|
:mod:`bz2file` module (available from
|
|
|
|
`PyPI <http://pypi.python.org/pypi/bz2file>`_). This module provides a
|
|
|
|
backport of Python 3.3's :class:`BZ2File` class, which does support
|
|
|
|
multi-stream files.
|
2012-02-04 17:44:49 -04:00
|
|
|
|
|
|
|
|
2008-04-24 22:59:09 -03:00
|
|
|
.. method:: close()
|
2007-08-15 11:28:22 -03:00
|
|
|
|
2008-04-24 22:59:09 -03:00
|
|
|
Close the file. Sets data attribute :attr:`closed` to true. A closed file
|
|
|
|
cannot be used for further I/O operations. :meth:`close` may be called
|
|
|
|
more than once without error.
|
2007-08-15 11:28:22 -03:00
|
|
|
|
|
|
|
|
2008-04-24 22:59:09 -03:00
|
|
|
.. method:: read([size])
|
2007-08-15 11:28:22 -03:00
|
|
|
|
2010-03-12 18:45:38 -04:00
|
|
|
Read at most *size* uncompressed bytes, returned as a byte string. If the
|
2008-04-24 22:59:09 -03:00
|
|
|
*size* argument is negative or omitted, read until EOF is reached.
|
2007-08-15 11:28:22 -03:00
|
|
|
|
|
|
|
|
2008-04-24 22:59:09 -03:00
|
|
|
.. method:: readline([size])
|
2007-08-15 11:28:22 -03:00
|
|
|
|
2010-03-12 18:45:38 -04:00
|
|
|
Return the next line from the file, as a byte string, retaining newline.
|
|
|
|
A non-negative *size* argument limits the maximum number of bytes to
|
|
|
|
return (an incomplete line may be returned then). Return an empty byte
|
|
|
|
string at EOF.
|
2007-08-15 11:28:22 -03:00
|
|
|
|
|
|
|
|
2008-04-24 22:59:09 -03:00
|
|
|
.. method:: readlines([size])
|
2007-08-15 11:28:22 -03:00
|
|
|
|
2008-04-24 22:59:09 -03:00
|
|
|
Return a list of lines read. The optional *size* argument, if given, is an
|
|
|
|
approximate bound on the total number of bytes in the lines returned.
|
2007-08-15 11:28:22 -03:00
|
|
|
|
|
|
|
|
2008-04-24 22:59:09 -03:00
|
|
|
.. method:: seek(offset[, whence])
|
2007-08-15 11:28:22 -03:00
|
|
|
|
2008-04-24 22:59:09 -03:00
|
|
|
Move to new file position. Argument *offset* is a byte count. Optional
|
|
|
|
argument *whence* defaults to ``os.SEEK_SET`` or ``0`` (offset from start
|
|
|
|
of file; offset should be ``>= 0``); other values are ``os.SEEK_CUR`` or
|
|
|
|
``1`` (move relative to current position; offset can be positive or
|
|
|
|
negative), and ``os.SEEK_END`` or ``2`` (move relative to end of file;
|
|
|
|
offset is usually negative, although many platforms allow seeking beyond
|
|
|
|
the end of a file).
|
2007-08-15 11:28:22 -03:00
|
|
|
|
2008-04-24 22:59:09 -03:00
|
|
|
Note that seeking of bz2 files is emulated, and depending on the
|
|
|
|
parameters the operation may be extremely slow.
|
2007-08-15 11:28:22 -03:00
|
|
|
|
|
|
|
|
2008-04-24 22:59:09 -03:00
|
|
|
.. method:: tell()
|
2007-08-15 11:28:22 -03:00
|
|
|
|
2008-04-24 22:59:09 -03:00
|
|
|
Return the current file position, an integer.
|
2007-08-15 11:28:22 -03:00
|
|
|
|
|
|
|
|
2008-04-24 22:59:09 -03:00
|
|
|
.. method:: write(data)
|
2007-08-15 11:28:22 -03:00
|
|
|
|
2010-03-12 18:45:38 -04:00
|
|
|
Write the byte string *data* to file. Note that due to buffering,
|
|
|
|
:meth:`close` may be needed before the file on disk reflects the data
|
|
|
|
written.
|
2007-08-15 11:28:22 -03:00
|
|
|
|
|
|
|
|
2010-03-12 18:45:38 -04:00
|
|
|
.. method:: writelines(sequence_of_byte_strings)
|
2007-08-15 11:28:22 -03:00
|
|
|
|
2010-03-12 18:45:38 -04:00
|
|
|
Write the sequence of byte strings to the file. Note that newlines are not
|
|
|
|
added. The sequence can be any iterable object producing byte strings.
|
|
|
|
This is equivalent to calling write() for each byte string.
|
2007-08-15 11:28:22 -03:00
|
|
|
|
|
|
|
|
|
|
|
Sequential (de)compression
|
|
|
|
--------------------------
|
|
|
|
|
|
|
|
Sequential compression and decompression is done using the classes
|
|
|
|
:class:`BZ2Compressor` and :class:`BZ2Decompressor`.
|
|
|
|
|
|
|
|
|
2009-04-05 19:20:44 -03:00
|
|
|
.. class:: BZ2Compressor(compresslevel=9)
|
2007-08-15 11:28:22 -03:00
|
|
|
|
|
|
|
Create a new compressor object. This object may be used to compress data
|
2008-04-24 22:59:09 -03:00
|
|
|
sequentially. If you want to compress data in one shot, use the
|
|
|
|
:func:`compress` function instead. The *compresslevel* parameter, if given,
|
|
|
|
must be a number between ``1`` and ``9``; the default is ``9``.
|
2007-08-15 11:28:22 -03:00
|
|
|
|
2008-04-24 22:59:09 -03:00
|
|
|
.. method:: compress(data)
|
2007-08-15 11:28:22 -03:00
|
|
|
|
2008-04-24 22:59:09 -03:00
|
|
|
Provide more data to the compressor object. It will return chunks of
|
|
|
|
compressed data whenever possible. When you've finished providing data to
|
|
|
|
compress, call the :meth:`flush` method to finish the compression process,
|
|
|
|
and return what is left in internal buffers.
|
2007-08-15 11:28:22 -03:00
|
|
|
|
|
|
|
|
2008-04-24 22:59:09 -03:00
|
|
|
.. method:: flush()
|
2007-08-15 11:28:22 -03:00
|
|
|
|
2008-04-24 22:59:09 -03:00
|
|
|
Finish the compression process and return what is left in internal
|
|
|
|
buffers. You must not use the compressor object after calling this method.
|
2007-08-15 11:28:22 -03:00
|
|
|
|
|
|
|
|
|
|
|
.. class:: BZ2Decompressor()
|
|
|
|
|
|
|
|
Create a new decompressor object. This object may be used to decompress data
|
|
|
|
sequentially. If you want to decompress data in one shot, use the
|
|
|
|
:func:`decompress` function instead.
|
|
|
|
|
2008-04-24 22:59:09 -03:00
|
|
|
.. method:: decompress(data)
|
2007-08-15 11:28:22 -03:00
|
|
|
|
2008-04-24 22:59:09 -03:00
|
|
|
Provide more data to the decompressor object. It will return chunks of
|
|
|
|
decompressed data whenever possible. If you try to decompress data after
|
|
|
|
the end of stream is found, :exc:`EOFError` will be raised. If any data
|
|
|
|
was found after the end of stream, it'll be ignored and saved in
|
|
|
|
:attr:`unused_data` attribute.
|
2007-08-15 11:28:22 -03:00
|
|
|
|
|
|
|
|
|
|
|
One-shot (de)compression
|
|
|
|
------------------------
|
|
|
|
|
|
|
|
One-shot compression and decompression is provided through the :func:`compress`
|
|
|
|
and :func:`decompress` functions.
|
|
|
|
|
|
|
|
|
2009-04-05 19:20:44 -03:00
|
|
|
.. function:: compress(data, compresslevel=9)
|
2007-08-15 11:28:22 -03:00
|
|
|
|
2008-04-24 22:59:09 -03:00
|
|
|
Compress *data* in one shot. If you want to compress data sequentially, use
|
|
|
|
an instance of :class:`BZ2Compressor` instead. The *compresslevel* parameter,
|
|
|
|
if given, must be a number between ``1`` and ``9``; the default is ``9``.
|
2007-08-15 11:28:22 -03:00
|
|
|
|
|
|
|
|
|
|
|
.. function:: decompress(data)
|
|
|
|
|
2008-04-24 22:59:09 -03:00
|
|
|
Decompress *data* in one shot. If you want to decompress data sequentially,
|
|
|
|
use an instance of :class:`BZ2Decompressor` instead.
|
2007-08-15 11:28:22 -03:00
|
|
|
|