Issue #6715: Add module for compression using the LZMA algorithm.

This commit is contained in:
Nadeem Vawda 2011-11-30 00:25:06 +02:00
parent 551ac95733
commit 3ff069ebc6
16 changed files with 3773 additions and 6 deletions

View File

@ -5,7 +5,8 @@ Data Compression and Archiving
******************************
The modules described in this chapter support data compression with the zlib,
gzip, and bzip2 algorithms, and the creation of ZIP- and tar-format archives.
gzip, bzip2 and lzma algorithms, and the creation of ZIP- and tar-format
archives.
.. toctree::
@ -13,5 +14,6 @@ gzip, and bzip2 algorithms, and the creation of ZIP- and tar-format archives.
zlib.rst
gzip.rst
bz2.rst
lzma.rst
zipfile.rst
tarfile.rst

View File

@ -12,7 +12,7 @@
This module provides a comprehensive interface for compressing and
decompressing data using the bzip2 compression algorithm.
For related file formats, see the :mod:`gzip`, :mod:`zipfile`, and
For related file formats, see the :mod:`gzip`, :mod:`lzma`, :mod:`zipfile`, and
:mod:`tarfile` modules.
The :mod:`bz2` module contains:

View File

@ -21,7 +21,7 @@ Note that additional file formats which can be decompressed by the
:program:`gzip` and :program:`gunzip` programs, such as those produced by
:program:`compress` and :program:`pack`, are not supported by this module.
For other archive formats, see the :mod:`bz2`, :mod:`zipfile`, and
For related file formats, see the :mod:`bz2`, :mod:`lzma`, :mod:`zipfile`, and
:mod:`tarfile` modules.
The module defines the following items:

344
Doc/library/lzma.rst Normal file
View File

@ -0,0 +1,344 @@
:mod:`lzma` --- Compression using the LZMA algorithm
====================================================
.. module:: lzma
:synopsis: A Python wrapper for the liblzma compression library.
.. moduleauthor:: Nadeem Vawda <nadeem.vawda@gmail.com>
.. sectionauthor:: Nadeem Vawda <nadeem.vawda@gmail.com>
.. versionadded:: 3.3
This module provides classes and convenience functions for compressing and
decompressing data using the LZMA compression algorithm. Also included is a file
interface supporting the ``.xz`` and legacy ``.lzma`` file formats used by the
:program:`xz` utility, as well as raw compressed streams.
For related file formats, see the :mod:`bz2`, :mod:`gzip`, :mod:`zipfile`, and
:mod:`tarfile` modules.
The interface provided by this module is very similar to that of the :mod:`bz2`
module. However, note that :class:`LZMAFile` is *not* thread-safe, unlike
:class:`bz2.BZ2File`, so if you need to use a single :class:`LZMAFile` instance
from multiple threads, it is necessary to protect it with a lock.
.. exception:: LZMAError
This exception is raised when an error occurs during compression or
decompression, or while initializing the compressor/decompressor state.
Reading and writing compressed files
------------------------------------
.. class:: LZMAFile(filename=None, mode="r", fileobj=None, format=None, check=-1, preset=None, filters=None)
Open an LZMA-compressed file.
An :class:`LZMAFile` can wrap an existing :term:`file object` (given by
*fileobj*), or operate directly on a named file (named by *filename*).
Exactly one of these two parameters should be provided. If *fileobj* is
provided, it is not closed when the :class:`LZMAFile` is closed.
The *mode* argument can be either ``"r"`` for reading (default), ``"w"`` for
overwriting, or ``"a"`` for appending. If *fileobj* is provided, a mode of
``"w"`` does not truncate the file, and is instead equivalent to ``"a"``.
When opening a file for reading, the input file may be the concatenation of
multiple separate compressed streams. These are transparently decoded as a
single logical stream.
When opening a file for reading, the *format* and *filters* arguments have
the same meanings as for :class:`LZMADecompressor`. In this case, the *check*
and *preset* arguments should not be used.
When opening a file for writing, the *format*, *check*, *preset* and
*filters* arguments have the same meanings as for :class:`LZMACompressor`.
:class:`LZMAFile` supports all the members specified by
:class:`io.BufferedIOBase`, except for :meth:`detach` and :meth:`truncate`.
Iteration and the :keyword:`with` statement are supported.
The following method is also provided:
.. method:: peek(size=-1)
Return buffered data without advancing the file position. At least one
byte of data will be returned, unless EOF has been reached. The exact
number of bytes returned is unspecified (the *size* argument is ignored).
Compressing and decompressing data in memory
--------------------------------------------
.. class:: LZMACompressor(format=FORMAT_XZ, check=-1, preset=None, filters=None)
Create a compressor object, which can be used to compress data incrementally.
For a more convenient way of compressing a single chunk of data, see
:func:`compress`.
The *format* argument specifies what container format should be used.
Possible values are:
* :const:`FORMAT_XZ`: The ``.xz`` container format.
This is the default format.
* :const:`FORMAT_ALONE`: The legacy ``.lzma`` container format.
This format is more limited than ``.xz`` -- it does not support integrity
checks or multiple filters.
* :const:`FORMAT_RAW`: A raw data stream, not using any container format.
This format specifier does not support integrity checks, and requires that
you always specify a custom filter chain (for both compression and
decompression). Additionally, data compressed in this manner cannot be
decompressed using :const:`FORMAT_AUTO` (see :class:`LZMADecompressor`).
The *check* argument specifies the type of integrity check to include in the
compressed data. This check is used when decompressing, to ensure that the
data has not been corrupted. Possible values are:
* :const:`CHECK_NONE`: No integrity check.
This is the default (and the only acceptable value) for
:const:`FORMAT_ALONE` and :const:`FORMAT_RAW`.
* :const:`CHECK_CRC32`: 32-bit Cyclic Redundancy Check.
* :const:`CHECK_CRC64`: 64-bit Cyclic Redundancy Check.
This is the default for :const:`FORMAT_XZ`.
* :const:`CHECK_SHA256`: 256-bit Secure Hash Algorithm.
If the specified check is not supported, an :class:`LZMAError` is raised.
The compression settings can be specified either as a preset compression
level (with the *preset* argument), or in detail as a custom filter chain
(with the *filters* argument).
The *preset* argument (if provided) should be an integer between ``0`` and
``9`` (inclusive), optionally OR-ed with the constant
:const:`PRESET_EXTREME`. If neither *preset* nor *filters* are given, the
default behavior is to use :const:`PRESET_DEFAULT` (preset level ``6``).
Higher presets produce smaller output, but make compression more CPU- and
memory-intensive, and also increase the memory required for decompression.
The *filters* argument (if provided) should be a filter chain specifier.
See :ref:`filter-chain-specs` for details.
.. method:: compress(data)
Compress *data* (a :class:`bytes` object), returning a :class:`bytes`
object containing compressed data for at least part of the input. Some of
*data* may be buffered internally, for use in later calls to
:meth:`compress` and :meth:`flush`. The returned data should be
concatenated with the output of any previous calls to :meth:`compress`.
.. method:: flush()
Finish the compression process, returning a :class:`bytes` object
containing any data stored in the compressor's internal buffers.
The compressor cannot be used after this method has been called.
.. class:: LZMADecompressor(format=FORMAT_AUTO, memlimit=None, filters=None)
Create a decompressor object, which can be used to decompress data
incrementally.
For a more convenient way of decompressing an entire compressed stream at
once, see :func:`decompress`.
The *format* argument specifies the container format that should be used. The
default is :const:`FORMAT_AUTO`, which can decompress both ``.xz`` and
``.lzma`` files. Other possible values are :const:`FORMAT_XZ`,
:const:`FORMAT_ALONE`, and :const:`FORMAT_RAW`.
The *memlimit* argument specifies a limit (in bytes) on the amount of memory
that the decompressor can use. When this argument is used, decompression will
fail with an :class:`LZMAError` if it is not possible to decompress the input
within the given memory limit.
The *filters* argument specifies the filter chain that was used to create
the stream being decompressed. This argument is required if *format* is
:const:`FORMAT_RAW`, but should not be used for other formats.
See :ref:`filter-chain-specs` for more information about filter chains.
.. note::
This class does not transparently handle inputs containing multiple
compressed streams, unlike :func:`decompress` and :class:`LZMAFile`. To
decompress a multi-stream input with :class:`LZMADecompressor`, you must
create a new decompressor for each stream.
.. method:: decompress(data)
Decompress *data* (a :class:`bytes` object), returning a :class:`bytes`
object containing the decompressed data for at least part of the input.
Some of *data* may be buffered internally, for use in later calls to
:meth:`decompress`. The returned data should be concatenated with the
output of any previous calls to :meth:`decompress`.
.. attribute:: check
The ID of the integrity check used by the input stream. This may be
:const:`CHECK_UNKNOWN` until enough of the input has been decoded to
determine what integrity check it uses.
.. attribute:: eof
True if the end-of-stream marker has been reached.
.. attribute:: unused_data
Data found after the end of the compressed stream.
Before the end of the stream is reached, this will be ``b""``.
.. function:: compress(data, format=FORMAT_XZ, check=-1, preset=None, filters=None)
Compress *data* (a :class:`bytes` object), returning the compressed data as a
:class:`bytes` object.
See :class:`LZMACompressor` above for a description of the *format*, *check*,
*preset* and *filters* arguments.
.. function:: decompress(data, format=FORMAT_AUTO, memlimit=None, filters=None)
Decompress *data* (a :class:`bytes` object), returning the uncompressed data
as a :class:`bytes` object.
If *data* is the concatenation of multiple distinct compressed streams,
decompress all of these streams, and return the concatenation of the results.
See :class:`LZMADecompressor` above for a description of the *format*,
*memlimit* and *filters* arguments.
Miscellaneous
-------------
.. function:: check_is_supported(check)
Returns true if the given integrity check is supported on this system.
:const:`CHECK_NONE` and :const:`CHECK_CRC32` are always supported.
:const:`CHECK_CRC64` and :const:`CHECK_SHA256` may be unavailable if you are
using a version of :program:`liblzma` that was compiled with a limited
feature set.
.. _filter-chain-specs:
Specifying custom filter chains
-------------------------------
A filter chain specifier is a sequence of dictionaries, where each dictionary
contains the ID and options for a single filter. Each dictionary must contain
the key ``"id"``, and may contain additional keys to specify filter-dependent
options. Valid filter IDs are as follows:
* Compression filters:
* :const:`FILTER_LZMA1` (for use with :const:`FORMAT_ALONE`)
* :const:`FILTER_LZMA2` (for use with :const:`FORMAT_XZ` and :const:`FORMAT_RAW`)
* Delta filter:
* :const:`FILTER_DELTA`
* Branch-Call-Jump (BCJ) filters:
* :const:`FILTER_X86`
* :const:`FILTER_IA64`
* :const:`FILTER_ARM`
* :const:`FILTER_ARMTHUMB`
* :const:`FILTER_POWERPC`
* :const:`FILTER_SPARC`
A filter chain can consist of up to 4 filters, and cannot be empty. The last
filter in the chain must be a compression filter, and any other filters must be
delta or BCJ filters.
Compression filters support the following options (specified as additional
entries in the dictionary representing the filter):
* ``preset``: A compression preset to use as a source of default values for
options that are not specified explicitly.
* ``dict_size``: Dictionary size in bytes. This should be between 4KiB and
1.5GiB (inclusive).
* ``lc``: Number of literal context bits.
* ``lp``: Number of literal position bits. The sum ``lc + lp`` must be at
most 4.
* ``pb``: Number of position bits; must be at most 4.
* ``mode``: :const:`MODE_FAST` or :const:`MODE_NORMAL`.
* ``nice_len``: What should be considered a "nice length" for a match.
This should be 273 or less.
* ``mf``: What match finder to use -- :const:`MF_HC3`, :const:`MF_HC4`,
:const:`MF_BT2`, :const:`MF_BT3`, or :const:`MF_BT4`.
* ``depth``: Maximum search depth used by match finder. 0 (default) means to
select automatically based on other filter options.
The delta filter stores the differences between bytes, producing more repetitive
input for the compressor in certain circumstances. It only supports a single
The delta filter supports only one option, ``dist``. This indicates the distance
between bytes to be subtracted. The default is 1, i.e. take the differences
between adjacent bytes.
The BCJ filters are intended to be applied to machine code. They convert
relative branches, calls and jumps in the code to use absolute addressing, with
the aim of increasing the redundancy that can be exploited by the compressor.
These filters support one option, ``start_offset``. This specifies the address
that should be mapped to the beginning of the input data. The default is 0.
Examples
--------
Reading in a compressed file::
import lzma
with lzma.LZMAFile("file.xz") as f:
file_content = f.read()
Creating a compressed file::
import lzma
data = b"Insert Data Here"
with lzma.LZMAFile("file.xz", "w") as f:
f.write(data)
Compressing data in memory::
import lzma
data_in = b"Insert Data Here"
data_out = lzma.compress(data_in)
Incremental compression::
import lzma
lzc = lzma.LZMACompressor()
out1 = lzc.compress(b"Some data\n")
out2 = lzc.compress(b"Another piece of data\n")
out3 = lzc.compress(b"Even more data\n")
out4 = lzc.flush()
# Concatenate all the partial results:
result = b"".join([out1, out2, out3, out4])
Writing compressed data to an already-open file::
import lzma
with open("file.xz", "wb") as f:
f.write(b"This data will not be compressed\n")
with lzma.LZMAFile(fileobj=f, mode="w") as lzf:
lzf.write(b"This *will* be compressed\n")
f.write(b"Not compressed\n")
Creating a compressed file using a custom filter chain::
import lzma
my_filters = [
{"id": lzma.FILTER_DELTA, "dist": 5},
{"id": lzma.FILTER_LZMA2, "preset": 7 | lzma.PRESET_EXTREME},
]
with lzma.LZMAFile("file.xz", "w", filters=my_filters) as f:
f.write(b"blah blah blah")

View File

@ -23,7 +23,7 @@ decryption of encrypted files in ZIP archives, but it currently cannot
create an encrypted file. Decryption is extremely slow as it is
implemented in native Python rather than C.
For other archive formats, see the :mod:`bz2`, :mod:`gzip`, and
For related file formats, see the :mod:`bz2`, :mod:`gzip`, :mod:`lzma`, and
:mod:`tarfile` modules.
The module defines the following items:

View File

@ -18,8 +18,8 @@ order. This documentation doesn't attempt to cover all of the permutations;
consult the zlib manual at http://www.zlib.net/manual.html for authoritative
information.
For reading and writing ``.gz`` files see the :mod:`gzip` module. For
other related file formats, see the :mod:`bz2`, :mod:`zipfile`, and
For reading and writing ``.gz`` files see the :mod:`gzip` module. For other
related file formats, see the :mod:`bz2`, :mod:`lzma`, :mod:`zipfile`, and
:mod:`tarfile` modules.
The available exception and functions in this module are:

398
Lib/lzma.py Normal file
View File

@ -0,0 +1,398 @@
"""Interface to the liblzma compression library.
This module provides a class for reading and writing compressed files,
classes for incremental (de)compression, and convenience functions for
one-shot (de)compression.
These classes and functions support both the XZ and legacy LZMA
container formats, as well as raw compressed data streams.
"""
__all__ = [
"CHECK_NONE", "CHECK_CRC32", "CHECK_CRC64", "CHECK_SHA256",
"CHECK_ID_MAX", "CHECK_UNKNOWN",
"FILTER_LZMA1", "FILTER_LZMA2", "FILTER_DELTA", "FILTER_X86", "FILTER_IA64",
"FILTER_ARM", "FILTER_ARMTHUMB", "FILTER_POWERPC", "FILTER_SPARC",
"FORMAT_AUTO", "FORMAT_XZ", "FORMAT_ALONE", "FORMAT_RAW",
"MF_HC3", "MF_HC4", "MF_BT2", "MF_BT3", "MF_BT4",
"MODE_FAST", "MODE_NORMAL", "PRESET_DEFAULT", "PRESET_EXTREME",
"LZMACompressor", "LZMADecompressor", "LZMAFile", "LZMAError",
"compress", "decompress", "check_is_supported",
]
import io
from _lzma import *
_MODE_CLOSED = 0
_MODE_READ = 1
_MODE_READ_EOF = 2
_MODE_WRITE = 3
_BUFFER_SIZE = 8192
class LZMAFile(io.BufferedIOBase):
"""A file object providing transparent LZMA (de)compression.
An LZMAFile can act as a wrapper for an existing file object, or
refer directly to a named file on disk.
Note that LZMAFile provides a *binary* file interface - data read
is returned as bytes, and data to be written must be given as bytes.
"""
def __init__(self, filename=None, mode="r", *,
fileobj=None, format=None, check=-1,
preset=None, filters=None):
"""Open an LZMA-compressed file.
If filename is given, open the named file. Otherwise, operate on
the file object given by fileobj. Exactly one of these two
parameters should be provided.
mode can be "r" for reading (default), "w" for (over)writing, or
"a" for appending.
format specifies the container format to use for the file.
If mode is "r", this defaults to FORMAT_AUTO. Otherwise, the
default is FORMAT_XZ.
check specifies the integrity check to use. This argument can
only be used when opening a file for writing. For FORMAT_XZ,
the default is CHECK_CRC64. FORMAT_ALONE and FORMAT_RAW do not
support integrity checks - for these formats, check must be
omitted, or be CHECK_NONE.
When opening a file for reading, the *preset* argument is not
meaningful, and should be omitted. The *filters* argument should
also be omitted, except when format is FORMAT_RAW (in which case
it is required).
When opening a file for writing, the settings used by the
compressor can be specified either as a preset compression
level (with the *preset* argument), or in detail as a custom
filter chain (with the *filters* argument). For FORMAT_XZ and
FORMAT_ALONE, the default is to use the PRESET_DEFAULT preset
level. For FORMAT_RAW, the caller must always specify a filter
chain; the raw compressor does not support preset compression
levels.
preset (if provided) should be an integer in the range 0-9,
optionally OR-ed with the constant PRESET_EXTREME.
filters (if provided) should be a sequence of dicts. Each dict
should have an entry for "id" indicating ID of the filter, plus
additional entries for options to the filter.
"""
self._fp = None
self._closefp = False
self._mode = _MODE_CLOSED
self._pos = 0
self._size = -1
if mode == "r":
if check != -1:
raise ValueError("Cannot specify an integrity check "
"when opening a file for reading")
if preset is not None:
raise ValueError("Cannot specify a preset compression "
"level when opening a file for reading")
if format is None:
format = FORMAT_AUTO
mode_code = _MODE_READ
# Save the args to pass to the LZMADecompressor initializer.
# If the file contains multiple compressed streams, each
# stream will need a separate decompressor object.
self._init_args = {"format":format, "filters":filters}
self._decompressor = LZMADecompressor(**self._init_args)
self._buffer = None
elif mode in ("w", "a"):
if format is None:
format = FORMAT_XZ
mode_code = _MODE_WRITE
self._compressor = LZMACompressor(format=format, check=check,
preset=preset, filters=filters)
else:
raise ValueError("Invalid mode: {!r}".format(mode))
if filename is not None and fileobj is None:
mode += "b"
self._fp = open(filename, mode)
self._closefp = True
self._mode = mode_code
elif fileobj is not None and filename is None:
self._fp = fileobj
self._mode = mode_code
else:
raise ValueError("Must give exactly one of filename and fileobj")
def close(self):
"""Flush and close the file.
May be called more than once without error. Once the file is
closed, any other operation on it will raise a ValueError.
"""
if self._mode == _MODE_CLOSED:
return
try:
if self._mode in (_MODE_READ, _MODE_READ_EOF):
self._decompressor = None
self._buffer = None
elif self._mode == _MODE_WRITE:
self._fp.write(self._compressor.flush())
self._compressor = None
finally:
try:
if self._closefp:
self._fp.close()
finally:
self._fp = None
self._closefp = False
self._mode = _MODE_CLOSED
@property
def closed(self):
"""True if this file is closed."""
return self._mode == _MODE_CLOSED
def fileno(self):
"""Return the file descriptor for the underlying file."""
self._check_not_closed()
return self._fp.fileno()
def seekable(self):
"""Return whether the file supports seeking."""
return self.readable()
def readable(self):
"""Return whether the file was opened for reading."""
self._check_not_closed()
return self._mode in (_MODE_READ, _MODE_READ_EOF)
def writable(self):
"""Return whether the file was opened for writing."""
self._check_not_closed()
return self._mode == _MODE_WRITE
# Mode-checking helper functions.
def _check_not_closed(self):
if self.closed:
raise ValueError("I/O operation on closed file")
def _check_can_read(self):
if not self.readable():
raise io.UnsupportedOperation("File not open for reading")
def _check_can_write(self):
if not self.writable():
raise io.UnsupportedOperation("File not open for writing")
def _check_can_seek(self):
if not self.seekable():
raise io.UnsupportedOperation("Seeking is only supported "
"on files open for reading")
# Fill the readahead buffer if it is empty. Returns False on EOF.
def _fill_buffer(self):
if self._buffer:
return True
if self._decompressor.unused_data:
rawblock = self._decompressor.unused_data
else:
rawblock = self._fp.read(_BUFFER_SIZE)
if not rawblock:
if self._decompressor.eof:
self._mode = _MODE_READ_EOF
self._size = self._pos
return False
else:
raise EOFError("Compressed file ended before the "
"end-of-stream marker was reached")
# Continue to next stream.
if self._decompressor.eof:
self._decompressor = LZMADecompressor(**self._init_args)
self._buffer = self._decompressor.decompress(rawblock)
return True
# Read data until EOF.
# If return_data is false, consume the data without returning it.
def _read_all(self, return_data=True):
blocks = []
while self._fill_buffer():
if return_data:
blocks.append(self._buffer)
self._pos += len(self._buffer)
self._buffer = None
if return_data:
return b"".join(blocks)
# Read a block of up to n bytes.
# If return_data is false, consume the data without returning it.
def _read_block(self, n, return_data=True):
blocks = []
while n > 0 and self._fill_buffer():
if n < len(self._buffer):
data = self._buffer[:n]
self._buffer = self._buffer[n:]
else:
data = self._buffer
self._buffer = None
if return_data:
blocks.append(data)
self._pos += len(data)
n -= len(data)
if return_data:
return b"".join(blocks)
def peek(self, size=-1):
"""Return buffered data without advancing the file position.
Always returns at least one byte of data, unless at EOF.
The exact number of bytes returned is unspecified.
"""
self._check_can_read()
if self._mode == _MODE_READ_EOF or not self._fill_buffer():
return b""
return self._buffer
def read(self, size=-1):
"""Read up to size uncompressed bytes from the file.
If size is negative or omitted, read until EOF is reached.
Returns b"" if the file is already at EOF.
"""
self._check_can_read()
if self._mode == _MODE_READ_EOF or size == 0:
return b""
elif size < 0:
return self._read_all()
else:
return self._read_block(size)
def read1(self, size=-1):
"""Read up to size uncompressed bytes with at most one read
from the underlying stream.
Returns b"" if the file is at EOF.
"""
self._check_can_read()
if (size == 0 or self._mode == _MODE_READ_EOF or
not self._fill_buffer()):
return b""
if 0 < size < len(self._buffer):
data = self._buffer[:size]
self._buffer = self._buffer[size:]
else:
data = self._buffer
self._buffer = None
self._pos += len(data)
return data
def write(self, data):
"""Write a bytes object to the file.
Returns the number of uncompressed bytes written, which is
always len(data). Note that due to buffering, the file on disk
may not reflect the data written until close() is called.
"""
self._check_can_write()
compressed = self._compressor.compress(data)
self._fp.write(compressed)
self._pos += len(data)
return len(data)
# Rewind the file to the beginning of the data stream.
def _rewind(self):
self._fp.seek(0, 0)
self._mode = _MODE_READ
self._pos = 0
self._decompressor = LZMADecompressor(**self._init_args)
self._buffer = None
def seek(self, offset, whence=0):
"""Change the file position.
The new position is specified by offset, relative to the
position indicated by whence. Possible values for whence are:
0: start of stream (default): offset must not be negative
1: current stream position
2: end of stream; offset must not be positive
Returns the new file position.
Note that seeking is emulated, sp depending on the parameters,
this operation may be extremely slow.
"""
self._check_can_seek()
# Recalculate offset as an absolute file position.
if whence == 0:
pass
elif whence == 1:
offset = self._pos + offset
elif whence == 2:
# Seeking relative to EOF - we need to know the file's size.
if self._size < 0:
self._read_all(return_data=False)
offset = self._size + offset
else:
raise ValueError("Invalid value for whence: {}".format(whence))
# Make it so that offset is the number of bytes to skip forward.
if offset < self._pos:
self._rewind()
else:
offset -= self._pos
# Read and discard data until we reach the desired position.
if self._mode != _MODE_READ_EOF:
self._read_block(offset, return_data=False)
return self._pos
def tell(self):
"""Return the current file position."""
self._check_not_closed()
return self._pos
def compress(data, format=FORMAT_XZ, check=-1, preset=None, filters=None):
"""Compress a block of data.
Refer to LZMACompressor's docstring for a description of the
optional arguments *format*, *check*, *preset* and *filters*.
For incremental compression, use an LZMACompressor object instead.
"""
comp = LZMACompressor(format, check, preset, filters)
return comp.compress(data) + comp.flush()
def decompress(data, format=FORMAT_AUTO, memlimit=None, filters=None):
"""Decompress a block of data.
Refer to LZMADecompressor's docstring for a description of the
optional arguments *format*, *check* and *filters*.
For incremental decompression, use a LZMADecompressor object instead.
"""
results = []
while True:
decomp = LZMADecompressor(format, memlimit, filters)
results.append(decomp.decompress(data))
if not decomp.eof:
raise LZMAError("Compressed data ended before the "
"end-of-stream marker was reached")
if not decomp.unused_data:
return b"".join(results)
# There is unused data left over. Proceed to next stream.
data = decomp.unused_data

1335
Lib/test/test_lzma.py Normal file

File diff suppressed because it is too large Load Diff

View File

@ -399,6 +399,8 @@ Core and Builtins
Library
-------
- Issue #6715: Add a module 'lzma' for compression using the LZMA algorithm.
- Issue #13487: Make inspect.getmodule robust against changes done to
sys.modules while it is iterating over it.

1106
Modules/_lzmamodule.c Normal file

File diff suppressed because it is too large Load Diff

537
PCbuild/_lzma.vcproj Normal file
View File

@ -0,0 +1,537 @@
<?xml version="1.0" encoding="Windows-1252"?>
<VisualStudioProject
ProjectType="Visual C++"
Version="9,00"
Name="_lzma"
ProjectGUID="{F9D71780-F393-11E0-BE50-0800200C9A66}"
RootNamespace="lzma"
Keyword="Win32Proj"
TargetFrameworkVersion="196613"
>
<Platforms>
<Platform
Name="Win32"
/>
<Platform
Name="x64"
/>
</Platforms>
<ToolFiles>
</ToolFiles>
<Configurations>
<Configuration
Name="Debug|Win32"
ConfigurationType="2"
InheritedPropertySheets=".\pyd_d.vsprops"
CharacterSet="0"
>
<Tool
Name="VCPreBuildEventTool"
/>
<Tool
Name="VCCustomBuildTool"
/>
<Tool
Name="VCXMLDataGeneratorTool"
/>
<Tool
Name="VCWebServiceProxyGeneratorTool"
/>
<Tool
Name="VCMIDLTool"
/>
<Tool
Name="VCCLCompilerTool"
AdditionalIncludeDirectories="$(lzmaDir)\include"
PreprocessorDefinitions="WIN32;_FILE_OFFSET_BITS=64;_CRT_SECURE_NO_DEPRECATE;_CRT_NONSTDC_NO_DEPRECATE;LZMA_API_STATIC"
/>
<Tool
Name="VCManagedResourceCompilerTool"
/>
<Tool
Name="VCResourceCompilerTool"
/>
<Tool
Name="VCPreLinkEventTool"
/>
<Tool
Name="VCLinkerTool"
AdditionalDependencies="$(lzmaDir)\bin_i486\liblzma.a"
/>
<Tool
Name="VCALinkTool"
/>
<Tool
Name="VCManifestTool"
/>
<Tool
Name="VCXDCMakeTool"
/>
<Tool
Name="VCBscMakeTool"
/>
<Tool
Name="VCFxCopTool"
/>
<Tool
Name="VCAppVerifierTool"
/>
<Tool
Name="VCPostBuildEventTool"
/>
</Configuration>
<Configuration
Name="Debug|x64"
ConfigurationType="2"
InheritedPropertySheets=".\pyd_d.vsprops;.\x64.vsprops"
CharacterSet="0"
>
<Tool
Name="VCPreBuildEventTool"
/>
<Tool
Name="VCCustomBuildTool"
/>
<Tool
Name="VCXMLDataGeneratorTool"
/>
<Tool
Name="VCWebServiceProxyGeneratorTool"
/>
<Tool
Name="VCMIDLTool"
TargetEnvironment="3"
/>
<Tool
Name="VCCLCompilerTool"
AdditionalIncludeDirectories="$(lzmaDir)\include"
PreprocessorDefinitions="WIN32;_FILE_OFFSET_BITS=64;_CRT_SECURE_NO_DEPRECATE;_CRT_NONSTDC_NO_DEPRECATE;LZMA_API_STATIC"
/>
<Tool
Name="VCManagedResourceCompilerTool"
/>
<Tool
Name="VCResourceCompilerTool"
/>
<Tool
Name="VCPreLinkEventTool"
/>
<Tool
Name="VCLinkerTool"
AdditionalDependencies="$(lzmaDir)\bin_x86-64\liblzma.a"
/>
<Tool
Name="VCALinkTool"
/>
<Tool
Name="VCManifestTool"
/>
<Tool
Name="VCXDCMakeTool"
/>
<Tool
Name="VCBscMakeTool"
/>
<Tool
Name="VCFxCopTool"
/>
<Tool
Name="VCAppVerifierTool"
/>
<Tool
Name="VCPostBuildEventTool"
/>
</Configuration>
<Configuration
Name="Release|Win32"
ConfigurationType="2"
InheritedPropertySheets=".\pyd.vsprops"
CharacterSet="0"
WholeProgramOptimization="1"
>
<Tool
Name="VCPreBuildEventTool"
/>
<Tool
Name="VCCustomBuildTool"
/>
<Tool
Name="VCXMLDataGeneratorTool"
/>
<Tool
Name="VCWebServiceProxyGeneratorTool"
/>
<Tool
Name="VCMIDLTool"
/>
<Tool
Name="VCCLCompilerTool"
AdditionalIncludeDirectories="$(lzmaDir)\include"
PreprocessorDefinitions="WIN32;_FILE_OFFSET_BITS=64;_CRT_SECURE_NO_DEPRECATE;_CRT_NONSTDC_NO_DEPRECATE;LZMA_API_STATIC"
/>
<Tool
Name="VCManagedResourceCompilerTool"
/>
<Tool
Name="VCResourceCompilerTool"
/>
<Tool
Name="VCPreLinkEventTool"
/>
<Tool
Name="VCLinkerTool"
AdditionalDependencies="$(lzmaDir)\bin_i486\liblzma.a"
/>
<Tool
Name="VCALinkTool"
/>
<Tool
Name="VCManifestTool"
/>
<Tool
Name="VCXDCMakeTool"
/>
<Tool
Name="VCBscMakeTool"
/>
<Tool
Name="VCFxCopTool"
/>
<Tool
Name="VCAppVerifierTool"
/>
<Tool
Name="VCPostBuildEventTool"
/>
</Configuration>
<Configuration
Name="Release|x64"
ConfigurationType="2"
InheritedPropertySheets=".\pyd.vsprops;.\x64.vsprops"
CharacterSet="0"
WholeProgramOptimization="1"
>
<Tool
Name="VCPreBuildEventTool"
/>
<Tool
Name="VCCustomBuildTool"
/>
<Tool
Name="VCXMLDataGeneratorTool"
/>
<Tool
Name="VCWebServiceProxyGeneratorTool"
/>
<Tool
Name="VCMIDLTool"
TargetEnvironment="3"
/>
<Tool
Name="VCCLCompilerTool"
AdditionalIncludeDirectories="$(lzmaDir)\include"
PreprocessorDefinitions="WIN32;_FILE_OFFSET_BITS=64;_CRT_SECURE_NO_DEPRECATE;_CRT_NONSTDC_NO_DEPRECATE;LZMA_API_STATIC"
/>
<Tool
Name="VCManagedResourceCompilerTool"
/>
<Tool
Name="VCResourceCompilerTool"
/>
<Tool
Name="VCPreLinkEventTool"
/>
<Tool
Name="VCLinkerTool"
AdditionalDependencies="$(lzmaDir)\bin_x86-64\liblzma.a"
/>
<Tool
Name="VCALinkTool"
/>
<Tool
Name="VCManifestTool"
/>
<Tool
Name="VCXDCMakeTool"
/>
<Tool
Name="VCBscMakeTool"
/>
<Tool
Name="VCFxCopTool"
/>
<Tool
Name="VCAppVerifierTool"
/>
<Tool
Name="VCPostBuildEventTool"
/>
</Configuration>
<Configuration
Name="PGInstrument|Win32"
ConfigurationType="2"
InheritedPropertySheets=".\pyd.vsprops;.\pginstrument.vsprops"
CharacterSet="0"
WholeProgramOptimization="1"
>
<Tool
Name="VCPreBuildEventTool"
/>
<Tool
Name="VCCustomBuildTool"
/>
<Tool
Name="VCXMLDataGeneratorTool"
/>
<Tool
Name="VCWebServiceProxyGeneratorTool"
/>
<Tool
Name="VCMIDLTool"
/>
<Tool
Name="VCCLCompilerTool"
AdditionalIncludeDirectories="$(lzmaDir)\include"
PreprocessorDefinitions="WIN32;_FILE_OFFSET_BITS=64;_CRT_SECURE_NO_DEPRECATE;_CRT_NONSTDC_NO_DEPRECATE;LZMA_API_STATIC"
/>
<Tool
Name="VCManagedResourceCompilerTool"
/>
<Tool
Name="VCResourceCompilerTool"
/>
<Tool
Name="VCPreLinkEventTool"
/>
<Tool
Name="VCLinkerTool"
AdditionalDependencies="$(lzmaDir)\bin_i486\liblzma.a"
/>
<Tool
Name="VCALinkTool"
/>
<Tool
Name="VCManifestTool"
/>
<Tool
Name="VCXDCMakeTool"
/>
<Tool
Name="VCBscMakeTool"
/>
<Tool
Name="VCFxCopTool"
/>
<Tool
Name="VCAppVerifierTool"
/>
<Tool
Name="VCPostBuildEventTool"
/>
</Configuration>
<Configuration
Name="PGInstrument|x64"
ConfigurationType="2"
InheritedPropertySheets=".\pyd.vsprops;.\x64.vsprops;.\pginstrument.vsprops"
CharacterSet="0"
WholeProgramOptimization="1"
>
<Tool
Name="VCPreBuildEventTool"
/>
<Tool
Name="VCCustomBuildTool"
/>
<Tool
Name="VCXMLDataGeneratorTool"
/>
<Tool
Name="VCWebServiceProxyGeneratorTool"
/>
<Tool
Name="VCMIDLTool"
TargetEnvironment="3"
/>
<Tool
Name="VCCLCompilerTool"
AdditionalIncludeDirectories="$(lzmaDir)\include"
PreprocessorDefinitions="WIN32;_FILE_OFFSET_BITS=64;_CRT_SECURE_NO_DEPRECATE;_CRT_NONSTDC_NO_DEPRECATE;LZMA_API_STATIC"
/>
<Tool
Name="VCManagedResourceCompilerTool"
/>
<Tool
Name="VCResourceCompilerTool"
/>
<Tool
Name="VCPreLinkEventTool"
/>
<Tool
Name="VCLinkerTool"
AdditionalDependencies="$(lzmaDir)\bin_x86-64\liblzma.a"
TargetMachine="17"
/>
<Tool
Name="VCALinkTool"
/>
<Tool
Name="VCManifestTool"
/>
<Tool
Name="VCXDCMakeTool"
/>
<Tool
Name="VCBscMakeTool"
/>
<Tool
Name="VCFxCopTool"
/>
<Tool
Name="VCAppVerifierTool"
/>
<Tool
Name="VCPostBuildEventTool"
/>
</Configuration>
<Configuration
Name="PGUpdate|Win32"
ConfigurationType="2"
InheritedPropertySheets=".\pyd.vsprops;.\pgupdate.vsprops"
CharacterSet="0"
WholeProgramOptimization="1"
>
<Tool
Name="VCPreBuildEventTool"
/>
<Tool
Name="VCCustomBuildTool"
/>
<Tool
Name="VCXMLDataGeneratorTool"
/>
<Tool
Name="VCWebServiceProxyGeneratorTool"
/>
<Tool
Name="VCMIDLTool"
/>
<Tool
Name="VCCLCompilerTool"
AdditionalIncludeDirectories="$(lzmaDir)\include"
PreprocessorDefinitions="WIN32;_FILE_OFFSET_BITS=64;_CRT_SECURE_NO_DEPRECATE;_CRT_NONSTDC_NO_DEPRECATE;LZMA_API_STATIC"
/>
<Tool
Name="VCManagedResourceCompilerTool"
/>
<Tool
Name="VCResourceCompilerTool"
/>
<Tool
Name="VCPreLinkEventTool"
/>
<Tool
Name="VCLinkerTool"
AdditionalDependencies="$(lzmaDir)\bin_i486\liblzma.a"
/>
<Tool
Name="VCALinkTool"
/>
<Tool
Name="VCManifestTool"
/>
<Tool
Name="VCXDCMakeTool"
/>
<Tool
Name="VCBscMakeTool"
/>
<Tool
Name="VCFxCopTool"
/>
<Tool
Name="VCAppVerifierTool"
/>
<Tool
Name="VCPostBuildEventTool"
/>
</Configuration>
<Configuration
Name="PGUpdate|x64"
ConfigurationType="2"
InheritedPropertySheets=".\pyd.vsprops;.\x64.vsprops;.\pgupdate.vsprops"
CharacterSet="0"
WholeProgramOptimization="1"
>
<Tool
Name="VCPreBuildEventTool"
/>
<Tool
Name="VCCustomBuildTool"
/>
<Tool
Name="VCXMLDataGeneratorTool"
/>
<Tool
Name="VCWebServiceProxyGeneratorTool"
/>
<Tool
Name="VCMIDLTool"
TargetEnvironment="3"
/>
<Tool
Name="VCCLCompilerTool"
AdditionalIncludeDirectories="$(lzmaDir)\include"
PreprocessorDefinitions="WIN32;_FILE_OFFSET_BITS=64;_CRT_SECURE_NO_DEPRECATE;_CRT_NONSTDC_NO_DEPRECATE;LZMA_API_STATIC"
/>
<Tool
Name="VCManagedResourceCompilerTool"
/>
<Tool
Name="VCResourceCompilerTool"
/>
<Tool
Name="VCPreLinkEventTool"
/>
<Tool
Name="VCLinkerTool"
AdditionalDependencies="$(lzmaDir)\bin_x86-64\liblzma.a"
TargetMachine="17"
/>
<Tool
Name="VCALinkTool"
/>
<Tool
Name="VCManifestTool"
/>
<Tool
Name="VCXDCMakeTool"
/>
<Tool
Name="VCBscMakeTool"
/>
<Tool
Name="VCFxCopTool"
/>
<Tool
Name="VCAppVerifierTool"
/>
<Tool
Name="VCPostBuildEventTool"
/>
</Configuration>
</Configurations>
<References>
</References>
<Files>
<Filter
Name="Source Files"
>
<File
RelativePath="..\Modules\_lzmamodule.c"
>
</File>
</Filter>
</Files>
<Globals>
</Globals>
</VisualStudioProject>

View File

@ -92,6 +92,11 @@ Project("{8BC9CEB8-8B4A-11D0-8D11-00A0C91BC942}") = "_bz2", "_bz2.vcproj", "{73F
{CF7AC3D1-E2DF-41D2-BEA6-1E2556CDEA26} = {CF7AC3D1-E2DF-41D2-BEA6-1E2556CDEA26}
EndProjectSection
EndProject
Project("{8BC9CEB8-8B4A-11D0-8D11-00A0C91BC942}") = "_lzma", "_lzma.vcproj", "{F9D71780-F393-11E0-BE50-0800200C9A66}"
ProjectSection(ProjectDependencies) = postProject
{CF7AC3D1-E2DF-41D2-BEA6-1E2556CDEA26} = {CF7AC3D1-E2DF-41D2-BEA6-1E2556CDEA26}
EndProjectSection
EndProject
Project("{8BC9CEB8-8B4A-11D0-8D11-00A0C91BC942}") = "select", "select.vcproj", "{18CAE28C-B454-46C1-87A0-493D91D97F03}"
ProjectSection(ProjectDependencies) = postProject
{CF7AC3D1-E2DF-41D2-BEA6-1E2556CDEA26} = {CF7AC3D1-E2DF-41D2-BEA6-1E2556CDEA26}
@ -421,6 +426,22 @@ Global
{73FCD2BD-F133-46B7-8EC1-144CD82A59D5}.Release|Win32.Build.0 = Release|Win32
{73FCD2BD-F133-46B7-8EC1-144CD82A59D5}.Release|x64.ActiveCfg = Release|x64
{73FCD2BD-F133-46B7-8EC1-144CD82A59D5}.Release|x64.Build.0 = Release|x64
{F9D71780-F393-11E0-BE50-0800200C9A66}.Debug|Win32.ActiveCfg = Debug|Win32
{F9D71780-F393-11E0-BE50-0800200C9A66}.Debug|Win32.Build.0 = Debug|Win32
{F9D71780-F393-11E0-BE50-0800200C9A66}.Debug|x64.ActiveCfg = Debug|x64
{F9D71780-F393-11E0-BE50-0800200C9A66}.Debug|x64.Build.0 = Debug|x64
{F9D71780-F393-11E0-BE50-0800200C9A66}.PGInstrument|Win32.ActiveCfg = PGInstrument|Win32
{F9D71780-F393-11E0-BE50-0800200C9A66}.PGInstrument|Win32.Build.0 = PGInstrument|Win32
{F9D71780-F393-11E0-BE50-0800200C9A66}.PGInstrument|x64.ActiveCfg = PGInstrument|x64
{F9D71780-F393-11E0-BE50-0800200C9A66}.PGInstrument|x64.Build.0 = PGInstrument|x64
{F9D71780-F393-11E0-BE50-0800200C9A66}.PGUpdate|Win32.ActiveCfg = PGUpdate|Win32
{F9D71780-F393-11E0-BE50-0800200C9A66}.PGUpdate|Win32.Build.0 = PGUpdate|Win32
{F9D71780-F393-11E0-BE50-0800200C9A66}.PGUpdate|x64.ActiveCfg = PGUpdate|x64
{F9D71780-F393-11E0-BE50-0800200C9A66}.PGUpdate|x64.Build.0 = PGUpdate|x64
{F9D71780-F393-11E0-BE50-0800200C9A66}.Release|Win32.ActiveCfg = Release|Win32
{F9D71780-F393-11E0-BE50-0800200C9A66}.Release|Win32.Build.0 = Release|Win32
{F9D71780-F393-11E0-BE50-0800200C9A66}.Release|x64.ActiveCfg = Release|x64
{F9D71780-F393-11E0-BE50-0800200C9A66}.Release|x64.Build.0 = Release|x64
{18CAE28C-B454-46C1-87A0-493D91D97F03}.Debug|Win32.ActiveCfg = Debug|Win32
{18CAE28C-B454-46C1-87A0-493D91D97F03}.Debug|Win32.Build.0 = Debug|Win32
{18CAE28C-B454-46C1-87A0-493D91D97F03}.Debug|x64.ActiveCfg = Debug|x64

View File

@ -56,6 +56,10 @@
Name="bz2Dir"
Value="$(externalsDir)\bzip2-1.0.5"
/>
<UserMacro
Name="lzmaDir"
Value="$(externalsDir)\xz-5.0.3"
/>
<UserMacro
Name="opensslDir"
Value="$(externalsDir)\openssl-1.0.0a"

View File

@ -133,6 +133,12 @@ _bz2
All of this managed to build libbz2.lib in
bzip2-1.0.5\$platform-$configuration\, which the Python project links in.
_lzma
Python wrapper for the liblzma compression library.
Download the pre-built Windows binaries from http://tukaani.org/xz/, and
extract to ..\xz-5.0.3. If you are using a more recent version of liblzma,
it will be necessary to rename the directory from xz-<VERSION> to xz-5.0.3.
_ssl
Python wrapper for the secure sockets library.

View File

@ -41,3 +41,8 @@ if not exist sqlite-3.7.4 (
rd /s/q sqlite-source-3.6.21
svn export http://svn.python.org/projects/external/sqlite-3.7.4
)
@rem lzma
if not exist xz-5.0.3 (
svn export http://svn.python.org/projects/external/xz-5.0.3
)

View File

@ -1279,6 +1279,13 @@ class PyBuildExt(build_ext):
else:
missing.append('_bz2')
# LZMA compression support.
if self.compiler.find_library_file(lib_dirs, 'lzma'):
exts.append( Extension('_lzma', ['_lzmamodule.c'],
libraries = ['lzma']) )
else:
missing.append('_lzma')
# Interface to the Expat XML parser
#
# Expat was written by James Clark and is now maintained by a group of