665 lines
25 KiB
TeX
665 lines
25 KiB
TeX
\section{\module{tarfile} --- Read and write tar archive files}
|
|
|
|
\declaremodule{standard}{tarfile}
|
|
\modulesynopsis{Read and write tar-format archive files.}
|
|
\versionadded{2.3}
|
|
|
|
\moduleauthor{Lars Gust\"abel}{lars@gustaebel.de}
|
|
\sectionauthor{Lars Gust\"abel}{lars@gustaebel.de}
|
|
|
|
The \module{tarfile} module makes it possible to read and create tar archives.
|
|
Some facts and figures:
|
|
|
|
\begin{itemize}
|
|
\item reads and writes \module{gzip} and \module{bzip2} compressed archives.
|
|
\item read/write support for the \POSIX{}.1-1988 (ustar) format.
|
|
\item read/write support for the GNU tar format including \emph{longname} and
|
|
\emph{longlink} extensions, read-only support for the \emph{sparse}
|
|
extension.
|
|
\item read/write support for the \POSIX{}.1-2001 (pax) format.
|
|
\versionadded{2.6}
|
|
\item handles directories, regular files, hardlinks, symbolic links, fifos,
|
|
character devices and block devices and is able to acquire and
|
|
restore file information like timestamp, access permissions and owner.
|
|
\item can handle tape devices.
|
|
\end{itemize}
|
|
|
|
\begin{funcdesc}{open}{name\optional{, mode\optional{,
|
|
fileobj\optional{, bufsize}}}, **kwargs}
|
|
Return a \class{TarFile} object for the pathname \var{name}.
|
|
For detailed information on \class{TarFile} objects and the keyword
|
|
arguments that are allowed, see \citetitle{TarFile Objects}
|
|
(section \ref{tarfile-objects}).
|
|
|
|
\var{mode} has to be a string of the form \code{'filemode[:compression]'},
|
|
it defaults to \code{'r'}. Here is a full list of mode combinations:
|
|
|
|
\begin{tableii}{c|l}{code}{mode}{action}
|
|
\lineii{'r' or 'r:*'}{Open for reading with transparent compression (recommended).}
|
|
\lineii{'r:'}{Open for reading exclusively without compression.}
|
|
\lineii{'r:gz'}{Open for reading with gzip compression.}
|
|
\lineii{'r:bz2'}{Open for reading with bzip2 compression.}
|
|
\lineii{'a' or 'a:'}{Open for appending with no compression. The file
|
|
is created if it does not exist.}
|
|
\lineii{'w' or 'w:'}{Open for uncompressed writing.}
|
|
\lineii{'w:gz'}{Open for gzip compressed writing.}
|
|
\lineii{'w:bz2'}{Open for bzip2 compressed writing.}
|
|
\end{tableii}
|
|
|
|
Note that \code{'a:gz'} or \code{'a:bz2'} is not possible.
|
|
If \var{mode} is not suitable to open a certain (compressed) file for
|
|
reading, \exception{ReadError} is raised. Use \var{mode} \code{'r'} to
|
|
avoid this. If a compression method is not supported,
|
|
\exception{CompressionError} is raised.
|
|
|
|
If \var{fileobj} is specified, it is used as an alternative to a file
|
|
object opened for \var{name}. It is supposed to be at position 0.
|
|
|
|
For special purposes, there is a second format for \var{mode}:
|
|
\code{'filemode|[compression]'}. \function{open()} will return a
|
|
\class{TarFile} object that processes its data as a stream of
|
|
blocks. No random seeking will be done on the file. If given,
|
|
\var{fileobj} may be any object that has a \method{read()} or
|
|
\method{write()} method (depending on the \var{mode}).
|
|
\var{bufsize} specifies the blocksize and defaults to \code{20 *
|
|
512} bytes. Use this variant in combination with
|
|
e.g. \code{sys.stdin}, a socket file object or a tape device.
|
|
However, such a \class{TarFile} object is limited in that it does
|
|
not allow to be accessed randomly, see ``Examples''
|
|
(section~\ref{tar-examples}). The currently possible modes:
|
|
|
|
\begin{tableii}{c|l}{code}{Mode}{Action}
|
|
\lineii{'r|*'}{Open a \emph{stream} of tar blocks for reading with transparent compression.}
|
|
\lineii{'r|'}{Open a \emph{stream} of uncompressed tar blocks for reading.}
|
|
\lineii{'r|gz'}{Open a gzip compressed \emph{stream} for reading.}
|
|
\lineii{'r|bz2'}{Open a bzip2 compressed \emph{stream} for reading.}
|
|
\lineii{'w|'}{Open an uncompressed \emph{stream} for writing.}
|
|
\lineii{'w|gz'}{Open an gzip compressed \emph{stream} for writing.}
|
|
\lineii{'w|bz2'}{Open an bzip2 compressed \emph{stream} for writing.}
|
|
\end{tableii}
|
|
\end{funcdesc}
|
|
|
|
\begin{classdesc*}{TarFile}
|
|
Class for reading and writing tar archives. Do not use this
|
|
class directly, better use \function{open()} instead.
|
|
See ``TarFile Objects'' (section~\ref{tarfile-objects}).
|
|
\end{classdesc*}
|
|
|
|
\begin{funcdesc}{is_tarfile}{name}
|
|
Return \constant{True} if \var{name} is a tar archive file, that
|
|
the \module{tarfile} module can read.
|
|
\end{funcdesc}
|
|
|
|
\begin{classdesc}{TarFileCompat}{filename\optional{, mode\optional{,
|
|
compression}}}
|
|
Class for limited access to tar archives with a
|
|
\refmodule{zipfile}-like interface. Please consult the
|
|
documentation of the \refmodule{zipfile} module for more details.
|
|
\var{compression} must be one of the following constants:
|
|
\begin{datadesc}{TAR_PLAIN}
|
|
Constant for an uncompressed tar archive.
|
|
\end{datadesc}
|
|
\begin{datadesc}{TAR_GZIPPED}
|
|
Constant for a \refmodule{gzip} compressed tar archive.
|
|
\end{datadesc}
|
|
\end{classdesc}
|
|
|
|
\begin{excdesc}{TarError}
|
|
Base class for all \module{tarfile} exceptions.
|
|
\end{excdesc}
|
|
|
|
\begin{excdesc}{ReadError}
|
|
Is raised when a tar archive is opened, that either cannot be handled by
|
|
the \module{tarfile} module or is somehow invalid.
|
|
\end{excdesc}
|
|
|
|
\begin{excdesc}{CompressionError}
|
|
Is raised when a compression method is not supported or when the data
|
|
cannot be decoded properly.
|
|
\end{excdesc}
|
|
|
|
\begin{excdesc}{StreamError}
|
|
Is raised for the limitations that are typical for stream-like
|
|
\class{TarFile} objects.
|
|
\end{excdesc}
|
|
|
|
\begin{excdesc}{ExtractError}
|
|
Is raised for \emph{non-fatal} errors when using \method{extract()}, but
|
|
only if \member{TarFile.errorlevel}\code{ == 2}.
|
|
\end{excdesc}
|
|
|
|
\begin{excdesc}{HeaderError}
|
|
Is raised by \method{frombuf()} if the buffer it gets is invalid.
|
|
\versionadded{2.6}
|
|
\end{excdesc}
|
|
|
|
Each of the following constants defines a tar archive format that the
|
|
\module{tarfile} module is able to create. See section \ref{tar-formats} for
|
|
details.
|
|
|
|
\begin{datadesc}{USTAR_FORMAT}
|
|
\POSIX{}.1-1988 (ustar) format.
|
|
\end{datadesc}
|
|
|
|
\begin{datadesc}{GNU_FORMAT}
|
|
GNU tar format.
|
|
\end{datadesc}
|
|
|
|
\begin{datadesc}{PAX_FORMAT}
|
|
\POSIX{}.1-2001 (pax) format.
|
|
\end{datadesc}
|
|
|
|
\begin{datadesc}{DEFAULT_FORMAT}
|
|
The default format for creating archives. This is currently
|
|
\constant{GNU_FORMAT}.
|
|
\end{datadesc}
|
|
|
|
\begin{seealso}
|
|
\seemodule{zipfile}{Documentation of the \refmodule{zipfile}
|
|
standard module.}
|
|
|
|
\seetitle[http://www.gnu.org/software/tar/manual/html_node/tar_134.html\#SEC134]
|
|
{GNU tar manual, Basic Tar Format}{Documentation for tar archive files,
|
|
including GNU tar extensions.}
|
|
\end{seealso}
|
|
|
|
%-----------------
|
|
% TarFile Objects
|
|
%-----------------
|
|
|
|
\subsection{TarFile Objects \label{tarfile-objects}}
|
|
|
|
The \class{TarFile} object provides an interface to a tar archive. A tar
|
|
archive is a sequence of blocks. An archive member (a stored file) is made up
|
|
of a header block followed by data blocks. It is possible to store a file in a
|
|
tar archive several times. Each archive member is represented by a
|
|
\class{TarInfo} object, see \citetitle{TarInfo Objects} (section
|
|
\ref{tarinfo-objects}) for details.
|
|
|
|
\begin{classdesc}{TarFile}{name=None, mode='r', fileobj=None,
|
|
format=DEFAULT_FORMAT, tarinfo=TarInfo, dereference=False,
|
|
ignore_zeros=False, encoding=None, errors=None, pax_headers=None,
|
|
debug=0, errorlevel=0}
|
|
|
|
All following arguments are optional and can be accessed as instance
|
|
attributes as well.
|
|
|
|
\var{name} is the pathname of the archive. It can be omitted if
|
|
\var{fileobj} is given. In this case, the file object's \member{name}
|
|
attribute is used if it exists.
|
|
|
|
\var{mode} is either \code{'r'} to read from an existing archive,
|
|
\code{'a'} to append data to an existing file or \code{'w'} to create a new
|
|
file overwriting an existing one.
|
|
|
|
If \var{fileobj} is given, it is used for reading or writing data.
|
|
If it can be determined, \var{mode} is overridden by \var{fileobj}'s mode.
|
|
\var{fileobj} will be used from position 0.
|
|
\begin{notice}
|
|
\var{fileobj} is not closed, when \class{TarFile} is closed.
|
|
\end{notice}
|
|
|
|
\var{format} controls the archive format. It must be one of the constants
|
|
\constant{USTAR_FORMAT}, \constant{GNU_FORMAT} or \constant{PAX_FORMAT}
|
|
that are defined at module level.
|
|
\versionadded{2.6}
|
|
|
|
The \var{tarinfo} argument can be used to replace the default
|
|
\class{TarInfo} class with a different one.
|
|
\versionadded{2.6}
|
|
|
|
If \var{dereference} is \code{False}, add symbolic and hard links to the
|
|
archive. If it is \code{True}, add the content of the target files to the
|
|
archive. This has no effect on systems that do not support symbolic links.
|
|
|
|
If \var{ignore_zeros} is \code{False}, treat an empty block as the end of
|
|
the archive. If it is \var{True}, skip empty (and invalid) blocks and try
|
|
to get as many members as possible. This is only useful for reading
|
|
concatenated or damaged archives.
|
|
|
|
\var{debug} can be set from \code{0} (no debug messages) up to \code{3}
|
|
(all debug messages). The messages are written to \code{sys.stderr}.
|
|
|
|
If \var{errorlevel} is \code{0}, all errors are ignored when using
|
|
\method{extract()}. Nevertheless, they appear as error messages in the
|
|
debug output, when debugging is enabled. If \code{1}, all \emph{fatal}
|
|
errors are raised as \exception{OSError} or \exception{IOError} exceptions.
|
|
If \code{2}, all \emph{non-fatal} errors are raised as \exception{TarError}
|
|
exceptions as well.
|
|
|
|
The \var{encoding} and \var{errors} arguments control the way strings are
|
|
converted to unicode objects and vice versa. The default settings will work
|
|
for most users. See section \ref{tar-unicode} for in-depth information.
|
|
\versionadded{2.6}
|
|
|
|
The \var{pax_headers} argument is an optional dictionary of unicode strings
|
|
which will be added as a pax global header if \var{format} is
|
|
\constant{PAX_FORMAT}.
|
|
\versionadded{2.6}
|
|
\end{classdesc}
|
|
|
|
\begin{methoddesc}{open}{...}
|
|
Alternative constructor. The \function{open()} function on module level is
|
|
actually a shortcut to this classmethod. See section~\ref{module-tarfile}
|
|
for details.
|
|
\end{methoddesc}
|
|
|
|
\begin{methoddesc}{getmember}{name}
|
|
Return a \class{TarInfo} object for member \var{name}. If \var{name} can
|
|
not be found in the archive, \exception{KeyError} is raised.
|
|
\begin{notice}
|
|
If a member occurs more than once in the archive, its last
|
|
occurrence is assumed to be the most up-to-date version.
|
|
\end{notice}
|
|
\end{methoddesc}
|
|
|
|
\begin{methoddesc}{getmembers}{}
|
|
Return the members of the archive as a list of \class{TarInfo} objects.
|
|
The list has the same order as the members in the archive.
|
|
\end{methoddesc}
|
|
|
|
\begin{methoddesc}{getnames}{}
|
|
Return the members as a list of their names. It has the same order as
|
|
the list returned by \method{getmembers()}.
|
|
\end{methoddesc}
|
|
|
|
\begin{methoddesc}{list}{verbose=True}
|
|
Print a table of contents to \code{sys.stdout}. If \var{verbose} is
|
|
\constant{False}, only the names of the members are printed. If it is
|
|
\constant{True}, output similar to that of \program{ls -l} is produced.
|
|
\end{methoddesc}
|
|
|
|
\begin{methoddesc}{next}{}
|
|
Return the next member of the archive as a \class{TarInfo} object, when
|
|
\class{TarFile} is opened for reading. Return \code{None} if there is no
|
|
more available.
|
|
\end{methoddesc}
|
|
|
|
\begin{methoddesc}{extractall}{\optional{path\optional{, members}}}
|
|
Extract all members from the archive to the current working directory
|
|
or directory \var{path}. If optional \var{members} is given, it must be
|
|
a subset of the list returned by \method{getmembers()}.
|
|
Directory information like owner, modification time and permissions are
|
|
set after all members have been extracted. This is done to work around two
|
|
problems: A directory's modification time is reset each time a file is
|
|
created in it. And, if a directory's permissions do not allow writing,
|
|
extracting files to it will fail.
|
|
\versionadded{2.5}
|
|
\end{methoddesc}
|
|
|
|
\begin{methoddesc}{extract}{member\optional{, path}}
|
|
Extract a member from the archive to the current working directory,
|
|
using its full name. Its file information is extracted as accurately as
|
|
possible.
|
|
\var{member} may be a filename or a \class{TarInfo} object.
|
|
You can specify a different directory using \var{path}.
|
|
\begin{notice}
|
|
Because the \method{extract()} method allows random access to a tar
|
|
archive there are some issues you must take care of yourself. See the
|
|
description for \method{extractall()} above.
|
|
\end{notice}
|
|
\end{methoddesc}
|
|
|
|
\begin{methoddesc}{extractfile}{member}
|
|
Extract a member from the archive as a file object.
|
|
\var{member} may be a filename or a \class{TarInfo} object.
|
|
If \var{member} is a regular file, a file-like object is returned.
|
|
If \var{member} is a link, a file-like object is constructed from the
|
|
link's target.
|
|
If \var{member} is none of the above, \code{None} is returned.
|
|
\begin{notice}
|
|
The file-like object is read-only and provides the following methods:
|
|
\method{read()}, \method{readline()}, \method{readlines()},
|
|
\method{seek()}, \method{tell()}.
|
|
\end{notice}
|
|
\end{methoddesc}
|
|
|
|
\begin{methoddesc}{add}{name\optional{, arcname\optional{, recursive\optional{, exclude}}}}
|
|
Add the file \var{name} to the archive. \var{name} may be any type
|
|
of file (directory, fifo, symbolic link, etc.).
|
|
If given, \var{arcname} specifies an alternative name for the file in the
|
|
archive. Directories are added recursively by default.
|
|
This can be avoided by setting \var{recursive} to \constant{False}.
|
|
If \var{exclude} is given it must be a function that takes one filename
|
|
argument and returns a boolean value. Depending on this value the
|
|
respective file is either excluded (\constant{True}) or added
|
|
(\constant{False}).
|
|
\versionchanged[Added the \var{exclude} parameter]{2.6}
|
|
\end{methoddesc}
|
|
|
|
\begin{methoddesc}{addfile}{tarinfo\optional{, fileobj}}
|
|
Add the \class{TarInfo} object \var{tarinfo} to the archive.
|
|
If \var{fileobj} is given, \code{\var{tarinfo}.size} bytes are read
|
|
from it and added to the archive. You can create \class{TarInfo} objects
|
|
using \method{gettarinfo()}.
|
|
\begin{notice}
|
|
On Windows platforms, \var{fileobj} should always be opened with mode
|
|
\code{'rb'} to avoid irritation about the file size.
|
|
\end{notice}
|
|
\end{methoddesc}
|
|
|
|
\begin{methoddesc}{gettarinfo}{\optional{name\optional{,
|
|
arcname\optional{, fileobj}}}}
|
|
Create a \class{TarInfo} object for either the file \var{name} or
|
|
the file object \var{fileobj} (using \function{os.fstat()} on its
|
|
file descriptor). You can modify some of the \class{TarInfo}'s
|
|
attributes before you add it using \method{addfile()}. If given,
|
|
\var{arcname} specifies an alternative name for the file in the
|
|
archive.
|
|
\end{methoddesc}
|
|
|
|
\begin{methoddesc}{close}{}
|
|
Close the \class{TarFile}. In write mode, two finishing zero
|
|
blocks are appended to the archive.
|
|
\end{methoddesc}
|
|
|
|
\begin{memberdesc}{posix}
|
|
Setting this to \constant{True} is equivalent to setting the
|
|
\member{format} attribute to \constant{USTAR_FORMAT},
|
|
\constant{False} is equivalent to \constant{GNU_FORMAT}.
|
|
\versionchanged[\var{posix} defaults to \constant{False}]{2.4}
|
|
\deprecated{2.6}{Use the \member{format} attribute instead.}
|
|
\end{memberdesc}
|
|
|
|
\begin{memberdesc}{pax_headers}
|
|
A dictionary containing key-value pairs of pax global headers.
|
|
\versionadded{2.6}
|
|
\end{memberdesc}
|
|
|
|
%-----------------
|
|
% TarInfo Objects
|
|
%-----------------
|
|
|
|
\subsection{TarInfo Objects \label{tarinfo-objects}}
|
|
|
|
A \class{TarInfo} object represents one member in a
|
|
\class{TarFile}. Aside from storing all required attributes of a file
|
|
(like file type, size, time, permissions, owner etc.), it provides
|
|
some useful methods to determine its type. It does \emph{not} contain
|
|
the file's data itself.
|
|
|
|
\class{TarInfo} objects are returned by \class{TarFile}'s methods
|
|
\method{getmember()}, \method{getmembers()} and \method{gettarinfo()}.
|
|
|
|
\begin{classdesc}{TarInfo}{\optional{name}}
|
|
Create a \class{TarInfo} object.
|
|
\end{classdesc}
|
|
|
|
\begin{methoddesc}{frombuf}{buf}
|
|
Create and return a \class{TarInfo} object from string buffer \var{buf}.
|
|
\versionadded[Raises \exception{HeaderError} if the buffer is
|
|
invalid.]{2.6}
|
|
\end{methoddesc}
|
|
|
|
\begin{methoddesc}{fromtarfile}{tarfile}
|
|
Read the next member from the \class{TarFile} object \var{tarfile} and
|
|
return it as a \class{TarInfo} object.
|
|
\versionadded{2.6}
|
|
\end{methoddesc}
|
|
|
|
\begin{methoddesc}{tobuf}{\optional{format\optional{, encoding
|
|
\optional{, errors}}}}
|
|
Create a string buffer from a \class{TarInfo} object. For information
|
|
on the arguments see the constructor of the \class{TarFile} class.
|
|
\versionchanged[The arguments were added]{2.6}
|
|
\end{methoddesc}
|
|
|
|
A \code{TarInfo} object has the following public data attributes:
|
|
|
|
\begin{memberdesc}{name}
|
|
Name of the archive member.
|
|
\end{memberdesc}
|
|
|
|
\begin{memberdesc}{size}
|
|
Size in bytes.
|
|
\end{memberdesc}
|
|
|
|
\begin{memberdesc}{mtime}
|
|
Time of last modification.
|
|
\end{memberdesc}
|
|
|
|
\begin{memberdesc}{mode}
|
|
Permission bits.
|
|
\end{memberdesc}
|
|
|
|
\begin{memberdesc}{type}
|
|
File type. \var{type} is usually one of these constants:
|
|
\constant{REGTYPE}, \constant{AREGTYPE}, \constant{LNKTYPE},
|
|
\constant{SYMTYPE}, \constant{DIRTYPE}, \constant{FIFOTYPE},
|
|
\constant{CONTTYPE}, \constant{CHRTYPE}, \constant{BLKTYPE},
|
|
\constant{GNUTYPE_SPARSE}. To determine the type of a
|
|
\class{TarInfo} object more conveniently, use the \code{is_*()}
|
|
methods below.
|
|
\end{memberdesc}
|
|
|
|
\begin{memberdesc}{linkname}
|
|
Name of the target file name, which is only present in
|
|
\class{TarInfo} objects of type \constant{LNKTYPE} and
|
|
\constant{SYMTYPE}.
|
|
\end{memberdesc}
|
|
|
|
\begin{memberdesc}{uid}
|
|
User ID of the user who originally stored this member.
|
|
\end{memberdesc}
|
|
|
|
\begin{memberdesc}{gid}
|
|
Group ID of the user who originally stored this member.
|
|
\end{memberdesc}
|
|
|
|
\begin{memberdesc}{uname}
|
|
User name.
|
|
\end{memberdesc}
|
|
|
|
\begin{memberdesc}{gname}
|
|
Group name.
|
|
\end{memberdesc}
|
|
|
|
\begin{memberdesc}{pax_headers}
|
|
A dictionary containing key-value pairs of an associated pax
|
|
extended header.
|
|
\versionadded{2.6}
|
|
\end{memberdesc}
|
|
|
|
A \class{TarInfo} object also provides some convenient query methods:
|
|
|
|
\begin{methoddesc}{isfile}{}
|
|
Return \constant{True} if the \class{Tarinfo} object is a regular
|
|
file.
|
|
\end{methoddesc}
|
|
|
|
\begin{methoddesc}{isreg}{}
|
|
Same as \method{isfile()}.
|
|
\end{methoddesc}
|
|
|
|
\begin{methoddesc}{isdir}{}
|
|
Return \constant{True} if it is a directory.
|
|
\end{methoddesc}
|
|
|
|
\begin{methoddesc}{issym}{}
|
|
Return \constant{True} if it is a symbolic link.
|
|
\end{methoddesc}
|
|
|
|
\begin{methoddesc}{islnk}{}
|
|
Return \constant{True} if it is a hard link.
|
|
\end{methoddesc}
|
|
|
|
\begin{methoddesc}{ischr}{}
|
|
Return \constant{True} if it is a character device.
|
|
\end{methoddesc}
|
|
|
|
\begin{methoddesc}{isblk}{}
|
|
Return \constant{True} if it is a block device.
|
|
\end{methoddesc}
|
|
|
|
\begin{methoddesc}{isfifo}{}
|
|
Return \constant{True} if it is a FIFO.
|
|
\end{methoddesc}
|
|
|
|
\begin{methoddesc}{isdev}{}
|
|
Return \constant{True} if it is one of character device, block
|
|
device or FIFO.
|
|
\end{methoddesc}
|
|
|
|
%------------------------
|
|
% Examples
|
|
%------------------------
|
|
|
|
\subsection{Examples \label{tar-examples}}
|
|
|
|
How to extract an entire tar archive to the current working directory:
|
|
\begin{verbatim}
|
|
import tarfile
|
|
tar = tarfile.open("sample.tar.gz")
|
|
tar.extractall()
|
|
tar.close()
|
|
\end{verbatim}
|
|
|
|
How to create an uncompressed tar archive from a list of filenames:
|
|
\begin{verbatim}
|
|
import tarfile
|
|
tar = tarfile.open("sample.tar", "w")
|
|
for name in ["foo", "bar", "quux"]:
|
|
tar.add(name)
|
|
tar.close()
|
|
\end{verbatim}
|
|
|
|
How to read a gzip compressed tar archive and display some member information:
|
|
\begin{verbatim}
|
|
import tarfile
|
|
tar = tarfile.open("sample.tar.gz", "r:gz")
|
|
for tarinfo in tar:
|
|
print tarinfo.name, "is", tarinfo.size, "bytes in size and is",
|
|
if tarinfo.isreg():
|
|
print "a regular file."
|
|
elif tarinfo.isdir():
|
|
print "a directory."
|
|
else:
|
|
print "something else."
|
|
tar.close()
|
|
\end{verbatim}
|
|
|
|
How to create a tar archive with faked information:
|
|
\begin{verbatim}
|
|
import tarfile
|
|
tar = tarfile.open("sample.tar.gz", "w:gz")
|
|
for name in namelist:
|
|
tarinfo = tar.gettarinfo(name, "fakeproj-1.0/" + name)
|
|
tarinfo.uid = 123
|
|
tarinfo.gid = 456
|
|
tarinfo.uname = "johndoe"
|
|
tarinfo.gname = "fake"
|
|
tar.addfile(tarinfo, file(name))
|
|
tar.close()
|
|
\end{verbatim}
|
|
|
|
The \emph{only} way to extract an uncompressed tar stream from
|
|
\code{sys.stdin}:
|
|
\begin{verbatim}
|
|
import sys
|
|
import tarfile
|
|
tar = tarfile.open(mode="r|", fileobj=sys.stdin)
|
|
for tarinfo in tar:
|
|
tar.extract(tarinfo)
|
|
tar.close()
|
|
\end{verbatim}
|
|
|
|
%------------
|
|
% Tar format
|
|
%------------
|
|
|
|
\subsection{Supported tar formats \label{tar-formats}}
|
|
|
|
There are three tar formats that can be created with the \module{tarfile}
|
|
module:
|
|
|
|
\begin{itemize}
|
|
|
|
\item
|
|
The \POSIX{}.1-1988 ustar format (\constant{USTAR_FORMAT}). It supports
|
|
filenames up to a length of at best 256 characters and linknames up to 100
|
|
characters. The maximum file size is 8 gigabytes. This is an old and limited
|
|
but widely supported format.
|
|
|
|
\item
|
|
The GNU tar format (\constant{GNU_FORMAT}). It supports long filenames and
|
|
linknames, files bigger than 8 gigabytes and sparse files. It is the de facto
|
|
standard on GNU/Linux systems. \module{tarfile} fully supports the GNU tar
|
|
extensions for long names, sparse file support is read-only.
|
|
|
|
\item
|
|
The \POSIX{}.1-2001 pax format (\constant{PAX_FORMAT}). It is the most
|
|
flexible format with virtually no limits. It supports long filenames and
|
|
linknames, large files and stores pathnames in a portable way. However, not
|
|
all tar implementations today are able to handle pax archives properly.
|
|
|
|
The \emph{pax} format is an extension to the existing \emph{ustar} format. It
|
|
uses extra headers for information that cannot be stored otherwise. There are
|
|
two flavours of pax headers: Extended headers only affect the subsequent file
|
|
header, global headers are valid for the complete archive and affect all
|
|
following files. All the data in a pax header is encoded in \emph{UTF-8} for
|
|
portability reasons.
|
|
|
|
\end{itemize}
|
|
|
|
There are some more variants of the tar format which can be read, but not
|
|
created:
|
|
|
|
\begin{itemize}
|
|
|
|
\item
|
|
The ancient V7 format. This is the first tar format from \UNIX{} Seventh
|
|
Edition, storing only regular files and directories. Names must not be longer
|
|
than 100 characters, there is no user/group name information. Some archives
|
|
have miscalculated header checksums in case of fields with non-\ASCII{}
|
|
characters.
|
|
|
|
\item
|
|
The SunOS tar extended format. This format is a variant of the \POSIX{}.1-2001
|
|
pax format, but is not compatible.
|
|
|
|
\end{itemize}
|
|
|
|
%----------------
|
|
% Unicode issues
|
|
%----------------
|
|
|
|
\subsection{Unicode issues \label{tar-unicode}}
|
|
|
|
The tar format was originally conceived to make backups on tape drives with the
|
|
main focus on preserving file system information. Nowadays tar archives are
|
|
commonly used for file distribution and exchanging archives over networks. One
|
|
problem of the original format (that all other formats are merely variants of)
|
|
is that there is no concept of supporting different character encodings.
|
|
For example, an ordinary tar archive created on a \emph{UTF-8} system cannot be
|
|
read correctly on a \emph{Latin-1} system if it contains non-\ASCII{}
|
|
characters. Names (i.e. filenames, linknames, user/group names) containing
|
|
these characters will appear damaged. Unfortunately, there is no way to
|
|
autodetect the encoding of an archive.
|
|
|
|
The pax format was designed to solve this problem. It stores non-\ASCII{} names
|
|
using the universal character encoding \emph{UTF-8}. When a pax archive is
|
|
read, these \emph{UTF-8} names are converted to the encoding of the local
|
|
file system.
|
|
|
|
The details of unicode conversion are controlled by the \var{encoding} and
|
|
\var{errors} keyword arguments of the \class{TarFile} class.
|
|
|
|
The default value for \var{encoding} is the local character encoding. It is
|
|
deduced from \function{sys.getfilesystemencoding()} and
|
|
\function{sys.getdefaultencoding()}. In read mode, \var{encoding} is used
|
|
exclusively to convert unicode names from a pax archive to strings in the local
|
|
character encoding. In write mode, the use of \var{encoding} depends on the
|
|
chosen archive format. In case of \constant{PAX_FORMAT}, input names that
|
|
contain non-\ASCII{} characters need to be decoded before being stored as
|
|
\emph{UTF-8} strings. The other formats do not make use of \var{encoding}
|
|
unless unicode objects are used as input names. These are converted to
|
|
8-bit character strings before they are added to the archive.
|
|
|
|
The \var{errors} argument defines how characters are treated that cannot be
|
|
converted to or from \var{encoding}. Possible values are listed in section
|
|
\ref{codec-base-classes}. In read mode, there is an additional scheme
|
|
\code{'utf-8'} which means that bad characters are replaced by their
|
|
\emph{UTF-8} representation. This is the default scheme. In write mode the
|
|
default value for \var{errors} is \code{'strict'} to ensure that name
|
|
information is not altered unnoticed.
|