Passing a unicode filename to tarfile.open() along with mode "w|gz" failed
with a UnicodeError because the filename was not encoded properly before being
written to the gzipped stream in the FNAME extra field.
tarfile unnecessarily checked the existence of numerical user and group ids on
extraction. If one of them did not exist the respective id of the current user
(i.e. root) was used for the file and ownership information was lost. (Patch
by Sebastien Luttringer)
svn+ssh://pythondev@svn.python.org/python/branches/py3k
........
r85211 | lars.gustaebel | 2010-10-04 17:18:47 +0200 (Mon, 04 Oct 2010) | 14 lines
Issue #9065: no longer use "root" as the default for the
uname and gname field.
If tarfile creates a new archive and adds a file with a
uid/gid that doesn't have a corresponding name on the
system (e.g. because the user/group account was deleted) it
uses the empty string in the uname/gname field now instead
of "root". Using "root" as the default was a bad idea
because on extraction the uname/gname fields are supposed
to override the uid/gid fields. So, all archive members
with nameless uids/gids belonged to the root user after
extraction.
........
for extracting symbolic and hard link entries as regular files as a
work-around on platforms that do not support filesystem links.
This stopped working reliably after a change in r74571. I also added
a few tests for this functionality.
to "sys.platform == 'mac'" and that is
dead code because it refers to a platform
that is no longer supported (and hasn't been
supported for several releases).
Fixes issue #7908 for the trunk.
default.
TarFile's errorlevel argument controls how errors are
handled that occur during extraction. There are three
possible levels 0, 1 and 2. If errorlevel is set to 1 or 2
fatal errors (e.g. a full filesystem) are raised as
exceptions. If it is set to 0, which is the default value,
extraction errors are suppressed, and error messages are
written to the debug log instead. But, if the debug log is
not activated, which is the default as well, all these
errors go unnoticed.
The original intention was to imitate GNU tar which tries
to extract as many members as possible instead of stopping
on the first error. It turns out that this is no good
default behaviour for a tar library. This patch simply
changes the default value for the errorlevel argument from
0 to 1, so that fatal extraction errors are raised as
EnvironmentError exceptions.
(Note that an empty archive is not the same as an empty file. An
empty archive contains no members and is correctly terminated with an
EOF block full of zeros. An empty file contains no data at all.)
The problem was that although tarfile was able to create empty
archives, it failed to open them raising a ReadError. On the other
hand, tarfile opened empty files without error in most read modes and
presented them as empty archives. (However, some modes still raised
errors: "r|gz" raised ReadError, but "r:gz" worked, "r:bz2" even
raised EOFError.)
In order to get a more fine-grained control over the various internal
error conditions I now split up the HeaderError exception into a
number of meaningful sub-exceptions. This makes it easier in the
TarFile.next() method to react to the different conditions in the
correct way.
The visible change in its behaviour now is that tarfile will open
empty archives correctly and raise ReadError consistently for empty
files.
The filter argument must be a function that takes a TarInfo
object argument, changes it and returns it again. If the
function returns None the TarInfo object will be excluded
from the archive.
The exclude argument is deprecated from now on, because it
does something similar but is not as flexible.
No longer use tarfile.normpath() on pathnames. Store pathnames
unchanged, i.e. do not remove "./", "../" and "//" occurrences.
However, still convert absolute to relative paths.
forever on incomplete input. That caused tarfile.open() to hang when used
with mode 'r' or 'r:bz2' and a fileobj argument that contained no data or
partial bzip2 compressed data.
specify an error handling scheme for character conversion. Additional
scheme "utf-8" in read mode. Unicode input filenames are now
supported by design. The values of the pax_headers dictionary are now
limited to unicode objects.
Fixed: The prefix field is no longer used in PAX_FORMAT (in
conformance with POSIX).
Fixed: In read mode use a possible pax header size field.
Fixed: Strip trailing slashes from pax header name values.
Fixed: Give values in user-specified pax_headers precedence when
writing.
Added unicode tests. Added pax/regtype4 member to testtar.tar all
possible number fields in a pax header.
Added two chapters to the documentation about the different formats
tarfile.py supports and how unicode issues are handled.
support.
The TarInfo class now contains all necessary logic to process and
create tar header data which has been moved there from the TarFile
class. The fromtarfile() method was added. The new path and linkpath
properties are aliases for the name and linkname attributes in
correspondence to the pax naming scheme.
The TarFile constructor and classmethods now accept a number of
keyword arguments which could only be set as attributes before (e.g.
dereference, ignore_zeros). The encoding and pax_headers arguments
were added for pax support. There is a new tarinfo keyword argument
that allows using subclassed TarInfo objects in TarFile.
The boolean TarFile.posix attribute is deprecated, because now three
tar formats are supported. Instead, the desired format for writing is
specified using the constants USTAR_FORMAT, GNU_FORMAT and PAX_FORMAT
as the format keyword argument. This change affects TarInfo.tobuf()
as well.
The test suite has been heavily reorganized and partially rewritten.
A new testtar.tar was added that contains sample data in many formats
from 4 different tar programs.
Some bugs and quirks that also have been fixed:
Directory names do no longer have a trailing slash in TarInfo.name or
TarFile.getnames().
Adding the same file twice does not create a hardlink file member.
The TarFile constructor does no longer need a name argument.
The TarFile._mode attribute was renamed to mode and contains either
'r', 'w' or 'a'.
writing the crc to file on the "PPC64 Debian trunk" buildbot
when running test_tarfile.
This is again a case where the native zlib crc is an unsigned
32-bit int, but the Python wrapper implicitly casts it to
signed C long, so that "the sign bit looks different" on
different platforms.
already verified in .frombuf() on the lines above. If there was
a problem an exception is raised, so there was no way this condition
could have been true.