buildbot (& possibly other 64-bit boxes) during test_gzip.
The native zlib crc32 function returns an unsigned 32-bit integer,
which the Python wrapper implicitly casts to C long. Therefore the
same crc can "look negative" on a 32-bit box but "look positive" on
a 64-bit box. This patch papers over that platform difference when
writing the crc to file.
It may be better to change the Python wrapper, either to make
the result "look positive" on all platforms (which means it may
have to return a Python long at times on a 32-bit box), or to
keep the sign the same across boxes. But that would be a visible
change in what users see, while the current hack changes no
visible behavior (well, apart from stopping the struct deprecation
warning).
Note that the module-level write32() function is no longer used.
gzip shouldn't raise ValueError on corrupt files
Currently the gzip module will raise a ValueError if the file was
corrupt (bad crc or bad size). I can't see how that applies to
reading a corrupt file. IOError seems better, and it's what code
will likely be looking for.
The last round boosted "the limit" from 2GB to 4GB. This round gets
rid of the 4GB limit. For files > 4GB, gzip stores just the last 32
bits of the file size, and now we play along with that too. Tested
by hand (on a 6+GB file) on Win2K.
Boosting from 2GB to 4GB was arguably enough "a bugfix". Going beyond
that smells more like "new feature" to me.
Fixed the signed/unsigned confusions when dealing with files >= 2GB.
4GB is still a hard limitation of the gzip file format, though.
Testing this was a bitch on Win98SE due to frequent system freezes. It
didn't freeze while running gzip, it kept freezing while trying to *create*
a > 2GB test file! This wasn't Python's doing. I don't know of a
reasonable way to test this functionality in regrtest.py, so I'm not
checking in a test case (a test case would necessarily require creating
a 2GB+ file first, using gzip to zip it, using gzip to unzip it again,
and then compare before-and-after; so >4GB free space would be required,
and a loooong time; I did all this "by hand" once).
Bugfix candidate, I guess.
*this* set of patches is Ka-Ping's final sweep:
The attached patches update the standard library so that all modules
have docstrings beginning with one-line summaries.
A new docstring was added to formatter. The docstring for os.py
was updated to mention nt, os2, ce in addition to posix, dos, mac.
who writes:
Here is batch 2, as a big collection of CVS context diffs.
Along with moving comments into docstrings, i've added a
couple of missing docstrings and attempted to make sure more
module docstrings begin with a one-line summary.
I did not add docstrings to the methods in profile.py for
fear of upsetting any careful optimizations there, though
i did move class documentation into class docstrings.
The convention i'm using is to leave credits/version/copyright
type of stuff in # comments, and move the rest of the descriptive
stuff about module usage into module docstrings. Hope this is
okay.
1. Jack Jansen reports that on the Mac, the time may be negative, and
solves this by adding a write32u() function that writes an unsigned
long.
2. On 64-bit platforms the CRC comparison fails; I've fixed this by
casting both values to be compared to "unsigned long" i.e. modulo
0x100000000L.
allow using the 'a' flag as a mode for opening a GzipFile. gzip
files, surprisingly enough, can be concatenated and then decompressed;
the effect is to concatenate the two chunks of data.
If we support it on writing, it should also be supported on reading.
This *wasn't* trivial, and required rearranging the code in the
reading path, particularly the _read() method.
Raise IOError instead of RuntimeError in two cases, 'Not a gzipped file'
and 'Unknown compression method'
problem was a couple of bugs in the readline implementation.
1. Include the '\n' in the string returned by readline
2. Bug calculating new buffer size in _unread
Also remove unncessary import of StringIO
Here's my suggested replacement for gzip.py for 1.5.1. I've
re-implemeted methods readline and readlines, added an _unread, and
tweaked read and _read.
I tried a more complicated buffer scheme for unread (using a list of
strings and string.join), but it was more complicated and slower.
This version is a lot faster than the current version and is still
pretty simple.