Change summary:
+ There is now a `gzip.READ_BUFFER_SIZE` constant that is 128KB. Other programs that read in 128KB chunks: pigz and cat. So this seems best practice among good programs. Also it is faster than 8 kb chunks.
+ a zlib._ZlibDecompressor was added. This is the _bz2.BZ2Decompressor ported to zlib. Since the zlib.Decompress object is better for in-memory decompression, the _ZlibDecompressor is hidden. It only makes sense in file decompression, and that is already implemented now in the gzip library. No need to bother the users with this.
+ The ZlibDecompressor uses the older Cpython arrange_output_buffer functions, as those are faster and more appropriate for the use case.
+ GzipFile.read has been optimized. There is no longer a `unconsumed_tail` member to write back to padded file. This is instead handled by the ZlibDecompressor itself, which has an internal buffer. `_add_read_data` has been inlined, as it was just two calls.
EDIT: While I am adding improvements anyway, I figured I could add another one-liner optimization now to the python -m gzip application. That read chunks in io.DEFAULT_BUFFER_SIZE previously, but has been updated now to use READ_BUFFER_SIZE chunks.
kB (*kilo* byte) unit means 1000 bytes, whereas KiB ("kibibyte")
means 1024 bytes. KB was misused: replace kB or KB with KiB when
appropriate.
Same change for MB and GB which become MiB and GiB.
Change the output of Tools/iobench/iobench.py.
Round also the size of the documentation from 5.5 MB to 5 MiB.
The underlying zlib library stores sizes in “unsigned int”. The corresponding
Python parameters are all sizes of buffers filled in by zlib, so it is okay
to reduce higher values to the UINT_MAX internal cap. OverflowError is still
raised for sizes that do not fit in Py_ssize_t.
Sizes are now limited to Py_ssize_t rather than unsigned long, because Python
byte strings cannot be larger than Py_ssize_t. Previously this could result
in a SystemError on 32-bit platforms.
This resolves a regression in the gzip module when reading more than UINT_MAX
or LONG_MAX bytes in one call, introduced by revision 62723172412c.
I have compared output between pre- and post-patch runs of these tests
to make sure there's nothing missing and nothing broken, on both
Windows and Linux. The only differences I found were actually tests
that were previously *not* run.
Additionally, fix a bug where a MemoryError in allocating a bytes object could
leave the decompressor object in an invalid state (with its unconsumed_tail
member being NULL).
Patch by Serhiy Storchaka.
Additionally, fix a bug where a MemoryError in allocating a bytes object could
leave the decompressor object in an invalid state (with its unconsumed_tail
member being NULL).
Patch by Serhiy Storchaka.