Commit Graph

139 Commits

Author SHA1 Message Date
Arjun 9af485436b
gh-89550: Buffer GzipFile.write to reduce execution time by ~15% (#101251)
Use `io.BufferedWriter` to buffer gzip writes.

---------

Co-authored-by: Alex Waygood <Alex.Waygood@Gmail.com>
Co-authored-by: Gregory P. Smith <greg@krypto.org>
2023-05-08 17:55:59 +00:00
Ruben Vorderman eae7dad402
gh-95534: Improve gzip reading speed by 10% (#97664)
Change summary:
+ There is now a `gzip.READ_BUFFER_SIZE` constant that is 128KB. Other programs that read in 128KB chunks: pigz and cat. So this seems best practice among good programs. Also it is faster than 8 kb chunks.
+ a zlib._ZlibDecompressor was added. This is the _bz2.BZ2Decompressor ported to zlib. Since the zlib.Decompress object is better for in-memory decompression, the _ZlibDecompressor is hidden. It only makes sense in file decompression, and that is already implemented now in the gzip library. No need to bother the users with this.
+ The ZlibDecompressor uses the older Cpython arrange_output_buffer functions, as those are faster and more appropriate for the use case. 
+ GzipFile.read has been optimized. There is no longer a `unconsumed_tail` member to write back to padded file. This is instead handled by the ZlibDecompressor itself, which has an internal buffer. `_add_read_data` has been inlined, as it was just two calls.

EDIT: While I am adding improvements anyway, I figured I could add another one-liner optimization now to the python -m gzip application. That read chunks in io.DEFAULT_BUFFER_SIZE previously, but has been updated now to use READ_BUFFER_SIZE chunks.
2022-10-16 19:10:58 -07:00
Victor Stinner d3a27e4c93
gh-94196: Remove gzip.GzipFile.filename attribute (#94197)
gzip: Remove the filename attribute of gzip.GzipFile,
deprecated since Python 2.6, use the name attribute instead. In write
mode, the filename attribute added '.gz' file extension if it was not
present.
2022-06-24 11:59:32 +02:00
Ilya Leoshkevich 943ca5e1d6
gh-90839: Forward gzip.compress() compresslevel to zlib (gh-31215) 2022-04-12 22:46:40 +09:00
Ruben Vorderman 0ff3d95b98
bpo-45507: EOFErrors should be thrown for truncated gzip members (GH-29029) 2021-11-19 19:07:05 +01:00
Inada Naoki 0a4c82ddd3
bpo-45475: Revert `__iter__` optimization for GzipFile, BZ2File, and LZMAFile. (GH-29016)
This reverts commit d2a8e69c2c.
2021-10-19 11:51:48 +09:00
Ruben Vorderman ea23e7820f
bpo-43613: Faster implementation of gzip.compress and gzip.decompress (GH-27941)
Co-authored-by: Łukasz Langa <lukasz@langa.pl>
2021-09-02 17:02:59 +02:00
Ma Lin bc6c12c72a
bpo-44439: BZ2File.write() / LZMAFile.write() handle buffer protocol correctly (GH-26764)
No longer use len() to get the length of the input data. For some buffer protocol objects,
the length obtained by using len() is wrong.
2021-06-22 10:04:23 +03:00
Ashwin Ramaswami de367378f6
Fix typo in comment (GH-26162) 2021-05-16 16:35:41 +01:00
Inada Naoki d2a8e69c2c
bpo-43787: Add __iter__ to GzipFile, BZ2File, and LZMAFile (GH-25353) 2021-04-13 13:51:49 +09:00
Inada Naoki 4827483f47
bpo-43510: Implement PEP 597 opt-in EncodingWarning. (GH-19481)
See [PEP 597](https://www.python.org/dev/peps/pep-0597/).

* Add `-X warn_default_encoding` and `PYTHONWARNDEFAULTENCODING`.
* Add EncodingWarning
* Add io.text_encoding()
* open(), TextIOWrapper() emits EncodingWarning when encoding is omitted and warn_default_encoding is enabled.
* _pyio.TextIOWrapper() uses UTF-8 as fallback default encoding used when failed to import locale module. (used during building Python)
* bz2, configparser, gzip, lzma, pathlib, tempfile modules use io.text_encoding().
* What's new entry
2021-03-29 12:28:14 +09:00
Ruben Vorderman 7956ef8849
bpo-43317: Use io.DEFAULT_BUFFER_SIZE instead of 1024 in gzip CLI (#24645)
This improves the performance slightly.
2021-02-26 21:17:51 +09:00
Inada Naoki 9525a18b5b
bpo-43316: gzip: Fix sys.exit() usage. (GH-24652) 2021-02-26 11:09:06 +09:00
Ruben Vorderman cc3df6368d
bpo-43316: gzip: CLI uses non-zero return code on error. (GH-24647)
Exit code is now 1 instead of 0. A message is printed to stderr instead of stdout. This is
the proper behaviour for a tool that can be used in scripts.
2021-02-25 20:30:24 +09:00
William Chargin eab3b3f1c6 bpo-39389: gzip: fix compression level metadata (GH-18077)
As described in RFC 1952, section 2.3.1, the XFL (eXtra FLags) byte of a
gzip member header should indicate whether the DEFLATE algorithm was
tuned for speed or compression ratio. Prior to this patch, archives
emitted by the `gzip` module always indicated maximum compression.
2020-01-21 13:25:24 +02:00
Serhiy Storchaka a0652328a2
bpo-28286: Deprecate opening GzipFile for writing implicitly. (GH-16417)
Always specify the mode argument for writing.
2019-11-16 18:56:57 +02:00
Zackery Spytz cf599f6f6f bpo-6584: Add a BadGzipFile exception to the gzip module. (GH-13022)
Co-Authored-By: Filip Gruszczyński <gruszczy@gmail.com>
Co-Authored-By: Michele Orrù <maker@tumbolandia.net>
2019-05-13 10:50:52 +03:00
Maximilian Nöthe 4f5a3493b5 fix typo in gzip.py (GH-12928) 2019-04-24 18:21:02 +09:00
guoci 0e7497cb46 bpo-34898: Add mtime parameter to gzip.compress(). (GH-9704)
Without setting mtime, time.time() will be used as the timestamp which will
end up in the compressed data and each invocation of the compress() function
will vary over time.
2018-11-07 11:50:23 +02:00
Stéphane Wirtel 3e28eed9ec bpo-34969: Add --fast, --best on the gzip CLI (GH-9833) 2018-11-03 16:24:23 +01:00
Stéphane Wirtel e8bbc52deb bpo-23596: Use argparse for the command line of gzip (GH-9781)
Co-authored-by: Antony Lee <anntzer.lee@gmail.com>
2018-10-10 00:41:33 +02:00
Victor Stinner 8c663fd60e
Replace KB unit with KiB (#4293)
kB (*kilo* byte) unit means 1000 bytes, whereas KiB ("kibibyte")
means 1024 bytes. KB was misused: replace kB or KB with KiB when
appropriate.

Same change for MB and GB which become MiB and GiB.

Change the output of Tools/iobench/iobench.py.

Round also the size of the documentation from 5.5 MB to 5 MiB.
2017-11-08 14:44:44 -08:00
Berker Peksag 03020cfa97 Issue #28227: gzip now supports pathlib
Patch by Ethan Furman.
2016-10-02 13:47:58 +03:00
Serhiy Storchaka 5f1a5187f7 Use sequence repetition instead of bytes constructor with integer argument. 2016-09-11 14:41:02 +03:00
Martin Panter 8f26565ba9 Fix spelling (inital), grammar (may translates) in documentation, comments 2016-04-19 04:03:41 +00:00
Martin Panter b82032f935 Issue #22341: Drop Python 2 workaround and document CRC initial value
Also align the parameter naming in binascii to be consistent with zlib.
2015-12-11 05:19:29 +00:00
Antoine Pitrou 2dbc6e6bce Issue #23529: Limit the size of decompressed data when reading from
GzipFile, BZ2File or LZMAFile.  This defeats denial of service attacks
using compressed bombs (i.e. compressed payloads which decompress to a huge
size).

Patch by Martin Panter and Nikolaus Rath.
2015-04-11 00:31:01 +02:00
Serhiy Storchaka 2116b12da5 Issue #23865: close() methods in multiple modules now are idempotent and more
robust at shutdown. If needs to release multiple resources, they are released
even if errors are occured.
2015-04-10 13:29:28 +03:00
Serhiy Storchaka 7e7a3dba5f Issue #23865: close() methods in multiple modules now are idempotent and more
robust at shutdown. If needs to release multiple resources, they are released
even if errors are occured.
2015-04-10 13:24:41 +03:00
Serhiy Storchaka bca63b362d Issue #23688: Added support of arbitrary bytes-like objects and avoided
unnecessary copying of memoryview in gzip.GzipFile.write().
Original patch by Wolfgang Maier.
2015-03-23 14:59:48 +02:00
Serhiy Storchaka d4c2ac8394 Issue #21560: An attempt to write a data of wrong type no longer cause
GzipFile corruption.  Original patch by Wolfgang Maier.
2015-03-23 15:25:43 +02:00
Ned Deily e5127299c8 Issue #20875: Merge from 3.3 2014-03-09 14:47:58 -07:00
Ned Deily 6120739f0c Issue #20875: Prevent possible gzip "'read' is not defined" NameError.
Patch by Claudiu Popa.
2014-03-09 14:44:34 -07:00
Nadeem Vawda ee1be99e05 Issue #19222: Add support for the 'x' mode to the gzip module.
Original patch by Tim Heaney.
2013-10-19 00:11:13 +02:00
Serhiy Storchaka 48e6a8c88a Issue #18743: Fix references to non-existant "StringIO" module
in docstrings and comments.
2013-08-29 11:39:48 +03:00
Serhiy Storchaka 50254c57cd Issue #18743: Fix references to non-existant "StringIO" module
in docstrings and comments.
2013-08-29 11:35:43 +03:00
Georg Brandl b3bd624a55 Back out patch for #1159051, which caused backwards compatibility problems. 2013-05-12 11:57:26 +02:00
Serhiy Storchaka ffcd339aac Close #17666: Fix reading gzip files with an extra field. 2013-04-08 22:37:15 +03:00
Serhiy Storchaka 7e69f0085e Close #17666: Fix reading gzip files with an extra field. 2013-04-08 22:35:02 +03:00
Serhiy Storchaka cc0172c007 Issue #1159051: GzipFile now raises EOFError when reading a corrupted file
with truncated header or footer.
Added tests for reading truncated gzip, bzip2, and lzma files.
2013-01-22 17:11:07 +02:00
Serhiy Storchaka 57f9b7a124 Issue #1159051: GzipFile now raises EOFError when reading a corrupted file
with truncated header or footer.
Added tests for reading truncated gzip, bzip2, and lzma files.
2013-01-22 17:07:49 +02:00
Serhiy Storchaka 7c3922f44c Issue #1159051: GzipFile now raises EOFError when reading a corrupted file
with truncated header or footer.
Added tests for reading truncated gzip and bzip2 files.
2013-01-22 17:01:59 +02:00
Serhiy Storchaka fc6e8aabf5 #15546: Fix GzipFile.peek()'s handling of pathological input data.
This is a backport of changeset 8c07ff7f882f.
2013-01-22 15:54:48 +02:00
Andrew Svetlov f7a17b48d7 Replace IOError with OSError (#16715) 2012-12-25 16:47:37 +02:00
Nadeem Vawda 6ff262e18f Issue #15677: Document that zlib and gzip accept a compression level of 0 to mean 'no compression'.
Patch by Brian Brazil.
2012-11-11 14:14:47 +01:00
Nadeem Vawda 19e568d254 Issue #15677: Document that zlib and gzip accept a compression level of 0 to mean 'no compression'.
Patch by Brian Brazil.
2012-11-11 14:04:14 +01:00
Antoine Pitrou 2a021c80ce Issue #15800: fix the closing of input / output files when gzip is used as a script. 2012-08-30 00:30:14 +02:00
Antoine Pitrou ecc4757b79 Issue #15800: fix the closing of input / output files when gzip is used as a script. 2012-08-30 00:29:24 +02:00
Nadeem Vawda 043540088a #15546: Also fix GzipFile.peek(). 2012-08-05 14:45:41 +02:00
Nadeem Vawda 37d3ff1487 #15546: Fix {GzipFile,LZMAFile}.read1()'s handling of pathological input data. 2012-08-05 02:19:09 +02:00