When compiled with `USE_ZLIB_CRC32` defined (`configure` sets this on POSIX systems), `binascii.crc32(...)` failed to compute the correct value when the input data was >= 4GiB. Because the zlib crc32 API is limited to a 32-bit length.
This lines it up with the `zlib.crc32(...)` implementation that doesn't have that flaw.
**Performance:** This also adopts the same GIL releasing for larger inputs logic that `zlib.crc32` has, and causes the Windows build to always use zlib's crc32 instead of our slow C code as zlib is a required build dependency on Windows.
Clarifies a versionchanged note on crc32 & adler32 docs that the workaround is only needed for Python 2 and earlier.
Also cleans up an unnecessary intermediate variable in the implementation.
Authored-By: Ma Lin / animalize
Co-authored-by: Gregory P. Smith <greg@krypto.org>
* zlib uses an UINT32_MAX sliding window for the output buffer
These funtions have an initial output buffer size parameter:
- zlib.decompress(data, /, wbits=MAX_WBITS, bufsize=DEF_BUF_SIZE)
- zlib.Decompress.flush([length])
If the initial size > UINT32_MAX, use an UINT32_MAX sliding window, instead of clamping to UINT32_MAX.
Speed up when (the initial size == the actual size).
This fixes a memory consumption and copying performance regression in earlier 3.10 beta releases if someone used an output buffer larger than 4GiB with zlib.decompress.
Reviewed-by: Gregory P. Smith
* Fix initial buffer size can't > UINT32_MAX in zlib module
After commit f9bedb630e, in 64-bit build,
if the initial buffer size > UINT32_MAX, ValueError will be raised.
These two functions are affected:
1. zlib.decompress(data, /, wbits=MAX_WBITS, bufsize=DEF_BUF_SIZE)
2. zlib.Decompress.flush([length])
This commit re-allows the size > UINT32_MAX.
* adds curly braces per PEP 7.
* Renames `Buffer_*` to `OutputBuffer_*` for clarity
Faster bz2/lzma/zlib via new output buffering.
Also adds .readall() function to _compression.DecompressReader class
to take best advantage of this in the consume-all-output at once scenario.
Often a 5-20% speedup in common scenarios due to less data copying.
Contributed by Ma Lin.
No longer use deprecated aliases to functions:
* Replace PyObject_MALLOC() with PyObject_Malloc()
* Replace PyObject_REALLOC() with PyObject_Realloc()
* Replace PyObject_FREE() with PyObject_Free()
* Replace PyObject_Del() with PyObject_Free()
* Replace PyObject_DEL() with PyObject_Free()
* Fix potential division by zero in BZ2_Malloc()
* Avoid division by zero in PyLzma_Malloc()
* Avoid division by zero and integer overflow in PyZlib_Malloc()
Reported by Svace static analyzer.
When a single .c file contains several functions and/or methods with
the same name, a safety _METHODDEF #define statement is generated
only for one of them.
This fixes the bug by using the full name of the function to avoid
duplicates rather than just the name.
The underlying zlib library stores sizes in “unsigned int”. The corresponding
Python parameters are all sizes of buffers filled in by zlib, so it is okay
to reduce higher values to the UINT_MAX internal cap. OverflowError is still
raised for sizes that do not fit in Py_ssize_t.
Sizes are now limited to Py_ssize_t rather than unsigned long, because Python
byte strings cannot be larger than Py_ssize_t. Previously this could result
in a SystemError on 32-bit platforms.
This resolves a regression in the gzip module when reading more than UINT_MAX
or LONG_MAX bytes in one call, introduced by revision 62723172412c.