cpython

Commit Graph

Author	SHA1	Message	Date
Johan Förberg	2b2d607095	gh-121267: Improve performance of tarfile (#121267 ) (#121269 ) Tarfile in the default write mode spends much of its time resolving UIDs into usernames and GIDs into group names. By caching these mappings, a significant speedup can be achieved. In my simple benchmark[1], this extra caching speeds up tarfile by 8x. [1] https://gist.github.com/jforberg/86af759c796199740c31547ae828aef2 --------- Co-authored-by: Tian Gao <gaogaotiantian@hotmail.com> Co-authored-by: Bénédikt Tran <10796600+picnixz@users.noreply.github.com> Co-authored-by: Shantanu <12621235+hauntsaninja@users.noreply.github.com>	2024-10-30 15:08:30 -07:00
Seth Michael Larson	34ddb64d08	gh-121285: Remove backtracking when parsing tarfile headers (GH-121286) * Remove backtracking when parsing tarfile headers * Rewrite PAX header parsing to be stricter * Optimize parsing of GNU extended sparse headers v0.0 Co-authored-by: Kirill Podoprigora <kirill.bast9@mail.ru> Co-authored-by: Gregory P. Smith <greg@krypto.org>	2024-08-31 15:17:05 -07:00
WilliamRoyNelson	dcafb362f7	gh-121999: Change default tarfile filter to 'data' (GH-122002) Co-authored-by: Tomas R <tomas.roun8@gmail.com> Co-authored-by: Scott Odle <scott@sjodle.com> Co-authored-by: Bénédikt Tran <10796600+picnixz@users.noreply.github.com> Co-authored-by: Petr Viktorin <encukou@gmail.com>	2024-07-26 16:34:13 +02:00
Jason R. Coombs	c8b45a385a	gh-118673: Remove shebang and executable bits from stdlib modules. (#119658 ) * gh-118673: Remove shebang and executable bits from stdlib modules. * Removed shebangs and exe bits on turtledemo scripts. The setting was inappropriate for '__main__' and inconsistent across the other modules. The scripts can still be executed directly by invoking with the desired interpreter.	2024-05-29 12:43:19 -04:00
Geoffrey Thomas	ef172521a9	Remove almost all unpaired backticks in docstrings (#119231 ) As reported in #117847 and #115366, an unpaired backtick in a docstring tends to confuse e.g. Sphinx running on subclasses of standard library objects, and the typographic style of using a backtick as an opening quote is no longer in favor. Convert almost all uses of the form The variable `foo' should do xyz to The variable 'foo' should do xyz and also fix up miscellaneous other unpaired backticks (extraneous / missing characters). No functional change is intended here other than in human-readable docstrings.	2024-05-22 12:35:18 -04:00
Serhiy Storchaka	51ef89cd9a	gh-115961: Add name and mode attributes for compressed file-like objects (GH-116036) * Add name and mode attributes for compressed and archived file-like objects in modules bz2, lzma, tarfile and zipfile. * Change the value of the mode attribute of GzipFile from integer (1 or 2) to string ('rb' or 'wb'). * Change the value of the mode attribute of ZipExtFile from 'r' to 'rb'.	2024-04-21 11:46:39 +03:00
lyc8503	15b3555e4a	gh-116931: Add fileobj parameter check for Tarfile.addfile (GH-117988) Tarfile.addfile now throws an ValueError when the user passes in a non-zero size tarinfo but does not provide a fileobj, instead of writing an incomplete entry.	2024-04-19 11:41:51 +00:00
Alex Waygood	cff0a2db00	gh-117691: Add an appropriate stacklevel for PEP-706 tarfile deprecation warnings (GH-117872)	2024-04-16 13:36:00 +02:00
pan324	0dfa7ce346	gh-115256: Remove refcycles from tarfile writing (GH-115257)	2024-03-04 13:26:32 +00:00
Serhiy Storchaka	5d2794a16b	gh-67837, gh-112998: Fix dirs creation in concurrent extraction (GH-115082) Avoid race conditions in the creation of directories during concurrent extraction in tarfile and zipfile. Co-authored-by: Samantha Hughes <shughes-uk@users.noreply.github.com> Co-authored-by: Peder Bergebakken Sundt <pbsds@hotmail.com>	2024-02-11 12:38:07 +02:00
Serhiy Storchaka	96bce033c4	gh-114959: tarfile: do not ignore errors when extract a directory on top of a file (GH-114960) Also, add tests common to tarfile and zipfile.	2024-02-03 16:18:46 +00:00
Stanley	0651936ae2	gh-67641: Clarify documentation on bytes vs text with non-seeking tarfile stream (GH-31610)	2023-12-27 17:16:36 +00:00
Marat Idrisov	e1117cb886	gh-87264: Convert tarinfo type to stat type (GH-113230) Co-authored-by: val-shkolnikov <val@nvsoft.net>	2023-12-19 11:04:43 -08:00
Alex Waygood	bfe7e72522	gh-109653: Defer importing `warnings` in several modules (#110286 )	2023-10-04 06:09:43 +01:00
Petr Viktorin	5d18715765	gh-107811: tarfile: treat overflow in UID/GID as failure to set it (#108369 )	2023-08-23 20:00:07 +02:00
balmeida-nokia	37135d25e2	gh-107396: tarfiles: set self.exception before _init_read_gz() (GH-107485) In the stack call of: _init_read_gz() ``` _read, tarfile.py:548 read, tarfile.py:526 _init_read_gz, tarfile.py:491 ``` a try;except exists that uses `self.exception`, so it needs to be set before calling _init_read_gz().	2023-08-21 11:39:06 +00:00
Petr Viktorin	acbd3f9c5c	gh-107845: Fix symlink handling for tarfile.data_filter (GH-107846) Co-authored-by: Victor Stinner <vstinner@python.org> Co-authored-by: Lumír 'Frenzy' Balhar <frenzy.madness@gmail.com>	2023-08-21 12:56:46 +02:00
Robert O'Shea	50fce89d12	gh-102120: [TarFile] Add an iter function that doesn't cache (GH-102128)	2023-05-23 13:44:40 -07:00
Petr Viktorin	af53046995	gh-102950: Implement PEP 706 – Filter for tarfile.extractall (#102953 )	2023-04-24 10:58:06 +02:00
Oleg Iarygin	56d055a0d8	gh-74468: [tarfile] Fix incorrect name attribute of ExFileObject (GH-102424) Co-authored-by: Simeon Visser <svisser@users.noreply.github.com>	2023-03-27 16:21:07 -07:00
Nick Drozd	024ac542d7	bpo-45975: Simplify some while-loops with walrus operator (GH-29347)	2022-11-26 14:33:25 -08:00
Sam Ezeh	78365b8e28	gh-91078: Return None from TarFile.next when the tarfile is empty (GH-91850) Co-authored-by: Irit Katriel <1055913+iritkatriel@users.noreply.github.com>	2022-11-26 09:57:05 -08:00
Nikita Sobolev	faf7dfa656	gh-99325: Remove unused `NameError` handling (#99326 )	2022-11-11 09:56:57 +00:00
Yaron de Leeuw	50cd4b6959	bpo-26253: Add compressionlevel to tarfile stream (GH-2962) `tarfile` already accepts a compressionlevel argument for creating files. This patch adds the same for stream-based tarfile usage. The default is 9, the value that was previously hard-coded.	2022-06-25 11:43:54 +03:00
Chris Fernald	c1e19421c2	gh-91387: Strip trailing slash from tarfile longname directories (GH-32423) Co-authored-by: Brett Cannon <brett@python.org>	2022-06-17 15:38:41 -07:00
Joshua Root	bf2d44ffb0	bpo-45863: tarfile: don't zero out header fields unnecessarily (GH-29693) Numeric fields of type float, notably mtime, can't be represented exactly in the ustar header, so the pax header is used. But it is helpful to set them to the nearest int (i.e. second rather than nanosecond precision mtimes) in the ustar header as well, for the benefit of unarchivers that don't understand the pax header. Add test for tarfile.TarInfo.create_pax_header to confirm correct behaviour.	2022-02-09 18:06:19 +01:00
Andrzej Mateja	128ab092ca	bpo-44289: Keep argument file object's current position in tarfile.is_tarfile (GH-26488)	2022-02-09 08:19:16 -08:00
andrei kulakov	cfadcc31ea	bpo-21987: Fix TarFile.getmember getting a dir with a trailing slash (GH-30283)	2022-01-21 09:40:32 +02:00
Jack DeVries	b6fe857250	bpo-39039: tarfile raises descriptive exception from zlib.error (GH-27766) * during tarfile parsing, a zlib error indicates invalid data * tarfile.open now raises a descriptive exception from the zlib error * this makes it clear to the user that they may be trying to open a corrupted tar file	2021-09-29 11:25:48 +02:00
Anthony Sottile	9aea31dedd	bpo-8978: improve tarfile.open error message when lzma / bz2 are missing (GH-24850) Automerge-Triggered-By: GH:pablogsal	2021-04-27 10:39:01 -07:00
Ethan Furman	b5a6db9111	bpo-39717: [tarfile] update nested exception raising (GH-23739) - `from None` if the new exception uses, or doesn't need, the previous one - `from e` if the previous exception is still relevant	2020-12-12 13:26:44 -08:00
Julien Palard	4fedd7123e	bpo-12800: tarfile: Restore fix from `011525ee9` (GH-21409) Restore fix from `011525ee92`.	2020-11-25 10:23:17 +01:00
Andrey Doroschenko	ec42789e6e	bpo-39693: mention KeyError in tarfile extractfile documentation (GH-18639) Co-authored-by: Andrey Darascheka <andrei.daraschenka@leverx.com>	2020-10-20 10:05:01 -04:00
Artem Bulgakov	22748a83d9	bpo-41316: Make tarfile follow specs for FNAME (GH-21511) tarfile writes full path to FNAME field of GZIP format instead of just basename if user specified absolute path. Some archive viewers may process file incorrectly. Also it creates security issue because anyone can know structure of directories on system and know username or other personal information. RFC1952 says about FNAME: This is the original name of the file being compressed, with any directory components removed. So tarfile must remove directory names from FNAME and write only basename of file. Automerge-Triggered-By: @jaraco	2020-09-07 09:46:33 -07:00
Rishi	5a8d121a1f	bpo-39017: Avoid infinite loop in the tarfile module (GH-21454) Avoid infinite loop when reading specially crafted TAR files using the tarfile module (CVE-2019-20907).	2020-07-15 13:51:00 +02:00
William Chargin	674935b8ca	bpo-18819: tarfile: only set device fields for device files (GH-18080) The GNU docs describe the `devmajor` and `devminor` fields of the tar header struct only in the context of character and block special files, suggesting that in other cases they are not populated. Typical utilities behave accordingly; this patch teaches `tarfile` to do the same.	2020-02-12 11:56:02 -08:00
Serhiy Storchaka	9017e0bd5e	bpo-39430: Fix race condition in lazy imports in tarfile. (GH-18161) Use `from ... import ...` to ensure module is fully loaded before accessing its attributes.	2020-01-24 09:55:52 -08:00
William Woodruff	dd754caf14	bpo-29435: Allow is_tarfile to take a filelike obj (GH-18090) `is_tarfile()` now supports `name` being a file or file-like object.	2020-01-22 18:24:16 -08:00
Raymond Hettinger	a694f23948	Add missing docstrings for TarInfo objects (#12555 )	2019-03-27 13:16:34 -07:00
CAM Gerlach	e680c3db80	bpo-36268: Change default tar format to pax from GNU. (GH-12355)	2019-03-21 16:44:51 +02:00
Anthony Sottile	8377cd4fcd	Clean up code which checked presence of os.{stat,lstat,chmod} (#11643 )	2019-02-25 23:32:27 +01:00
INADA Naoki	8d130913cb	bpo-34043: Optimize tarfile uncompress performance (GH-8089) tarfile._Stream has two buffer for compressed and uncompressed data. Those buffers are not aligned so unnecessary bytes slicing happens for every reading chunks. This commit bypass compressed buffering. In this benchmark [1], user time become 250ms from 300ms. [1]: https://bugs.python.org/msg320763	2018-07-06 14:06:00 +09:00
hajoscher	12a08c4760	bpo-34010: Fix tarfile read performance regression (GH-8020) During buffered read, use a list followed by join instead of extending a bytes object. This is how it was done before but changed in commit `b506dc32c1`.	2018-07-04 17:13:18 +09:00
INADA Naoki	461a1c4b49	bpo-33842: Remove tarfile.filemode (GH-7661)	2018-06-28 17:10:36 +09:00
Joffrey F	72d9b2be36	bpo-32713: Fix tarfile.itn for large/negative float values. (GH-5434)	2018-02-27 02:02:21 +02:00
Bernhard M. Wiedemann	84521047e4	bpo-30693: zip+tarfile: sort directory listing (#2263 ) tarfile and zipfile now sort directory listing to generate tar and zip archives in a more reproducible way. See also https://reproducible-builds.org/docs/stable-inputs/ on that topic.	2018-01-31 11:17:10 +01:00
Mike	53f7a7c281	bpo-32297: Few misspellings found in Python source code comments. (#4803 ) * Fix multiple typos in code comments * Add spacing in comments (test_logging.py, test_math.py) * Fix spaces at the beginning of comments in test_logging.py	2017-12-14 13:04:53 +02:00
Alex Gaynor	c7cc14a825	Remove two legacy constants which hopefully have no consumers (#1087 ) The data contained in them is nonsensical	2017-04-11 22:41:42 -04:00
Serhiy Storchaka	150cd1916a	bpo-29958: Minor improvements to zipfile and tarfile CLI. (#944 )	2017-04-07 18:56:12 +03:00
Serhiy Storchaka	bdf6b910f9	bpo-29776: Use decorator syntax for properties. (#585 )	2017-03-19 08:40:32 +02:00

1 2 3 4 5

243 Commits