Commit Graph

243 Commits

Author SHA1 Message Date
Johan Förberg 2b2d607095
gh-121267: Improve performance of tarfile (#121267) (#121269)
Tarfile in the default write mode spends much of its time resolving UIDs
into usernames and GIDs into group names. By caching these mappings, a
significant speedup can be achieved.

In my simple benchmark[1], this extra caching speeds up tarfile by 8x.

[1] https://gist.github.com/jforberg/86af759c796199740c31547ae828aef2

---------

Co-authored-by: Tian Gao <gaogaotiantian@hotmail.com>
Co-authored-by: Bénédikt Tran <10796600+picnixz@users.noreply.github.com>
Co-authored-by: Shantanu <12621235+hauntsaninja@users.noreply.github.com>
2024-10-30 15:08:30 -07:00
Seth Michael Larson 34ddb64d08
gh-121285: Remove backtracking when parsing tarfile headers (GH-121286)
* Remove backtracking when parsing tarfile headers
* Rewrite PAX header parsing to be stricter
* Optimize parsing of GNU extended sparse headers v0.0

Co-authored-by: Kirill Podoprigora <kirill.bast9@mail.ru>
Co-authored-by: Gregory P. Smith <greg@krypto.org>
2024-08-31 15:17:05 -07:00
WilliamRoyNelson dcafb362f7
gh-121999: Change default tarfile filter to 'data' (GH-122002)
Co-authored-by: Tomas R <tomas.roun8@gmail.com>
Co-authored-by: Scott Odle <scott@sjodle.com>
Co-authored-by: Bénédikt Tran <10796600+picnixz@users.noreply.github.com>
Co-authored-by: Petr Viktorin <encukou@gmail.com>
2024-07-26 16:34:13 +02:00
Jason R. Coombs c8b45a385a
gh-118673: Remove shebang and executable bits from stdlib modules. (#119658)
* gh-118673: Remove shebang and executable bits from stdlib modules.

* Removed shebangs and exe bits on turtledemo scripts.

The setting was inappropriate for '__main__' and inconsistent across the other modules. The scripts can still be executed directly by invoking with the desired interpreter.
2024-05-29 12:43:19 -04:00
Geoffrey Thomas ef172521a9
Remove almost all unpaired backticks in docstrings (#119231)
As reported in #117847 and #115366, an unpaired backtick in a docstring
tends to confuse e.g. Sphinx running on subclasses of standard library
objects, and the typographic style of using a backtick as an opening
quote is no longer in favor. Convert almost all uses of the form

    The variable `foo' should do xyz

to

    The variable 'foo' should do xyz

and also fix up miscellaneous other unpaired backticks (extraneous /
missing characters).

No functional change is intended here other than in human-readable
docstrings.
2024-05-22 12:35:18 -04:00
Serhiy Storchaka 51ef89cd9a
gh-115961: Add name and mode attributes for compressed file-like objects (GH-116036)
* Add name and mode attributes for compressed and archived file-like objects
  in modules bz2, lzma, tarfile and zipfile.
* Change the value of the mode attribute of GzipFile from integer (1 or 2)
  to string ('rb' or 'wb').
* Change the value of the mode attribute of ZipExtFile from 'r' to 'rb'.
2024-04-21 11:46:39 +03:00
lyc8503 15b3555e4a
gh-116931: Add fileobj parameter check for Tarfile.addfile (GH-117988)
Tarfile.addfile now throws an ValueError when the user passes
in a non-zero size tarinfo but does not provide a fileobj,
instead of writing an incomplete entry.
2024-04-19 11:41:51 +00:00
Alex Waygood cff0a2db00
gh-117691: Add an appropriate stacklevel for PEP-706 tarfile deprecation warnings (GH-117872) 2024-04-16 13:36:00 +02:00
pan324 0dfa7ce346
gh-115256: Remove refcycles from tarfile writing (GH-115257) 2024-03-04 13:26:32 +00:00
Serhiy Storchaka 5d2794a16b
gh-67837, gh-112998: Fix dirs creation in concurrent extraction (GH-115082)
Avoid race conditions in the creation of directories during concurrent
extraction in tarfile and zipfile.

Co-authored-by: Samantha Hughes <shughes-uk@users.noreply.github.com>
Co-authored-by: Peder Bergebakken Sundt <pbsds@hotmail.com>
2024-02-11 12:38:07 +02:00
Serhiy Storchaka 96bce033c4
gh-114959: tarfile: do not ignore errors when extract a directory on top of a file (GH-114960)
Also, add tests common to tarfile and zipfile.
2024-02-03 16:18:46 +00:00
Stanley 0651936ae2
gh-67641: Clarify documentation on bytes vs text with non-seeking tarfile stream (GH-31610) 2023-12-27 17:16:36 +00:00
Marat Idrisov e1117cb886
gh-87264: Convert tarinfo type to stat type (GH-113230)
Co-authored-by: val-shkolnikov <val@nvsoft.net>
2023-12-19 11:04:43 -08:00
Alex Waygood bfe7e72522
gh-109653: Defer importing `warnings` in several modules (#110286) 2023-10-04 06:09:43 +01:00
Petr Viktorin 5d18715765
gh-107811: tarfile: treat overflow in UID/GID as failure to set it (#108369) 2023-08-23 20:00:07 +02:00
balmeida-nokia 37135d25e2
gh-107396: tarfiles: set self.exception before _init_read_gz() (GH-107485)
In the stack call of: _init_read_gz()
```
_read, tarfile.py:548
read, tarfile.py:526
_init_read_gz, tarfile.py:491
```
a try;except exists that uses `self.exception`, so it needs to be set before
calling _init_read_gz().
2023-08-21 11:39:06 +00:00
Petr Viktorin acbd3f9c5c
gh-107845: Fix symlink handling for tarfile.data_filter (GH-107846)
Co-authored-by: Victor Stinner <vstinner@python.org>
Co-authored-by: Lumír 'Frenzy' Balhar <frenzy.madness@gmail.com>
2023-08-21 12:56:46 +02:00
Robert O'Shea 50fce89d12
gh-102120: [TarFile] Add an iter function that doesn't cache (GH-102128) 2023-05-23 13:44:40 -07:00
Petr Viktorin af53046995
gh-102950: Implement PEP 706 – Filter for tarfile.extractall (#102953) 2023-04-24 10:58:06 +02:00
Oleg Iarygin 56d055a0d8
gh-74468: [tarfile] Fix incorrect name attribute of ExFileObject (GH-102424)
Co-authored-by: Simeon Visser <svisser@users.noreply.github.com>
2023-03-27 16:21:07 -07:00
Nick Drozd 024ac542d7
bpo-45975: Simplify some while-loops with walrus operator (GH-29347) 2022-11-26 14:33:25 -08:00
Sam Ezeh 78365b8e28
gh-91078: Return None from TarFile.next when the tarfile is empty (GH-91850)
Co-authored-by: Irit Katriel <1055913+iritkatriel@users.noreply.github.com>
2022-11-26 09:57:05 -08:00
Nikita Sobolev faf7dfa656
gh-99325: Remove unused `NameError` handling (#99326) 2022-11-11 09:56:57 +00:00
Yaron de Leeuw 50cd4b6959
bpo-26253: Add compressionlevel to tarfile stream (GH-2962)
`tarfile` already accepts a compressionlevel argument for creating
files. This patch adds the same for stream-based tarfile usage.
The default is 9, the value that was previously hard-coded.
2022-06-25 11:43:54 +03:00
Chris Fernald c1e19421c2
gh-91387: Strip trailing slash from tarfile longname directories (GH-32423)
Co-authored-by: Brett Cannon <brett@python.org>
2022-06-17 15:38:41 -07:00
Joshua Root bf2d44ffb0
bpo-45863: tarfile: don't zero out header fields unnecessarily (GH-29693)
Numeric fields of type float, notably mtime, can't be represented
exactly in the ustar header, so the pax header is used. But it is
helpful to set them to the nearest int (i.e. second rather than
nanosecond precision mtimes) in the ustar header as well, for the
benefit of unarchivers that don't understand the pax header.

Add test for tarfile.TarInfo.create_pax_header to confirm correct
behaviour.
2022-02-09 18:06:19 +01:00
Andrzej Mateja 128ab092ca
bpo-44289: Keep argument file object's current position in tarfile.is_tarfile (GH-26488) 2022-02-09 08:19:16 -08:00
andrei kulakov cfadcc31ea
bpo-21987: Fix TarFile.getmember getting a dir with a trailing slash (GH-30283) 2022-01-21 09:40:32 +02:00
Jack DeVries b6fe857250
bpo-39039: tarfile raises descriptive exception from zlib.error (GH-27766)
* during tarfile parsing, a zlib error indicates invalid data
* tarfile.open now raises a descriptive exception from the zlib error
* this makes it clear to the user that they may be trying to open a
  corrupted tar file
2021-09-29 11:25:48 +02:00
Anthony Sottile 9aea31dedd
bpo-8978: improve tarfile.open error message when lzma / bz2 are missing (GH-24850)
Automerge-Triggered-By: GH:pablogsal
2021-04-27 10:39:01 -07:00
Ethan Furman b5a6db9111
bpo-39717: [tarfile] update nested exception raising (GH-23739)
- `from None` if the new exception uses, or doesn't need, the previous one
- `from e` if the previous exception is still relevant
2020-12-12 13:26:44 -08:00
Julien Palard 4fedd7123e
bpo-12800: tarfile: Restore fix from 011525ee9 (GH-21409)
Restore fix from 011525ee92.
2020-11-25 10:23:17 +01:00
Andrey Doroschenko ec42789e6e
bpo-39693: mention KeyError in tarfile extractfile documentation (GH-18639)
Co-authored-by: Andrey Darascheka <andrei.daraschenka@leverx.com>
2020-10-20 10:05:01 -04:00
Artem Bulgakov 22748a83d9
bpo-41316: Make tarfile follow specs for FNAME (GH-21511)
tarfile writes full path to FNAME field of GZIP format instead of just basename if user specified absolute path. Some archive viewers may process file incorrectly. Also it creates security issue because anyone can know structure of directories on system and know username or other personal information.

RFC1952 says about FNAME:
This is the original name of the file being compressed, with any directory components removed.

So tarfile must remove directory names from FNAME and write only basename of file.

Automerge-Triggered-By: @jaraco
2020-09-07 09:46:33 -07:00
Rishi 5a8d121a1f
bpo-39017: Avoid infinite loop in the tarfile module (GH-21454)
Avoid infinite loop when reading specially crafted TAR files using the tarfile module
(CVE-2019-20907).
2020-07-15 13:51:00 +02:00
William Chargin 674935b8ca
bpo-18819: tarfile: only set device fields for device files (GH-18080)
The GNU docs describe the `devmajor` and `devminor` fields of the tar
header struct only in the context of character and block special files,
suggesting that in other cases they are not populated. Typical utilities
behave accordingly; this patch teaches `tarfile` to do the same.
2020-02-12 11:56:02 -08:00
Serhiy Storchaka 9017e0bd5e bpo-39430: Fix race condition in lazy imports in tarfile. (GH-18161)
Use `from ... import ...` to ensure module is fully loaded before accessing its attributes.
2020-01-24 09:55:52 -08:00
William Woodruff dd754caf14 bpo-29435: Allow is_tarfile to take a filelike obj (GH-18090)
`is_tarfile()` now supports `name` being a file or file-like object.
2020-01-22 18:24:16 -08:00
Raymond Hettinger a694f23948
Add missing docstrings for TarInfo objects (#12555) 2019-03-27 13:16:34 -07:00
CAM Gerlach e680c3db80 bpo-36268: Change default tar format to pax from GNU. (GH-12355) 2019-03-21 16:44:51 +02:00
Anthony Sottile 8377cd4fcd Clean up code which checked presence of os.{stat,lstat,chmod} (#11643) 2019-02-25 23:32:27 +01:00
INADA Naoki 8d130913cb
bpo-34043: Optimize tarfile uncompress performance (GH-8089)
tarfile._Stream has two buffer for compressed and uncompressed data.
Those buffers are not aligned so unnecessary bytes slicing happens
for every reading chunks.

This commit bypass compressed buffering.

In this benchmark [1], user time become 250ms from 300ms.

[1]: https://bugs.python.org/msg320763
2018-07-06 14:06:00 +09:00
hajoscher 12a08c4760 bpo-34010: Fix tarfile read performance regression (GH-8020)
During buffered read, use a list followed by join instead of extending a bytes object.
This is how it was done before but changed in commit b506dc32c1.
2018-07-04 17:13:18 +09:00
INADA Naoki 461a1c4b49
bpo-33842: Remove tarfile.filemode (GH-7661) 2018-06-28 17:10:36 +09:00
Joffrey F 72d9b2be36 bpo-32713: Fix tarfile.itn for large/negative float values. (GH-5434) 2018-02-27 02:02:21 +02:00
Bernhard M. Wiedemann 84521047e4 bpo-30693: zip+tarfile: sort directory listing (#2263)
tarfile and zipfile now sort directory listing to generate tar and zip archives
in a more reproducible way.

See also https://reproducible-builds.org/docs/stable-inputs/ on that topic.
2018-01-31 11:17:10 +01:00
Mike 53f7a7c281 bpo-32297: Few misspellings found in Python source code comments. (#4803)
* Fix multiple typos in code comments

* Add spacing in comments (test_logging.py, test_math.py)

* Fix spaces at the beginning of comments in test_logging.py
2017-12-14 13:04:53 +02:00
Alex Gaynor c7cc14a825 Remove two legacy constants which hopefully have no consumers (#1087)
The data contained in them is nonsensical
2017-04-11 22:41:42 -04:00
Serhiy Storchaka 150cd1916a bpo-29958: Minor improvements to zipfile and tarfile CLI. (#944) 2017-04-07 18:56:12 +03:00
Serhiy Storchaka bdf6b910f9 bpo-29776: Use decorator syntax for properties. (#585) 2017-03-19 08:40:32 +02:00