* gh-117151: increase default buffer size of shutil.copyfileobj() to 256k.
it was set to 16k in the 1990s.
it was raised to 64k in 2019. the discussion at the time mentioned another 5% improvement by raising to 128k and settled for a very conservative setting.
it's 2024 now, I think it should be revisited to match modern hardware. I am measuring 0-15% performance improvement when raising to 256k on various types of disk. there is no downside as far as I can tell.
this function is only intended for sequential copy of full files (or file like objects). it's the typical use case that benefits from larger operations.
for reference, I came across this function while trying to profile pip that is using it to copy files when installing python packages.
* add news
---------
Co-authored-by: rmorotti <romain.morotti@man.com>
Preparatory work for moving `_rmtree_unsafe()` and `_rmtree_safe_fd()` to
`pathlib._os` so that they can be used from both `shutil` and `pathlib`.
Move implementation-specific setup from `rmtree()` into the safe/unsafe
functions, and give them the same signature `(path, dir_fd, onexc)`.
In the tests, mock `os.open` rather than `_rmtree_safe_fd()` to ensure the
FD-based walk is used, and replace a couple references to
`shutil._use_fd_functions` with `shutil.rmtree.avoids_symlink_attacks`
(which has the same value).
No change of behaviour.
Implement `shutil._rmtree_safe_fd()` using a list as a stack to avoid emitting recursion errors on deeply nested trees.
`shutil._rmtree_unsafe()` was fixed in a150679f90.
Make `shutil._rmtree_unsafe()` call `os.walk()`, which is implemented
without recursion.
`shutil._rmtree_safe_fd()` is not affected and can still raise a recursion
error.
Co-authored-by: Jelle Zijlstra <jelle.zijlstra@gmail.com>
* docs: tiny grammar change: "pointed by" -> "pointed to by"
This commit uses "file pointed to by" to replace "file pointed by" in
- doc for shutil.copytree
- docstring for shutil.copytree
- docstring _abc.PathBase.open
- docstring for pathlib.Path.open
- doc for os.copy_file_range
- doc for os.splice
The docs use "file pointed to by" more frequently than
"file pointed by". So, this commit replaces the uses of
"file pointed by" in order to make the uses consistent
through the docs.
```bash
$ grep -ri 'pointed to by' cpython/
```
yields more results than
```bash
$ grep -ri 'pointed by' cpython/
```
Separately:
There are two occurrences of "tree pointed by":
- cpython/Doc/library/xml.etree.elementtree.rst for
`xml.etree.ElementInclude.include`
- cpython/Lib/xml/etree/ElementInclude.py for `include`
For those uses of "tree pointed by", I expect "tree pointed to by"
instead. However, I found enough uses online of (a) "tree pointed by"
rather than (b) "tree pointed to by" to convince me that (a) is in
common use.
So, this commit does not replace those occurrences of "tree pointed by"
to "tree pointed to by". But I will replace them if a reviewer
believes it is correct to replace them.
* docs: typo: "exists and executable" -> "exists and is executable"
---------
Co-authored-by: Andrew-Zipperer <atzipperer@gmail.com>
Previously it worked differently if dst is a symbolic link:
it modified the permission bits of dst itself rather than the file
it points to if follow_symlinks is true or src is not a symbolic link,
and did nothing if follow_symlinks is false and src is a symbolic link.
Also document similar changes in shutil.copystat().
* Ignore os.close() errors when ignore_errors is True.
* Pass os.close() errors to the error handler if specified.
* os.close no longer retried after error.
Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
Previously a symlink attack resistant version of shutil.rmtree() could ignore
or pass to the error handler arbitrary exception when invalid arguments
were provided.
Call also copy_python_src_ignore() on listdir() names.
shutil.copytree(): replace set() with an empty tuple. An empty tuple
becomes a constant in the compiler and checking if an item is in an
empty tuple is cheap.
Restore following CPython <= 3.10.5 behavior of shutil.make_archive()
that went away as part of gh-93160:
Do not create an empty archive if root_dir is not a directory, and, in
that case, raise FileNotFoundError or NotADirectoryError regardless
of format choice. Beyond the brought-back behavior, the function may
now also raise these exceptions in dry_run mode.
gh-82814: Adds `errno.EACCES` to the list of ignored errors on
`_copyxattr`. EPERM and EACCES are different constants but
in general should be treated the same.
News entry authored by: Gregory P. Smith <greg@krypto.org>
It is no longer changed when create a zip or tar archive.
It is still changed for custom archivers registered with shutil.register_archive_format()
if root_dir is not None.
Co-authored-by: Éric <merwok@netwok.org>
Co-authored-by: Łukasz Langa <lukasz@langa.pl>
It fixes the "Text File Busy" OSError when using 'rmtree' on a
windows-managed filesystem in via the VirtualBox shared folder
(and possible other scenarios like a windows-managed network file
system).
I considered only falling back when both were 0, but that still seems
wrong, and the highly popular rich[1] library does it this way, so I
thought we should probably inherit that behavior.
[1] https://github.com/willmcgugan/rich
Signed-off-by: Filipe Laíns <lains@riseup.net>
Co-authored-by: Łukasz Langa <lukasz@langa.pl>
`shutil.unpack_archive()` tries to read the whole file into memory, making no use of any kind of smaller buffer. Process crashes for really large files: I.e. archive: ~1.7G, unpacked: ~10G. Before the crash it can easily take away all available RAM on smaller systems. Had to pull the code form `zipfile.Zipfile.extractall()` to fix this
Automerge-Triggered-By: GH:gpshead
The onerror is supposed to be called with failed function, but in this case lstat is wrongly used instead of open.
Not sure if this needs bug or not...
Automerge-Triggered-By: GH:hynek