In `PathBase.resolve()`, raise `UnsupportedOperation` if a non-POSIX path
parser is used (our implementation uses `posixpath._realpath()`, which
produces incorrect results for non-POSIX path flavours.) Also tweak code to
call `self.absolute()` upfront rather than supplying an emulated `getcwd()`
function.
Adjust `PathBase.absolute()` to work somewhat like `resolve()`. If a POSIX
path parser is used, we treat the root directory as the current directory.
This is the simplest useful behaviour for concrete path types without a
current directory cursor.
In the past I've equivocated about whether to require at least one argument
in the `PurePathBase` (and `PathBase`) initializer, and what the default
should be if we make it optional. I now have a local use case that has
persuaded me to make it optional and default to the empty string (a
`zipp.Path`-like class that treats relative and absolute paths similarly.)
Happily this brings the base class more in line with `PurePath` and `Path`.
Defer joining of path segments in the private `PurePathBase` ABC. The new
behaviour matches how the public `PurePath` class handles path segments.
This removes a hard-to-grok difference between the ABCs and the main
classes. It also slightly reduces the size of `PurePath` objects by
eliminating a `_raw_path` slot.
Use the new `PathBase.scandir()` method in `PathBase.walk()`, which greatly
reduces the number of `PathBase.stat()` calls needed when walking.
There are no user-facing changes, because the pathlib ABCs are still
private and `Path.walk()` doesn't use the implementation in its superclass.
Use the new `PathBase.scandir()` method in `PathBase.glob()`, which greatly
reduces the number of `PathBase.stat()` calls needed when globbing.
There are no user-facing changes, because the pathlib ABCs are still
private and `Path.glob()` doesn't use the implementation in its superclass.
Add `pathlib.Path.scandir()` as a trivial wrapper of `os.scandir()`. This
will be used to implement several `PathBase` methods more efficiently,
including methods that provide `Path.copy()`.
`PurePath.__init__()` incorrectly uses the `_raw_paths` of a given
`PurePath` object with a different flavour, even though the procedure to
join path segments can differ between flavours.
This change makes the `_raw_paths`-enabled deferred joining apply _only_
when the path flavours match.
Co-authored-by: Bénédikt Tran <10796600+picnixz@users.noreply.github.com>
Remove *ignore* and *on_error* arguments from `pathlib.Path.copy[_into]()`,
because these arguments are under-designed. Specifically:
- *ignore* is appropriated from `shutil.copytree()`, but it's not clear
how it should apply when the user copies a non-directory. We've changed
the callback signature from the `shutil` version, but I'm not confident
the new signature is as good as it can be.
- *on_error* is a generalisation of `shutil.copytree()`'s error handling,
which is to accumulate exceptions and raise a single `shutil.Error` at
the end. It's not obvious which solution is better.
Additionally, this arguments may be challenging to implement in future user
subclasses of `PathBase`, which might utilise a native recursive copying
method.
Per feedback from Paul Moore on GH-123158, it's better to defer making
`Path.delete()` public than ship it with under-designed error handling
capabilities.
We leave a remnant `_delete()` method, which is used by `move()`. Any
functionality not needed by `move()` is deleted.
These two methods accept an *existing* directory path, onto which we join
the source path's base name to form the final target path.
A possible alternative implementation is to check for directories in
`copy()` and `move()` and adjust the target path, which is done in several
`shutil` functions. This behaviour is helpful in a shell context, but
less so in a stored program that explicitly specifies destinations. For
example, a user that calls `Path('foo.py').copy('bar.py')` might not
imagine that `bar.py/foo.py` would be created, but under the alternative
implementation this will happen if `bar.py` is an existing directory.
Add a `Path.move()` method that moves a file or directory tree, and returns a new `Path` instance pointing to the target.
This method is similar to `shutil.move()`, except that it doesn't accept a *copy_function* argument, and it doesn't check whether the destination is an existing directory.
`Path.read_bytes()` is used to read a whole file. buffering /
BufferedIO is focused around making small, possibly interleaved,
read/write efficient which doesn't add value in this case.
On my Mac, running the benchmark:
```python
import pyperf
from pathlib import Path
def read_all(all_paths):
for p in all_paths:
p.read_bytes()
def read_file(path_obj):
path_obj.read_bytes()
all_rst = list(Path("Doc").glob("**/*.rst"))
all_py = list(Path(".").glob("**/*.py"))
assert all_rst, "Should have found rst files"
assert all_py, "Should have found python source files"
runner = pyperf.Runner()
runner.bench_func("read_file_small", read_file, Path("Doc/howto/clinic.rst"))
runner.bench_func("read_file_large", read_file, Path("Doc/c-api/typeobj.rst"))
```
before:
```python
.....................
read_file_small: Mean +- std dev: 6.80 us +- 0.07 us
.....................
read_file_large: Mean +- std dev: 10.8 us +- 0.2 us
````
after:
```python
.....................
read_file_small: Mean +- std dev: 5.67 us +- 0.05 us
.....................
read_file_large: Mean +- std dev: 9.77 us +- 0.52 us
```
Replace `umask(0)` with `umask(0o002)` so the created files are not
world-writable, and replace `umask(0o022)` with `umask(0o026)` to check
that permissions for 'others' can still be set.
Rename `pathlib.Path.copy()` to `_copy_file()` (i.e. make it private.)
Rename `pathlib.Path.copytree()` to `copy()`, and add support for copying
non-directories. This simplifies the interface for users, and nicely
complements the upcoming `move()` and `delete()` methods (which will also
accept any type of file.)
Co-authored-by: Adam Turner <9087854+AA-Turner@users.noreply.github.com>
Rename `pathlib.Path.rmtree()` to `delete()`, and add support for deleting
non-directories. This simplifies the interface for users, and nicely
complements the upcoming `move()` and `copy()` methods (which will also
accept any type of file.)
Add *preserve_metadata* keyword-only argument to `pathlib.Path.copytree()`,
defaulting to false. When set to true, we copy timestamps, permissions,
extended attributes and flags where available, like `shutil.copystat()`.
Add a `Path.rmtree()` method that removes an entire directory tree, like
`shutil.rmtree()`. The signature of the optional *on_error* argument
matches the `Path.walk()` argument of the same name, but differs from the
*onexc* and *onerror* arguments to `shutil.rmtree()`. Consistency within
pathlib is probably more important.
In the private pathlib ABCs, we add an implementation based on `walk()`.
Co-authored-by: Bénédikt Tran <10796600+picnixz@users.noreply.github.com>
Add *preserve_metadata* keyword-only argument to `pathlib.Path.copy()`, defaulting to false. When set to true, we copy timestamps, permissions, extended attributes and flags where available, like `shutil.copystat()`. The argument has no effect on Windows, where metadata is always copied.
Internally (in the pathlib ABCs), path types gain `_readable_metadata` and `_writable_metadata` attributes. These sets of strings describe what kinds of metadata can be retrieved and stored. We take an intersection of `source._readable_metadata` and `target._writable_metadata` to minimise reads/writes. A new `_read_metadata()` method accepts a set of metadata keys and returns a dict with those keys, and a new `_write_metadata()` method accepts a dict of metadata. We *might* make these public in future, but it's hard to justify while the ABCs are still private.
Check for `ERROR_INVALID_PARAMETER` when calling `_winapi.CopyFile2()` and
raise `UnsupportedOperation`. In `Path.copy()`, handle this exception and
fall back to the `PathBase.copy()` implementation.
Add `pathlib.Path.copytree()` method, which recursively copies one
directory to another.
This differs from `shutil.copytree()` in the following respects:
1. Our method has a *follow_symlinks* argument, whereas shutil's has a
*symlinks* argument with an inverted meaning.
2. Our method lacks something like a *copy_function* argument. It always
uses `Path.copy()` to copy files.
3. Our method lacks something like a *ignore_dangling_symlinks* argument.
Instead, users can filter out danging symlinks with *ignore*, or
ignore exceptions with *on_error*
4. Our *ignore* argument is a callable that accepts a single path object,
whereas shutil's accepts a path and a list of child filenames.
5. We add an *on_error* argument, which is a callable that accepts
an `OSError` instance. (`Path.walk()` also accepts such a callable).
Co-authored-by: Nice Zombies <nineteendo19d0@gmail.com>
Add support for not following symlinks in `pathlib.Path.copy()`.
On Windows we add the `COPY_FILE_COPY_SYMLINK` flag is following symlinks is disabled. If the source is symlink to a directory, this call will fail with `ERROR_ACCESS_DENIED`. In this case we add `COPY_FILE_DIRECTORY` to the flags and retry. This can fail on old Windowses, which we note in the docs.
No news as `copy()` was only just added.
In preparation for the addition of `PathBase.rmtree()`, implement
`DummyPath.unlink()` and `rmdir()`, and move corresponding tests into
`test_pathlib_abc` so they're run against `DummyPath`.
Add a `Path.copy()` method that copies the content of one file to another.
This method is similar to `shutil.copyfile()` but differs in the following ways:
- Uses `fcntl.FICLONE` where available (see GH-81338)
- Uses `os.copy_file_range` where available (see GH-81340)
- Uses `_winapi.CopyFile2` where available, even though this copies more metadata than the other implementations. This makes `WindowsPath.copy()` more similar to `shutil.copy2()`.
The method is presently _less_ specified than the `shutil` functions to allow OS-specific optimizations that might copy more or less metadata.
Incorporates code from GH-81338 and GH-93152.
Co-authored-by: Eryk Sun <eryksun@gmail.com>
pathlib now treats "`.`" as a valid file extension (suffix). This brings
it in line with `os.path.splitext()`.
In the (private) pathlib ABCs, we add a new `ParserBase.splitext()` method
that splits a path into a `(root, ext)` pair, like `os.path.splitext()`.
This method is called by `PurePathBase.stem`, `suffix`, etc. In a future
version of pathlib, we might make these base classes public, and so users
will be able to define their own `splitext()` method to control file
extension splitting.
In `pathlib.PurePath` we add optimised `stem`, `suffix` and `suffixes`
properties that don't use `splitext()`, which avoids computing the path
base name twice.
Remove support for supplying additional positional arguments to
`PurePath.relative_to()` and `is_relative_to()`. This has been deprecated
since Python 3.12.
Since 6258844c, paths that might not exist can be fed into pathlib's
globbing implementation, which will call `os.scandir()` / `os.lstat()` only
when strictly necessary. This allows us to drop an initial `self.is_dir()`
call, which saves a `stat()`.
Co-authored-by: Shantanu <12621235+hauntsaninja@users.noreply.github.com>
Don't bother calling `os.scandir()` to scan for literal pattern segments,
like `foo` in `foo/*.py`. Instead, append the segment(s) as-is and call
through to the next selector with `exists=False`, which signals that the
path might not exist. Subsequent selectors will call `os.scandir()` or
`os.lstat()` to filter out missing paths as needed.
Replace tri-state `follow_symlinks` with boolean `recurse_symlinks` argument. The new argument controls whether symlinks are followed when expanding recursive `**` wildcards. The possible argument values correspond as follows:
follow_symlinks recurse_symlinks
=============== ================
False N/A
None False
True True
We therefore drop support for not following symlinks when expanding non-recursive pattern parts; it wasn't requested in the original issue, and it's a feature not found in any shells.
This makes the API a easier to grok by eliminating `None` as an option.
No news blurb as `follow_symlinks` was new in 3.13.
Stop raising `ValueError` from `glob.translate()` when a `**` sub-string
appears in a non-recursive pattern segment. This matches `glob.glob()`
behaviour.
Switch the default value of *follow_symlinks* from `None` to `True` in
`pathlib._abc.PathBase.glob()` and `rglob()`. This speeds up recursive
globbing.
No change to the public pathlib classes.
When expanding and filtering paths for a `**` wildcard segment, build an `re.Pattern` object from the subsequent pattern parts, rather than the entire pattern, and match against the `os.DirEntry` object prior to instantiating a path object. Also skip compiling a pattern when expanding a `*` wildcard segment.
When expanding `**` wildcards, ensure we add a trailing slash to the
topmost directory path. This matches `glob.glob()` behaviour:
>>> glob.glob('dirA/**', recursive=True)
['dirA/', 'dirA/dirB', 'dirA/dirB/dirC']
This does not affect `pathlib.Path.glob()`, because trailing slashes aren't
supported in pathlib proper.
Methods like `full_match()`, `glob()`, etc, are difficult to make work with
byte paths, and it's not worth the effort. This patch makes `PurePathBase`
raise `TypeError` when given non-`str` path segments.
Return files and directories from `pathlib.Path.glob()` if the pattern ends
with `**`. This is more compatible with `PurePath.full_match()` and with
other glob implementations such as bash and `glob.glob()`. Users can add a
trailing slash to match only directories.
In my previous patch I added a `FutureWarning` with the intention of fixing
this in Python 3.15. Upon further reflection I think this was an
unnecessarily cautious remedy to a clear bug.
Raise `ValueError` if `with_suffix('.ext')` is called on a path without a
stem. Paths may only have a non-empty suffix if they also have a non-empty
stem.
ABC-only bugfix; no effect on public classes.
Remove `self.joinpath('')` call that should have been removed in 6313cdde.
This makes `PathBase.glob('')` yield itself *without* adding a trailing slash. It's hard to say whether this is more or less correct, but at least everything else is faster, and there's no behaviour change in the public classes where empty glob patterns are disallowed.