`Path.read_bytes()` is used to read a whole file. buffering /
BufferedIO is focused around making small, possibly interleaved,
read/write efficient which doesn't add value in this case.
On my Mac, running the benchmark:
```python
import pyperf
from pathlib import Path
def read_all(all_paths):
for p in all_paths:
p.read_bytes()
def read_file(path_obj):
path_obj.read_bytes()
all_rst = list(Path("Doc").glob("**/*.rst"))
all_py = list(Path(".").glob("**/*.py"))
assert all_rst, "Should have found rst files"
assert all_py, "Should have found python source files"
runner = pyperf.Runner()
runner.bench_func("read_file_small", read_file, Path("Doc/howto/clinic.rst"))
runner.bench_func("read_file_large", read_file, Path("Doc/c-api/typeobj.rst"))
```
before:
```python
.....................
read_file_small: Mean +- std dev: 6.80 us +- 0.07 us
.....................
read_file_large: Mean +- std dev: 10.8 us +- 0.2 us
````
after:
```python
.....................
read_file_small: Mean +- std dev: 5.67 us +- 0.05 us
.....................
read_file_large: Mean +- std dev: 9.77 us +- 0.52 us
```
Replace `umask(0)` with `umask(0o002)` so the created files are not
world-writable, and replace `umask(0o022)` with `umask(0o026)` to check
that permissions for 'others' can still be set.
Rename `pathlib.Path.copy()` to `_copy_file()` (i.e. make it private.)
Rename `pathlib.Path.copytree()` to `copy()`, and add support for copying
non-directories. This simplifies the interface for users, and nicely
complements the upcoming `move()` and `delete()` methods (which will also
accept any type of file.)
Co-authored-by: Adam Turner <9087854+AA-Turner@users.noreply.github.com>
Rename `pathlib.Path.rmtree()` to `delete()`, and add support for deleting
non-directories. This simplifies the interface for users, and nicely
complements the upcoming `move()` and `copy()` methods (which will also
accept any type of file.)
Add *preserve_metadata* keyword-only argument to `pathlib.Path.copytree()`,
defaulting to false. When set to true, we copy timestamps, permissions,
extended attributes and flags where available, like `shutil.copystat()`.
Add a `Path.rmtree()` method that removes an entire directory tree, like
`shutil.rmtree()`. The signature of the optional *on_error* argument
matches the `Path.walk()` argument of the same name, but differs from the
*onexc* and *onerror* arguments to `shutil.rmtree()`. Consistency within
pathlib is probably more important.
In the private pathlib ABCs, we add an implementation based on `walk()`.
Co-authored-by: Bénédikt Tran <10796600+picnixz@users.noreply.github.com>
Add *preserve_metadata* keyword-only argument to `pathlib.Path.copy()`, defaulting to false. When set to true, we copy timestamps, permissions, extended attributes and flags where available, like `shutil.copystat()`. The argument has no effect on Windows, where metadata is always copied.
Internally (in the pathlib ABCs), path types gain `_readable_metadata` and `_writable_metadata` attributes. These sets of strings describe what kinds of metadata can be retrieved and stored. We take an intersection of `source._readable_metadata` and `target._writable_metadata` to minimise reads/writes. A new `_read_metadata()` method accepts a set of metadata keys and returns a dict with those keys, and a new `_write_metadata()` method accepts a dict of metadata. We *might* make these public in future, but it's hard to justify while the ABCs are still private.
Check for `ERROR_INVALID_PARAMETER` when calling `_winapi.CopyFile2()` and
raise `UnsupportedOperation`. In `Path.copy()`, handle this exception and
fall back to the `PathBase.copy()` implementation.
Add `pathlib.Path.copytree()` method, which recursively copies one
directory to another.
This differs from `shutil.copytree()` in the following respects:
1. Our method has a *follow_symlinks* argument, whereas shutil's has a
*symlinks* argument with an inverted meaning.
2. Our method lacks something like a *copy_function* argument. It always
uses `Path.copy()` to copy files.
3. Our method lacks something like a *ignore_dangling_symlinks* argument.
Instead, users can filter out danging symlinks with *ignore*, or
ignore exceptions with *on_error*
4. Our *ignore* argument is a callable that accepts a single path object,
whereas shutil's accepts a path and a list of child filenames.
5. We add an *on_error* argument, which is a callable that accepts
an `OSError` instance. (`Path.walk()` also accepts such a callable).
Co-authored-by: Nice Zombies <nineteendo19d0@gmail.com>
Add support for not following symlinks in `pathlib.Path.copy()`.
On Windows we add the `COPY_FILE_COPY_SYMLINK` flag is following symlinks is disabled. If the source is symlink to a directory, this call will fail with `ERROR_ACCESS_DENIED`. In this case we add `COPY_FILE_DIRECTORY` to the flags and retry. This can fail on old Windowses, which we note in the docs.
No news as `copy()` was only just added.
In preparation for the addition of `PathBase.rmtree()`, implement
`DummyPath.unlink()` and `rmdir()`, and move corresponding tests into
`test_pathlib_abc` so they're run against `DummyPath`.
Add a `Path.copy()` method that copies the content of one file to another.
This method is similar to `shutil.copyfile()` but differs in the following ways:
- Uses `fcntl.FICLONE` where available (see GH-81338)
- Uses `os.copy_file_range` where available (see GH-81340)
- Uses `_winapi.CopyFile2` where available, even though this copies more metadata than the other implementations. This makes `WindowsPath.copy()` more similar to `shutil.copy2()`.
The method is presently _less_ specified than the `shutil` functions to allow OS-specific optimizations that might copy more or less metadata.
Incorporates code from GH-81338 and GH-93152.
Co-authored-by: Eryk Sun <eryksun@gmail.com>
pathlib now treats "`.`" as a valid file extension (suffix). This brings
it in line with `os.path.splitext()`.
In the (private) pathlib ABCs, we add a new `ParserBase.splitext()` method
that splits a path into a `(root, ext)` pair, like `os.path.splitext()`.
This method is called by `PurePathBase.stem`, `suffix`, etc. In a future
version of pathlib, we might make these base classes public, and so users
will be able to define their own `splitext()` method to control file
extension splitting.
In `pathlib.PurePath` we add optimised `stem`, `suffix` and `suffixes`
properties that don't use `splitext()`, which avoids computing the path
base name twice.
Remove support for supplying additional positional arguments to
`PurePath.relative_to()` and `is_relative_to()`. This has been deprecated
since Python 3.12.
Since 6258844c, paths that might not exist can be fed into pathlib's
globbing implementation, which will call `os.scandir()` / `os.lstat()` only
when strictly necessary. This allows us to drop an initial `self.is_dir()`
call, which saves a `stat()`.
Co-authored-by: Shantanu <12621235+hauntsaninja@users.noreply.github.com>
Don't bother calling `os.scandir()` to scan for literal pattern segments,
like `foo` in `foo/*.py`. Instead, append the segment(s) as-is and call
through to the next selector with `exists=False`, which signals that the
path might not exist. Subsequent selectors will call `os.scandir()` or
`os.lstat()` to filter out missing paths as needed.
Replace tri-state `follow_symlinks` with boolean `recurse_symlinks` argument. The new argument controls whether symlinks are followed when expanding recursive `**` wildcards. The possible argument values correspond as follows:
follow_symlinks recurse_symlinks
=============== ================
False N/A
None False
True True
We therefore drop support for not following symlinks when expanding non-recursive pattern parts; it wasn't requested in the original issue, and it's a feature not found in any shells.
This makes the API a easier to grok by eliminating `None` as an option.
No news blurb as `follow_symlinks` was new in 3.13.
Stop raising `ValueError` from `glob.translate()` when a `**` sub-string
appears in a non-recursive pattern segment. This matches `glob.glob()`
behaviour.
Switch the default value of *follow_symlinks* from `None` to `True` in
`pathlib._abc.PathBase.glob()` and `rglob()`. This speeds up recursive
globbing.
No change to the public pathlib classes.
When expanding and filtering paths for a `**` wildcard segment, build an `re.Pattern` object from the subsequent pattern parts, rather than the entire pattern, and match against the `os.DirEntry` object prior to instantiating a path object. Also skip compiling a pattern when expanding a `*` wildcard segment.
When expanding `**` wildcards, ensure we add a trailing slash to the
topmost directory path. This matches `glob.glob()` behaviour:
>>> glob.glob('dirA/**', recursive=True)
['dirA/', 'dirA/dirB', 'dirA/dirB/dirC']
This does not affect `pathlib.Path.glob()`, because trailing slashes aren't
supported in pathlib proper.
Methods like `full_match()`, `glob()`, etc, are difficult to make work with
byte paths, and it's not worth the effort. This patch makes `PurePathBase`
raise `TypeError` when given non-`str` path segments.
Return files and directories from `pathlib.Path.glob()` if the pattern ends
with `**`. This is more compatible with `PurePath.full_match()` and with
other glob implementations such as bash and `glob.glob()`. Users can add a
trailing slash to match only directories.
In my previous patch I added a `FutureWarning` with the intention of fixing
this in Python 3.15. Upon further reflection I think this was an
unnecessarily cautious remedy to a clear bug.
Raise `ValueError` if `with_suffix('.ext')` is called on a path without a
stem. Paths may only have a non-empty suffix if they also have a non-empty
stem.
ABC-only bugfix; no effect on public classes.
Remove `self.joinpath('')` call that should have been removed in 6313cdde.
This makes `PathBase.glob('')` yield itself *without* adding a trailing slash. It's hard to say whether this is more or less correct, but at least everything else is faster, and there's no behaviour change in the public classes where empty glob patterns are disallowed.
Add `@needs_symlinks` decorator for tests that require symlink support in
the path class.
Also add `@needs_windows` and `@needs_posix` decorators for tests that
require a specific a specific path flavour. These aren't much used yet, but
will be later.
Add `ntpath.isreserved()`, which identifies reserved pathnames such as "NUL", "AUX" and "CON".
Deprecate `pathlib.PurePath.is_reserved()`.
---------
Co-authored-by: Eryk Sun <eryksun@gmail.com>
Co-authored-by: Brett Cannon <brett@python.org>
Co-authored-by: Steve Dower <steve.dower@microsoft.com>
In 49f90ba we added support for the recursive wildcard `**` in
`pathlib.PurePath.match()`. This should allow arbitrary prefix and suffix
matching, like `p.match('foo/**')` or `p.match('**/foo')`, but there's a
problem: for relative patterns only, `match()` implicitly inserts a `**`
token on the left hand side, causing all patterns to match from the right.
As a result, it's impossible to match relative patterns from the left:
`PurePath('foo/bar').match('bar/**')` is true!
This commit reverts the changes to `match()`, and instead adds a new
`full_match()` method that:
- Allows empty patterns
- Supports the recursive wildcard `**`
- Matches the *entire* path when given a relative pattern
Allow `os.PathLike` objects to be passed as patterns to `pathlib.Path.glob()` and `rglob()`. (It's already possible to use them in `PurePath.match()`)
While we're in the area:
- Allow empty glob patterns in `PathBase` (but not `Path`)
- Speed up globbing in `PathBase` by generating paths with trailing slashes only as a final step, rather than for every intermediate directory.
- Simplify and speed up handling of rare patterns involving both `**` and `..` segments.
This allows users of the `pathlib-abc` PyPI package to use `posixpath` or
`ntpath` as a path module in versions of Python lacking
`os.path.splitroot()` (3.11 and before).
Path modules provide a subset of the `os.path` API, specifically those
functions needed to provide `PurePathBase` functionality. Each
`PurePathBase` subclass references its path module via a `pathmod` class
attribute.
This commit adds a new `PathModuleBase` class, which provides abstract
methods that unconditionally raise `UnsupportedOperation`. An instance of
this class is assigned to `PurePathBase.pathmod`, replacing `posixpath`.
As a result, `PurePathBase` is no longer POSIX-y by default, and
all its methods raise `UnsupportedOperation` courtesy of `pathmod`.
Users who subclass `PurePathBase` or `PathBase` should choose the path
syntax by setting `pathmod` to `posixpath`, `ntpath`, `os.path`, or their
own subclass of `PathModuleBase`, as circumstances demand.
On Windows, `os.path.isabs()` now returns `False` when given a path that
starts with exactly one (back)slash. This is more compatible with other
functions in `os.path`, and with Microsoft's own documentation.
Also adjust `pathlib.PureWindowsPath.is_absolute()` to call
`ntpath.isabs()`, which corrects its handling of partial UNC/device paths
like `//foo`.
Co-authored-by: Jon Foster <jon@jon-foster.co.uk>
Refuse to guess what a user means when they initialise a pathlib ABC
without any positional arguments. In mainline pathlib it's normalised to
`.`, but in the ABCs this guess isn't appropriate; for example, the path
type may not represent the current directory as `.`, or may have no concept
of a "current directory" at all.
Apply pathlib's normalization and performance tuning in `pathlib.PurePath`, but not `pathlib._abc.PurePathBase`.
With this change, the pathlib ABCs do not normalize away alternate path separators, empty segments, or dot segments. A single string given to the initialiser will round-trip by default, i.e. `str(PurePathBase(my_string)) == my_string`. Implementors can set their own path domain-specific normalization scheme by overriding `__str__()`
Eliminating path normalization makes maintaining and caching the path's parts and string representation both optional and not very useful, so this commit moves the `_drv`, `_root`, `_tail_cached` and `_str` slots from `PurePathBase` to `PurePath`. Only `_raw_paths` and `_resolving` slots remain in `PurePathBase`. This frees the ABCs from the burden of some of pathlib's hardest-to-understand code.
- Add `__slots__` to dummy path classes.
- Return namedtuple rather than `os.stat_result` from `DummyPath.stat()`.
- Reduce maximum symlink count in `DummyPathWithSymlinks.resolve()`.
`PurePathBase` does not define `__eq__()`, and so we have no business checking path equality in `test_eq_common` and `test_equivalences`. The tests only pass at the moment because we define the test class's `__eq__()` for use elsewhere.
Also move `test_parse_path_common` into the main pathlib test suite. It exercises a private `_parse_path()` method that will be moved to `PurePath` soon.
Lastly move a couple more tests concerned with optimisations and path normalisation.
Slightly improve `pathlib.Path.glob()` tests for symlink loop handling
When filtering results, ignore paths with more than one `linkD/` segment,
rather than all paths below the first `linkD/` segment. This allows us
to test that other paths under `linkD/` are correctly returned.
Split test cases for invalid names into dedicated test methods. This will
make it easier to refactor tests for invalid name handling in ABCs later.
No change of coverage, just a change of test suite organisation.