Ensure that `PurePath('foo/a').with_name('.')` raises `ValueError`
Ensure that `PureWindowsPath('foo/a').with_name('a:b')` does not raise
`ValueError`.
This method supports file URIs (including variants) as described in RFC 8089, such as URIs generated by `pathlib.Path.as_uri()` and `urllib.request.pathname2url()`.
The method is added to `Path` rather than `PurePath` because it uses `os.fsdecode()`, and so its results vary from system to system. I intend to deprecate `PurePath.as_uri()` and move it to `Path` for the same reason.
Co-authored-by: Adam Turner <9087854+AA-Turner@users.noreply.github.com>
Add private `pathlib._PathBase` class. This will be used by an experimental PyPI package to incubate a `tarfile.TarPath` class.
Co-authored-by: Adam Turner <9087854+AA-Turner@users.noreply.github.com>
We match paths using the `_lines` attribute, which is derived from the
path's string representation. The bug arises because an empty path's string
representation is `'.'` (not `''`), which is matched by the `'*'` wildcard.
Make assertions about the `st_mode`, `st_ino` and `st_dev` attributes of
the stat results from two files and a directory, rather than checking if
`chmod()` affects `st_mode` (which is already tested elsewhere).
- Split out dedicated test for unbuffered `open()`
- Split out dedicated test for `is_mount()` at the filesystem root
- Avoid `os.stat()` when checking that empty paths point to '.'
- Remove unnecessary `rmtree()` call
- Remove unused `assertSame()` method
Adjust the pathlib tests to add a new `PathTest.can_symlink` class
attribute, which allows us to enable or disable symlink support in tests.
A (near-)future commit will add an `AbstractPath` class; its tests will
hard-code the value to `True` or `False` depending on a stub subclass's
capabilities.
Remove `PathTest.dirlink()` function. Symlinks in `PathTest.setUp()` are
created using `os.symlink()` directly; symlinks in test functions use
`Path.symlink_to()` in order to make the tests applicable to a
(near-)future `AbstractPath` class.
Brings `pathlib.Path.is_dir()` and `in line with `os.DirEntry.is_dir()`, which
will be important for implementing generic path walking and globbing.
Likewise `is_file()`.
This new exception type is raised instead of `NotImplementedError` when
a path operation is not supported. It can be raised from `Path.readlink()`,
`symlink_to()`, `hardlink_to()`, `owner()` and `group()`. In a future
version of pathlib, it will be raised by `AbstractPath` for these methods
and others, such as `AbstractPath.mkdir()` and `unlink()`.
Re-arrange `pathlib.Path` test methods in source code. No other changes.
The test methods are arranged in two groups. The first group checks
`stat()`, `open()`, `iterdir()`, `readlink()`, and derived methods like
`exists()`, `read_text()`, `glob()` and `resolve()`. The second group
checks all other `Path` methods. To minimise the diff I've maintained the
method order within groups where possible.
This patch prepares the ground for a new `_AbstractPath` class, which will
support methods in the first group above. By churning the test methods
here, subsequent patches will be easier to review and less likely to break
things.
This commit introduces a 'walk-and-match' strategy for handling glob patterns that include a non-terminal `**` wildcard, such as `**/*.py`. For this example, the previous implementation recursively walked directories using `os.scandir()` when it expanded the `**` component, and then **scanned those same directories again** when expanded the `*.py` component. This is wasteful.
In the new implementation, any components following a `**` wildcard are used to build a `re.Pattern` object, which is used to filter the results of the recursive walk. A pattern like `**/*.py` uses half the number of `os.scandir()` calls; a pattern like `**/*/*.py` a third, etc.
This new algorithm does not apply if either:
1. The *follow_symlinks* argument is set to `None` (its default), or
2. The pattern contains `..` components.
In these cases we fall back to the old implementation.
This commit also replaces selector classes with selector functions. These generators directly yield results rather calling through to their successors. A new internal `Path._glob()` method takes care to chain these generators together, which simplifies the lazy algorithm and slightly improves performance. It should also be easier to understand and maintain.
`PurePath.match()` now handles the `**` wildcard as in `Path.glob()`, i.e. it matches any number of path segments.
We now compile a `re.Pattern` object for the entire pattern. This is made more difficult by `fnmatch` not treating directory separators as special when evaluating wildcards (`*`, `?`, etc), and so we arrange the path parts onto separate *lines* in a string, and ensure we don't set `re.DOTALL`.
Co-authored-by: Hugo van Kemenade <hugovk@users.noreply.github.com>
Co-authored-by: Alex Waygood <Alex.Waygood@Gmail.com>
Add a keyword-only *follow_symlinks* parameter to `pathlib.Path.glob()` and`rglob()`.
When *follow_symlinks* is `None` (the default), these methods follow symlinks except when evaluating "`**`" wildcards. When set to true or false, symlinks are always or never followed, respectively.
For backwards compatibility, accept backslashes as path separators in
`PurePosixPath` if an instance of `PureWindowsPath` is supplied.
This restores behaviour from Python 3.11.
Co-authored-by: Gregory P. Smith <greg@krypto.org>
Use `str.lower()` rather than `ntpath.normcase()` to normalize case of
Windows paths. This restores behaviour from Python 3.11.
Co-authored-by: Gregory P. Smith <greg@krypto.org>
In Python 3.8 and prior, `pathlib.Path.__exit__()` marked a path as closed;
some subsequent attempts to perform I/O would raise an IOError. This
functionality was never documented, and had the effect of making `Path`
objects mutable, contrary to PEP 428. In Python 3.9 we made `__exit__()` a
no-op, and in 3.11 `__enter__()` began raising deprecation warnings. Here
we remove both methods.
`pathlib.Path.glob()` now suppresses all OSError exceptions, except
those raised from calling `is_dir()` on the top-level path.
Previously, `glob()` suppressed ENOENT, ENOTDIR, EBADF and ELOOP
errors and their Windows equivalents. PermissionError was also
suppressed unless it occurred when calling `is_dir()` on the
top-level path. However, the selector would abort prematurely
if a PermissionError was raised, and so `glob()` could return
incomplete results.
Stop de-duplicating results in `_RecursiveWildcardSelector`. A new
`_DoubleRecursiveWildcardSelector` class is introduced which performs
de-duplication, but this is used _only_ for patterns with multiple
non-adjacent `**` segments, such as `path.glob('**/foo/**')`. By avoiding
the use of a set, `PurePath.__hash__()` is not called, and so paths do not
need to be stringified and case-normalised.
Also merge adjacent '**' segments in patterns.
Add `pathlib.PurePath.with_segments()`, which creates a path object from arguments. This method is called whenever a derivative path is created, such as from `pathlib.PurePath.parent`. Subclasses may override this method to share information between path objects.
Co-authored-by: Alex Waygood <Alex.Waygood@Gmail.com>
This argument allows case-sensitive matching to be enabled on Windows, and
case-insensitive matching to be enabled on Posix.
Co-authored-by: Steve Dower <steve.dower@microsoft.com>
We now use `_WildcardSelector` to evaluate literal pattern segments, which
allows us to retrieve the real filesystem case.
This change is necessary in order to implement a *case_sensitive* argument
(see GH-81079) and a *follow_symlinks* argument (see GH-77609).
These segments do not require a `stat()` call, as the selector's
`_select_from()` method is called after we've established that the
parent is a directory.
Check that arguments are strings before calling `os.path.join()`.
Also improve performance of `PurePath(PurePath(...))` while we're in the
area: we now use the *unnormalized* string path of such arguments.
Co-authored-by: Terry Jan Reedy <tjreedy@udel.edu>
We no longer add a root to device paths such as `//./PhysicalDrive0`,
`//?/BootPartition` and `//./c:` while normalizing. We also avoid adding a
root to incomplete UNC share paths, like `//`, `//a` and `//a/`.
Co-authored-by: Eryk Sun <eryksun@gmail.com>
Improve performance of path construction by skipping the addition of the path anchor (`drive + root`) to the internal `_parts` list. Rename this attribute to `_tail` for clarity.
Fix an issue where `__new__()` and `__init__()` were not called on subclasses of `pathlib.PurePath` and `Path` in some circumstances.
Paths are now normalized on-demand. This speeds up path construction, `p.joinpath(q)`, and `p / q`.
Co-authored-by: Steve Dower <steve.dower@microsoft.com>
Use a stack to implement `pathlib.Path.walk()` iteratively instead of recursively to avoid hitting recursion limits on deeply nested trees.
Co-authored-by: Barney Gale <barney.gale@gmail.com>
Co-authored-by: Brett Cannon <brett@python.org>
Since Mercurial removal from bitbucket.org, some links are broken.
They are replaced by github.com or webarchive.org links if available. Otherwise, they are removed.
Co-authored-by: Shantanu <12621235+hauntsaninja@users.noreply.github.com>
The previous `_parse_args()` method pulled the `_parts` out of any supplied `PurePath` objects; these were subsequently joined in `_from_parts()` using `os.path.join()`. This is actually a slower form of joining than calling `fspath()` on the path object, because it doesn't take advantage of the fact that the contents of `_parts` is normalized!
This reduces the time taken to run `PurePath("foo", "bar")` by ~20%, and the time taken to run `PurePath(p, "cheese")`, where `p = PurePath("/foo", "bar", "baz")`, by ~40%.
Automerge-Triggered-By: GH:AlexWaygood