Brings `pathlib.Path.is_dir()` and `in line with `os.DirEntry.is_dir()`, which
will be important for implementing generic path walking and globbing.
Likewise `is_file()`.
This new exception type is raised instead of `NotImplementedError` when
a path operation is not supported. It can be raised from `Path.readlink()`,
`symlink_to()`, `hardlink_to()`, `owner()` and `group()`. In a future
version of pathlib, it will be raised by `AbstractPath` for these methods
and others, such as `AbstractPath.mkdir()` and `unlink()`.
This commit introduces a 'walk-and-match' strategy for handling glob patterns that include a non-terminal `**` wildcard, such as `**/*.py`. For this example, the previous implementation recursively walked directories using `os.scandir()` when it expanded the `**` component, and then **scanned those same directories again** when expanded the `*.py` component. This is wasteful.
In the new implementation, any components following a `**` wildcard are used to build a `re.Pattern` object, which is used to filter the results of the recursive walk. A pattern like `**/*.py` uses half the number of `os.scandir()` calls; a pattern like `**/*/*.py` a third, etc.
This new algorithm does not apply if either:
1. The *follow_symlinks* argument is set to `None` (its default), or
2. The pattern contains `..` components.
In these cases we fall back to the old implementation.
This commit also replaces selector classes with selector functions. These generators directly yield results rather calling through to their successors. A new internal `Path._glob()` method takes care to chain these generators together, which simplifies the lazy algorithm and slightly improves performance. It should also be easier to understand and maintain.
`PurePath.match()` now handles the `**` wildcard as in `Path.glob()`, i.e. it matches any number of path segments.
We now compile a `re.Pattern` object for the entire pattern. This is made more difficult by `fnmatch` not treating directory separators as special when evaluating wildcards (`*`, `?`, etc), and so we arrange the path parts onto separate *lines* in a string, and ensure we don't set `re.DOTALL`.
Co-authored-by: Hugo van Kemenade <hugovk@users.noreply.github.com>
Co-authored-by: Alex Waygood <Alex.Waygood@Gmail.com>
Add a keyword-only *follow_symlinks* parameter to `pathlib.Path.glob()` and`rglob()`.
When *follow_symlinks* is `None` (the default), these methods follow symlinks except when evaluating "`**`" wildcards. When set to true or false, symlinks are always or never followed, respectively.
Add `pathlib.PurePath.with_segments()`, which creates a path object from arguments. This method is called whenever a derivative path is created, such as from `pathlib.PurePath.parent`. Subclasses may override this method to share information between path objects.
Co-authored-by: Alex Waygood <Alex.Waygood@Gmail.com>
This argument allows case-sensitive matching to be enabled on Windows, and
case-insensitive matching to be enabled on Posix.
Co-authored-by: Steve Dower <steve.dower@microsoft.com>
The previous `_parse_args()` method pulled the `_parts` out of any supplied `PurePath` objects; these were subsequently joined in `_from_parts()` using `os.path.join()`. This is actually a slower form of joining than calling `fspath()` on the path object, because it doesn't take advantage of the fact that the contents of `_parts` is normalized!
This reduces the time taken to run `PurePath("foo", "bar")` by ~20%, and the time taken to run `PurePath(p, "cheese")`, where `p = PurePath("/foo", "bar", "baz")`, by ~40%.
Automerge-Triggered-By: GH:AlexWaygood
The documentation for `rglob` did not mention what `pattern` actually
is.
Mentioning and linking to `fnmatch` makes this explicit, as the
documentation for `fnmatch` both shows the syntax and some explanation.
The behaviour is fully explained a couple paragraphs above, but it may be useful to have a brief example to cover the behaviour.
Automerge-Triggered-By: GH:hauntsaninja
By default, :meth:`pathlib.PurePath.relative_to` doesn't deal with paths that are not a direct prefix of the other, raising an exception in that instance. This change adds a *walk_up* parameter that can be set to allow for using ``..`` to calculate the relative path.
example:
```
>>> p = PurePosixPath('/etc/passwd')
>>> p.relative_to('/etc')
PurePosixPath('passwd')
>>> p.relative_to('/usr')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "pathlib.py", line 940, in relative_to
raise ValueError(error_message.format(str(self), str(formatted)))
ValueError: '/etc/passwd' does not start with '/usr'
>>> p.relative_to('/usr', strict=False)
PurePosixPath('../etc/passwd')
```
https://bugs.python.org/issue40358
Automerge-Triggered-By: GH:brettcannon
Have `pathlib.WindowsPath.is_mount()` call `ntpath.ismount()`. Previously it raised `NotImplementedError` unconditionally.
https://bugs.python.org/issue42777
The r in `rglob` stands for "recursively", so use the word in the description. Also, glob and rglob can usefully be mentioned as the pathlib equivalent of os.walk.
Automerge-Triggered-By: GH:brettcannon
* Add additional pointers to pathlib's mapping to os.path functions
os.path.splitext has a somewhat quirky signature since it mixes the path and filename components but I wanted the documentation to mention `PurePath.stem` as the natural counterpart to `PurePath.suffix` for the common use of `os.path.splitext` to turn "file.py" into "file" and "py".
Technically this could have some discussion of how to handle the parent directory hierarchy but that seems a bit out of keeping with the spirit of this table so I omitted mentioning `PurePath.parents` here.
* Update Doc/library/pathlib.rst
Co-authored-by: Ezio Melotti <ezio.melotti@gmail.com>
Co-authored-by: Ezio Melotti <ezio.melotti@gmail.com>
Documentation for `pathlib` says:
> Spurious slashes and single dots are collapsed, but double dots ('..') are not, since this would change the meaning of a path in the face of symbolic links:
However, it omits that initial double slashes also aren't collapsed.
Later, in documentation of `PurePath.drive`, `PurePath.root`, and `PurePath.name` it mentions UNC but:
- this abbreviation says nothing to a person who is unaware about existence of UNC (Wikipedia doesn't help either by [giving a disambiguation page](https://en.wikipedia.org/wiki/UNC))
- it shows up only if a person needs to use a specific property or decides to fully learn what the module provides.
For context, see the BPO entry.
These are currently broken as they refer to :meth:`Path.relative_to` rather than :meth:`PurePath.relative_to`, and `relative_to` is a method on `PurePath`.
We could try to remedy this by taking a slice, but we then run into an issue where the empty string will match altsep on POSIX. That rabbit hole could keep getting deeper.
A proper fix for the original issue involves making pathlib's path normalisation more configurable - in this case we want to retain trailing slashes, but in other we might want to preserve `./` prefixes, or elide `../` segments when we're sure we won't encounter symlinks.
This reverts commit ea2f5bcda1.
The argument order of `link_to()` is reversed compared to what one may expect, so:
a.link_to(b)
Might be expected to create *a* as a link to *b*, in fact it creates *b* as a link to *a*, making it function more like a "link from". This doesn't match `symlink_to()` nor the documentation and doesn't seem to be the original author's intent.
This PR deprecates `link_to()` and introduces `hardlink_to()`, which has the same argument order as `symlink_to()`.
This makes `ntpath.expanduser()` match `pathlib.Path.expanduser()` in this regard, and is more in line with `posixpath.expanduser()`'s cautious approach.
Also remove the near-duplicate implementation of `expanduser()` in pathlib, and by doing so fix a bug where KeyError could be raised when expanding another user's home directory.
This commit also fixes up some of the overlapping documentation changed
in bpo-35498, which added support for indexing with slices.
Fixes bpo-21041.
https://bugs.python.org/issue21041
Co-authored-by: Paul Ganssle <p.ganssle@gmail.com>
Co-authored-by: Rémi Lapeyre <remi.lapeyre@henki.fr>
Added slice support to the `pathlib.Path.parents` sequence. For a `Path` `p`, slices of `p.parents` should return the same thing as slices of `tuple(p.parents)`.