Construct only one new list object (using `list.copy()`) when creating a
new path object with a modified tail. This slightly speeds up
`with_name()` and `with_suffix()`
- Add fast path to `_split_stack()`
- Skip unnecessarily resolution of the current directory when a relative
path is given to `resolve()`
- Remove stat and target caches, which slow down most `resolve()` calls in
practice.
- Slightly refactor code for clarity.
Re-arrange `pathlib.PurePath` methods in source code. No other changes.
The `PurePath` implementations of certain special methods, such as
`__eq__()` and `__hash__()`, are not usually applicable to user subclasses
of `_PathBase`. To facilitate their removal, another patch will split the
`PurePath` class into `_PurePathBase` and `PurePath`, with the latter
providing these special methods.
This patch prepares the ground for splitting `PurePath`. It's similar to
e8d77b0, which preceded splitting `Path`. By churning the methods here,
subsequent patches will be easier to review and less likely to break
things.
Add `glob.translate()` function that converts a pathname with shell wildcards to a regular expression. The regular expression is used by pathlib to implement `match()` and `glob()`.
This function differs from `fnmatch.translate()` in that wildcards do not match path separators by default, and that a `*` pattern segment matches precisely one path segment. When *recursive* is set to true, `**` pattern segments match any number of path segments, and `**` cannot appear outside its own segment.
In pathlib, this change speeds up directory walking (because `_make_child_relpath()` does less work), makes path objects smaller (they don't need a `_lines` slot), and removes the need for some gnarly code.
Co-authored-by: Jason R. Coombs <jaraco@jaraco.com>
Co-authored-by: Adam Turner <9087854+AA-Turner@users.noreply.github.com>
Ensure that `PurePath('foo/a').with_name('.')` raises `ValueError`
Ensure that `PureWindowsPath('foo/a').with_name('a:b')` does not raise
`ValueError`.
This method supports file URIs (including variants) as described in RFC 8089, such as URIs generated by `pathlib.Path.as_uri()` and `urllib.request.pathname2url()`.
The method is added to `Path` rather than `PurePath` because it uses `os.fsdecode()`, and so its results vary from system to system. I intend to deprecate `PurePath.as_uri()` and move it to `Path` for the same reason.
Co-authored-by: Adam Turner <9087854+AA-Turner@users.noreply.github.com>
Add private `pathlib._PathBase` class. This will be used by an experimental PyPI package to incubate a `tarfile.TarPath` class.
Co-authored-by: Adam Turner <9087854+AA-Turner@users.noreply.github.com>
We match paths using the `_lines` attribute, which is derived from the
path's string representation. The bug arises because an empty path's string
representation is `'.'` (not `''`), which is matched by the `'*'` wildcard.
Brings `pathlib.Path.is_dir()` and `in line with `os.DirEntry.is_dir()`, which
will be important for implementing generic path walking and globbing.
Likewise `is_file()`.
This new exception type is raised instead of `NotImplementedError` when
a path operation is not supported. It can be raised from `Path.readlink()`,
`symlink_to()`, `hardlink_to()`, `owner()` and `group()`. In a future
version of pathlib, it will be raised by `AbstractPath` for these methods
and others, such as `AbstractPath.mkdir()` and `unlink()`.
This commit introduces a 'walk-and-match' strategy for handling glob patterns that include a non-terminal `**` wildcard, such as `**/*.py`. For this example, the previous implementation recursively walked directories using `os.scandir()` when it expanded the `**` component, and then **scanned those same directories again** when expanded the `*.py` component. This is wasteful.
In the new implementation, any components following a `**` wildcard are used to build a `re.Pattern` object, which is used to filter the results of the recursive walk. A pattern like `**/*.py` uses half the number of `os.scandir()` calls; a pattern like `**/*/*.py` a third, etc.
This new algorithm does not apply if either:
1. The *follow_symlinks* argument is set to `None` (its default), or
2. The pattern contains `..` components.
In these cases we fall back to the old implementation.
This commit also replaces selector classes with selector functions. These generators directly yield results rather calling through to their successors. A new internal `Path._glob()` method takes care to chain these generators together, which simplifies the lazy algorithm and slightly improves performance. It should also be easier to understand and maintain.
`PurePath.match()` now handles the `**` wildcard as in `Path.glob()`, i.e. it matches any number of path segments.
We now compile a `re.Pattern` object for the entire pattern. This is made more difficult by `fnmatch` not treating directory separators as special when evaluating wildcards (`*`, `?`, etc), and so we arrange the path parts onto separate *lines* in a string, and ensure we don't set `re.DOTALL`.
Co-authored-by: Hugo van Kemenade <hugovk@users.noreply.github.com>
Co-authored-by: Alex Waygood <Alex.Waygood@Gmail.com>
Subclassing `os.PathLike` rather than using `register()` makes
initialisation slower, due to the additional `__isinstance__` work.
This partially reverts commit bd1b6228d1.
Co-authored-by: Alex Waygood <Alex.Waygood@Gmail.com>
Add a keyword-only *follow_symlinks* parameter to `pathlib.Path.glob()` and`rglob()`.
When *follow_symlinks* is `None` (the default), these methods follow symlinks except when evaluating "`**`" wildcards. When set to true or false, symlinks are always or never followed, respectively.
For backwards compatibility, accept backslashes as path separators in
`PurePosixPath` if an instance of `PureWindowsPath` is supplied.
This restores behaviour from Python 3.11.
Co-authored-by: Gregory P. Smith <greg@krypto.org>
Use `str.lower()` rather than `ntpath.normcase()` to normalize case of
Windows paths. This restores behaviour from Python 3.11.
Co-authored-by: Gregory P. Smith <greg@krypto.org>
In Python 3.8 and prior, `pathlib.Path.__exit__()` marked a path as closed;
some subsequent attempts to perform I/O would raise an IOError. This
functionality was never documented, and had the effect of making `Path`
objects mutable, contrary to PEP 428. In Python 3.9 we made `__exit__()` a
no-op, and in 3.11 `__enter__()` began raising deprecation warnings. Here
we remove both methods.
`pathlib.Path.glob()` now suppresses all OSError exceptions, except
those raised from calling `is_dir()` on the top-level path.
Previously, `glob()` suppressed ENOENT, ENOTDIR, EBADF and ELOOP
errors and their Windows equivalents. PermissionError was also
suppressed unless it occurred when calling `is_dir()` on the
top-level path. However, the selector would abort prematurely
if a PermissionError was raised, and so `glob()` could return
incomplete results.
Stop de-duplicating results in `_RecursiveWildcardSelector`. A new
`_DoubleRecursiveWildcardSelector` class is introduced which performs
de-duplication, but this is used _only_ for patterns with multiple
non-adjacent `**` segments, such as `path.glob('**/foo/**')`. By avoiding
the use of a set, `PurePath.__hash__()` is not called, and so paths do not
need to be stringified and case-normalised.
Also merge adjacent '**' segments in patterns.
Re-arrange `pathlib.Path` methods in source code. No other changes.
The methods are arranged as follows:
1. `stat()` and dependants (`exists()`, `is_dir()`, etc)
2. `open()` and dependants (`read_text()`, `write_bytes()`, etc)
3. `iterdir()` and dependants (`glob()`, `walk()`, etc)
4. All other `Path` methods
This patch prepares the ground for a new `_AbstractPath` class, which will
support the methods in groups 1, 2 and 3 above. By churning the methods
here, subsequent patches will be easier to review and less likely to break
things.
Improve performance of `pathlib.Path.absolute()` and `cwd()` by joining paths only when necessary. Also improve
performance of `PurePath.is_absolute()` on Posix by skipping path parsing and normalization.
Add `pathlib.PurePath.with_segments()`, which creates a path object from arguments. This method is called whenever a derivative path is created, such as from `pathlib.PurePath.parent`. Subclasses may override this method to share information between path objects.
Co-authored-by: Alex Waygood <Alex.Waygood@Gmail.com>
This argument allows case-sensitive matching to be enabled on Windows, and
case-insensitive matching to be enabled on Posix.
Co-authored-by: Steve Dower <steve.dower@microsoft.com>
We now use `_WildcardSelector` to evaluate literal pattern segments, which
allows us to retrieve the real filesystem case.
This change is necessary in order to implement a *case_sensitive* argument
(see GH-81079) and a *follow_symlinks* argument (see GH-77609).
These segments do not require a `stat()` call, as the selector's
`_select_from()` method is called after we've established that the
parent is a directory.
Check that arguments are strings before calling `os.path.join()`.
Also improve performance of `PurePath(PurePath(...))` while we're in the
area: we now use the *unnormalized* string path of such arguments.
Co-authored-by: Terry Jan Reedy <tjreedy@udel.edu>
We no longer add a root to device paths such as `//./PhysicalDrive0`,
`//?/BootPartition` and `//./c:` while normalizing. We also avoid adding a
root to incomplete UNC share paths, like `//`, `//a` and `//a/`.
Co-authored-by: Eryk Sun <eryksun@gmail.com>
Improve performance of path construction by skipping the addition of the path anchor (`drive + root`) to the internal `_parts` list. Rename this attribute to `_tail` for clarity.
Fix an issue where `__new__()` and `__init__()` were not called on subclasses of `pathlib.PurePath` and `Path` in some circumstances.
Paths are now normalized on-demand. This speeds up path construction, `p.joinpath(q)`, and `p / q`.
Co-authored-by: Steve Dower <steve.dower@microsoft.com>
Use a stack to implement `pathlib.Path.walk()` iteratively instead of recursively to avoid hitting recursion limits on deeply nested trees.
Co-authored-by: Barney Gale <barney.gale@gmail.com>
Co-authored-by: Brett Cannon <brett@python.org>
The previous `_parse_args()` method pulled the `_parts` out of any supplied `PurePath` objects; these were subsequently joined in `_from_parts()` using `os.path.join()`. This is actually a slower form of joining than calling `fspath()` on the path object, because it doesn't take advantage of the fact that the contents of `_parts` is normalized!
This reduces the time taken to run `PurePath("foo", "bar")` by ~20%, and the time taken to run `PurePath(p, "cheese")`, where `p = PurePath("/foo", "bar", "baz")`, by ~40%.
Automerge-Triggered-By: GH:AlexWaygood