Commit Graph

185 Commits

Author SHA1 Message Date
Barney Gale b5f7777cb3
GH-110488: Fix two small issues in `pathlib.PurePath.with_name()` (#110651)
Ensure that `PurePath('foo/a').with_name('.')` raises `ValueError`

Ensure that `PureWindowsPath('foo/a').with_name('a:b')` does not raise
`ValueError`.
2023-10-11 04:45:11 +01:00
Barney Gale 15de493395
GH-107465: Add `pathlib.Path.from_uri()` classmethod. (#107640)
This method supports file URIs (including variants) as described in RFC 8089, such as URIs generated by `pathlib.Path.as_uri()` and `urllib.request.pathname2url()`.

The method is added to `Path` rather than `PurePath` because it uses `os.fsdecode()`, and so its results vary from system to system. I intend to deprecate `PurePath.as_uri()` and move it to `Path` for the same reason.

Co-authored-by: Adam Turner <9087854+AA-Turner@users.noreply.github.com>
2023-10-01 16:14:02 +01:00
Barney Gale 89966a694b
GH-89812: Add `pathlib._PathBase` (#106337)
Add private `pathlib._PathBase` class. This will be used by an experimental PyPI package to incubate a `tarfile.TarPath` class.

Co-authored-by: Adam Turner <9087854+AA-Turner@users.noreply.github.com>
2023-09-30 15:45:01 +01:00
Barney Gale ecd813f054
GH-109187: Improve symlink loop handling in `pathlib.Path.resolve()` (GH-109192)
Treat symlink loops like other errors: in strict mode, raise `OSError`, and
in non-strict mode, do not raise any exception.
2023-09-26 17:57:17 +01:00
Barney Gale bdc3c884cd
GH-78722: Raise exceptions from `pathlib.Path.iterdir()` without delay. (#107320)
`pathlib.Path.iterdir()` now immediately raises any `OSError`
exception from `os.listdir()`, rather than waiting until its
result is iterated over.
2023-09-02 16:08:03 +01:00
Barney Gale ec0a0d2bd9
GH-70303: Emit FutureWarning when pathlib glob pattern ends with `**` (GH-105413)
In a future Python release, patterns with this ending will match both files
and directories. Users may add a trailing slash to remove the warning.
2023-08-04 23:12:12 +00:00
János Kukovecz e7e6e4b035
gh-105002: [pathlib] Fix relative_to with walk_up=True using ".." (#107014)
It makes sense to raise an Error because ".." can not
be resolved and the current working directory is unknown.
2023-07-26 20:44:55 +01:00
Barney Gale c6c5665ee0
GH-100502: Add `pathlib.PurePath.pathmod` attribute (GH-106533)
This instance attribute stores the implementation of `os.path` used for
low-level path operations: either `posixpath` or `ntpath`.
2023-07-19 18:59:55 +01:00
Barney Gale b4efdf8cda
GH-106330: Fix matching of empty path in `pathlib.PurePath.match()` (GH-106331)
We match paths using the `_lines` attribute, which is derived from the
path's string representation. The bug arises because an empty path's string
representation is `'.'` (not `''`), which is matched by the `'*'` wildcard.
2023-07-03 21:29:44 +01:00
Barney Gale 219effa876
GH-105793: Add follow_symlinks argument to `pathlib.Path.is_dir()` and `is_file()` (GH-105794)
Brings `pathlib.Path.is_dir()` and `in line with `os.DirEntry.is_dir()`, which
will be important for implementing generic path walking and globbing.
Likewise `is_file()`.
2023-06-26 17:58:17 +01:00
Barney Gale a8006706f7
GH-89812: Add `pathlib.UnsupportedOperation` (GH-105926)
This new exception type is raised instead of `NotImplementedError` when
a path operation is not supported. It can be raised from `Path.readlink()`,
`symlink_to()`, `hardlink_to()`, `owner()` and `group()`. In a future
version of pathlib, it will be raised by `AbstractPath` for these methods
and others, such as `AbstractPath.mkdir()` and `unlink()`.
2023-06-22 14:35:51 +01:00
Barney Gale ffeaec7e60
GH-104996: Defer joining of `pathlib.PurePath()` arguments. (GH-104999)
Joining of arguments is moved to `_load_parts`, which is called when a
normalized path is needed.
2023-06-07 23:27:06 +01:00
Barney Gale 24af45172f
GH-102613: Fast recursive globbing in `pathlib.Path.glob()` (GH-104512)
This commit introduces a 'walk-and-match' strategy for handling glob patterns that include a non-terminal `**` wildcard, such as `**/*.py`. For this example, the previous implementation recursively walked directories using `os.scandir()` when it expanded the `**` component, and then **scanned those same directories again** when expanded the `*.py` component. This is wasteful.

In the new implementation, any components following a `**` wildcard are used to build a `re.Pattern` object, which is used to filter the results of the recursive walk. A pattern like `**/*.py` uses half the number of `os.scandir()` calls; a pattern like `**/*/*.py` a third, etc.

This new algorithm does not apply if either:

1. The *follow_symlinks* argument is set to `None` (its default), or
2. The pattern contains `..` components.

In these cases we fall back to the old implementation.

This commit also replaces selector classes with selector functions. These generators directly yield results rather calling through to their successors. A new internal `Path._glob()` method takes care to chain these generators together, which simplifies the lazy algorithm and slightly improves performance. It should also be easier to understand and maintain.
2023-06-06 23:50:36 +01:00
Barney Gale 49f90ba1ea
GH-73435: Implement recursive wildcards in `pathlib.PurePath.match()` (#101398)
`PurePath.match()` now handles the `**` wildcard as in `Path.glob()`, i.e. it matches any number of path segments.

We now compile a `re.Pattern` object for the entire pattern. This is made more difficult by `fnmatch` not treating directory separators as special when evaluating wildcards (`*`, `?`, etc), and so we arrange the path parts onto separate *lines* in a string, and ensure we don't set `re.DOTALL`.

Co-authored-by: Hugo van Kemenade <hugovk@users.noreply.github.com>
Co-authored-by: Alex Waygood <Alex.Waygood@Gmail.com>
2023-05-30 20:18:09 +00:00
Barney Gale d593074494
GH-104898: Revert pathlib os.PathLike registration change. (GH-105073)
Subclassing `os.PathLike` rather than using `register()` makes
initialisation slower, due to the additional `__isinstance__` work.

This partially reverts commit bd1b6228d1.

Co-authored-by: Alex Waygood <Alex.Waygood@Gmail.com>
2023-05-29 21:44:51 +00:00
Barney Gale ace676e2c2
GH-77609: Add follow_symlinks argument to `pathlib.Path.glob()` (GH-102616)
Add a keyword-only *follow_symlinks* parameter to `pathlib.Path.glob()` and`rglob()`.

When *follow_symlinks* is `None` (the default), these methods follow symlinks except when evaluating "`**`" wildcards. When set to true or false, symlinks are always or never followed, respectively.
2023-05-29 16:59:52 +01:00
Barney Gale 328422ce61
GH-103631: Fix `PurePosixPath(PureWindowsPath(...))` separator handling (GH-104949)
For backwards compatibility, accept backslashes as path separators in
`PurePosixPath` if an instance of `PureWindowsPath` is supplied.
This restores behaviour from Python 3.11.

Co-authored-by: Gregory P. Smith <greg@krypto.org>
2023-05-26 18:05:43 +00:00
Barney Gale ad0be361c9
GH-104947: Make pathlib.PureWindowsPath comparisons consistent across platforms (GH-104948)
Use `str.lower()` rather than `ntpath.normcase()` to normalize case of
Windows paths. This restores behaviour from Python 3.11.

Co-authored-by: Gregory P. Smith <greg@krypto.org>
2023-05-26 18:04:02 +00:00
Barney Gale bd1b6228d1
GH-104898: Add __slots__ to os.PathLike (GH-104899) 2023-05-25 21:24:20 +01:00
Barney Gale 6b1510cf11
GH-83863: Drop support for using `pathlib.Path` objects as context managers (GH-104807)
In Python 3.8 and prior, `pathlib.Path.__exit__()` marked a path as closed;
some subsequent attempts to perform I/O would raise an IOError. This
functionality was never documented, and had the effect of making `Path`
objects mutable, contrary to PEP 428. In Python 3.9 we made `__exit__()` a
no-op, and in 3.11 `__enter__()` began raising deprecation warnings. Here
we remove both methods.
2023-05-23 22:31:59 +00:00
thirumurugan dcdc90d384
GH-104484: Add case_sensitive argument to `pathlib.PurePath.match()` (GH-104565)
Co-authored-by: Barney Gale <barney.gale@gmail.com>
2023-05-18 18:59:31 +01:00
Barney Gale cb88ae635e
GH-102613: Fix recursion error from `pathlib.Path.glob()` (GH-104373)
Use `Path.walk()` to implement the recursive wildcard `**`. This method
uses an iterative (rather than recursive) walk - see GH-100282.
2023-05-15 18:33:32 +01:00
Barney Gale 94f30c7557
GH-90208: Suppress OSError exceptions from `pathlib.Path.glob()` (GH-104141)
`pathlib.Path.glob()` now suppresses all OSError exceptions, except
those raised from calling `is_dir()` on the top-level path.

Previously, `glob()` suppressed ENOENT, ENOTDIR, EBADF and ELOOP
errors and their Windows equivalents. PermissionError was also
suppressed unless it occurred when calling `is_dir()` on the
top-level path. However, the selector would abort prematurely
if a PermissionError was raised, and so `glob()` could return
incomplete results.
2023-05-11 01:01:39 +01:00
Barney Gale a33ce66dca
GH-87695: Fix OSError from `pathlib.Path.glob()` (GH-104292)
Fix issue where `pathlib.Path.glob()` raised `OSError` when it encountered
a symlink to an overly long path.
2023-05-10 17:17:08 +00:00
Barney Gale c0ece3dc97
GH-102613: Improve performance of `pathlib.Path.rglob()` (GH-104244)
Stop de-duplicating results in `_RecursiveWildcardSelector`. A new
`_DoubleRecursiveWildcardSelector` class is introduced which performs
de-duplication, but this is used _only_ for patterns with multiple
non-adjacent `**` segments, such as `path.glob('**/foo/**')`. By avoiding
the use of a set, `PurePath.__hash__()` is not called, and so paths do not
need to be stringified and case-normalised.

Also merge adjacent '**' segments in patterns.
2023-05-07 22:12:50 +01:00
Barney Gale e8d77b03e0
GH-89812: Churn `pathlib.Path` methods (GH-104243)
Re-arrange `pathlib.Path` methods in source code. No other changes.

The methods are arranged as follows:

1. `stat()` and dependants (`exists()`, `is_dir()`, etc)
2. `open()` and dependants (`read_text()`, `write_bytes()`, etc)
3. `iterdir()` and dependants (`glob()`, `walk()`, etc)
4. All other `Path` methods

This patch prepares the ground for a new `_AbstractPath` class, which will
support the methods in groups 1, 2 and 3 above. By churning the methods
here, subsequent patches will be easier to review and less likely to break
things.
2023-05-07 20:07:07 +01:00
Barney Gale de7f694e3c
GH-103548: Improve performance of `pathlib.Path.[is_]absolute()` (GH-103549)
Improve performance of `pathlib.Path.absolute()` and `cwd()` by joining paths only when necessary. Also improve
performance of `PurePath.is_absolute()` on Posix by skipping path parsing and normalization.
2023-05-06 18:03:07 +00:00
Barney Gale d00d942149
GH-100479: Add `pathlib.PurePath.with_segments()` (GH-103975)
Add `pathlib.PurePath.with_segments()`, which creates a path object from arguments. This method is called whenever a derivative path is created, such as from `pathlib.PurePath.parent`. Subclasses may override this method to share information between path objects.

Co-authored-by: Alex Waygood <Alex.Waygood@Gmail.com>
2023-05-05 19:04:53 +00:00
Barney Gale 8100be5535
GH-81079: Add case_sensitive argument to `pathlib.Path.glob()` (GH-102710)
This argument allows case-sensitive matching to be enabled on Windows, and
case-insensitive matching to be enabled on Posix.

Co-authored-by: Steve Dower <steve.dower@microsoft.com>
2023-05-04 16:44:36 +00:00
Barney Gale da1980afcb
GH-104114: Fix `pathlib.WindowsPath.glob()` use of literal pattern segment case (GH-104116)
We now use `_WildcardSelector` to evaluate literal pattern segments, which
allows us to retrieve the real filesystem case.

This change is necessary in order to implement a *case_sensitive* argument
(see GH-81079) and a *follow_symlinks* argument (see GH-77609).
2023-05-03 17:28:44 +01:00
andrei kulakov af886ffa06
GH-89769: `pathlib.Path.glob()`: do not follow symlinks when checking for precise match (GH-29655)
Co-authored-by: Barney Gale <barney.gale@gmail.com>
2023-05-03 04:50:10 +01:00
Barney Gale 65a49c6553
GH-104102: Optimize `pathlib.Path.glob()` handling of `../` pattern segments (GH-104103)
These segments do not require a `stat()` call, as the selector's
`_select_from()` method is called after we've established that the
parent is a directory.
2023-05-02 23:16:04 +00:00
Barney Gale 47770a1e91
GH-104104: Optimize `pathlib.Path.glob()` by avoiding repeated calls to `os.path.normcase()` (GH-104105)
Use `re.IGNORECASE` to implement case-insensitive matching. This
restores behaviour from before GH-31691.
2023-05-02 22:51:18 +01:00
Barney Gale 8611e7bf5c
GH-103525: Improve exception message from `pathlib.PurePath()` (GH-103526)
Check that arguments are strings before calling `os.path.join()`.

Also improve performance of `PurePath(PurePath(...))` while we're in the
area: we now use the *unnormalized* string path of such arguments.

Co-authored-by: Terry Jan Reedy <tjreedy@udel.edu>
2023-05-02 19:08:19 +01:00
Barney Gale 8af8f52d17
GH-78079: Fix UNC device path root normalization in pathlib (GH-102003)
We no longer add a root to device paths such as `//./PhysicalDrive0`,
`//?/BootPartition` and `//./c:` while normalizing. We also avoid adding a
root to incomplete UNC share paths, like `//`, `//a` and `//a/`.

Co-authored-by: Eryk Sun <eryksun@gmail.com>
2023-04-14 21:55:41 +01:00
Barney Gale 2c673d5e93
GH-101362: Omit path anchor from `pathlib.PurePath()._parts` (GH-102476)
Improve performance of path construction by skipping the addition of the path anchor (`drive + root`) to the internal `_parts` list. Rename this attribute to `_tail` for clarity.
2023-04-09 18:40:03 +01:00
Barney Gale 11c302055a
GH-76846, GH-85281: Call `__new__()` and `__init__()` on pathlib subclasses (GH-102789)
Fix an issue where `__new__()` and `__init__()` were not called on subclasses of `pathlib.PurePath` and `Path` in some circumstances.

Paths are now normalized on-demand. This speeds up path construction, `p.joinpath(q)`, and `p / q`.

Co-authored-by: Steve Dower <steve.dower@microsoft.com>
2023-04-03 19:57:11 +01:00
Stanislav Zmiev 713df2c534
GH-89727: Fix pathlib.Path.walk RecursionError on deep trees (GH-100282)
Use a stack to implement `pathlib.Path.walk()` iteratively instead of recursively to avoid hitting recursion limits on deeply nested trees.

Co-authored-by: Barney Gale <barney.gale@gmail.com>
Co-authored-by: Brett Cannon <brett@python.org>
2023-03-22 14:45:25 +00:00
Barney Gale 90f1d77717
GH-80486: Fix handling of NTFS alternate data streams in pathlib (GH-102454)
Co-authored-by: Maor Kleinberger <kmaork@gmail.com>
2023-03-10 17:29:04 +00:00
Barney Gale 6716254e71
GH-101362: Optimise PurePath(PurePath(...)) (GH-101667)
The previous `_parse_args()` method pulled the `_parts` out of any supplied `PurePath` objects; these were subsequently joined in `_from_parts()` using `os.path.join()`. This is actually a slower form of joining than calling `fspath()` on the path object, because it doesn't take advantage of the fact that the contents of `_parts` is normalized!

This reduces the time taken to run `PurePath("foo", "bar")` by ~20%, and the time taken to run `PurePath(p, "cheese")`, where `p = PurePath("/foo", "bar", "baz")`, by ~40%.

Automerge-Triggered-By: GH:AlexWaygood
2023-03-05 15:50:21 -08:00
Barney Gale 3e60e0213e
GH-101362: Check pathlib.Path flavour compatibility at import time (GH-101664)
This saves a comparison in `pathlib.Path.__new__()` and reduces the time taken to run `Path()` by ~5%.

Automerge-Triggered-By: GH:AlexWaygood
2023-03-05 14:46:45 -08:00
Barney Gale 3572c861d8
GH-101362: Call join() only when >1 argument supplied to pathlib.PurePath() (#101665)
GH-101362: Call join() only when >1 argument supplied to pathlib.PurePath

This reduces the time taken to run `PurePath("foo")` by ~15%
2023-03-05 22:00:56 +00:00
Barney Gale 072011b3c3
gh-100809: Fix handling of drive-relative paths in pathlib.Path.absolute() (GH-100812)
Resolving the drive independently uses the OS API, which ensures it starts from the current directory on that drive.
2023-02-17 14:08:14 +00:00
Barney Gale d401b20630
gh-101360: Fix anchor matching in pathlib.PureWindowsPath.match() (GH-101363)
Use `fnmatch` to match path and pattern anchors, just as we do for other
path parts. This allows patterns such as `'*:/Users/*'` to be matched.
2023-02-17 14:05:38 +00:00
Barney Gale e5b08ddddf
gh-101000: Add os.path.splitroot() (#101002)
Co-authored-by: Eryk Sun <eryksun@gmail.com>
Co-authored-by: Alex Waygood <Alex.Waygood@Gmail.com>
2023-01-27 00:28:27 +00:00
Yurii Karabas 080cb27829
gh-74033: Fix bug when Path takes and ignores **kwargs (GH-19632)
Fix a bug where `Path` takes and ignores `**kwargs` by adding to `PurePath`  class `__init__` method which can take only positional arguments.

Automerge-Triggered-By: GH:brettcannon
2023-01-13 16:05:43 -08:00
Barney Gale 7fba99eadb
gh-100562: improve performance of `pathlib.Path.absolute()` (GH-100563)
Increase performance of the `absolute()` method by calling `os.getcwd()` directly, rather than using the `Path.cwd()` class method. This avoids constructing an extra `Path` object (and the parsing/normalization that comes with it).

Decrease performance of the `cwd()` class method by calling the `Path.absolute()` method, rather than using `os.getcwd()` directly. This involves constructing an extra `Path` object. We do this to maintain a longstanding pattern where `os` functions are called from only one place, which allows them to be more readily replaced by users. As `cwd()` is generally called at most once within user programs, it's a good bargain.

```shell
# before
$ ./python -m timeit -s 'from pathlib import Path; p = Path("foo", "bar")' 'p.absolute()'
50000 loops, best of 5: 9.04 usec per loop
# after
$ ./python -m timeit -s 'from pathlib import Path; p = Path("foo", "bar")' 'p.absolute()'
50000 loops, best of 5: 5.02 usec per loop
```

Automerge-Triggered-By: GH:AlexWaygood
2023-01-05 14:11:50 -08:00
Barney Gale a68e585c8b
gh-68320, gh-88302 - Allow for private `pathlib.Path` subclassing (GH-31691)
Users may wish to define subclasses of `pathlib.Path` to add or modify
existing methods. Before this change, attempting to instantiate a subclass
raised an exception like:

    AttributeError: type object 'PPath' has no attribute '_flavour'

Previously the `_flavour` attribute was assigned as follows:

    PurePath._flavour        = xxx not set!! xxx
    PurePosixPath._flavour   = _PosixFlavour()
    PureWindowsPath._flavour = _WindowsFlavour()

This change replaces it with a `_pathmod` attribute, set as follows:

    PurePath._pathmod        = os.path
    PurePosixPath._pathmod   = posixpath
    PureWindowsPath._pathmod = ntpath

Functionality from `_PosixFlavour` and `_WindowsFlavour` is moved into
`PurePath` as underscored-prefixed classmethods. Flavours are removed.

Co-authored-by: Alex Waygood <Alex.Waygood@Gmail.com>
Co-authored-by: Brett Cannon <brett@python.org>
Co-authored-by: Adam Turner <9087854+AA-Turner@users.noreply.github.com>
Co-authored-by: Eryk Sun <eryksun@gmail.com>
2022-12-23 14:52:23 -08:00
Barney Gale 5a991da329
gh-78707: deprecate passing >1 argument to `PurePath.[is_]relative_to()` (GH-94469)
This brings `relative_to()` and `is_relative_to()` more in line with other pathlib methods like `rename()` and `symlink_to()`.

Resolves #78707.
2022-12-16 16:14:27 -08:00
Barney Gale ae234fbc5c
gh-99029: Fix handling of `PureWindowsPath('C:\<blah>').relative_to('C:')` (GH-99031)
`relative_to()` now treats naked drive paths as relative. This brings its
behaviour in line with other parts of pathlib, and with `ntpath.relpath()`,
and so allows us to factor out the pathlib-specific implementation.
2022-11-25 11:15:57 -08:00