Since 6258844c, paths that might not exist can be fed into pathlib's
globbing implementation, which will call `os.scandir()` / `os.lstat()` only
when strictly necessary. This allows us to drop an initial `self.is_dir()`
call, which saves a `stat()`.
Co-authored-by: Shantanu <12621235+hauntsaninja@users.noreply.github.com>
Don't bother calling `os.scandir()` to scan for literal pattern segments,
like `foo` in `foo/*.py`. Instead, append the segment(s) as-is and call
through to the next selector with `exists=False`, which signals that the
path might not exist. Subsequent selectors will call `os.scandir()` or
`os.lstat()` to filter out missing paths as needed.
Replace tri-state `follow_symlinks` with boolean `recurse_symlinks` argument. The new argument controls whether symlinks are followed when expanding recursive `**` wildcards. The possible argument values correspond as follows:
follow_symlinks recurse_symlinks
=============== ================
False N/A
None False
True True
We therefore drop support for not following symlinks when expanding non-recursive pattern parts; it wasn't requested in the original issue, and it's a feature not found in any shells.
This makes the API a easier to grok by eliminating `None` as an option.
No news blurb as `follow_symlinks` was new in 3.13.
Stop raising `ValueError` from `glob.translate()` when a `**` sub-string
appears in a non-recursive pattern segment. This matches `glob.glob()`
behaviour.
Switch the default value of *follow_symlinks* from `None` to `True` in
`pathlib._abc.PathBase.glob()` and `rglob()`. This speeds up recursive
globbing.
No change to the public pathlib classes.
When expanding `**` wildcards, ensure we add a trailing slash to the
topmost directory path. This matches `glob.glob()` behaviour:
>>> glob.glob('dirA/**', recursive=True)
['dirA/', 'dirA/dirB', 'dirA/dirB/dirC']
This does not affect `pathlib.Path.glob()`, because trailing slashes aren't
supported in pathlib proper.
Methods like `full_match()`, `glob()`, etc, are difficult to make work with
byte paths, and it's not worth the effort. This patch makes `PurePathBase`
raise `TypeError` when given non-`str` path segments.
Return files and directories from `pathlib.Path.glob()` if the pattern ends
with `**`. This is more compatible with `PurePath.full_match()` and with
other glob implementations such as bash and `glob.glob()`. Users can add a
trailing slash to match only directories.
In my previous patch I added a `FutureWarning` with the intention of fixing
this in Python 3.15. Upon further reflection I think this was an
unnecessarily cautious remedy to a clear bug.
Raise `ValueError` if `with_suffix('.ext')` is called on a path without a
stem. Paths may only have a non-empty suffix if they also have a non-empty
stem.
ABC-only bugfix; no effect on public classes.
Remove `self.joinpath('')` call that should have been removed in 6313cdde.
This makes `PathBase.glob('')` yield itself *without* adding a trailing slash. It's hard to say whether this is more or less correct, but at least everything else is faster, and there's no behaviour change in the public classes where empty glob patterns are disallowed.
Add `@needs_symlinks` decorator for tests that require symlink support in
the path class.
Also add `@needs_windows` and `@needs_posix` decorators for tests that
require a specific a specific path flavour. These aren't much used yet, but
will be later.
In 49f90ba we added support for the recursive wildcard `**` in
`pathlib.PurePath.match()`. This should allow arbitrary prefix and suffix
matching, like `p.match('foo/**')` or `p.match('**/foo')`, but there's a
problem: for relative patterns only, `match()` implicitly inserts a `**`
token on the left hand side, causing all patterns to match from the right.
As a result, it's impossible to match relative patterns from the left:
`PurePath('foo/bar').match('bar/**')` is true!
This commit reverts the changes to `match()`, and instead adds a new
`full_match()` method that:
- Allows empty patterns
- Supports the recursive wildcard `**`
- Matches the *entire* path when given a relative pattern
Allow `os.PathLike` objects to be passed as patterns to `pathlib.Path.glob()` and `rglob()`. (It's already possible to use them in `PurePath.match()`)
While we're in the area:
- Allow empty glob patterns in `PathBase` (but not `Path`)
- Speed up globbing in `PathBase` by generating paths with trailing slashes only as a final step, rather than for every intermediate directory.
- Simplify and speed up handling of rare patterns involving both `**` and `..` segments.
This allows users of the `pathlib-abc` PyPI package to use `posixpath` or
`ntpath` as a path module in versions of Python lacking
`os.path.splitroot()` (3.11 and before).
Path modules provide a subset of the `os.path` API, specifically those
functions needed to provide `PurePathBase` functionality. Each
`PurePathBase` subclass references its path module via a `pathmod` class
attribute.
This commit adds a new `PathModuleBase` class, which provides abstract
methods that unconditionally raise `UnsupportedOperation`. An instance of
this class is assigned to `PurePathBase.pathmod`, replacing `posixpath`.
As a result, `PurePathBase` is no longer POSIX-y by default, and
all its methods raise `UnsupportedOperation` courtesy of `pathmod`.
Users who subclass `PurePathBase` or `PathBase` should choose the path
syntax by setting `pathmod` to `posixpath`, `ntpath`, `os.path`, or their
own subclass of `PathModuleBase`, as circumstances demand.
Refuse to guess what a user means when they initialise a pathlib ABC
without any positional arguments. In mainline pathlib it's normalised to
`.`, but in the ABCs this guess isn't appropriate; for example, the path
type may not represent the current directory as `.`, or may have no concept
of a "current directory" at all.
Apply pathlib's normalization and performance tuning in `pathlib.PurePath`, but not `pathlib._abc.PurePathBase`.
With this change, the pathlib ABCs do not normalize away alternate path separators, empty segments, or dot segments. A single string given to the initialiser will round-trip by default, i.e. `str(PurePathBase(my_string)) == my_string`. Implementors can set their own path domain-specific normalization scheme by overriding `__str__()`
Eliminating path normalization makes maintaining and caching the path's parts and string representation both optional and not very useful, so this commit moves the `_drv`, `_root`, `_tail_cached` and `_str` slots from `PurePathBase` to `PurePath`. Only `_raw_paths` and `_resolving` slots remain in `PurePathBase`. This frees the ABCs from the burden of some of pathlib's hardest-to-understand code.
- Add `__slots__` to dummy path classes.
- Return namedtuple rather than `os.stat_result` from `DummyPath.stat()`.
- Reduce maximum symlink count in `DummyPathWithSymlinks.resolve()`.
`PurePathBase` does not define `__eq__()`, and so we have no business checking path equality in `test_eq_common` and `test_equivalences`. The tests only pass at the moment because we define the test class's `__eq__()` for use elsewhere.
Also move `test_parse_path_common` into the main pathlib test suite. It exercises a private `_parse_path()` method that will be moved to `PurePath` soon.
Lastly move a couple more tests concerned with optimisations and path normalisation.
Slightly improve `pathlib.Path.glob()` tests for symlink loop handling
When filtering results, ignore paths with more than one `linkD/` segment,
rather than all paths below the first `linkD/` segment. This allows us
to test that other paths under `linkD/` are correctly returned.
Split test cases for invalid names into dedicated test methods. This will
make it easier to refactor tests for invalid name handling in ABCs later.
No change of coverage, just a change of test suite organisation.
The `DummyPurePath` and `DummyPath` test classes are simple subclasses of
`PurePathBase` and `PathBase`. This commit adds `__repr__()` methods to the
dummy classes, which makes debugging test failures less painful.
Raise auditing events in `pathlib.Path.glob()`, `rglob()` and `walk()`,
but not in `pathlib._abc.PathBase` methods. Also move generation of a
deprecation warning into `pathlib.Path` so it gets the right stack level.
Run expensive tests for walking and globbing from `test_pathlib` but not
`test_pathlib_abc`. The ABCs are not as tightly optimised as the classes
in top-level `pathlib`, and so these tests are taking rather a long time on
some buildbots. Coverage of the main `pathlib` classes should suffice.
Change the value of `pathlib._abc.PurePathBase.pathmod` from `os.path` to
`posixpath`.
User subclasses of `PurePathBase` and `PathBase` previously used the host
OS's path syntax, e.g. backslashes as separators on Windows. This is wrong
in most use cases, and likely to catch developers out unless they test on
both Windows and non-Windows machines.
In this patch we change the default to POSIX syntax, regardless of OS. This
is somewhat arguable (why not make all aspects of syntax abstract and
individually configurable?) but an improvement all the same.
This change has no effect on `PurePath`, `Path`, nor their subclasses. Only
private APIs are affected.
`PurePathBase.__repr__()` produces a string like `MyPath('/foo')`. This
repr is incorrect/misleading when a subclass's `__init__()` method is
customized, which I expect to be the very common.
This commit moves the `__repr__()` method to `PurePath`, leaving
`PurePathBase` with the default `object` repr.
No user-facing changes because the `pathlib._abc` module remains private.
Store the test base directory as a class attribute named `base` rather than
module constants named `BASE`.
The base directory is a local file path, and therefore not ideally suited
to the pathlib ABC tests. In a future commit we'll change its value in
`test_pathlib_abc.py` such that it points to a totally fictitious path, which
will help to ensure we're not touching the local filesystem.