Commit Graph

50 Commits

Author SHA1 Message Date
Barney Gale 7bd6ebf696
GH-73991: Prune `pathlib.Path.copy()` and `copy_into()` arguments (#123337)
Remove *ignore* and *on_error* arguments from `pathlib.Path.copy[_into]()`,
because these arguments are under-designed. Specifically:

- *ignore* is appropriated from `shutil.copytree()`, but it's not clear
  how it should apply when the user copies a non-directory. We've changed
  the callback signature from the `shutil` version, but I'm not confident
  the new signature is as good as it can be.
- *on_error* is a generalisation of `shutil.copytree()`'s error handling,
  which is to accumulate exceptions and raise a single `shutil.Error` at
  the end. It's not obvious which solution is better.

Additionally, this arguments may be challenging to implement in future user
subclasses of `PathBase`, which might utilise a native recursive copying
method.
2024-08-26 17:05:34 +01:00
Barney Gale 033d537cd4
GH-73991: Make `pathlib.Path.delete()` private. (#123315)
Per feedback from Paul Moore on GH-123158, it's better to defer making
`Path.delete()` public than ship it with under-designed error handling
capabilities.

We leave a remnant `_delete()` method, which is used by `move()`. Any
functionality not needed by `move()` is deleted.
2024-08-26 16:26:34 +01:00
Barney Gale c68a93c582
GH-73991: Add `pathlib.Path.copy_into()` and `move_into()` (#123314)
These two methods accept an *existing* directory path, onto which we join
the source path's base name to form the final target path.

A possible alternative implementation is to check for directories in
`copy()` and `move()` and adjust the target path, which is done in several
`shutil` functions. This behaviour is helpful in a shell context, but
less so in a stored program that explicitly specifies destinations. For
example, a user that calls `Path('foo.py').copy('bar.py')` might not
imagine that `bar.py/foo.py` would be created, but under the alternative
implementation this will happen if `bar.py` is an existing directory.
2024-08-26 14:14:23 +01:00
Barney Gale 625d0705b9
GH-73991: Add `pathlib.Path.move()` (#122073)
Add a `Path.move()` method that moves a file or directory tree, and returns a new `Path` instance pointing to the target.

This method is similar to `shutil.move()`, except that it doesn't accept a *copy_function* argument, and it doesn't check whether the destination is an existing directory.
2024-08-25 16:51:51 +01:00
Barney Gale d7ae4dc5c1
GH-73991: Disallow copying directory into itself via `pathlib.Path.copy()` (#122924) 2024-08-23 20:03:11 +01:00
Cody Maloney 35d8ac7cd7
GH-120754: Disable buffering in Path.read_bytes (#122111)
`Path.read_bytes()` is used to read a whole file. buffering /
BufferedIO is focused around making small, possibly interleaved,
read/write efficient which doesn't add value in this case.

On my Mac, running the benchmark:

```python
import pyperf
from pathlib import Path

def read_all(all_paths):
    for p in all_paths:
        p.read_bytes()

def read_file(path_obj):
    path_obj.read_bytes()

all_rst = list(Path("Doc").glob("**/*.rst"))
all_py = list(Path(".").glob("**/*.py"))
assert all_rst, "Should have found rst files"
assert all_py, "Should have found python source files"

runner = pyperf.Runner()
runner.bench_func("read_file_small", read_file, Path("Doc/howto/clinic.rst"))
runner.bench_func("read_file_large", read_file, Path("Doc/c-api/typeobj.rst"))
```

before:
```python
.....................
read_file_small: Mean +- std dev: 6.80 us +- 0.07 us
.....................
read_file_large: Mean +- std dev: 10.8 us +- 0.2 us
````

after:
```python
.....................
read_file_small: Mean +- std dev: 5.67 us +- 0.05 us
.....................
read_file_large: Mean +- std dev: 9.77 us +- 0.52 us
```
2024-08-16 13:52:41 -07:00
Barney Gale a6644d4464
GH-73991: Rework `pathlib.Path.copytree()` into `copy()` (#122369)
Rename `pathlib.Path.copy()` to `_copy_file()` (i.e. make it private.)

Rename `pathlib.Path.copytree()` to `copy()`, and add support for copying
non-directories. This simplifies the interface for users, and nicely
complements the upcoming `move()` and `delete()` methods (which will also
accept any type of file.)

Co-authored-by: Adam Turner <9087854+AA-Turner@users.noreply.github.com>
2024-08-11 22:43:18 +01:00
Barney Gale 98dba73010
GH-73991: Rework `pathlib.Path.rmtree()` into `delete()` (#122368)
Rename `pathlib.Path.rmtree()` to `delete()`, and add support for deleting
non-directories. This simplifies the interface for users, and nicely
complements the upcoming `move()` and `copy()` methods (which will also
accept any type of file.)
2024-08-07 01:34:44 +01:00
Barney Gale 094375b9b7
GH-73991: Add `pathlib.Path.rmtree()` (#119060)
Add a `Path.rmtree()` method that removes an entire directory tree, like
`shutil.rmtree()`. The signature of the optional *on_error* argument
matches the `Path.walk()` argument of the same name, but differs from the
*onexc* and *onerror* arguments to `shutil.rmtree()`. Consistency within
pathlib is probably more important.

In the private pathlib ABCs, we add an implementation based on `walk()`.

Co-authored-by: Bénédikt Tran <10796600+picnixz@users.noreply.github.com>
2024-07-20 20:14:13 +00:00
Barney Gale f09d184821
GH-73991: Support copying directory symlinks on older Windows (#120807)
Check for `ERROR_INVALID_PARAMETER` when calling `_winapi.CopyFile2()` and
raise `UnsupportedOperation`. In `Path.copy()`, handle this exception and
fall back to the `PathBase.copy()` implementation.
2024-07-03 04:30:29 +01:00
Barney Gale 35e998f560
GH-73991: Add `pathlib.Path.copytree()` (#120718)
Add `pathlib.Path.copytree()` method, which recursively copies one
directory to another.

This differs from `shutil.copytree()` in the following respects:

1. Our method has a *follow_symlinks* argument, whereas shutil's has a
   *symlinks* argument with an inverted meaning.
2. Our method lacks something like a *copy_function* argument. It always
   uses `Path.copy()` to copy files.
3. Our method lacks something like a *ignore_dangling_symlinks* argument.
   Instead, users can filter out danging symlinks with *ignore*, or
   ignore exceptions with *on_error*
4. Our *ignore* argument is a callable that accepts a single path object,
   whereas shutil's accepts a path and a list of child filenames.
5. We add an *on_error* argument, which is a callable that accepts
   an `OSError` instance. (`Path.walk()` also accepts such a callable).

Co-authored-by: Nice Zombies <nineteendo19d0@gmail.com>
2024-06-23 22:01:12 +01:00
Barney Gale 20d5b84f57
GH-73991: Add follow_symlinks argument to `pathlib.Path.copy()` (#120519)
Add support for not following symlinks in `pathlib.Path.copy()`.

On Windows we add the `COPY_FILE_COPY_SYMLINK` flag is following symlinks is disabled. If the source is symlink to a directory, this call will fail with `ERROR_ACCESS_DENIED`. In this case we add `COPY_FILE_DIRECTORY` to the flags and retry. This can fail on old Windowses, which we note in the docs.

No news as `copy()` was only just added.
2024-06-19 00:59:54 +00:00
Barney Gale 9f741e55c1
GH-73991: pathlib ABC tests: add `DummyPath.unlink()` and `rmdir()` (#120715)
In preparation for the addition of `PathBase.rmtree()`, implement
`DummyPath.unlink()` and `rmdir()`, and move corresponding tests into
`test_pathlib_abc` so they're run against `DummyPath`.
2024-06-18 22:13:45 +00:00
Barney Gale 7c38097add
GH-73991: Add `pathlib.Path.copy()` (#119058)
Add a `Path.copy()` method that copies the content of one file to another.

This method is similar to `shutil.copyfile()` but differs in the following ways:

- Uses `fcntl.FICLONE` where available (see GH-81338)
- Uses `os.copy_file_range` where available (see GH-81340)
- Uses `_winapi.CopyFile2` where available, even though this copies more metadata than the other implementations. This makes `WindowsPath.copy()` more similar to `shutil.copy2()`.

The method is presently _less_ specified than the `shutil` functions to allow OS-specific optimizations that might copy more or less metadata.

Incorporates code from GH-81338 and GH-93152.

Co-authored-by: Eryk Sun <eryksun@gmail.com>
2024-06-14 17:15:49 +01:00
Barney Gale e418fc3a6e
GH-82805: Fix handling of single-dot file extensions in pathlib (#118952)
pathlib now treats "`.`" as a valid file extension (suffix). This brings
it in line with `os.path.splitext()`.

In the (private) pathlib ABCs, we add a new `ParserBase.splitext()` method
that splits a path into a `(root, ext)` pair, like `os.path.splitext()`.
This method is called by `PurePathBase.stem`, `suffix`, etc. In a future
version of pathlib, we might make these base classes public, and so users
will be able to define their own `splitext()` method to control file
extension splitting.

In `pathlib.PurePath` we add optimised `stem`, `suffix` and `suffixes`
properties that don't use `splitext()`, which avoids computing the path
base name twice.
2024-05-25 21:01:36 +01:00
Barney Gale 3c28510b98
GH-119113: Raise `TypeError` from `pathlib.PurePath.with_suffix(None)` (#119124)
Restore behaviour from 3.12 when `path.with_suffix(None)` is called.
2024-05-19 17:04:56 +01:00
Barney Gale a74f117dab
GH-115060: Speed up `pathlib.Path.glob()` by omitting initial `stat()` (#117831)
Since 6258844c, paths that might not exist can be fed into pathlib's
globbing implementation, which will call `os.scandir()` / `os.lstat()` only
when strictly necessary. This allows us to drop an initial `self.is_dir()`
call, which saves a `stat()`.

Co-authored-by: Shantanu <12621235+hauntsaninja@users.noreply.github.com>
2024-04-14 00:08:03 +01:00
Barney Gale 0eb52f5f26
GH-115060: Speed up `pathlib.Path.glob()` by not scanning literal parts (#117732)
Don't bother calling `os.scandir()` to scan for literal pattern segments,
like `foo` in `foo/*.py`. Instead, append the segment(s) as-is and call
through to the next selector with `exists=False`, which signals that the
path might not exist. Subsequent selectors will call `os.scandir()` or
`os.lstat()` to filter out missing paths as needed.
2024-04-12 22:19:21 +01:00
Barney Gale 6150bb2412
GH-77609: Add recurse_symlinks argument to `pathlib.Path.glob()` (#117311)
Replace tri-state `follow_symlinks` with boolean `recurse_symlinks` argument. The new argument controls whether symlinks are followed when expanding recursive `**` wildcards. The possible argument values correspond as follows:

    follow_symlinks  recurse_symlinks
    ===============  ================
    False            N/A
    None             False
    True             True

We therefore drop support for not following symlinks when expanding non-recursive pattern parts; it wasn't requested in the original issue, and it's a feature not found in any shells.

This makes the API a easier to grok by eliminating `None` as an option.

No news blurb as `follow_symlinks` was new in 3.13.
2024-04-05 18:51:54 +00:00
Barney Gale 752e18389e
GH-114575: Rename `PurePath.pathmod` to `PurePath.parser` (#116513)
And rename the private base class from `PathModuleBase` to `ParserBase`.
2024-03-31 19:14:48 +01:00
Barney Gale 0634201f53
GH-116377: Stop raising `ValueError` from `glob.translate()`. (#116378)
Stop raising `ValueError` from `glob.translate()` when a `**` sub-string
appears in a non-recursive pattern segment. This matches `glob.glob()`
behaviour.
2024-03-17 17:09:35 +00:00
Barney Gale 1dce0073da
pathlib ABCs: follow all symlinks in `PathBase.glob()` (#116293)
Switch the default value of *follow_symlinks* from `None` to `True` in
`pathlib._abc.PathBase.glob()` and `rglob()`. This speeds up recursive
globbing.

No change to the public pathlib classes.
2024-03-04 02:26:33 +00:00
Barney Gale e3dedeae7a
GH-114610: Fix `pathlib.PurePath.with_stem('')` handling of file extensions (#114612)
Raise `ValueError` if `with_stem('')` is called on a path with a file
extension. Paths may only have an empty stem if they also have an empty
suffix.
2024-02-24 19:37:03 +00:00
Barney Gale 1b1f8398d0
GH-106747: Make pathlib ABC globbing more consistent with `glob.glob()` (#115056)
When expanding `**` wildcards, ensure we add a trailing slash to the
topmost directory path. This matches `glob.glob()` behaviour:

    >>> glob.glob('dirA/**', recursive=True)
    ['dirA/', 'dirA/dirB', 'dirA/dirB/dirC']

This does not affect `pathlib.Path.glob()`, because trailing slashes aren't
supported in pathlib proper.
2024-02-06 02:48:18 +00:00
Barney Gale 574291963f
pathlib ABCs: drop partial, broken, untested support for `bytes` paths. (#114777)
Methods like `full_match()`, `glob()`, etc, are difficult to make work with
byte paths, and it's not worth the effort. This patch makes `PurePathBase`
raise `TypeError` when given non-`str` path segments.
2024-01-31 00:59:33 +00:00
Barney Gale fda7445ca5
GH-70303: Make `pathlib.Path.glob('**')` return both files and directories (#114684)
Return files and directories from `pathlib.Path.glob()` if the pattern ends
with `**`. This is more compatible with `PurePath.full_match()` and with
other glob implementations such as bash and `glob.glob()`. Users can add a
trailing slash to match only directories.

In my previous patch I added a `FutureWarning` with the intention of fixing
this in Python 3.15. Upon further reflection I think this was an
unnecessarily cautious remedy to a clear bug.
2024-01-30 19:52:53 +00:00
Barney Gale 809eed4805
GH-114610: Fix `pathlib._abc.PurePathBase.with_suffix('.ext')` handling of stems (#114613)
Raise `ValueError` if `with_suffix('.ext')` is called on a path without a
stem. Paths may only have a non-empty suffix if they also have a non-empty
stem.

ABC-only bugfix; no effect on public classes.
2024-01-30 14:25:16 +00:00
Barney Gale 823a38a960
GH-79634: Speed up pathlib globbing by removing `joinpath()` call. (#114623)
Remove `self.joinpath('')` call that should have been removed in 6313cdde.

This makes `PathBase.glob('')` yield itself *without* adding a trailing slash. It's hard to say whether this is more or less correct, but at least everything else is faster, and there's no behaviour change in the public classes where empty glob patterns are disallowed.
2024-01-27 19:59:51 +00:00
Barney Gale 2d08af34b8
Cover OS-specific behaviour in `PurePathBase` and `PathBase` tests. (#114633)
Wherever possible, move tests for OS-specific behaviour from `PurePathTest`
and `PathTest` to `DummyPurePathTest` and `DummyPathTest`.
2024-01-27 02:16:17 +00:00
Barney Gale 7a9727e10c
pathlib tests: annotate tests needing symlinks with decorator (#114625)
Add `@needs_symlinks` decorator for tests that require symlink support in
the path class.

Also add `@needs_windows` and `@needs_posix` decorators for tests that
require a specific a specific path flavour. These aren't much used yet, but
will be later.
2024-01-26 22:29:28 +00:00
Barney Gale b69548a0f5
GH-73435: Add `pathlib.PurePath.full_match()` (#114350)
In 49f90ba we added support for the recursive wildcard `**` in
`pathlib.PurePath.match()`. This should allow arbitrary prefix and suffix
matching, like `p.match('foo/**')` or `p.match('**/foo')`, but there's a
problem: for relative patterns only, `match()` implicitly inserts a `**`
token on the left hand side, causing all patterns to match from the right.
As a result, it's impossible to match relative patterns from the left:
`PurePath('foo/bar').match('bar/**')` is true!

This commit reverts the changes to `match()`, and instead adds a new
`full_match()` method that:

- Allows empty patterns
- Supports the recursive wildcard `**`
- Matches the *entire* path when given a relative pattern
2024-01-26 01:12:46 +00:00
Barney Gale 6313cdde58
GH-79634: Accept path-like objects as pathlib glob patterns. (#114017)
Allow `os.PathLike` objects to be passed as patterns to `pathlib.Path.glob()` and `rglob()`. (It's already possible to use them in `PurePath.match()`)

While we're in the area:

- Allow empty glob patterns in `PathBase` (but not `Path`)
- Speed up globbing in `PathBase` by generating paths with trailing slashes only as a final step, rather than for every intermediate directory.
- Simplify and speed up handling of rare patterns involving both `**` and `..` segments.
2024-01-20 02:10:25 +00:00
Barney Gale 4de4e654e5
Replace `pathlib._abc.PathModuleBase.splitroot()` with `splitdrive()` (#114065)
This allows users of the `pathlib-abc` PyPI package to use `posixpath` or
`ntpath` as a path module in versions of Python lacking
`os.path.splitroot()` (3.11 and before).
2024-01-14 23:06:04 +00:00
Barney Gale ca6cf56330
Add `pathlib._abc.PathModuleBase` (#113893)
Path modules provide a subset of the `os.path` API, specifically those
functions needed to provide `PurePathBase` functionality. Each
`PurePathBase` subclass references its path module via a `pathmod` class
attribute.

This commit adds a new `PathModuleBase` class, which provides abstract
methods that unconditionally raise `UnsupportedOperation`. An instance of
this class is assigned to `PurePathBase.pathmod`, replacing `posixpath`.
As a result, `PurePathBase` is no longer POSIX-y by default, and
all its methods raise `UnsupportedOperation` courtesy of `pathmod`.

Users who subclass `PurePathBase` or `PathBase` should choose the path
syntax by setting `pathmod` to `posixpath`, `ntpath`, `os.path`, or their
own subclass of `PathModuleBase`, as circumstances demand.
2024-01-14 21:49:53 +00:00
Barney Gale 5d8a3e74b5
pathlib ABCs: Require one or more initialiser arguments (#113885)
Refuse to guess what a user means when they initialise a pathlib ABC
without any positional arguments. In mainline pathlib it's normalised to
`.`, but in the ABCs this guess isn't appropriate; for example, the path
type may not represent the current directory as `.`, or may have no concept
of a "current directory" at all.
2024-01-10 01:12:58 +00:00
Barney Gale beb80d11ec
GH-113528: Deoptimise `pathlib._abc.PurePathBase` (#113559)
Apply pathlib's normalization and performance tuning in `pathlib.PurePath`, but not `pathlib._abc.PurePathBase`.

With this change, the pathlib ABCs do not normalize away alternate path separators, empty segments, or dot segments. A single string given to the initialiser will round-trip by default, i.e. `str(PurePathBase(my_string)) == my_string`. Implementors can set their own path domain-specific normalization scheme by overriding `__str__()`

Eliminating path normalization makes maintaining and caching the path's parts and string representation both optional and not very useful, so this commit moves the `_drv`, `_root`, `_tail_cached` and `_str` slots from `PurePathBase` to `PurePath`. Only `_raw_paths` and `_resolving` slots remain in `PurePathBase`. This frees the ABCs from the burden of some of pathlib's hardest-to-understand code.
2024-01-09 23:52:15 +00:00
Barney Gale b3dba18eab
GH-113528: Speed up pathlib ABC tests. (#113788)
- Add `__slots__` to dummy path classes.
- Return namedtuple rather than `os.stat_result` from `DummyPath.stat()`.
- Reduce maximum symlink count in `DummyPathWithSymlinks.resolve()`.
2024-01-08 19:31:52 +00:00
Barney Gale a9df076d7d
GH-113528: Move a few misplaced pathlib tests (#113527)
`PurePathBase` does not define `__eq__()`, and so we have no business checking path equality in `test_eq_common` and `test_equivalences`. The tests only pass at the moment because we define the test class's `__eq__()` for use elsewhere.

Also move `test_parse_path_common` into the main pathlib test suite. It exercises a private `_parse_path()` method that will be moved to `PurePath` soon.

Lastly move a couple more tests concerned with optimisations and path normalisation.
2024-01-08 19:17:18 +00:00
Barney Gale 2205510e7b
GH-113528: Slightly improve `pathlib.Path.glob()` tests for symlink loop handling (#113763)
Slightly improve `pathlib.Path.glob()` tests for symlink loop handling

When filtering results, ignore paths with more than one `linkD/` segment,
rather than all paths below the first `linkD/` segment. This allows us
to test that other paths under `linkD/` are correctly returned.
2024-01-06 17:03:39 +00:00
Barney Gale f1526356b1
GH-113528: Split up pathlib tests for invalid basenames. (#113776)
Split test cases for invalid names into dedicated test methods. This will
make it easier to refactor tests for invalid name handling in ABCs later.

No change of coverage, just a change of test suite organisation.
2024-01-06 17:03:07 +00:00
Barney Gale d429a5a8e7
GH-113528: pathlib ABC tests: add repr to dummy path classes. (#113777)
The `DummyPurePath` and `DummyPath` test classes are simple subclasses of
`PurePathBase` and `PathBase`. This commit adds `__repr__()` methods to the
dummy classes, which makes debugging test failures less painful.
2024-01-06 17:02:36 +00:00
Barney Gale 3375dfed40
GH-113568: Stop raising deprecation warnings from pathlib ABCs (#113757) 2024-01-05 22:56:04 +00:00
Barney Gale 3c4e972d6d
GH-113568: Stop raising auditing events from pathlib ABCs (#113571)
Raise auditing events in `pathlib.Path.glob()`, `rglob()` and `walk()`,
but not in `pathlib._abc.PathBase` methods. Also move generation of a
deprecation warning into `pathlib.Path` so it gets the right stack level.
2024-01-05 21:41:19 +00:00
Barney Gale 6ca0e6754e
GH-113528: Remove a couple of expensive pathlib ABC tests (#113534)
Run expensive tests for walking and globbing from `test_pathlib` but not
`test_pathlib_abc`. The ABCs are not as tightly optimised as the classes
in top-level `pathlib`, and so these tests are taking rather a long time on
some buildbots. Coverage of the main `pathlib` classes should suffice.
2023-12-28 22:44:29 +00:00
Barney Gale d1c711e757
GH-110109: Adjust `test_pathlib_abc` imports to ease backporting (#113411)
This very boring patch reduces the number of changes needed in
`test_pathlib_abc.py` when backporting to the external `pathlib_abc`
package.
2023-12-22 20:59:17 +00:00
Barney Gale a0d3d3ec9d
GH-110109: pathlib ABCs: do not vary path syntax by host OS. (#113219)
Change the value of `pathlib._abc.PurePathBase.pathmod` from `os.path` to
`posixpath`.

User subclasses of `PurePathBase` and `PathBase` previously used the host
OS's path syntax, e.g. backslashes as separators on Windows. This is wrong
in most use cases, and likely to catch developers out unless they test on
both Windows and non-Windows machines.

In this patch we change the default to POSIX syntax, regardless of OS. This
is somewhat arguable (why not make all aspects of syntax abstract and
individually configurable?) but an improvement all the same.

This change has no effect on `PurePath`, `Path`, nor their subclasses. Only
private APIs are affected.
2023-12-22 18:09:50 +00:00
Barney Gale ff5e131df5
GH-112855: Slightly improve tests for `pathlib.PurePath` pickling (#113243)
Add a few more simple test cases, like non-anchored paths. Remove misplaced
and indirect test that pickling doesn't change the `stat()` value.
2023-12-22 17:49:09 +00:00
Barney Gale 237e2cff00
GH-110109: Fix misleading `pathlib._abc.PurePathBase` repr (#113376)
`PurePathBase.__repr__()` produces a string like `MyPath('/foo')`. This
repr is incorrect/misleading when a subclass's `__init__()` method is
customized, which I expect to be the very common.

This commit moves the `__repr__()` method to `PurePath`, leaving
`PurePathBase` with the default `object` repr.

No user-facing changes because the `pathlib._abc` module remains private.
2023-12-22 15:11:16 +00:00
Barney Gale 2f0ec7fa94
GH-110109: pathlib tests: store base directory as test class attribute (#113221)
Store the test base directory as a class attribute named `base` rather than
module constants named `BASE`.

The base directory is a local file path, and therefore not ideally suited
to the pathlib ABC tests. In a future commit we'll change its value in
`test_pathlib_abc.py` such that it points to a totally fictitious path, which 
will help to ensure we're not touching the local filesystem.
2023-12-17 00:07:32 +00:00
Barney Gale d91e43ed78
GH-110109: Move tests for pathlib ABCs to new module. (#112904) 2023-12-16 19:04:33 +00:00