cpython

Commit Graph

Author	SHA1	Message	Date
Barney Gale	d8d94911e2	Move pathlib implementation out of `__init__.py` (#118582 ) Use the `__init__.py` file only for imports that define the API, following the example of asyncio.	2024-05-05 20:57:19 +01:00
Barney Gale	a40f557d7b	GH-116380: Move pathlib globbing implementation into `pathlib._glob` (#118562 ) Moving this code under the `pathlib` package makes it quite a lot easier to backport in the `pathlib-abc` PyPI package. It was a bit foolish of me to add it to `glob` in the first place. Also add `translate()` to `__all__` in `glob`. This function is new in 3.13, so there's no NEWS needed.	2024-05-03 20:29:25 +00:00
Andrew Zipperer	a6b610a94b	docs: typo: tiny grammar change: "pointed by" -> "pointed to by" (#118411 ) * docs: tiny grammar change: "pointed by" -> "pointed to by" This commit uses "file pointed to by" to replace "file pointed by" in - doc for shutil.copytree - docstring for shutil.copytree - docstring _abc.PathBase.open - docstring for pathlib.Path.open - doc for os.copy_file_range - doc for os.splice The docs use "file pointed to by" more frequently than "file pointed by". So, this commit replaces the uses of "file pointed by" in order to make the uses consistent through the docs. ```bash $ grep -ri 'pointed to by' cpython/ ``` yields more results than ```bash $ grep -ri 'pointed by' cpython/ ``` Separately: There are two occurrences of "tree pointed by": - cpython/Doc/library/xml.etree.elementtree.rst for `xml.etree.ElementInclude.include` - cpython/Lib/xml/etree/ElementInclude.py for `include` For those uses of "tree pointed by", I expect "tree pointed to by" instead. However, I found enough uses online of (a) "tree pointed by" rather than (b) "tree pointed to by" to convince me that (a) is in common use. So, this commit does not replace those occurrences of "tree pointed by" to "tree pointed to by". But I will replace them if a reviewer believes it is correct to replace them. * docs: typo: "exists and executable" -> "exists and is executable" --------- Co-authored-by: Andrew-Zipperer <atzipperer@gmail.com>	2024-05-02 05:37:12 +00:00
Barney Gale	15fbd53ba9	GH-112855: Speed up `pathlib.PurePath` pickling (#112856 ) The second item in the tuple returned from `__reduce__()` is a tuple of arguments to supply to path constructor. Previously we returned the `parts` tuple here, which entailed joining, parsing and normalising the path object, and produced a compact pickle representation. With this patch, we instead return a tuple of paths that were originally given to the path constructor. This makes pickling much faster (at the expense of compactness). It's worth noting that, in the olden times, pathlib performed this parsing/normalization up-front in every case, and so using `parts` for pickling was almost free. Nowadays pathlib only parses/normalises paths when it's necessary or advantageous to do so (e.g. computing a path parent, or iterating over a directory, respectively).	2024-04-20 17:46:52 +01:00
Barney Gale	a74f117dab	GH-115060: Speed up `pathlib.Path.glob()` by omitting initial `stat()` (#117831 ) Since `6258844c`, paths that might not exist can be fed into pathlib's globbing implementation, which will call `os.scandir()` / `os.lstat()` only when strictly necessary. This allows us to drop an initial `self.is_dir()` call, which saves a `stat()`. Co-authored-by: Shantanu <12621235+hauntsaninja@users.noreply.github.com>	2024-04-14 00:08:03 +01:00
Barney Gale	30f0643e36	GH-117727: Speed up `pathlib.Path.iterdir()` by using `os.scandir()` (#117728 ) Replace use of `os.listdir()` with `os.scandir()`. Forgo setting `_drv`, `_root` and `_tail_cached`, as these usually aren't needed. Use `os.DirEntry.path` to set `_str`.	2024-04-12 22:02:39 +00:00
Barney Gale	0cc71bde00	GH-117586: Speed up `pathlib.Path.walk()` by working with strings (#117726 ) Move `pathlib.Path.walk()` implementation into `glob._Globber`. The new `glob._Globber.walk()` classmethod works with strings internally, which is a little faster than generating `Path` objects and keeping them normalized. The `pathlib.Path.walk()` method converts the strings back to path objects. In the private pathlib ABCs, our existing subclass of `_Globber` ensures that `PathBase` instances are used throughout. Follow-up to #117589.	2024-04-11 01:26:53 +01:00
Barney Gale	6258844c27	GH-117586: Speed up `pathlib.Path.glob()` by working with strings (#117589 ) Move pathlib globbing implementation into a new private class: `glob._Globber`. This class implements fast string-based globbing. It's called by `pathlib.Path.glob()`, which then converts strings back to path objects. In the private pathlib ABCs, add a `pathlib._abc.Globber` subclass that works with `PathBase` objects rather than strings, and calls user-defined path methods like `PathBase.stat()` rather than `os.stat()`. This sets the stage for two more improvements: - GH-115060: Query non-wildcard segments with `lstat()` - GH-116380: Unify `pathlib` and `glob` implementations of globbing. No change to the implementations of `glob.glob()` and `glob.iglob()`.	2024-04-10 20:43:07 +01:00
Barney Gale	6150bb2412	GH-77609: Add recurse_symlinks argument to `pathlib.Path.glob()` (#117311 ) Replace tri-state `follow_symlinks` with boolean `recurse_symlinks` argument. The new argument controls whether symlinks are followed when expanding recursive `**` wildcards. The possible argument values correspond as follows: follow_symlinks recurse_symlinks =============== ================ False N/A None False True True We therefore drop support for not following symlinks when expanding non-recursive pattern parts; it wasn't requested in the original issue, and it's a feature not found in any shells. This makes the API a easier to grok by eliminating `None` as an option. No news blurb as `follow_symlinks` was new in 3.13.	2024-04-05 18:51:54 +00:00
Barney Gale	752e18389e	GH-114575: Rename `PurePath.pathmod` to `PurePath.parser` (#116513 ) And rename the private base class from `PathModuleBase` to `ParserBase`.	2024-03-31 19:14:48 +01:00
Barney Gale	6f93b4df92	GH-115060: Speed up `pathlib.Path.glob()` by removing redundant regex matching (#115061 ) When expanding and filtering paths for a `*` wildcard segment, build an `re.Pattern` object from the subsequent pattern parts, rather than the entire pattern, and match against the `os.DirEntry` object prior to instantiating a path object. Also skip compiling a pattern when expanding a `` wildcard segment.	2024-02-10 18:12:34 +00:00
Barney Gale	1667c28686	pathlib ABCs: raise `UnsupportedOperation` directly. (#114776 ) Raise `UnsupportedOperation` directly, rather than via an `_unsupported()` helper, to give human readers and IDEs/typecheckers/etc a bigger hint that these methods are abstract.	2024-01-31 00:38:01 +00:00
Barney Gale	fda7445ca5	GH-70303: Make `pathlib.Path.glob('')` return both files and directories (#114684 ) Return files and directories from `pathlib.Path.glob()` if the pattern ends with ``. This is more compatible with `PurePath.full_match()` and with other glob implementations such as bash and `glob.glob()`. Users can add a trailing slash to match only directories. In my previous patch I added a `FutureWarning` with the intention of fixing this in Python 3.15. Upon further reflection I think this was an unnecessarily cautious remedy to a clear bug.	2024-01-30 19:52:53 +00:00
Barney Gale	7e31d6dea2	gh-88569: add `ntpath.isreserved()` (#95486 ) Add `ntpath.isreserved()`, which identifies reserved pathnames such as "NUL", "AUX" and "CON". Deprecate `pathlib.PurePath.is_reserved()`. --------- Co-authored-by: Eryk Sun <eryksun@gmail.com> Co-authored-by: Brett Cannon <brett@python.org> Co-authored-by: Steve Dower <steve.dower@microsoft.com>	2024-01-26 18:14:24 +00:00
Barney Gale	b69548a0f5	GH-73435: Add `pathlib.PurePath.full_match()` (#114350 ) In `49f90ba` we added support for the recursive wildcard `` in `pathlib.PurePath.match()`. This should allow arbitrary prefix and suffix matching, like `p.match('foo/')` or `p.match('/foo')`, but there's a problem: for relative patterns only, `match()` implicitly inserts a `` token on the left hand side, causing all patterns to match from the right. As a result, it's impossible to match relative patterns from the left: `PurePath('foo/bar').match('bar/')` is true! This commit reverts the changes to `match()`, and instead adds a new `full_match()` method that: - Allows empty patterns - Supports the recursive wildcard `` - Matches the entire path when given a relative pattern	2024-01-26 01:12:46 +00:00
Barney Gale	6313cdde58	GH-79634: Accept path-like objects as pathlib glob patterns. (#114017 ) Allow `os.PathLike` objects to be passed as patterns to `pathlib.Path.glob()` and `rglob()`. (It's already possible to use them in `PurePath.match()`) While we're in the area: - Allow empty glob patterns in `PathBase` (but not `Path`) - Speed up globbing in `PathBase` by generating paths with trailing slashes only as a final step, rather than for every intermediate directory. - Simplify and speed up handling of rare patterns involving both `**` and `..` segments.	2024-01-20 02:10:25 +00:00
Barney Gale	ca6cf56330	Add `pathlib._abc.PathModuleBase` (#113893 ) Path modules provide a subset of the `os.path` API, specifically those functions needed to provide `PurePathBase` functionality. Each `PurePathBase` subclass references its path module via a `pathmod` class attribute. This commit adds a new `PathModuleBase` class, which provides abstract methods that unconditionally raise `UnsupportedOperation`. An instance of this class is assigned to `PurePathBase.pathmod`, replacing `posixpath`. As a result, `PurePathBase` is no longer POSIX-y by default, and all its methods raise `UnsupportedOperation` courtesy of `pathmod`. Users who subclass `PurePathBase` or `PathBase` should choose the path syntax by setting `pathmod` to `posixpath`, `ntpath`, `os.path`, or their own subclass of `PathModuleBase`, as circumstances demand.	2024-01-14 21:49:53 +00:00
Barney Gale	f20b151a1c	pathlib ABCs: add `_raw_path` property (#113976 ) It's wrong for the `PurePathBase` methods to rely so much on `__str__()`. Instead, they should treat the raw path(s) as opaque objects and leave the details to `pathmod`. This commit adds a `PurePathBase._raw_path` property and uses it through many of the other ABC methods. These methods are all redefined in `PurePath` and `Path`, so this has no effect on the public classes.	2024-01-13 08:03:21 +00:00
Barney Gale	beb80d11ec	GH-113528: Deoptimise `pathlib._abc.PurePathBase` (#113559 ) Apply pathlib's normalization and performance tuning in `pathlib.PurePath`, but not `pathlib._abc.PurePathBase`. With this change, the pathlib ABCs do not normalize away alternate path separators, empty segments, or dot segments. A single string given to the initialiser will round-trip by default, i.e. `str(PurePathBase(my_string)) == my_string`. Implementors can set their own path domain-specific normalization scheme by overriding `__str__()` Eliminating path normalization makes maintaining and caching the path's parts and string representation both optional and not very useful, so this commit moves the `_drv`, `_root`, `_tail_cached` and `_str` slots from `PurePathBase` to `PurePath`. Only `_raw_paths` and `_resolving` slots remain in `PurePathBase`. This frees the ABCs from the burden of some of pathlib's hardest-to-understand code.	2024-01-09 23:52:15 +00:00
Barney Gale	cdca0ce0ad	GH-113528: Deoptimise `pathlib._abc.PurePathBase.relative_to()` (again) (#113882 ) Restore full battle-tested implementations of `PurePath.[is_]relative_to()`. These were recently split up in `3375dfe` and `a15a773`. In `PurePathBase`, add entirely new implementations based on `_stack`, which itself calls `pathmod.split()` repeatedly to disassemble a path. These new implementations preserve features like trailing slashes where possible, while still observing that a `..` segment cannot be added to traverse an empty or `.` segment in walk_up mode. They do not rely on `parents` nor `__eq__()`, nor do they spin up temporary path objects. Unfortunately calling `pathmod.relpath()` isn't an option, as it calls `abspath()` and in turn `os.getcwd()`, which is impure.	2024-01-09 23:04:14 +00:00
Barney Gale	5c7bd0e398	GH-113528: Deoptimise `pathlib._abc.PurePathBase.parts` (#113883 ) Implement `parts` using `_stack`, which itself calls `pathmod.split()` repeatedly. This avoids use of `_tail`, which will be moved to `PurePath` shortly.	2024-01-09 22:46:50 +00:00
Barney Gale	9100fc407e	GH-113528: Deoptimise `pathlib._abc.PathBase._make_child_relpath()` (#113532 ) Call straight through to `joinpath()` in `PathBase._make_child_relpath()`. Move optimised/caching code to `pathlib.Path._make_child_relpath()`	2024-01-09 19:11:17 +00:00
Barney Gale	a15a7735e6	GH-113528: Deoptimise `pathlib._abc.PurePathBase.relative_to()` (#113529 ) Replace use of `_from_parsed_parts()` with `with_segments()` in `PurePathBase.relative_to()`, and move the assignment of `_drv`, `_root` and `_tail_cached` slots into `PurePath.relative_to()`.	2024-01-06 21:37:38 +00:00
Barney Gale	37bd893a22	GH-113528: Deoptimise `pathlib._abc.PurePathBase.parent` (#113530 ) Replace use of `_from_parsed_parts()` with `with_segments()`, and move assignments to `_drv`, `_root`, _tail_cached` and `_str` slots into `PurePath`.	2024-01-06 21:17:51 +00:00
Barney Gale	1e914ad89d	GH-113528: Deoptimise `pathlib._abc.PurePathBase.name` (#113531 ) Replace usage of `_from_parsed_parts()` with `with_segments()` in `with_name()`, and take a similar approach in `name` for consistency's sake.	2024-01-06 20:50:25 +00:00
Barney Gale	3375dfed40	GH-113568: Stop raising deprecation warnings from pathlib ABCs (#113757 )	2024-01-05 22:56:04 +00:00
Barney Gale	3c4e972d6d	GH-113568: Stop raising auditing events from pathlib ABCs (#113571 ) Raise auditing events in `pathlib.Path.glob()`, `rglob()` and `walk()`, but not in `pathlib._abc.PathBase` methods. Also move generation of a deprecation warning into `pathlib.Path` so it gets the right stack level.	2024-01-05 21:41:19 +00:00
Barney Gale	c2e8298eba	GH-113225: Speed up `pathlib.Path.glob()` (#113226 ) Use `os.DirEntry.path` as the string representation of child paths, unless the parent path is empty, in which case we use the entry `name`.	2024-01-04 20:48:26 +00:00
Barney Gale	b664d91599	GH-113225: Speed up `pathlib._abc.PathBase.glob()` (#113556 ) `PathBase._scandir()` is implemented using `iterdir()`, so we can use its results directly, rather than passing them through `_make_child_relpath()`.	2023-12-28 22:23:01 +00:00
Barney Gale	f8b6e171ad	GH-110109: pathlib ABCs: drop use of `io.text_encoding()` (#113417 ) Do not use the locale-specific default encoding in `PathBase.read_text()` and `write_text()`. Locale settings shouldn't influence the operation of these base classes, which are intended mostly for implementing rich paths on nonlocal filesystems.	2023-12-27 15:32:35 +00:00
Barney Gale	a0d3d3ec9d	GH-110109: pathlib ABCs: do not vary path syntax by host OS. (#113219 ) Change the value of `pathlib._abc.PurePathBase.pathmod` from `os.path` to `posixpath`. User subclasses of `PurePathBase` and `PathBase` previously used the host OS's path syntax, e.g. backslashes as separators on Windows. This is wrong in most use cases, and likely to catch developers out unless they test on both Windows and non-Windows machines. In this patch we change the default to POSIX syntax, regardless of OS. This is somewhat arguable (why not make all aspects of syntax abstract and individually configurable?) but an improvement all the same. This change has no effect on `PurePath`, `Path`, nor their subclasses. Only private APIs are affected.	2023-12-22 18:09:50 +00:00
Barney Gale	237e2cff00	GH-110109: Fix misleading `pathlib._abc.PurePathBase` repr (#113376 ) `PurePathBase.__repr__()` produces a string like `MyPath('/foo')`. This repr is incorrect/misleading when a subclass's `__init__()` method is customized, which I expect to be the very common. This commit moves the `__repr__()` method to `PurePath`, leaving `PurePathBase` with the default `object` repr. No user-facing changes because the `pathlib._abc` module remains private.	2023-12-22 15:11:16 +00:00
Barney Gale	23df46a1dd	GH-112906: Fix performance regression in pathlib path initialisation (#112907 ) This was caused by `76929fdeeb`, specifically its use of `super()` and its packing/unpacking `*args`. Co-authored-by: Alex Waygood <Alex.Waygood@Gmail.com>	2023-12-10 00:06:27 +00:00
Barney Gale	a98e7a8112	GH-110109: Move pathlib ABCs to new `pathlib._abc` module. (#112881 ) Move `_PurePathBase` and `_PathBase` to a new `pathlib._abc` module, and drop the underscores from the class names. Tests are mostly left alone in this commit, but they'll be similarly split in a subsequent commit. The `pathlib._abc` module will be published as an independent PyPI package (similar to how `zipfile._path` is published as `zipp`), to be refined and stabilised prior to its possible addition to the standard library.	2023-12-09 16:07:40 +01:00

34 Commits