This exposes `PyUnstable_Object_ClearWeakRefsNoCallbacks` as an unstable
C-API function to provide a thread-safe mechanism for clearing weakrefs
without executing callbacks.
Some C-API extensions need to clear weakrefs without calling callbacks,
such as after running finalizers like we do in subtype_dealloc.
Previously they could use `_PyWeakref_ClearRef` on each weakref, but
that's not thread-safe in the free-threaded build.
Co-authored-by: Petr Viktorin <encukou@gmail.com>
The `inspect.ismethoddescriptor()` function did not check for the lack of
`__delete__()` and, consequently, erroneously returned True when applied
to *data* descriptors with only `__get__()` and `__delete__()` defined.
Co-authored-by: Bénédikt Tran <10796600+picnixz@users.noreply.github.com>
Co-authored-by: Alyssa Coghlan <ncoghlan@gmail.com>
Mocking only works if sys.modules['pydoc'] and pydoc are the same,
but some pydoc functions reload the module and change sys.modules.
Ensure that sys.modules['pydoc'] is always restored after the corresponding
tests.
This matches the output behavior in 3.10 and earlier; the optimization in 3.11 allowed the zlib library's "os" value to be filled in instead in the circumstance when mtime was 0. this keeps things consistent.
In gh-120009 I used an atexit hook to finalize the _datetime module's static types at interpreter shutdown. However, atexit hooks are executed very early in finalization, which is a problem in the few cases where a subclass of one of those static types is still alive until the final GC collection. The static builtin types don't have this probably because they are finalized toward the end, after the final GC collection. To avoid the problem for _datetime, I have applied a similar approach here.
Also, credit goes to @mgorny and @neonene for the new tests.
FYI, I would have liked to take a slightly cleaner approach with managed static types, but wanted to get a smaller fix in first for the sake of backporting. I'll circle back to the cleaner approach with a future change on the main branch.
Add __all__ to the following modules:
importlib.machinery, importlib.util and xml.sax.
Add also "# noqa: F401" in collections.abc,
subprocess and xml.sax.
* Sort __all__; remove collections.abc.__all__; remove private names
* Add tests
Add a `Path.copy()` method that copies the content of one file to another.
This method is similar to `shutil.copyfile()` but differs in the following ways:
- Uses `fcntl.FICLONE` where available (see GH-81338)
- Uses `os.copy_file_range` where available (see GH-81340)
- Uses `_winapi.CopyFile2` where available, even though this copies more metadata than the other implementations. This makes `WindowsPath.copy()` more similar to `shutil.copy2()`.
The method is presently _less_ specified than the `shutil` functions to allow OS-specific optimizations that might copy more or less metadata.
Incorporates code from GH-81338 and GH-93152.
Co-authored-by: Eryk Sun <eryksun@gmail.com>
Tools such as ruff can ignore "imported but unused" warnings if a
line ends with "# noqa: F401". It avoids the temptation to remove
an import which is used effectively.
Remove wheeldata from both sides of the `assertEqual`, so that we're
*actually* ignoring it from the test set.
This test is only making assertions about the source tree, no code is
being executed that would do anything different based on the value of
`WHEEL_PKG_DIR`.
The _strptime module object was cached in a static local variable (in the datetime.strptime() implementation). That's a problem when it crosses isolation boundaries, such as reinitializing the runtme or between interpreters. This change fixes the problem by dropping the static variable, instead always relying on the normal sys.modules cache (via PyImport_Import()).
* Remove pyrepl's optimization for self-insert
This will be replaced by a less specialized optimization.
* Use line-buffering when pyrepl echoes pastes
Previously echoing was totally suppressed until the entire command had
been pasted and the terminal ended paste mode, but this gives the user
no feedback to indicate that an operation is in progress. Drawing
something to the screen once per line strikes a balance between
perceived responsiveness and performance.
* Remove dead code from pyrepl
`msg_at_bottom` is always true.
* Speed up pyrepl's screen rendering computation
The Reader in pyrepl doesn't hold a complete representation of the
screen area being drawn as persistent state. Instead, it recomputes it,
on each keypress. This is fast enough for a few hundred bytes, but
incredibly slow as the input buffer grows into the kilobytes (likely
because of pasting).
Rather than making some expensive and expansive changes to the repl's
internal representation of the screen, add some caching: remember some
data from one refresh to the next about what was drawn to the screen
and, if we don't find anything that has invalidated the results that
were computed last time around, reuse them. To keep this caching as
simple as possible, all we'll do is look for lines in the buffer that
were above the cursor the last time we were asked to update the screen,
and that are still above the cursor now. We assume that nothing can
affect a line that comes before both the old and new cursor location
without us being informed. Based on this assumption, we can reuse old
lines, which drastically speeds up the overwhelmingly common case where
the user is typing near the end of the buffer.
* Speed up pyrepl prompt drawing
Cache the `can_colorize()` call rather than repeatedly recomputing it.
This call looks up an environment variable, and is called once per
character typed at the REPL. The environment variable lookup shows up as
a hot spot when profiling, and we don't expect this to change while the
REPL is running.
* Speed up pasting multiple lines into the REPL
Previously, we were checking whether the command should be accepted each
time a line break was encountered, but that's not the expected behavior.
In bracketed paste mode, we expect everything pasted to be part of
a single block of code, and encountering a newline shouldn't behave like
a user pressing <Enter> to execute a command. The user should always
have a chance to review the pasted command before running it.
* Use a read buffer for input in pyrepl
Previously we were reading one byte at a time, which causes much slower
IO than necessary. Instead, read in chunks, processing previously read
data before asking for more.
* Optimize finding width of a single character
`wlen` finds the width of a multi-character string by adding up the
width of each character, and then subtracting the width of any escape
sequences. It's often called for single character strings, however,
which can't possibly contain escape sequences. Optimize for that case.
* Optimize disp_str for ASCII characters
Since every ASCII character is known to display as single width, we can
avoid not only the Unicode data lookup in `disp_str` but also the one
hidden in `str_width` for them.
* Speed up cursor movements in long pyrepl commands
When the current pyrepl command buffer contains many lines, scrolling up
becomes slow. We have optimizations in place to reuse lines above the
cursor position from one refresh to the next, but don't currently try to
reuse lines below the cursor position in the same way, so we wind up
with quadratic behavior where all lines of the buffer below the cursor
are recomputed each time the cursor moves up another line.
Optimize this by only computing one screen's worth of lines beyond the
cursor position. Any lines beyond that can't possibly be shown by the
console, and bounding this makes scrolling up have linear time
complexity instead.
---------
Signed-off-by: Matt Wozniski <mwozniski@bloomberg.net>
Co-authored-by: Pablo Galindo <pablogsal@gmail.com>
In order to patch flask.g e.g. as in #84982, that
proxies getattr must not be invoked. For that,
mock must not try to read from the original
object. In some cases that is unavoidable, e.g.
when doing autospec. However, patch("flask.g",
new_callable=MagicMock) should be entirely safe.
If the Helper() class was initialized with an output, the topics, keywords
and symbols help still use the pager instead of the output.
Change the behavior so the output is used if available while keeping the
previous behavior if no output was configured.
The tests were accidentally disabled by 2da0dc0, which didn't handle classes correctly.
I considered updating no_rerun() to support classes, but the way test_datetime.py works would have made things fairly messy. Plus, it looks like the refleaks we had encountered before have been resolved.
In `glob._Globber`, move pathlib-specific methods to `pathlib._abc.PathGlobber` and replace them with abstract methods. Rename `glob._Globber` to `glob._GlobberBase`. As a result, the `glob` module is no longer befouled by code that can only ever apply to pathlib.
No change of behaviour.
Adjust DeprecationWarning when testing element truth values in ElementTree, we're planning to go with the more natural True return rather than a disruptive harder to code around exception raise, and are deferring the behavior change for a few more releases.
Add private `posixpath._realpath()` function, which is a generic version of `realpath()` that can be parameterised with string tokens (`sep`, `curdir`, `pardir`) and query functions (`getcwd`, `lstat`, `readlink`). Also add support for limiting the number of symlink traversals.
In the private `pathlib._abc.PathBase` class, call `posixpath._realpath()` and remove our re-implementation of the same algorithm.
No change to any public APIs, either in `posixpath` or `pathlib`.
Co-authored-by: Nice Zombies <nineteendo19d0@gmail.com>
Some time strings that contain fractional hours or minutes are permitted
by ISO 8601, but such strings are very unlikely to be intentional. The
current parser does not parse such strings correctly or raise an error.
This change raises a ValueError when hours or minutes contain a decimal mark.
Co-authored-by: blurb-it[bot] <43283697+blurb-it[bot]@users.noreply.github.com>
This matches the default GIL switch interval. It greatly speeds up the
free-threaded build: previously, it spent nearly all its time in
`gc.collect()`.
The `test_imaplib` was taking 40+ minutes in the refleak build bots because
the tests waiting on a client `self._setup()` was creating a client that
prevented progress until its connection timed out, which scaled with the
global timeout.
We should set `connect=False` for the tests that don't want `_setup()` to
create a client.
Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
The process is expected to time out. In the refleak builds,
`support.SHORT_TIMEOUT` is often five minutes and we run the tests six
times, so test_signal was taking >30 minutes.
The free-threaded build currently immortalizes objects that use deferred
reference counting (see gh-117783). This typically happens once the
first non-main thread is created, but the behavior can be suppressed for
tests, in subinterpreters, or during a compile() call.
This fixes a race condition involving the tracking of whether the
behavior is suppressed.
Remove the delegation of `int` to the `__trunc__` special method: `int` will now only delegate to `__int__` and `__index__` (in that order). `__trunc__` continues to exist, but its sole purpose is to support `math.trunc`.
---------
Co-authored-by: Bénédikt Tran <10796600+picnixz@users.noreply.github.com>
Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
Implement `shutil._rmtree_safe_fd()` using a list as a stack to avoid emitting recursion errors on deeply nested trees.
`shutil._rmtree_unsafe()` was fixed in a150679f90.
* Add docs for new APIs
* Add soft-deprecation notices
* Add What's New porting entries
* Update comments referencing `PyFrame_LocalsToFast()` to mention the proxy instead
* Other related cleanups found when looking for refs to the deprecated APIs
Support non-dict globals in LOAD_FROM_DICT_OR_GLOBALS
The implementation basically copies LOAD_GLOBAL. Possibly it could be deduplicated,
but that seems like it may get hairy since the two operations have different operands.
This is important to fix in 3.14 for PEP 649, but it's a bug in earlier versions too,
and we should backport to 3.13 and 3.12 if possible.
Make sure that `gilstate_counter` is not zero in when calling
`PyThreadState_Clear()`. A destructor called from `PyThreadState_Clear()` may
call back into `PyGILState_Ensure()` and `PyGILState_Release()`. If
`gilstate_counter` is zero, it will try to create a new thread state before
the current active thread state is destroyed, leading to an assertion failure
or crash.
When using the ** operator or pow() with Fraction as the base
and an exponent that is not rational, a float, or a complex, the
fraction is no longer converted to a float.
* Passing a string as the "real" keyword argument is now an error;
it should only be passed as a single positional argument.
* Passing a complex number as the "real" or "imag" argument is now deprecated;
it should only be passed as a single positional argument.
* Share common classes.
* Use exactly representable floats and exact tests.
* Check the sign of zero components.
* Remove duplicated tests (mostly left after merging int and long).
* Reorder tests in more consistent way.
* Test more error messages.
* Add tests for missed cases.
The Full Grammar specification in the docs omits rule actions, so grammar rules that raise a syntax error looked like valid syntax.
This was solved in ef940de by hiding those rules in the custom syntax highlighter.
This moves all syntax-error alternatives to invalid rules, adds a validator that ensures that actions containing RAISE_SYNTAX_ERROR are in invalid rules, and reverts the syntax highlighter hack.
For silly reasons, pathlib's generic implementation of `walk()` currently
resides in `glob._Globber`. This commit moves it into
`pathlib._abc.PathBase.walk()` where it really belongs, and makes
`pathlib.Path.walk()` call `os.walk()`.
Make `shutil._rmtree_unsafe()` call `os.walk()`, which is implemented
without recursion.
`shutil._rmtree_safe_fd()` is not affected and can still raise a recursion
error.
Co-authored-by: Jelle Zijlstra <jelle.zijlstra@gmail.com>
If one calls pow(fractions.Fraction, x, module) with modulo not None, the error message now says that the types are incompatible rather than saying pow only takes 2 arguments. Implemented by having fractions.Fraction __pow__ accept optional modulo argument and return NotImplemented if not None. pow() then raises with appropriate message.
---------
Co-authored-by: Mark Dickinson <dickinsm@gmail.com>
* gh-118673: Remove shebang and executable bits from stdlib modules.
* Removed shebangs and exe bits on turtledemo scripts.
The setting was inappropriate for '__main__' and inconsistent across the other modules. The scripts can still be executed directly by invoking with the desired interpreter.
Structure layout, and especially bitfields, sometimes resulted in clearly
wrong behaviour like overlapping fields. This fixes
Co-authored-by: Gregory P. Smith <gps@python.org>
Co-authored-by: Petr Viktorin <encukou@gmail.com>
This is minimal support. Subinterpreters are not supported yet. That will be addressed in a later change.
Co-authored-by: Eric Snow <ericsnowcurrently@gmail.com>
The fix in gh-119561 introduced an assertion that doesn't hold true if any of the three new test extension modules are loaded more than once. This is fine normally but breaks if the new test_check_state_first() is run more than once, which happens for refleak checking and with the regrtest --forever flag. We fix that here by clearing each of the three modules after loading them. We also tweak a check in _modules_by_index_check().
pathlib now treats "`.`" as a valid file extension (suffix). This brings
it in line with `os.path.splitext()`.
In the (private) pathlib ABCs, we add a new `ParserBase.splitext()` method
that splits a path into a `(root, ext)` pair, like `os.path.splitext()`.
This method is called by `PurePathBase.stem`, `suffix`, etc. In a future
version of pathlib, we might make these base classes public, and so users
will be able to define their own `splitext()` method to control file
extension splitting.
In `pathlib.PurePath` we add optimised `stem`, `suffix` and `suffixes`
properties that don't use `splitext()`, which avoids computing the path
base name twice.
The assertion was added in gh-118532 but was based on the invalid assumption that PyState_FindModule() would only be called with an already-initialized module def. I've added a test to make sure we don't make that assumption again.
``_fancy_replace()`` is no longer recursive. and a single call does a worst-case linear number of ratio() computations instead of quadratic. This renders toothless a universe of pathological cases. Some inputs may produce different output, but that's rare, and I didn't find a case where the final diff appeared to be of materially worse quality. To the contrary, by refusing to even consider synching on lines "far apart", there was more easy-to-digest locality in the output.
`Out-File -Encoding utf8` and similar commands in Windows Powershell 5.1 emit
UTF-8 with a BOM marker, which the regular `utf-8` codec decodes incorrectly.
`utf-8-sig` accepts a BOM, but also works correctly without one.
This change also makes .pth files match the way Python source files are handled.
Co-authored-by: Inada Naoki <songofacandy@gmail.com>
Add socket.VMADDR_CID_LOCAL constant.
Fix ThreadedVSOCKSocketStreamTest: if get_cid() returns the host
address or the "any" address, use the local communication address
(loopback): VMADDR_CID_LOCAL.
On Linux 6.9, apparently, the /dev/vsock device is now available but
get_cid() returns VMADDR_CID_ANY (-1).
`drop_gil()` assumes that its caller is attached, which means that the current
thread holds the GIL if and only if the GIL is enabled, and the enabled-state
of the GIL won't change. This isn't true, though, because `detach_thread()`
calls `_PyEval_ReleaseLock()` after detaching and
`_PyThreadState_DeleteCurrent()` calls it after removing the current thread
from consideration for stop-the-world requests (effectively detaching it).
Fix this by remembering whether or not a thread acquired the GIL when it last
attached, in `PyThreadState._status.holds_gil`, and check this in `drop_gil()`
instead of `gil->enabled`.
This fixes a crash in `test_multiprocessing_pool_circular_import()`, so I've
reenabled it.
Track all pairs achieving the best ratio in Differ(). This repairs the "very deep recursion and cubic time" bad cases in a way that preserves previous output.
Add `Py_BEGIN_CRITICAL_SECTION_SEQUENCE_FAST` and
`Py_END_CRITICAL_SECTION_SEQUENCE_FAST` macros and update `str.join` to use
them. Also add a regression test that would crash reliably without this
patch.
As reported in #117847 and #115366, an unpaired backtick in a docstring
tends to confuse e.g. Sphinx running on subclasses of standard library
objects, and the typographic style of using a backtick as an opening
quote is no longer in favor. Convert almost all uses of the form
The variable `foo' should do xyz
to
The variable 'foo' should do xyz
and also fix up miscellaneous other unpaired backticks (extraneous /
missing characters).
No functional change is intended here other than in human-readable
docstrings.
_PyArg_Parser holds static global data generated for modules by Argument Clinic. The _PyArg_Parser.kwtuple field is a tuple object, even though it's stored within a static global. In some cases the tuple is statically allocated and thus it's okay that it gets shared by multiple interpreters. However, in other cases the tuple is set lazily, allocated from the heap using the active interprepreter at the point the tuple is needed.
This is a problem once that interpreter is destroyed since _PyArg_Parser.kwtuple becomes at dangling pointer, leading to crashes. It isn't a problem if the tuple is allocated under the main interpreter, since its lifetime is bound to the lifetime of the runtime. The solution here is to temporarily switch to the main interpreter. The alternative would be to always statically allocate the tuple.
This change also fixes a bug where only the most recent parser was added to the global linked list.
Fix regression introduced in gh-100884: AttributeError when re-fold a long
address list.
Also fix more cases of incorrect encoding of the address separator in the
address list missed in gh-100884.
* bpo-15987: Implement ast.compare
Add a compare() function that compares two ASTs for structural equality. There are two set of attributes on AST node objects, fields and attributes. The fields are always compared, since they represent the actual structure of the code. The attributes can be optionally be included in the comparison. Attributes capture things like line numbers of column offsets, so comparing them involves test whether the layout of the program text is the same. Since whitespace seems inessential for comparing ASTs, the default is to compare fields but not attributes.
ASTs are just Python objects that can be modified in arbitrary ways. The API for ASTs is under-specified in the presence of user modifications to objects. The comparison respects modifications to fields and attributes, and to _fields and _attributes attributes. A user could create obviously malformed objects, and the code will probably fail with an AttributeError when that happens. (For example, adding "spam" to _fields but not adding a "spam" attribute to the object.)
Co-authored-by: Jeremy Hylton <jeremy@alum.mit.edu>
The PEP 649 implementation will require a way to load NotImplementedError
from the bytecode. @markshannon suggested implementing this by converting
LOAD_ASSERTION_ERROR into a more general mechanism for loading constants.
This PR adds this new opcode. I will work on the rest of the implementation
of the PEP separately.
Co-authored-by: Irit Katriel <1055913+iritkatriel@users.noreply.github.com>
* expand on What's New entry for PEP 667 (including porting notes)
* define 'optimized scope' as a glossary term
* cover comprehensions and generator expressions in locals() docs
* review all mentions of "locals" in documentation (updating if needed)
* review all mentions of "f_locals" in documentation (updating if needed)
regrtest test runner: Add XML support to the refleak checker
(-R option).
* run_unittest() now stores XML elements as string, rather than
objects, in support.junit_xml_list.
* runtest_refleak() now saves/restores XML strings before/after
checking for reference leaks. Save XML into a temporary file.
* Fix for email.generator.Generator with whitespace between encoded words.
email.generator.Generator currently does not handle whitespace between
encoded words correctly when the encoded words span multiple lines. The
current generator will create an encoded word for each line. If the end
of the line happens to correspond with the end real word in the
plaintext, the generator will place an unencoded space at the start of
the subsequent lines to represent the whitespace between the plaintext
words.
A compliant decoder will strip all the whitespace from between two
encoded words which leads to missing spaces in the round-tripped
output.
The fix for this is to make sure that whitespace between two encoded
words ends up inside of one or the other of the encoded words. This
fix places the space inside of the second encoded word.
A second problem happens with continuation lines. A continuation line that
starts with whitespace and is followed by a non-encoded word is fine because
the newline between such continuation lines is defined as condensing to
a single space character. When the continuation line starts with whitespace
followed by an encoded word, however, the RFCs specify that the word is run
together with the encoded word on the previous line. This is because normal
words are filded on syntactic breaks by encoded words are not.
The solution to this is to add the whitespace to the start of the encoded word
on the continuation line.
Test cases are from #92081
* Rename a variable so it's not confused with the final variable.
Code from https://github.com/pulkin, in PR
https://github.com/python/cpython/pull/119131
Greatly speeds `Differ` when there are many identically scoring pairs, by splitting the recursion near the inputs' midpoints instead of degenerating (as now) into just peeling off the first two lines.
Co-authored-by: Tim Peters <tim.peters@gmail.com>
Various test bots (outside the ones GH normally runs) are timing out during test_int after ecd8664 (asymptotically faster str->int). Best guess is that they don't build the C _decimal module. So require that module in the most likely tests to time out then. Flying mostly blind, though!
Asymptotically faster (O(n log n)) str->int for very large strings, leveraging the faster multiplication scheme in the C-coded `_decimal` when available. This is used instead of the current Karatsuba-limited method starting at 2 million digits.
Lots of opportunity remains for fine-tuning. Good targets include changing BYTELIM, and possibly changing the internal output base (from 256 to a higher number of bytes).
Doing this was substantial work, and many of the new lines are actually comments giving correctness proofs. The obvious approaches sticking to integers were too slow to be useful, so this is doing variable-precision decimal floating-point arithmetic. Much faster, but worst-possible rounding errors have to be wholly accounted for, using as little precision as possible.
Special thanks to Serhiy Storchaka for asking many good questions in his code reviews!
Co-authored-by: Jelle Zijlstra <jelle.zijlstra@gmail.com>
Co-authored-by: sstandre <43125375+sstandre@users.noreply.github.com>
Co-authored-by: Pieter Eendebak <pieter.eendebak@gmail.com>
Co-authored-by: Nice Zombies <nineteendo19d0@gmail.com>