The purpose of the `unicodedata.is_normalized` function is to answer
the question `str == unicodedata.normalized(form, str)` more
efficiently than writing just that, by using the "quick check"
optimization described in the Unicode standard in UAX #15.
However, it turns out the code doesn't implement the full algorithm
from the standard, and as a result we often miss the optimization and
end up having to compute the whole normalized string after all.
Implement the standard's algorithm. This greatly speeds up
`unicodedata.is_normalized` in many cases where our partial variant
of quick-check had been returning MAYBE and the standard algorithm
returns NO.
At a quick test on my desktop, the existing code takes about 4.4 ms/MB
(so 4.4 ns per byte) when the partial quick-check returns MAYBE and it
has to do the slow normalize-and-compare:
$ build.base/python -m timeit -s 'import unicodedata; s = "\uf900"*500000' \
-- 'unicodedata.is_normalized("NFD", s)'
50 loops, best of 5: 4.39 msec per loop
With this patch, it gets the answer instantly (58 ns) on the same 1 MB
string:
$ build.dev/python -m timeit -s 'import unicodedata; s = "\uf900"*500000' \
-- 'unicodedata.is_normalized("NFD", s)'
5000000 loops, best of 5: 58.2 nsec per loop
This restores a small optimization that the original version of this
code had for the `unicodedata.normalize` use case.
With this, that case is actually faster than in master!
$ build.base/python -m timeit -s 'import unicodedata; s = "\u0338"*500000' \
-- 'unicodedata.normalize("NFD", s)'
500 loops, best of 5: 561 usec per loop
$ build.dev/python -m timeit -s 'import unicodedata; s = "\u0338"*500000' \
-- 'unicodedata.normalize("NFD", s)'
500 loops, best of 5: 512 usec per loop
Extending the hover delay in test_tooltip should avoid spurious test_idle failures.
One longer delay instead of two shorter delays results in a net speedup.
Fixes a case in which email._header_value_parser.get_unstructured hangs the system for some invalid headers. This covers the cases in which the header contains either:
- a case without trailing whitespace
- an invalid encoded word
https://bugs.python.org/issue37764
This fix should also be backported to 3.7 and 3.8
https://bugs.python.org/issue37764
Fix a ctypes regression of Python 3.8. When a ctypes.Structure is
passed by copy to a function, ctypes internals created a temporary
object which had the side effect of calling the structure finalizer
(__del__) twice. The Python semantics requires a finalizer to be
called exactly once. Fix ctypes internals to no longer call the
finalizer twice.
Create a new internal StructParam_Type which is only used by
_ctypes_callproc() to call PyMem_Free(ptr) on Py_DECREF(argument).
StructUnionType_paramfunc() creates such object.
Adds a link to `dateutil.parser.isoparse` in the documentation.
It would be nice to set up intersphinx for things like this, but I think we can leave that for a separate PR.
CC: @pitrou
[bpo-37979](https://bugs.python.org/issue37979)
https://bugs.python.org/issue37979
Automerge-Triggered-By: @pitrou
* Fix call_matcher for mock when using methods
* Add NEWS entry
* Use None check and convert doctest to unittest
* Use better name for mock in tests. Handle _SpecState when the attribute was not accessed and add tests.
* Use reset_mock instead of reinitialization. Change inner class constructor signature for check
* Reword comment regarding call object lookup logic
- drop TargetScopeError in favour of raising SyntaxError directly
as per the updated PEP 572
- comprehension iteration variables are explicitly local, but
named expression targets in comprehensions are nonlocal or
global. Raise SyntaxError as specified in PEP 572
- named expression targets in the outermost iterable of a
comprehension have an ambiguous target scope. Avoid resolving
that question now by raising SyntaxError. PEP 572
originally required this only for cases where the bound name
conflicts with the iteration variable in the comprehension,
but CPython can't easily restrict the exception to that case
(as it doesn't know the target variable names when visiting
the outermost iterator expression)
These were caused by keeping around a reference to the Squeezer
instance and calling it's load_font() upon config changes, which
sometimes happened even if the shell window no longer existed.
This change completely removes that mechanism, instead having the
editor window properly update its width attribute, which can then
be used by Squeezer.
* fix Path._add_implied_dirs to include all implied directories
* fix Path._add_implied_dirs to include all implied directories
* Optimize code by using sets instead of lists
* 📜🤖 Added by blurb_it.
* fix Path._add_implied_dirs to include all implied directories
* Optimize code by using sets instead of lists
* 📜🤖 Added by blurb_it.
* Add tests to zipfile.Path.iterdir() fix
* Update test for zipfile.Path.iterdir()
* remove whitespace from test file
* Rewrite NEWS blurb to describe the user-facing impact and avoid implementation details.
* remove redundant [] within set comprehension
* Update to use unique_everseen to maintain order and other suggestions in review
* remove whitespace and add back add_dirs in tests
* Add new standalone function parents using posixpath to get parents of a directory
* removing whitespace (sorry)
* Remove import pathlib from zipfile.py
* Rewrite _parents as a slice on a generator of the ancestry of a path.
* Remove check for '.' and '/', now that parents no longer returns those.
* Separate calculation of implied dirs from adding those
* Re-use _implied_dirs in tests for generating zipfile with dir entries.
* Replace three fixtures (abcde, abcdef, abde) with one representative example alpharep.
* Simplify implementation of _implied_dirs by collapsing the generation of parent directories for each name.
PyConfig_Read() is now responsible to handle early calls to
PySys_AddXOption() and PySys_AddWarnOption().
Options added by PySys_AddXOption() are now handled the same way than
PyConfig.xoptions and command line -X options.
For example, PySys_AddXOption(L"faulthandler") enables faulthandler
as expected.
empty_argv is no longer static in Python 3.8, but it is declared in
a temporary scope, whereas argv keeps a reference to it.
empty_argv memory (allocated on the stack) is reused by
make_sys_argv() code which is inlined when using gcc -O3.
Define empty_argv in PySys_SetArgvEx() body, to ensure
that it remains valid for the whole lifetime of
the PySys_SetArgvEx() call.
Special characters in email address header display names are normally
put within double quotes. However, encoded words (=?charset?x?...?=) are
not allowed withing double quotes. When the header contains a word with
special characters and another word that must be encoded, the first one
must also be encoded.
In the next example, the display name in the From header is quoted and
therefore the comma is allowed; in the To header, the comma is not
within quotes and not encoded, which is not allowed and therefore
rejected by some mail servers.
From: "Foo Bar, France" <foo@example.com>
To: Foo Bar, =?utf-8?q?Espa=C3=B1a?= <foo@example.com>
https://bugs.python.org/issue37482
The activation scripts generated by venv were inconsistent in how they changed the shell's prompt. Some used `__VENV_PROMPT__` exclusively, some used `__VENV_PROMPT__` if it was set even though by default `__VENV_PROMPT__` is always set and the fallback matched the default, and one ignored `__VENV_PROMPT__` and used `__VENV_NAME__` instead (and even used a differing format to the default prompt). This change now has all activation scripts use `__VENV_PROMPT__` only and relies on the fact that venv sets that value by default.
The color of the customization is also now set in fish to the blue from the Python logo for as hex color support is built into that shell (much like PowerShell where the built-in green color is used).
bpo-37834: Normalise handling of reparse points on Windows
* ntpath.realpath() and nt.stat() will traverse all supported reparse points (previously was mixed)
* nt.lstat() will let the OS traverse reparse points that are not name surrogates (previously would not traverse any reparse point)
* nt.[l]stat() will only set S_IFLNK for symlinks (previous behaviour)
* nt.readlink() will read destinations for symlinks and junction points only
bpo-1311: os.path.exists('nul') now returns True on Windows
* nt.stat('nul').st_mode is now S_IFCHR (previously was an error)
Fix codecs.lookup() to normalize the encoding name the same way
than encodings.normalize_encoding(), except that codecs.lookup()
also converts the name to lower case.
The faulthandler module no longer allocates its alternative stack at
Python startup. Now the stack is only allocated at the first
faulthandler usage.
faulthandler no longer ignores memory allocation failure when
allocating the stack. sigaltstack() failure now raises an OSError
exception, rather than being ignored.
The alternative stack is no longer used if sigaction() is
not available. In practice, sigaltstack() should only be available
when sigaction() is avaialble, so this change should have no effect
in practice.
faulthandler.dump_traceback_later() internal locks are now only
allocated at the first dump_traceback_later() call, rather than
always being allocated at Python startup.
* Write a message when killing a worker process
* Put a timeout on the second popen.communicate() call
(after killing the process)
* Put a timeout on popen.wait() call
* Catch popen.kill() and popen.wait() exceptions
There are plenty of legitimate scripts in the tree that begin with a
`#!`, but also a few that seem to be marked executable by mistake.
Found them with this command -- it gets executable files known to Git,
filters to the ones that don't start with a `#!`, and then unmarks
them as executable:
$ git ls-files --stage \
| perl -lane 'print $F[3] if (!/^100644/)' \
| while read f; do
head -c2 "$f" | grep -qxF '#!' \
|| chmod a-x "$f"; \
done
Looking at the list by hand confirms that we didn't sweep up any
files that should have the executable bit after all. In particular
* The `.psd` files are images from Photoshop.
* The `.bat` files sure look like things that can be run.
But we have lots of other `.bat` files, and they don't have
this bit set, so it must not be needed for them.
Automerge-Triggered-By: @benjaminp
The fact that keyword names are strings is now part of the vectorcall and `METH_FASTCALL` protocols. The biggest concrete change is that `_PyStack_UnpackDict` now checks that and raises `TypeError` if not.
CC @markshannon @vstinner
https://bugs.python.org/issue37540
Base PR for other PRs that want to play with `type.__call__` such as #13930 and #14589.
The author is really @markshannon I just made the PR.
https://bugs.python.org/issue37207
Automerge-Triggered-By: @encukou
faulthandler now allocates a dedicated stack of SIGSTKSZ*2 bytes,
instead of just SIGSTKSZ bytes. Calling the previous signal handler
in faulthandler signal handler uses more than SIGSTKSZ bytes of stack
memory on some platforms.
FreeBSD implementation of poll(2) restricts the timeout argument to be
either zero, or positive, or equal to INFTIM (-1).
Unless otherwise overridden, socket timeout defaults to -1. This value
is then converted to milliseconds (-1000) and used as argument to the
poll syscall. poll returns EINVAL (22), and the connection fails.
This bug was discovered during the EINTR handling testing, and the
reproduction code can be found in
https://bugs.python.org/issue23618 (see connect_eintr.py,
attached). On GNU/Linux, the example runs as expected.
This change is trivial:
If the supplied timeout value is negative, truncate it to -1.
* bpo-37256: Wording in Request class docs
* 📜🤖 Added by blurb_it.
* Update Misc/NEWS.d/next/Documentation/2019-07-16-14-48-12.bpo-37256.qJTrBb.rst
Co-Authored-By: Kyle Stanley <aeros167@gmail.com>
- Remove use of replacement text in the script
- Make use of the pyvenv.cfg file for prompt value.
- Add parameters to allow more flexibility
- Make use of the current path, and assumptions about where env puts things, to compensate
- Make the script a bit more 'idiomatic' Powershell
- Add script documentation (Get-Help .\.venv\Scripts\Activate.ps1 shows PS help page now
This should fix the IndexError trying to retrieve `DisplayName.display_name` and `DisplayName.value` when the `value` is basically an empty string.
https://bugs.python.org/issue32178
DeprecationWarning will continue to be emitted for invalid escape
sequences in string and bytes literals just as it did in 3.7.
SyntaxWarning may be emitted in the future. But per mailing list
discussion, we don't yet know when because we haven't settled on how to
do so in a non-disruptive manner.
(Applies 4c5b6bac24 to the master branch).
(This is https://github.com/python/cpython/pull/15142 for master/3.9)
https://bugs.python.org/issue32912
Automerge-Triggered-By: @gpshead
This fixes an inconsistency between the Python and C implementations of
the datetime module. The pure python version of the code was not
accepting offsets greater than 23:59 but less than 24:00. This is an
accidental legacy of the original implementation, which was put in place
before tzinfo allowed sub-minute time zone offsets.
GH-14878
There was a discrepancy between the Python and C implementations.
Add singletons ALWAYS_EQ, LARGEST and SMALLEST in test.support
to test mixed type comparison.
Imports now raise `TypeError` instead of `ValueError` for relative import failures. This makes things consistent between `builtins.__import__` and `importlib.__import__` as well as using a more natural import for the failure.
https://bugs.python.org/issue37444
Automerge-Triggered-By: @brettcannon
Previously pdb checked the $HOME environmental variable
to find the user .pdbrc. If $HOME is not set, the user
.pdbrc would not be found.
Change pdb to use `os.path.expanduser('~')` to determine
the user's home directory. Thus, if $HOME is not set (as
in tox or on Windows), os.path.expanduser('~') falls
back on other techniques for locating the user's home
directory.
This follows pip's implementation for loading .piprc.
Co-authored-by: Dan Lidral-Porter <dlp@aperiodic.org>
Support for RFCOMM, L2CAP, HCI, SCO is based on the BTPROTO_* macros
being defined. Winsock only supports RFCOMM, even though it has a
BTHPROTO_L2CAP macro. L2CAP support would build on windows, but not
necessarily work.
This also adds some basic unittests for constants (all of which existed
prior to this commit, just not on windows) and creating sockets.
pair: Nate Duarte <slacknate@gmail.com>
* bpo-37742: Return the root logger when logging.getLogger('root') is called.
* Added type check guard on logger name in logging.getLogger() and refined a test.
BPO -16970: Adding error message for invalid args
Applied the patch argparse-v2 patch issue 16970, ran patch check and the test suite, test_argparse with 0 errors
https://bugs.python.org/issue16970
This changeset increases the default size of the stack
for threads on macOS to the size of the stack
of the main thread and reenables the relevant
recursion test.
Expose the CAN_BCM SocketCAN constants used in the bcm_msg_head struct
flags (provided by <linux/can/bcm.h>) under the socket library.
This adds the following constants with a CAN_BCM prefix:
* SETTIMER
* STARTTIMER
* TX_COUNTEVT
* TX_ANNOUNCE
* TX_CP_CAN_ID
* RX_FILTER_ID
* RX_CHECK_DLC
* RX_NO_AUTOTIMER
* RX_ANNOUNCE_RESUME
* TX_RESET_MULTI_IDX
* RX_RTR_FRAME
* CAN_FD_FRAME
The CAN_FD_FRAME flag was introduced in the 4.8 kernel, while the other
ones were present since SocketCAN drivers were mainlined in 2.6.25. As
such, it is probably unnecessary to guard against these constants being
missing.
Mark some individual tests to skip when --pgo is used. The tests
marked increase the PGO task time significantly and likely don't
help improve optimization of the final executable.
When scanning the string, most characters are valid, so
checking for invalid characters first means never needing
to check the value of strict on valid strings, and only
needing to check it on invalid characters when doing
non-strict parsing of invalid strings.
This provides a measurable reduction in per-character
processing time (~11% in the pre-merge patch testing).
Deprecate the parser module and add a deprecation warning triggered on import and a warning block in the documentation.
https://bugs.python.org/issue37268
Automerge-Triggered-By: @pablogsal
The boxes for the font and highlight samples are now constrained by the overall config dialog size. They gain scrollbars when the when a large font size makes the samples too large for the box.
Reduce the number of unit tests run for the PGO generation task. This
speeds up the task by a factor of about 15x. Running the full unit test
suite is slow. This change may result in a slightly less optimized build
since not as many code branches will be executed. If you are willing to
wait for the much slower build, the old behavior can be restored using
'./configure [..] PROFILE_TASK="-m test --pgo-extended"'. We make no
guarantees as to which PGO task set produces a faster build. Users who
care should run their own relevant benchmarks as results can depend on
the environment, workload, and compiler tool chain.
* Clear name and parent of mock in autospecced objects used with attach_mock
* Add NEWS entry
* Fix reversed order of comparison
* Test child and standalone function calls
* Use a helper function extracting mock to avoid code duplication and refactor tests.