The purpose of the `unicodedata.is_normalized` function is to answer
the question `str == unicodedata.normalized(form, str)` more
efficiently than writing just that, by using the "quick check"
optimization described in the Unicode standard in UAX #15.
However, it turns out the code doesn't implement the full algorithm
from the standard, and as a result we often miss the optimization and
end up having to compute the whole normalized string after all.
Implement the standard's algorithm. This greatly speeds up
`unicodedata.is_normalized` in many cases where our partial variant
of quick-check had been returning MAYBE and the standard algorithm
returns NO.
At a quick test on my desktop, the existing code takes about 4.4 ms/MB
(so 4.4 ns per byte) when the partial quick-check returns MAYBE and it
has to do the slow normalize-and-compare:
$ build.base/python -m timeit -s 'import unicodedata; s = "\uf900"*500000' \
-- 'unicodedata.is_normalized("NFD", s)'
50 loops, best of 5: 4.39 msec per loop
With this patch, it gets the answer instantly (58 ns) on the same 1 MB
string:
$ build.dev/python -m timeit -s 'import unicodedata; s = "\uf900"*500000' \
-- 'unicodedata.is_normalized("NFD", s)'
5000000 loops, best of 5: 58.2 nsec per loop
This restores a small optimization that the original version of this
code had for the `unicodedata.normalize` use case.
With this, that case is actually faster than in master!
$ build.base/python -m timeit -s 'import unicodedata; s = "\u0338"*500000' \
-- 'unicodedata.normalize("NFD", s)'
500 loops, best of 5: 561 usec per loop
$ build.dev/python -m timeit -s 'import unicodedata; s = "\u0338"*500000' \
-- 'unicodedata.normalize("NFD", s)'
500 loops, best of 5: 512 usec per loop
Extending the hover delay in test_tooltip should avoid spurious test_idle failures.
One longer delay instead of two shorter delays results in a net speedup.
Fixes a case in which email._header_value_parser.get_unstructured hangs the system for some invalid headers. This covers the cases in which the header contains either:
- a case without trailing whitespace
- an invalid encoded word
https://bugs.python.org/issue37764
This fix should also be backported to 3.7 and 3.8
https://bugs.python.org/issue37764
Fix a ctypes regression of Python 3.8. When a ctypes.Structure is
passed by copy to a function, ctypes internals created a temporary
object which had the side effect of calling the structure finalizer
(__del__) twice. The Python semantics requires a finalizer to be
called exactly once. Fix ctypes internals to no longer call the
finalizer twice.
Create a new internal StructParam_Type which is only used by
_ctypes_callproc() to call PyMem_Free(ptr) on Py_DECREF(argument).
StructUnionType_paramfunc() creates such object.
Adds a link to `dateutil.parser.isoparse` in the documentation.
It would be nice to set up intersphinx for things like this, but I think we can leave that for a separate PR.
CC: @pitrou
[bpo-37979](https://bugs.python.org/issue37979)
https://bugs.python.org/issue37979
Automerge-Triggered-By: @pitrou
* Fix call_matcher for mock when using methods
* Add NEWS entry
* Use None check and convert doctest to unittest
* Use better name for mock in tests. Handle _SpecState when the attribute was not accessed and add tests.
* Use reset_mock instead of reinitialization. Change inner class constructor signature for check
* Reword comment regarding call object lookup logic
- drop TargetScopeError in favour of raising SyntaxError directly
as per the updated PEP 572
- comprehension iteration variables are explicitly local, but
named expression targets in comprehensions are nonlocal or
global. Raise SyntaxError as specified in PEP 572
- named expression targets in the outermost iterable of a
comprehension have an ambiguous target scope. Avoid resolving
that question now by raising SyntaxError. PEP 572
originally required this only for cases where the bound name
conflicts with the iteration variable in the comprehension,
but CPython can't easily restrict the exception to that case
(as it doesn't know the target variable names when visiting
the outermost iterator expression)
These were caused by keeping around a reference to the Squeezer
instance and calling it's load_font() upon config changes, which
sometimes happened even if the shell window no longer existed.
This change completely removes that mechanism, instead having the
editor window properly update its width attribute, which can then
be used by Squeezer.
* fix Path._add_implied_dirs to include all implied directories
* fix Path._add_implied_dirs to include all implied directories
* Optimize code by using sets instead of lists
* 📜🤖 Added by blurb_it.
* fix Path._add_implied_dirs to include all implied directories
* Optimize code by using sets instead of lists
* 📜🤖 Added by blurb_it.
* Add tests to zipfile.Path.iterdir() fix
* Update test for zipfile.Path.iterdir()
* remove whitespace from test file
* Rewrite NEWS blurb to describe the user-facing impact and avoid implementation details.
* remove redundant [] within set comprehension
* Update to use unique_everseen to maintain order and other suggestions in review
* remove whitespace and add back add_dirs in tests
* Add new standalone function parents using posixpath to get parents of a directory
* removing whitespace (sorry)
* Remove import pathlib from zipfile.py
* Rewrite _parents as a slice on a generator of the ancestry of a path.
* Remove check for '.' and '/', now that parents no longer returns those.
* Separate calculation of implied dirs from adding those
* Re-use _implied_dirs in tests for generating zipfile with dir entries.
* Replace three fixtures (abcde, abcdef, abde) with one representative example alpharep.
* Simplify implementation of _implied_dirs by collapsing the generation of parent directories for each name.
PyConfig_Read() is now responsible to handle early calls to
PySys_AddXOption() and PySys_AddWarnOption().
Options added by PySys_AddXOption() are now handled the same way than
PyConfig.xoptions and command line -X options.
For example, PySys_AddXOption(L"faulthandler") enables faulthandler
as expected.
empty_argv is no longer static in Python 3.8, but it is declared in
a temporary scope, whereas argv keeps a reference to it.
empty_argv memory (allocated on the stack) is reused by
make_sys_argv() code which is inlined when using gcc -O3.
Define empty_argv in PySys_SetArgvEx() body, to ensure
that it remains valid for the whole lifetime of
the PySys_SetArgvEx() call.
Special characters in email address header display names are normally
put within double quotes. However, encoded words (=?charset?x?...?=) are
not allowed withing double quotes. When the header contains a word with
special characters and another word that must be encoded, the first one
must also be encoded.
In the next example, the display name in the From header is quoted and
therefore the comma is allowed; in the To header, the comma is not
within quotes and not encoded, which is not allowed and therefore
rejected by some mail servers.
From: "Foo Bar, France" <foo@example.com>
To: Foo Bar, =?utf-8?q?Espa=C3=B1a?= <foo@example.com>
https://bugs.python.org/issue37482
The activation scripts generated by venv were inconsistent in how they changed the shell's prompt. Some used `__VENV_PROMPT__` exclusively, some used `__VENV_PROMPT__` if it was set even though by default `__VENV_PROMPT__` is always set and the fallback matched the default, and one ignored `__VENV_PROMPT__` and used `__VENV_NAME__` instead (and even used a differing format to the default prompt). This change now has all activation scripts use `__VENV_PROMPT__` only and relies on the fact that venv sets that value by default.
The color of the customization is also now set in fish to the blue from the Python logo for as hex color support is built into that shell (much like PowerShell where the built-in green color is used).
bpo-37834: Normalise handling of reparse points on Windows
* ntpath.realpath() and nt.stat() will traverse all supported reparse points (previously was mixed)
* nt.lstat() will let the OS traverse reparse points that are not name surrogates (previously would not traverse any reparse point)
* nt.[l]stat() will only set S_IFLNK for symlinks (previous behaviour)
* nt.readlink() will read destinations for symlinks and junction points only
bpo-1311: os.path.exists('nul') now returns True on Windows
* nt.stat('nul').st_mode is now S_IFCHR (previously was an error)
Fix codecs.lookup() to normalize the encoding name the same way
than encodings.normalize_encoding(), except that codecs.lookup()
also converts the name to lower case.
The faulthandler module no longer allocates its alternative stack at
Python startup. Now the stack is only allocated at the first
faulthandler usage.
faulthandler no longer ignores memory allocation failure when
allocating the stack. sigaltstack() failure now raises an OSError
exception, rather than being ignored.
The alternative stack is no longer used if sigaction() is
not available. In practice, sigaltstack() should only be available
when sigaction() is avaialble, so this change should have no effect
in practice.
faulthandler.dump_traceback_later() internal locks are now only
allocated at the first dump_traceback_later() call, rather than
always being allocated at Python startup.
* Write a message when killing a worker process
* Put a timeout on the second popen.communicate() call
(after killing the process)
* Put a timeout on popen.wait() call
* Catch popen.kill() and popen.wait() exceptions
There are plenty of legitimate scripts in the tree that begin with a
`#!`, but also a few that seem to be marked executable by mistake.
Found them with this command -- it gets executable files known to Git,
filters to the ones that don't start with a `#!`, and then unmarks
them as executable:
$ git ls-files --stage \
| perl -lane 'print $F[3] if (!/^100644/)' \
| while read f; do
head -c2 "$f" | grep -qxF '#!' \
|| chmod a-x "$f"; \
done
Looking at the list by hand confirms that we didn't sweep up any
files that should have the executable bit after all. In particular
* The `.psd` files are images from Photoshop.
* The `.bat` files sure look like things that can be run.
But we have lots of other `.bat` files, and they don't have
this bit set, so it must not be needed for them.
Automerge-Triggered-By: @benjaminp
The fact that keyword names are strings is now part of the vectorcall and `METH_FASTCALL` protocols. The biggest concrete change is that `_PyStack_UnpackDict` now checks that and raises `TypeError` if not.
CC @markshannon @vstinner
https://bugs.python.org/issue37540
Base PR for other PRs that want to play with `type.__call__` such as #13930 and #14589.
The author is really @markshannon I just made the PR.
https://bugs.python.org/issue37207
Automerge-Triggered-By: @encukou
faulthandler now allocates a dedicated stack of SIGSTKSZ*2 bytes,
instead of just SIGSTKSZ bytes. Calling the previous signal handler
in faulthandler signal handler uses more than SIGSTKSZ bytes of stack
memory on some platforms.
FreeBSD implementation of poll(2) restricts the timeout argument to be
either zero, or positive, or equal to INFTIM (-1).
Unless otherwise overridden, socket timeout defaults to -1. This value
is then converted to milliseconds (-1000) and used as argument to the
poll syscall. poll returns EINVAL (22), and the connection fails.
This bug was discovered during the EINTR handling testing, and the
reproduction code can be found in
https://bugs.python.org/issue23618 (see connect_eintr.py,
attached). On GNU/Linux, the example runs as expected.
This change is trivial:
If the supplied timeout value is negative, truncate it to -1.
* bpo-37256: Wording in Request class docs
* 📜🤖 Added by blurb_it.
* Update Misc/NEWS.d/next/Documentation/2019-07-16-14-48-12.bpo-37256.qJTrBb.rst
Co-Authored-By: Kyle Stanley <aeros167@gmail.com>
- Remove use of replacement text in the script
- Make use of the pyvenv.cfg file for prompt value.
- Add parameters to allow more flexibility
- Make use of the current path, and assumptions about where env puts things, to compensate
- Make the script a bit more 'idiomatic' Powershell
- Add script documentation (Get-Help .\.venv\Scripts\Activate.ps1 shows PS help page now
This should fix the IndexError trying to retrieve `DisplayName.display_name` and `DisplayName.value` when the `value` is basically an empty string.
https://bugs.python.org/issue32178
DeprecationWarning will continue to be emitted for invalid escape
sequences in string and bytes literals just as it did in 3.7.
SyntaxWarning may be emitted in the future. But per mailing list
discussion, we don't yet know when because we haven't settled on how to
do so in a non-disruptive manner.
(Applies 4c5b6bac24 to the master branch).
(This is https://github.com/python/cpython/pull/15142 for master/3.9)
https://bugs.python.org/issue32912
Automerge-Triggered-By: @gpshead
This fixes an inconsistency between the Python and C implementations of
the datetime module. The pure python version of the code was not
accepting offsets greater than 23:59 but less than 24:00. This is an
accidental legacy of the original implementation, which was put in place
before tzinfo allowed sub-minute time zone offsets.
GH-14878
There was a discrepancy between the Python and C implementations.
Add singletons ALWAYS_EQ, LARGEST and SMALLEST in test.support
to test mixed type comparison.
Imports now raise `TypeError` instead of `ValueError` for relative import failures. This makes things consistent between `builtins.__import__` and `importlib.__import__` as well as using a more natural import for the failure.
https://bugs.python.org/issue37444
Automerge-Triggered-By: @brettcannon
Previously pdb checked the $HOME environmental variable
to find the user .pdbrc. If $HOME is not set, the user
.pdbrc would not be found.
Change pdb to use `os.path.expanduser('~')` to determine
the user's home directory. Thus, if $HOME is not set (as
in tox or on Windows), os.path.expanduser('~') falls
back on other techniques for locating the user's home
directory.
This follows pip's implementation for loading .piprc.
Co-authored-by: Dan Lidral-Porter <dlp@aperiodic.org>
Support for RFCOMM, L2CAP, HCI, SCO is based on the BTPROTO_* macros
being defined. Winsock only supports RFCOMM, even though it has a
BTHPROTO_L2CAP macro. L2CAP support would build on windows, but not
necessarily work.
This also adds some basic unittests for constants (all of which existed
prior to this commit, just not on windows) and creating sockets.
pair: Nate Duarte <slacknate@gmail.com>
* bpo-37742: Return the root logger when logging.getLogger('root') is called.
* Added type check guard on logger name in logging.getLogger() and refined a test.
BPO -16970: Adding error message for invalid args
Applied the patch argparse-v2 patch issue 16970, ran patch check and the test suite, test_argparse with 0 errors
https://bugs.python.org/issue16970
This changeset increases the default size of the stack
for threads on macOS to the size of the stack
of the main thread and reenables the relevant
recursion test.
Expose the CAN_BCM SocketCAN constants used in the bcm_msg_head struct
flags (provided by <linux/can/bcm.h>) under the socket library.
This adds the following constants with a CAN_BCM prefix:
* SETTIMER
* STARTTIMER
* TX_COUNTEVT
* TX_ANNOUNCE
* TX_CP_CAN_ID
* RX_FILTER_ID
* RX_CHECK_DLC
* RX_NO_AUTOTIMER
* RX_ANNOUNCE_RESUME
* TX_RESET_MULTI_IDX
* RX_RTR_FRAME
* CAN_FD_FRAME
The CAN_FD_FRAME flag was introduced in the 4.8 kernel, while the other
ones were present since SocketCAN drivers were mainlined in 2.6.25. As
such, it is probably unnecessary to guard against these constants being
missing.
Mark some individual tests to skip when --pgo is used. The tests
marked increase the PGO task time significantly and likely don't
help improve optimization of the final executable.
When scanning the string, most characters are valid, so
checking for invalid characters first means never needing
to check the value of strict on valid strings, and only
needing to check it on invalid characters when doing
non-strict parsing of invalid strings.
This provides a measurable reduction in per-character
processing time (~11% in the pre-merge patch testing).
Deprecate the parser module and add a deprecation warning triggered on import and a warning block in the documentation.
https://bugs.python.org/issue37268
Automerge-Triggered-By: @pablogsal
The boxes for the font and highlight samples are now constrained by the overall config dialog size. They gain scrollbars when the when a large font size makes the samples too large for the box.
Reduce the number of unit tests run for the PGO generation task. This
speeds up the task by a factor of about 15x. Running the full unit test
suite is slow. This change may result in a slightly less optimized build
since not as many code branches will be executed. If you are willing to
wait for the much slower build, the old behavior can be restored using
'./configure [..] PROFILE_TASK="-m test --pgo-extended"'. We make no
guarantees as to which PGO task set produces a faster build. Users who
care should run their own relevant benchmarks as results can depend on
the environment, workload, and compiler tool chain.
* Clear name and parent of mock in autospecced objects used with attach_mock
* Add NEWS entry
* Fix reversed order of comparison
* Test child and standalone function calls
* Use a helper function extracting mock to avoid code duplication and refactor tests.
Add two indent spec methods from editor and Rstrip to existing file.
Tests are not added for indent methods because they need change
in lights of 3.x's prohibition on mixing tabs and spaces.
* bpo-37461: Fix infinite loop in parsing of specially crafted email headers.
Some crafted email header would cause the get_parameter method to run in an
infinite loop causing a DoS attack surface when parsing those headers. This
patch fixes that by making sure the DQUOTE character is handled to prevent
going into an infinite loop.
PyObject_Malloc() and PyObject_Free() inlines pymalloc_alloc and
pymalloc_free partially.
But when PGO is not used, compiler don't know where is the hot part
in pymalloc_alloc and pymalloc_free.
* Only create CodeContext instances for "real" editors windows, but
not e.g. shell or output windows.
* Remove configuration update Tk event fired every second, by having
the editor window ask its code context widget to update when
necessary, i.e. upon font or highlighting updates.
* When code context isn't being shown, avoid having a Tk event fired
every 100ms to check whether the code context needs to be updated.
* Use the editor window's getlineno() method where applicable.
* Update font of the code context widget before the main text widget
As far as I can tell, this infinite loop would be triggered if:
1. The value being folded contains a single word (no spaces) longer than
max_line_length
2. The max_line_length is shorter than the encoding's name + 9
characters.
bpo-36564: https://bugs.python.org/issue36564
The `allow_abbrev` option for ArgumentParser is documented and intended to disable support for unique prefixes of --options, which may sometimes be ambiguous due to deferred parsing.
However, the initial implementation also broke parsing of grouped short flags, such as `-ab` meaning `-a -b` (or `-a=b`). Checking the argument for a leading `--` before rejecting it fixes this.
This was prompted by pytest-dev/pytest#5469, so a backport to at least 3.8 would be great 😄
And this is my first PR to CPython, so please let me know if I've missed anything!
https://bugs.python.org/issue26967
Hi,
I've faced an issue w/ `mailbox.Maildir()`. The case is following:
1. I create a folder with `tempfile.TemporaryDirectory()`, so it's empty
2. I pass that folder path as an argument when instantiating `mailbox.Maildir()`
3. Then I receive an exception happening because "there's no such file or directory" (namely `cur`, `tmp` or `new`) during interaction with Maildir
**Expected result:** subdirs are created during `Maildir()` instance creation.
**Actual result:** subdirs are assumed as existing which leads to exceptions during use.
**Workaround:** remove the actual dir before passing the path to `Maildir()`. It will be created automatically with all subdirs needed.
**Fix:** This PR. Basically it adds creation of subdirs regardless of whether the base dir existed before.
https://bugs.python.org/issue30088
Returns NotImplemented for timedelta and time in __eq__ for different types in Python implementation, which matches the C implementation.
This also adds tests to enforce that these objects will fall back to the right hand side's __eq__ and/or __ne__ implementation.
bpo-37579
Fix importlib examples to insert any newly created modules via importlib.util.module_from_spec() immediately into sys.modules instead of after calling loader.exec_module().
Thanks to Benjamin Mintz for finding the bug.
https://bugs.python.org/issue37521
With the addition of shared memory into Python 3.8, we now have three tests failing on Solaris, namely `test_multiprocessing_fork`, `test_multiprocessing_forkserver` and `test_multiprocessing_spawn`. The reason seems to be incorrect name handling which results in two slashes being prepended.
https://bugs.python.org/issue37558
Keeping an account of allocated blocks slows down _PyObject_Malloc()
and _PyObject_Free() by a measureable amount. Have
_Py_GetAllocatedBlocks() iterate over the arenas to sum up the
allocated blocks for pymalloc.
Nested BinOp instances (e.g. a+b+c) had a wrong col_offset for the
second BinOp (e.g. 2 instead of 0 in the example). Fix it by using the
correct st node to copy the line and col_offset from in ast.c.
This is done to compensate for the extra stack frames added by
IDLE itself, which cause problems when setting the recursion limit
to low values.
This wraps sys.setrecursionlimit() and sys.getrecursionlimit()
as invisibly as possible.
multiprocessing tests now stop the ForkServer instance if it's
running: close the "alive" file descriptor to ask the server to stop
and then remove its UNIX address.
Fix multiprocessing.util.get_temp_dir() finalizer: clear also the
'tempdir' configuration of the current process, so next call to
get_temp_dir() will create a new temporary directory, rather than
reusing the removed temporary directory.
Replacing the deprecated method "random.choose" to "random.choice" was technically not part of the original issue. However, it was discussed in the talk page and involved one of the files being moved. I assumed this was too minor to justify the creation of a separate issue.
Also, I added my name to the contributors list in Misc/ACKS. This will be my third PR to cpython, forgot to do it in the previous ones.
https://bugs.python.org/issue19696
test_distutils.test_build_ext() is now able to remove the temporary
directory on Windows: don't import the newly built C extension ("xx")
in the current process, but test it in a separated process.
test_concurrent_futures now cleans up multiprocessing to remove
immediately temporary directories created by
multiprocessing.util.get_temp_dir().
The test now uses setUpModule() and tearDownModule().
ssl.match_hostname() no longer accepts IPv4 addresses with additional text
after the address and only quad-dotted notation without trailing
whitespaces. Some inet_aton() implementations ignore whitespace and all data
after whitespace, e.g. '127.0.0.1 whatever'.
Short notations like '127.1' for '127.0.0.1' were already filtered out.
The bug was initially found by Dominik Czarnota and reported by Paul Kehrer.
Signed-off-by: Christian Heimes <christian@python.org>
https://bugs.python.org/issue37463
urllib.request tests now call urlcleanup() to remove temporary files
created by urlretrieve() tests and to clear the _opener global
variable set by urlopen() and functions calling indirectly urlopen().
regrtest now checks if urllib.request._url_tempfiles and
urllib.request._opener are changed by tests.
multiprocessing tests now call explicitly _run_finalizers() to remove
immediately temporary directories created by
multiprocessing.util.get_temp_dir().
Python initialization now ensures that sys stream encoding
names are always normalized by codecs.lookup(encoding).name.
Simplify test_c_locale_coercion: it doesn't have to normalize
encoding names anymore.
Under some conditions the earlier fix for bpo-18075, "Infinite recursion
tests triggering a segfault on Mac OS X", now causes failures on macOS
when attempting to change stack limit with resource.setrlimit
resource.RLIMIT_STACK, like regrtest does when running the test suite.
The reverted change had specified a non-default stack size when linking
the python executable on macOS. As of macOS 10.14.4, the previous
code causes a hard failure when running tests, although similar
failures had been seen under some conditions under some earlier
systems. Reverting the change to the interpreter stack size at link
time helped for release builds but caused some tests to fail when
built --with-pydebug. Try the opposite approach: continue to build
the interpreter with an increased stack size on macOS and remove
the failing setrlimit call in regrtest initialization. This will
definitely avoid the resource.RLIMIT_STACK error and should have
no, or fewer, side effects.