The purpose of the `unicodedata.is_normalized` function is to answer
the question `str == unicodedata.normalized(form, str)` more
efficiently than writing just that, by using the "quick check"
optimization described in the Unicode standard in UAX GH-15.
However, it turns out the code doesn't implement the full algorithm
from the standard, and as a result we often miss the optimization and
end up having to compute the whole normalized string after all.
Implement the standard's algorithm. This greatly speeds up
`unicodedata.is_normalized` in many cases where our partial variant
of quick-check had been returning MAYBE and the standard algorithm
returns NO.
At a quick test on my desktop, the existing code takes about 4.4 ms/MB
(so 4.4 ns per byte) when the partial quick-check returns MAYBE and it
has to do the slow normalize-and-compare:
$ build.base/python -m timeit -s 'import unicodedata; s = "\uf900"*500000' \
-- 'unicodedata.is_normalized("NFD", s)'
50 loops, best of 5: 4.39 msec per loop
With this patch, it gets the answer instantly (58 ns) on the same 1 MB
string:
$ build.dev/python -m timeit -s 'import unicodedata; s = "\uf900"*500000' \
-- 'unicodedata.is_normalized("NFD", s)'
5000000 loops, best of 5: 58.2 nsec per loop
This restores a small optimization that the original version of this
code had for the `unicodedata.normalize` use case.
With this, that case is actually faster than in master!
$ build.base/python -m timeit -s 'import unicodedata; s = "\u0338"*500000' \
-- 'unicodedata.normalize("NFD", s)'
500 loops, best of 5: 561 usec per loop
$ build.dev/python -m timeit -s 'import unicodedata; s = "\u0338"*500000' \
-- 'unicodedata.normalize("NFD", s)'
500 loops, best of 5: 512 usec per loop
(cherry picked from commit 2f09413947)
Co-authored-by: Greg Price <gnprice@gmail.com>
* [bpo-21315](https://bugs.python.org/issue21315): Fix parsing of encoded words with missing leading ws.
Because of missing leading whitespace, encoded word would get parsed as
unstructured token. This patch fixes that by looking for encoded words when
splitting tokens with whitespace.
Missing trailing whitespace around encoded word now register a defect
instead.
Original patch suggestion by David R. Murray on [bpo-21315](https://bugs.python.org/issue21315).
(cherry picked from commit 66c4f3f38b)
Co-authored-by: Abhilash Raj <maxking@users.noreply.github.com>
(cherry picked from commit dc20fc4311)
Co-authored-by: Miss Islington (bot) <31488909+miss-islington@users.noreply.github.com>
https://bugs.python.org/issue21315
Extending the hover delay in test_tooltip should avoid spurious test_idle failures.
One longer delay instead of two shorter delays results in a net speedup.
(cherry picked from commit 132acaba5a)
Co-authored-by: Tal Einat <taleinat+github@gmail.com>
Fix a ctypes regression of Python 3.8. When a ctypes.Structure is
passed by copy to a function, ctypes internals created a temporary
object which had the side effect of calling the structure finalizer
(__del__) twice. The Python semantics requires a finalizer to be
called exactly once. Fix ctypes internals to no longer call the
finalizer twice.
Create a new internal StructParam_Type which is only used by
_ctypes_callproc() to call PyMem_Free(ptr) on Py_DECREF(argument).
StructUnionType_paramfunc() creates such object.
(cherry picked from commit 96b4087ce7)
Co-authored-by: Victor Stinner <vstinner@redhat.com>
when built on non-Windows system without fd system call support,
like older versions of macOS.
(cherry picked from commit 7fcc2088a5)
Co-authored-by: Ned Deily <nad@python.org>
Adds a link to `dateutil.parser.isoparse` in the documentation.
It would be nice to set up intersphinx for things like this, but I think we can leave that for a separate PR.
CC: @pitrou
[bpo-37979](https://bugs.python.org/issue37979)
https://bugs.python.org/issue37979
Automerge-Triggered-By: @pitrou
(cherry picked from commit 59725f3bad)
Co-authored-by: Paul Ganssle <paul@ganssle.io>
* Fix call_matcher for mock when using methods
* Add NEWS entry
* Use None check and convert doctest to unittest
* Use better name for mock in tests. Handle _SpecState when the attribute was not accessed and add tests.
* Use reset_mock instead of reinitialization. Change inner class constructor signature for check
* Reword comment regarding call object lookup logic
(cherry picked from commit c96127821e)
Co-authored-by: Xtreak <tir.karthi@gmail.com>
* Define THREAD_STACK_SIZE for AIX to pass default recursion limit test
(cherry picked from commit 9670ce76b8)
Co-authored-by: Michael Felt <aixtools@users.noreply.github.com>
Special characters in email address header display names are normally
put within double quotes. However, encoded words (=?charset?x?...?=) are
not allowed withing double quotes. When the header contains a word with
special characters and another word that must be encoded, the first one
must also be encoded.
In the next example, the display name in the From header is quoted and
therefore the comma is allowed; in the To header, the comma is not
within quotes and not encoded, which is not allowed and therefore
rejected by some mail servers.
From: "Foo Bar, France" <foo@example.com>
To: Foo Bar, =?utf-8?q?Espa=C3=B1a?= <foo@example.com>
https://bugs.python.org/issue37482
(cherry picked from commit df0c21ff46)
Co-authored-by: bsiem <52461103+bsiem@users.noreply.github.com>
- drop TargetScopeError in favour of raising SyntaxError directly
as per the updated PEP 572
- comprehension iteration variables are explicitly local, but
named expression targets in comprehensions are nonlocal or
global. Raise SyntaxError as specified in PEP 572
- named expression targets in the outermost iterable of a
comprehension have an ambiguous target scope. Avoid resolving
that question now by raising SyntaxError. PEP 572
originally required this only for cases where the bound name
conflicts with the iteration variable in the comprehension,
but CPython can't easily restrict the exception to that case
(as it doesn't know the target variable names when visiting
the outermost iterator expression)
(cherry picked from commit 5dbe0f59b7)
These were caused by keeping around a reference to the Squeezer
instance and calling it's load_font() upon config changes, which
sometimes happened even if the shell window no longer existed.
This change completely removes that mechanism, instead having the
editor window properly update its width attribute, which can then
be used by Squeezer.
(cherry picked from commit d4b4c00b57)
Co-authored-by: Tal Einat <taleinat+github@gmail.com>
* fix Path._add_implied_dirs to include all implied directories
* fix Path._add_implied_dirs to include all implied directories
* Optimize code by using sets instead of lists
* 📜🤖 Added by blurb_it.
* fix Path._add_implied_dirs to include all implied directories
* Optimize code by using sets instead of lists
* 📜🤖 Added by blurb_it.
* Add tests to zipfile.Path.iterdir() fix
* Update test for zipfile.Path.iterdir()
* remove whitespace from test file
* Rewrite NEWS blurb to describe the user-facing impact and avoid implementation details.
* remove redundant [] within set comprehension
* Update to use unique_everseen to maintain order and other suggestions in review
* remove whitespace and add back add_dirs in tests
* Add new standalone function parents using posixpath to get parents of a directory
* removing whitespace (sorry)
* Remove import pathlib from zipfile.py
* Rewrite _parents as a slice on a generator of the ancestry of a path.
* Remove check for '.' and '/', now that parents no longer returns those.
* Separate calculation of implied dirs from adding those
* Re-use _implied_dirs in tests for generating zipfile with dir entries.
* Replace three fixtures (abcde, abcdef, abde) with one representative example alpharep.
* Simplify implementation of _implied_dirs by collapsing the generation of parent directories for each name.
(cherry picked from commit a4e2991bdc)
Co-authored-by: shireenrao <shireenrao@gmail.com>
Fix compilation of "break" and "continue" in the
"finally" block when the corresponding "try" block
contains "return" with a non-constant value.
(cherry picked from commit ef61c524dd)
PyConfig_Read() is now responsible to handle early calls to
PySys_AddXOption() and PySys_AddWarnOption().
Options added by PySys_AddXOption() are now handled the same way than
PyConfig.xoptions and command line -X options.
For example, PySys_AddXOption(L"faulthandler") enables faulthandler
as expected.
(cherry picked from commit 120b707a6d)
empty_argv is no longer static in Python 3.8, but it is declared in
a temporary scope, whereas argv keeps a reference to it.
empty_argv memory (allocated on the stack) is reused by
make_sys_argv() code which is inlined when using gcc -O3.
Define empty_argv in PySys_SetArgvEx() body, to ensure
that it remains valid for the whole lifetime of
the PySys_SetArgvEx() call.
(cherry picked from commit c48682509d)