Fixes a case in which email._header_value_parser.get_unstructured hangs the system for some invalid headers. This covers the cases in which the header contains either:
- a case without trailing whitespace
- an invalid encoded word
https://bugs.python.org/issue37764
This fix should also be backported to 3.7 and 3.8
https://bugs.python.org/issue37764
(cherry picked from commit c5b242f87f)
Co-authored-by: Ashwin Ramaswami <aramaswamis@gmail.com>
The purpose of the `unicodedata.is_normalized` function is to answer
the question `str == unicodedata.normalized(form, str)` more
efficiently than writing just that, by using the "quick check"
optimization described in the Unicode standard in UAX GH-15.
However, it turns out the code doesn't implement the full algorithm
from the standard, and as a result we often miss the optimization and
end up having to compute the whole normalized string after all.
Implement the standard's algorithm. This greatly speeds up
`unicodedata.is_normalized` in many cases where our partial variant
of quick-check had been returning MAYBE and the standard algorithm
returns NO.
At a quick test on my desktop, the existing code takes about 4.4 ms/MB
(so 4.4 ns per byte) when the partial quick-check returns MAYBE and it
has to do the slow normalize-and-compare:
$ build.base/python -m timeit -s 'import unicodedata; s = "\uf900"*500000' \
-- 'unicodedata.is_normalized("NFD", s)'
50 loops, best of 5: 4.39 msec per loop
With this patch, it gets the answer instantly (58 ns) on the same 1 MB
string:
$ build.dev/python -m timeit -s 'import unicodedata; s = "\uf900"*500000' \
-- 'unicodedata.is_normalized("NFD", s)'
5000000 loops, best of 5: 58.2 nsec per loop
This restores a small optimization that the original version of this
code had for the `unicodedata.normalize` use case.
With this, that case is actually faster than in master!
$ build.base/python -m timeit -s 'import unicodedata; s = "\u0338"*500000' \
-- 'unicodedata.normalize("NFD", s)'
500 loops, best of 5: 561 usec per loop
$ build.dev/python -m timeit -s 'import unicodedata; s = "\u0338"*500000' \
-- 'unicodedata.normalize("NFD", s)'
500 loops, best of 5: 512 usec per loop
(cherry picked from commit 2f09413947)
Co-authored-by: Greg Price <gnprice@gmail.com>
The HTML5 output from Sphinx 2.x adds '<p>' tags within list elements. Using a new prevtag attribute, ignore these instead of emitting unwanted '\n\n'.
Also stop looking for 'first' classes on tags (no longer present) and fix the bug of double-spacing instead of single spacing after <pre> blocks.
(cherry picked from commit 580bdb0ece)
Co-authored-by: Tal Einat <taleinat+github@gmail.com>
* [bpo-21315](https://bugs.python.org/issue21315): Fix parsing of encoded words with missing leading ws.
Because of missing leading whitespace, encoded word would get parsed as
unstructured token. This patch fixes that by looking for encoded words when
splitting tokens with whitespace.
Missing trailing whitespace around encoded word now register a defect
instead.
Original patch suggestion by David R. Murray on [bpo-21315](https://bugs.python.org/issue21315).
(cherry picked from commit 66c4f3f38b)
Co-authored-by: Abhilash Raj <maxking@users.noreply.github.com>
(cherry picked from commit dc20fc4311)
Co-authored-by: Miss Islington (bot) <31488909+miss-islington@users.noreply.github.com>
https://bugs.python.org/issue21315
Extending the hover delay in test_tooltip should avoid spurious test_idle failures.
One longer delay instead of two shorter delays results in a net speedup.
(cherry picked from commit 132acaba5a)
Co-authored-by: Tal Einat <taleinat+github@gmail.com>
* Fix suspicious.py to actually print the unused rules
* Fix the other `self.warn` calls
(cherry picked from commit e1786b5416)
Co-authored-by: Anthony Sottile <asottile@umich.edu>
Fix a ctypes regression of Python 3.8. When a ctypes.Structure is
passed by copy to a function, ctypes internals created a temporary
object which had the side effect of calling the structure finalizer
(__del__) twice. The Python semantics requires a finalizer to be
called exactly once. Fix ctypes internals to no longer call the
finalizer twice.
Create a new internal StructParam_Type which is only used by
_ctypes_callproc() to call PyMem_Free(ptr) on Py_DECREF(argument).
StructUnionType_paramfunc() creates such object.
(cherry picked from commit 96b4087ce7)
Co-authored-by: Victor Stinner <vstinner@redhat.com>
when built on non-Windows system without fd system call support,
like older versions of macOS.
(cherry picked from commit 7fcc2088a5)
Co-authored-by: Ned Deily <nad@python.org>
Adds a link to `dateutil.parser.isoparse` in the documentation.
It would be nice to set up intersphinx for things like this, but I think we can leave that for a separate PR.
CC: @pitrou
[bpo-37979](https://bugs.python.org/issue37979)
https://bugs.python.org/issue37979
Automerge-Triggered-By: @pitrou
(cherry picked from commit 59725f3bad)
Co-authored-by: Paul Ganssle <paul@ganssle.io>
With `symtable_visit_expr` now correctly adjusting the recursion depth for named
expressions, `symtable_handle_namedexpr` should be leaving it alone.
Also adds a new check to `PySymtable_BuildObject` that raises `SystemError`
if a successful first symbol analysis pass fails to keep the stack depth
accounting clean.
(cherry picked from commit 06145230c8)
Co-authored-by: Nick Coghlan <ncoghlan@gmail.com>
* Fix call_matcher for mock when using methods
* Add NEWS entry
* Use None check and convert doctest to unittest
* Use better name for mock in tests. Handle _SpecState when the attribute was not accessed and add tests.
* Use reset_mock instead of reinitialization. Change inner class constructor signature for check
* Reword comment regarding call object lookup logic
(cherry picked from commit c96127821e)
Co-authored-by: Xtreak <tir.karthi@gmail.com>
* Define THREAD_STACK_SIZE for AIX to pass default recursion limit test
(cherry picked from commit 9670ce76b8)
Co-authored-by: Michael Felt <aixtools@users.noreply.github.com>
Special characters in email address header display names are normally
put within double quotes. However, encoded words (=?charset?x?...?=) are
not allowed withing double quotes. When the header contains a word with
special characters and another word that must be encoded, the first one
must also be encoded.
In the next example, the display name in the From header is quoted and
therefore the comma is allowed; in the To header, the comma is not
within quotes and not encoded, which is not allowed and therefore
rejected by some mail servers.
From: "Foo Bar, France" <foo@example.com>
To: Foo Bar, =?utf-8?q?Espa=C3=B1a?= <foo@example.com>
https://bugs.python.org/issue37482
(cherry picked from commit df0c21ff46)
Co-authored-by: bsiem <52461103+bsiem@users.noreply.github.com>
These appeared in commit c5ae169e1. The comment on them, as well as
the presence among them of a rule for the .gitignore file itself,
indicate that the author intended these lines to remain only in their
own local working tree -- not to get committed even to their own repo,
let alone merged upstream.
They did nevertheless get committed, because it turns out that Git
takes no notice of what .gitignore says about files that it's already
tracking... for example, this .gitignore file itself.
Give effect to these lines' original intention, by deleting them. :-)
Git tip, for reference: the `.git/info/exclude` file is a handy way
to do exactly what these lines were originally intended to do. A
related handy file is `~/.config/git/ignore`. See gitignore(5),
aka `git help ignore`, for details.
https://bugs.python.org/issue37936
Automerge-Triggered-By: @zware
(cherry picked from commit 8c9e9b0cd5)
Co-authored-by: Greg Price <gnprice@gmail.com>