Fix regression introduced in gh-100884: AttributeError when re-fold a long
address list.
Also fix more cases of incorrect encoding of the address separator in the
address list missed in gh-100884.
* Fix for email.generator.Generator with whitespace between encoded words.
email.generator.Generator currently does not handle whitespace between
encoded words correctly when the encoded words span multiple lines. The
current generator will create an encoded word for each line. If the end
of the line happens to correspond with the end real word in the
plaintext, the generator will place an unencoded space at the start of
the subsequent lines to represent the whitespace between the plaintext
words.
A compliant decoder will strip all the whitespace from between two
encoded words which leads to missing spaces in the round-tripped
output.
The fix for this is to make sure that whitespace between two encoded
words ends up inside of one or the other of the encoded words. This
fix places the space inside of the second encoded word.
A second problem happens with continuation lines. A continuation line that
starts with whitespace and is followed by a non-encoded word is fine because
the newline between such continuation lines is defined as condensing to
a single space character. When the continuation line starts with whitespace
followed by an encoded word, however, the RFCs specify that the word is run
together with the encoded word on the previous line. This is because normal
words are filded on syntactic breaks by encoded words are not.
The solution to this is to add the whitespace to the start of the encoded word
on the continuation line.
Test cases are from #92081
* Rename a variable so it's not confused with the final variable.
Only treat '\n', '\r' and '\r\n' as line separators in re-folding the email
messages. Preserve control characters '\v', '\f', '\x1c', '\x1d' and '\x1e'
and Unicode line separators '\x85', '\u2028' and '\u2029' as is.
Determine the support of the Kyiv timezone by checking the result of
astimezone() which uses the system tz database and not the one
populated by zoneinfo.
Detect email address parsing errors and return empty tuple to
indicate the parsing error (old API). Add an optional 'strict'
parameter to getaddresses() and parseaddr() functions. Patch by
Thomas Dwyer.
Co-Authored-By: Thomas Dwyer <github@tomd.tel>
Detect email address parsing errors and return empty tuple to indicate the parsing error (old API). This fixes or at least ameliorates CVE-2023-27043.
---------
Co-authored-by: Gregory P. Smith <greg@krypto.org>
It was raised if the charset itself contains characters not encodable
in UTF-8 (in particular \udcxx characters representing non-decodable
bytes in the source).
Similar to the rewrite of email/mime/image.py and associated test after the
deprecation of imghdr.py, thisrewrites email/mime/audio.py and associated
tests after the deprecation of sndhdr.py.
Closes#91885
* Rewrite imghdr inlining for clarity and completeness
* Move MIMEImage class back closer to the top of the file since it's the
important thing.
* Use a decorate to mark a given rule function and simplify the rule function
names for clarity.
* Copy over all the imghdr test data files into the email package's test data
directory. This way when imghdr is actually removed, it won't affect the
MIMEImage guessing tests.
* Rewrite and extend the MIMEImage tests to test for all supported
auto-detected MIME image subtypes.
* Remove the now redundant PyBanner048.gif data file.
* See https://github.com/python/cpython/pull/91461#discussion_r850313336
Co-authored-by: Oleg Iarygin <dralife@yandex.ru>
Co-authored-by: Oleg Iarygin <dralife@yandex.ru>
* Deprecate imghdr
Co-authored-by: Hugo van Kemenade <hugovk@users.noreply.github.com>
* Update Doc/whatsnew/3.11.rst
Co-authored-by: Hugo van Kemenade <hugovk@users.noreply.github.com>
* Inline `imghdr` into `email.mime.image`
Co-authored-by: Hugo van Kemenade <hugovk@users.noreply.github.com>
Co-authored-by: Barry Warsaw <barry@python.org>
Various date parsing utilities in the email module, such as
email.utils.parsedate(), are supposed to gracefully handle invalid
input, typically by raising an appropriate exception or by returning
None.
The internal email._parseaddr._parsedate_tz() helper used by some of
these date parsing routines tries to be robust against malformed input,
but unfortunately it can still crash ungracefully when a non-empty but
whitespace-only input is passed. This manifests as an unexpected
IndexError.
In practice, this can happen when parsing an email with only a newline
inside a ‘Date:’ header, which unfortunately happens occasionally in the
real world.
Here's a minimal example:
$ python
Python 3.9.6 (default, Jun 30 2021, 10:22:16)
[GCC 11.1.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import email.utils
>>> email.utils.parsedate('foo')
>>> email.utils.parsedate(' ')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python3.9/email/_parseaddr.py", line 176, in parsedate
t = parsedate_tz(data)
File "/usr/lib/python3.9/email/_parseaddr.py", line 50, in parsedate_tz
res = _parsedate_tz(data)
File "/usr/lib/python3.9/email/_parseaddr.py", line 72, in _parsedate_tz
if data[0].endswith(',') or data[0].lower() in _daynames:
IndexError: list index out of range
The fix is rather straight-forward: guard against empty lists, after
splitting on whitespace, but before accessing the first element.
I am re-submitting an older PR which was abandoned but is still relevant, #10783 by @timb07.
The issue being solved () is still relevant. The original PR #10783 was closed as
the final request changes were not applied and since abandoned.
In this new PR I have re-used the original patch plus applied both comments from the review, by @maxking and @pganssle.
For reference, here is the original PR description:
In email.utils.parsedate_to_datetime(), a failure to parse the date, or invalid date components (such as hour outside 0..23) raises an exception. Document this behaviour, and add tests to test_email/test_utils.py to confirm this behaviour.
In email.headerregistry.DateHeader.parse(), check when parsedate_to_datetime() raises an exception and add a new defect InvalidDateDefect; preserve the invalid value as the string value of the header, but set the datetime attribute to None.
Add tests to test_email/test_headerregistry.py to confirm this behaviour; also added test to test_email/test_inversion.py to confirm emails with such defective date headers round trip successfully.
This pull request incorporates feedback gratefully received from @bitdancer, @brettcannon, @Mariatta and @warsaw, and replaces the earlier PR #2254.
Automerge-Triggered-By: GH:warsaw
This PR replaces #1977. The reason for the replacement is two-fold.
The fix itself is different is that if the CTE header doesn't exist in the original message, it is inserted. This is important because the new CTE could be quoted-printable whereas the original is implicit 8bit.
Also the tests are different. The test_nonascii_as_string_without_cte test in #1977 doesn't actually test the issue in that it passes without the fix. The test_nonascii_as_string_without_content_type_and_cte test is improved here, and even though it doesn't fail without the fix, it is included for completeness.
Automerge-Triggered-By: @warsaw