It was raised if the charset itself contains characters not encodable
in UTF-8 (in particular \udcxx characters representing non-decodable
bytes in the source).
Similar to the rewrite of email/mime/image.py and associated test after the
deprecation of imghdr.py, thisrewrites email/mime/audio.py and associated
tests after the deprecation of sndhdr.py.
Closes#91885
* Rewrite imghdr inlining for clarity and completeness
* Move MIMEImage class back closer to the top of the file since it's the
important thing.
* Use a decorate to mark a given rule function and simplify the rule function
names for clarity.
* Copy over all the imghdr test data files into the email package's test data
directory. This way when imghdr is actually removed, it won't affect the
MIMEImage guessing tests.
* Rewrite and extend the MIMEImage tests to test for all supported
auto-detected MIME image subtypes.
* Remove the now redundant PyBanner048.gif data file.
* See https://github.com/python/cpython/pull/91461#discussion_r850313336
Co-authored-by: Oleg Iarygin <dralife@yandex.ru>
Co-authored-by: Oleg Iarygin <dralife@yandex.ru>
* Deprecate imghdr
Co-authored-by: Hugo van Kemenade <hugovk@users.noreply.github.com>
* Update Doc/whatsnew/3.11.rst
Co-authored-by: Hugo van Kemenade <hugovk@users.noreply.github.com>
* Inline `imghdr` into `email.mime.image`
Co-authored-by: Hugo van Kemenade <hugovk@users.noreply.github.com>
Co-authored-by: Barry Warsaw <barry@python.org>
Various date parsing utilities in the email module, such as
email.utils.parsedate(), are supposed to gracefully handle invalid
input, typically by raising an appropriate exception or by returning
None.
The internal email._parseaddr._parsedate_tz() helper used by some of
these date parsing routines tries to be robust against malformed input,
but unfortunately it can still crash ungracefully when a non-empty but
whitespace-only input is passed. This manifests as an unexpected
IndexError.
In practice, this can happen when parsing an email with only a newline
inside a ‘Date:’ header, which unfortunately happens occasionally in the
real world.
Here's a minimal example:
$ python
Python 3.9.6 (default, Jun 30 2021, 10:22:16)
[GCC 11.1.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import email.utils
>>> email.utils.parsedate('foo')
>>> email.utils.parsedate(' ')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python3.9/email/_parseaddr.py", line 176, in parsedate
t = parsedate_tz(data)
File "/usr/lib/python3.9/email/_parseaddr.py", line 50, in parsedate_tz
res = _parsedate_tz(data)
File "/usr/lib/python3.9/email/_parseaddr.py", line 72, in _parsedate_tz
if data[0].endswith(',') or data[0].lower() in _daynames:
IndexError: list index out of range
The fix is rather straight-forward: guard against empty lists, after
splitting on whitespace, but before accessing the first element.
This PR replaces #1977. The reason for the replacement is two-fold.
The fix itself is different is that if the CTE header doesn't exist in the original message, it is inserted. This is important because the new CTE could be quoted-printable whereas the original is implicit 8bit.
Also the tests are different. The test_nonascii_as_string_without_cte test in #1977 doesn't actually test the issue in that it passes without the fix. The test_nonascii_as_string_without_content_type_and_cte test is improved here, and even though it doesn't fail without the fix, it is included for completeness.
Automerge-Triggered-By: @warsaw
Fixes a case in which email._header_value_parser.get_unstructured hangs the system for some invalid headers. This covers the cases in which the header contains either:
- a case without trailing whitespace
- an invalid encoded word
https://bugs.python.org/issue37764
This fix should also be backported to 3.7 and 3.8
https://bugs.python.org/issue37764
* bpo-30835: email: Fix AttributeError when parsing invalid Content-Transfer-Encoding
Parsing an email containing a multipart Content-Type, along with a
Content-Transfer-Encoding containing an invalid (non-ASCII-decodable) byte
will fail. email.feedparser.FeedParser._parsegen() gets the header and
attempts to convert it to lowercase before comparing it with the accepted
encodings, but as the header contains an invalid byte, it's returned as a
Header object rather than a str.
Cast the Content-Transfer-Encoding header to a str to avoid this.
Found using the AFL fuzzer.
Reported-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Andrew Donnellan <andrew@donnellan.id.au>
* Add email and NEWS entry for the bugfix.
This changes the main documentation, doc strings, source code comments, and a
couple error messages in the test suite. In some cases the word was removed
or edited some other way to fix the grammar.