Commit Graph

23 Commits

Author SHA1 Message Date
Georges Toth 303aac8c56
bpo-30681: Support invalid date format or value in email Date header (GH-22090)
I am re-submitting an older PR which was abandoned but is still relevant, #10783 by @timb07.

The issue being solved () is still relevant. The original PR #10783 was closed as
the final request changes were not applied and since abandoned.

In this new PR I have re-used the original patch plus applied both comments from the review, by @maxking and @pganssle.


For reference, here is the original PR description:
In email.utils.parsedate_to_datetime(), a failure to parse the date, or invalid date components (such as hour outside 0..23) raises an exception. Document this behaviour, and add tests to test_email/test_utils.py to confirm this behaviour.

In email.headerregistry.DateHeader.parse(), check when parsedate_to_datetime() raises an exception and add a new defect InvalidDateDefect; preserve the invalid value as the string value of the header, but set the datetime attribute to None.

Add tests to test_email/test_headerregistry.py to confirm this behaviour; also added test to test_email/test_inversion.py to confirm emails with such defective date headers round trip successfully.

This pull request incorporates feedback gratefully received from @bitdancer, @brettcannon, @Mariatta and @warsaw, and replaces the earlier PR #2254.

Automerge-Triggered-By: GH:warsaw
2020-10-26 17:31:06 -07:00
Abhilash Raj 21017ed904
bpo-39040: Fix parsing of email mime headers with whitespace between encoded-words. (gh-17620)
* bpo-39040: Fix parsing of email headers with encoded-words inside a quoted string.

It is fairly common to find malformed mime headers (especially content-disposition
headers) where the parameter values, instead of being encoded to RFC
standards, are "encoded" by doing RFC 2047 "encoded word" encoding, and
then enclosing the whole thing in quotes.  The processing of these malformed
headers was incorrectly leaving the spaces between encoded words in the decoded
text (whitespace between adjacent encoded words is supposed to be stripped on
decoding).  This changeset fixes the encoded word processing inside quoted strings
(bare-quoted-string) to do correct RFC 2047 decoding by stripping that
whitespace.
2020-05-28 20:04:59 -04:00
Ashwin Ramaswami 614f17211c
bpo-39073: validate Address parts to disallow CRLF (#19007)
Disallow CR or LF in email.headerregistry.Address arguments to guard against header injection attacks.
2020-03-29 20:38:41 -04:00
bsiem df0c21ff46 bpo-37482: Fix email address name with encoded words and special chars (GH-14561)
Special characters in email address header display names are normally
put within double quotes. However, encoded words (=?charset?x?...?=) are
not allowed withing double quotes. When the header contains a word with
special characters and another word that must be encoded, the first one
must also be encoded.

In the next example, the display name in the From header is quoted and
therefore the comma is allowed; in the To header, the comma is not
within quotes and not encoded, which is not allowed and therefore
rejected by some mail servers.

From: "Foo Bar, France" <foo@example.com>
To: Foo Bar, =?utf-8?q?Espa=C3=B1a?= <foo@example.com>





https://bugs.python.org/issue37482
2019-08-21 16:00:39 -07:00
Serhiy Storchaka 662db125cd
bpo-37685: Fixed __eq__, __lt__ etc implementations in some classes. (GH-14952)
They now return NotImplemented for unsupported type of the other operand.
2019-08-08 08:42:54 +03:00
Abhilash Raj 66c4f3f38b bpo-21315: Fix parsing of encoded words with missing leading ws. (#13425)
* bpo-21315: Fix parsing of encoded words with missing leading ws.

Because of missing leading whitespace, encoded word would get parsed as
unstructured token. This patch fixes that by looking for encoded words when
splitting tokens with whitespace.

Missing trailing whitespace around encoded word now register a defect
instead.

Original patch suggestion by David R. Murray on bpo-21315.
2019-06-05 09:56:33 -07:00
Abhilash Raj 46d88a1131 bpo-35805: Add parser for Message-ID email header. (GH-13397)
* bpo-35805: Add parser for Message-ID header.

This parser is based on the definition of Identification Fields from RFC 5322
Sec 3.6.4.

This should also prevent folding of Message-ID header using RFC 2047 encoded
words and hence fix bpo-35805.

* Prevent folding of non-ascii message-id headers.
* Add fold method to MsgID token to prevent folding.
2019-06-04 10:41:34 -07:00
Krzysztof Wojcik c1f5667be1 bpo-33529, email: Fix infinite loop in email header encoding (GH-12020) 2019-05-14 18:55:23 +02:00
R. David Murray 85d5c18c9d
bpo-27240 Rewrite the email header folding algorithm. (#3488)
The original algorithm tried to delegate the folding to the tokens so
that those tokens whose folding rules differed could specify the
differences.  However, this resulted in a lot of duplicated code because
most of the rules were the same.

The new algorithm moves all folding logic into a set of functions
external to the token classes, but puts the information about which
tokens can be folded in which ways on the tokens...with the exception of
mime-parameters, which are a special case (which was not even
implemented in the old folder).

This algorithm can still probably be improved and hopefully simplified
somewhat.

Note that some of the test expectations are changed.  I believe the
changes are toward more desirable and consistent behavior: in general
when (re) folding a line the canonical version of the tokens is
generated, rather than preserving errors or extra whitespace.
2017-12-03 18:51:41 -05:00
Serhiy Storchaka e437a10d15 Issue #23277: Remove unused imports in tests. 2016-04-24 21:41:02 +03:00
R David Murray 685b3495e1 #21991: make headerregistry params property MappingProxyType.
It is unlikely anyone is using the fact that the dictionary returned
by the 'params' attribute was previously writable, but even if someone
is the API is provisional so this kind of change is acceptable (and
needed, to get the API "right" before it becomes official).

Patch by Stéphane Wirtel.
2014-10-17 19:30:13 -04:00
R David Murray 01e46ee7e2 Merge: #16983: Apply postel's law to encoded words inside quoted strings. 2014-02-08 13:13:01 -05:00
R David Murray 0400d33928 #16983: Apply postel's law to encoded words inside quoted strings.
This applies only to the new parser.  The old parser decodes encoded words
inside quoted strings already, although it gets the whitespace wrong
when it does so.

This version of the patch only handles the most common case (a single encoded
word surrounded by quotes), but I haven't seen any other variations of this in
the wild yet, so its good enough for now.
2014-02-08 13:12:00 -05:00
R David Murray 3da240fd01 #18891: Complete new provisional email API.
This adds EmailMessage and, MIMEPart subclasses of Message
with new API methods, and a ContentManager class used by
the new methods.  Also a new policy setting, content_manager.

Patch was reviewed by Stephen J. Turnbull and Serhiy Storchaka,
and reflects their feedback.

I will ideally add some examples of using the new API to the
documentation before the final release.
2013-10-16 22:48:40 -04:00
Ezio Melotti b5bc353b88 #18741: fix more typos. Patch by Févry Thibault. 2013-08-17 16:11:40 +03:00
R David Murray 923512f327 #18431: Decode encoded words in atoms in new email parser.
There is more to be done here in terms of accepting RFC invalid
input that some mailers accept, but this covers the valid
RFC places where encoded words can occur in structured headers.
2013-07-12 16:00:28 -04:00
R David Murray 65171b28e7 #18044: Fix parsing of encoded words of the form =?utf8?q?=XX...?=
The problem was I was only checking for decimal digits after the third '?',
not for *hex* digits :(.

This changeset also fixes a couple of comment typos, deletes an unused
function relating to encoded word parsing, and removed an invalid
'if' test from the folding function that was revealed by the tests
written to validate this issue.
2013-07-11 15:52:57 -04:00
Ezio Melotti 3f5db3940f Fix a few typos and a double semicolon. Patch by Eitan Adler. 2013-01-27 06:20:14 +02:00
R David Murray 97f43c019f #15160: Extend the new email parser to handle MIME headers.
This code passes all the same tests that the existing RFC mime header
parser passes, plus a bunch of additional ones.

There are a couple of commented out tests where there are issues with the
folding.  The folding doesn't normally get invoked for headers parsed from
source, and the cases are marginal anyway (headers with invalid binary data)
so I'm not worried about them, but will fix them after the beta.

There are things that can be done to make this API even more convenient, but I
think this is a solid foundation worth having.  And the parser is a full RFC
parser, so it handles cases that the current parser doesn't.  (There are also
probably cases where it fails when the current parser doesn't, but I haven't
found them yet ;)

Oh, yeah, and there are some really ugly bits in the parser for handling some
'postel' cases that are unfortunately common.

I hope/plan to to eventually refactor a lot of the code in the parser which
should reduce the line count...but there is no escaping the fact that the
error recovery is welter of special cases.
2012-06-24 05:03:27 -04:00
R David Murray 1be413e366 Don't use metaclasses when class decorators can do the job.
Thanks to Nick Coghlan for pointing out that I'd forgotten about class
decorators.
2012-05-31 18:00:45 -04:00
R David Murray 56517e5cb9 Make parameterized tests in email less hackish.
Or perhaps more hackish, depending on your perspective.  But at least this
way it is now possible to run the individual tests using the unittest CLI.
2012-05-30 21:53:40 -04:00
R David Murray a7c9ddb59c Regularize test_email/test_headerregistry's references to policy. 2012-05-28 20:22:37 -04:00
R David Murray ea9766897b Make headerregistry fully part of the provisional api.
When I made the checkin of the provisional email policy, I knew that
Address and Group needed to be made accessible from somewhere.  The more
I looked at it, though, the more it became clear that since this is a
provisional API anyway, there's no good reason to hide headerregistry as
a private API.  It was designed to ultimately be part of the public API,
and so it should be part of the provisional API.

This patch fully documents the headerregistry API, and deletes the
abbreviated version of those docs I had added to the provisional policy
docs.
2012-05-27 15:03:38 -04:00