Patch #1464708 from William McVey: fixed handling of nested comments in mail
addresses. E.g.
"Foo ((Foo Bar)) <foo@example.com>"
Fixes for both rfc822.py and email package. This patch needs to be back
ported to Python 2.3 for email 2.5.
* Use +=
* Replace loop logic with str.splitlines equivalent
* Don't use variable names that shadow tuple, list, and str
* Use dict.get instead of equivalent try/except
* Minor loop logic simplications
Specifically, time.strftime() no longer accepts a 0 in the yday position of a
time tuple, since that can crash some platform strftime() implementations.
parsedate_tz(): Change the return value to return 1 in the yday position.
Update tests in test_rfc822.py and test_email.py
instead of calling the getaddrlist() method, since the latter doesn't
work with multiple calls (it will return the empty list for the second
and subsequent calls).
Closes SF bug #555035. Include a unittest.
rfc822.AddressList incorrectly handles empty address.
"<>" is converted to None and should be "".
AddressList.__str__() fails on None.
I got an email with such an address and my program
failed processing it.
Example:
>>> import rfc822
>>> rfc822.AddressList("<>").addresslist
[('', None)]
>>> str(rfc822.AddressList("<>"))
Traceback (most recent call last):
File "<stdin>", line 1, in ?
File "/usr/lib/python2.1/rfc822.py", line 753, in __str__
return ", ".join(map(dump_address_pair,
self.addresslist))
TypeError: sequence item 0: expected string, None found
[His solution: in the internal routine AddrlistClass.getrouteaddr(),
initialize adlist to "".]
Modify rfc822.formatdate() to always generate English names,
regardless of locale. This is required by RFC 1123.
In open_local_file() of urllib and urllib2, use new formatdate() from
rfc822.
now allowed in unquoted RealName areas (technically, they are defined
as "obsolete syntax" we MUST accept in phrases, as part of the
obs-phrase production). Thus, parsing
To: User J. Person <person@dom.ain>
correctly returns "User J. Person" as the RealName.
AddrlistClass.__init__(): Add definition of self.phraseends which is
just self.atomends with `.' removed.
getatom(): Add an optional argument `atomends' which, if None (the
default) means use self.atomends.
getphraselist(): Pass self.phraseends to getatom() and break out of
the loop only when the current character is in phraseends instead of
atomends. This allows dots to continue to serve as atom delimiters in
all contexts except phrases.
Also, loads of docstring updates to document RFC 2822 conformance
(sorry, this should have been two separate patches).
setdefault() the empty string. In setdefault(), use + to join the value
to create the entry for the headers attribute so that TypeError is raised
if the value is of the wrong type.
also modified check_all function to suppress all warnings since they aren't
relevant to what this test is doing (allows quiet checking of regsub, for
instance)
rfc822 (Addresslist) modules. Also a preliminary testcase for augmented
assignment, which should actually be merged with the test_class testcase I
added last week.
comments, docstrings or error messages. I fixed two minor things in
test_winreg.py ("didn't" -> "Didn't" and "Didnt" -> "Didn't").
There is a minor style issue involved: Guido seems to have preferred English
grammar (behaviour, honour) in a couple places. This patch changes that to
American, which is the more prominent style in the source. I prefer English
myself, so if English is preferred, I'd be happy to supply a patch myself ;)
It breaks Mailman, it was actually documented in the docstring, so it
was an intentional deviation from the usual del semantics. Let's
document the original behavior in Doc/lib/librfc822.tex.
for gotonext() pushing self.pos past the end of the string. This can
happen if the message has a To field like "To: :" and you call
msg.getaddrlist('to').
Problem: rfc822.py in 1.5.2 final loses the quotes around
quoted local-part names.
The fix is to preserve the quotes around a local-part
name in an address.
Test:
import rfc822
a = rfc822.AddrlistClass('(Comment stuff) "Quoted
name"@somewhere.com')
a.getaddrlist()
The correct result is:
[('Comment stuff', '"Quoted name"@somewhere.com')]
named header, so that if a message has, e.g. multiple CC: lines, all
will get returned by the call to getaddrlist(). It also correctly
handles addresses which show up in continuation lines.
AdderlistClass.__init__(): Added \n to self.CR which fixes a bug that
sometimes, an address would contain a bogus trailing newline.
Message.getaddress(): In final else clause, added a test for the
character we're at being in self.specials. Without this, such
characters never get consumed and we infloop. Case in point (as
posted to c.l.py):
To: <[smtp:dd47@mail.xxx.edu]_at_hmhq@hdq-mdm1-imgout.companay.com>
----------------------------^
otherwise we'd infloop here
Extended the rfc822 parsedate routines to handle the cases they failed
on in an archive of ~37,000 messages. I believe the changes are
compatible, in that all previously correct parsing are still correct.
[I still see problems with some messages, but no showstoppers.]