In Python2, if a unicode string was assigned as the value of a header,
email would automatically CTE encode it using the UTF8 charset.
This capability was lost in the Python3 translation, and this patch
restores it.
Patch by Ali Ikinci, assisted by R. David Murray.
I also added a fix for the mailbox test that was depending (with a comment
that it was a bad idea to so depend) on non-ASCII causing message_from_string
to raise an error. It now uses support.patch to induce an error during
message serialization.
Analogous to the decode_header fix, this fix makes Header.append and
make_header correctly handle the unknown-8bit charset introduced by email5.1,
when the input to them is binary strings. Previous to this fix the
make_header(decode_header(x)) == x invariant was broken in the face of the
unknown-8bit charset.
Why I consider this a bug rather than an API change: the API change was
to Message, which didn't used to return Headers unless you added them
yourself. Now it does (for 8bit binary header input), so decode_header
needs to be able to handle them.
The fix is to charset.py, which was not doing the encoding to the
correct output character set when doing a body_encode for either
the shift-jis or euc-jp charsets. There's also a fix for handling
a bytes input in encoders.py.
Patch by Michael Henry, comment changes by me.
When a header was long enough to need to be split across lines, the
input charset name was used instead of the output charset name in
the encoded words. This make a difference only for the two charsets
above.
svn+ssh://pythondev@svn.python.org/python/branches/py3k
........
r87136 | r.david.murray | 2010-12-08 17:53:00 -0500 (Wed, 08 Dec 2010) | 6 lines
Have script_helper._assert_python strip refcount strings from stderr.
This makes the output of the function and those that depend on it
independent of whether or not they are being run under a debug
build.
........
r87221 | r.david.murray | 2010-12-13 19:55:46 -0500 (Mon, 13 Dec 2010) | 4 lines
#10699: fix docstring for tzset: it does not take a parameter
Thanks to Garrett Cooper for the fix.
........
r87256 | r.david.murray | 2010-12-14 21:19:14 -0500 (Tue, 14 Dec 2010) | 2 lines
#10705: document what the values of debuglevel are and mean.
........
r87337 | r.david.murray | 2010-12-17 11:11:40 -0500 (Fri, 17 Dec 2010) | 2 lines
#10559: provide instructions for accessing sys.argv when first mentioned.
........
r87338 | r.david.murray | 2010-12-17 11:29:07 -0500 (Fri, 17 Dec 2010) | 2 lines
#10454: clarify the compileall docs and help messages.
[compileall.py changes not backported.]
........
r87571 | r.david.murray | 2010-12-29 14:06:48 -0500 (Wed, 29 Dec 2010) | 2 lines
Fix same typo in docs.
........
r87839 | r.david.murray | 2011-01-07 16:57:25 -0500 (Fri, 07 Jan 2011) | 9 lines
Fix formatting of values with embedded newlines when rfc2047 encoding
Before this patch if a value being encoded had an embedded newline,
the line following the newline would have no leading whitespace,
and the whitespace it did have was encoded into the word. Now
the existing whitespace gets turned into a blank, the way it does
in other header reformatting, and the _continuation_ws gets added
at the beginning of the encoded line.
........
r88164 | r.david.murray | 2011-01-24 14:34:58 -0500 (Mon, 24 Jan 2011) | 12 lines
#10960: fix 'stat' links, link to lstat from stat, general tidy of stat doc.
Original patch by Michal Nowikowski, with some additions and wording
fixes by me.
I changed the wording from 'Performs a stat system call' to 'Performs
the equivalent of a stat system call', since on Windows there are no
stat/lstat system calls involved. I also extended Michal's breakout
of the attributes into a list to the other paragraphs, and rearranged
the order of the paragraphs in the 'stat' docs to make it flow
better and put it in what I think is a more logical/useful order.
........
svn+ssh://pythondev@svn.python.org/python/branches/py3k
........
r87873 | r.david.murray | 2011-01-08 21:35:24 -0500 (Sat, 08 Jan 2011) | 12 lines
#5871: protect against header injection attacks.
This makes Header.encode throw a HeaderParseError if it winds up
formatting a header such that a continuation line has no leading
whitespace and looks like a header. Since Header accepts values
containing newlines and preserves them (and this is by design), without
this fix any program that took user input (say, a subject in a web form)
and passed it to the email package as a header was vulnerable to header
injection attacks. (As far as we know this has never been exploited.)
Thanks to Jakub Wilk for reporting this vulnerability.
........
This makes Header.encode throw a HeaderParseError if it winds up
formatting a header such that a continuation line has no leading
whitespace and looks like a header. Since Header accepts values
containing newlines and preserves them (and this is by design), without
this fix any program that took user input (say, a subject in a web form)
and passed it to the email package as a header was vulnerable to header
injection attacks. (As far as we know this has never been exploited.)
Thanks to Jakub Wilk for reporting this vulnerability.
This applies only when generating strings from non-RFC compliant binary
input; it makes the existing recoding behavior more consistent (ie:
now no data is lost when recoding).
Before this patch if a value being encoded had an embedded newline,
the line following the newline would have no leading whitespace,
and the whitespace it did have was encoded into the word. Now
the existing whitespace gets turned into a blank, the way it does
in other header reformatting, and the _continuation_ws gets added
at the beginning of the encoded line.
svn+ssh://pythondev@svn.python.org/python/branches/py3k
........
r87750 | r.david.murray | 2011-01-04 20:39:32 -0500 (Tue, 04 Jan 2011) | 5 lines
#10790: make append work when output codec is different from input codec
There's still a bug here (the encode call shouldn't use the 'errors'
paramter), but I'll fix that later.
........
The RFC is bit hard to understand on this point, but the examples
clearly show that parameter values that are encoded according
to its charset/language rules don't have surrounding quotes, and
the ABNF does not allow for quotes. So when we produce such
encoded values, we no longer add quotes.
svn+ssh://pythondev@svn.python.org/python/branches/py3k
........
r87415 | r.david.murray | 2010-12-21 13:07:59 -0500 (Tue, 21 Dec 2010) | 4 lines
Fix the change made for issue 1243654.
Surprisingly, it turns out there was no test that exercised this code path.
........
Such addresses are not RFC compliant except under the 'obsolete syntax'
rules, but before this fix the whitespace was dropped from the input,
concatenating the pieces. That breaks one of the principles of the
email package, that of preserving the input as much as possible.
It also denies the application program the opportunity to apply its
own heuristics to interpretation of such non-compliant addresses.
It is possible users of the email package were depending on the local
part always being a single token, so this fix will not be backported.
svn+ssh://pythondev@svn.python.org/python/branches/py3k
........
r87217 | r.david.murray | 2010-12-13 18:51:19 -0500 (Mon, 13 Dec 2010) | 5 lines
#1078919: make add_header automatically do RFC2231 encoding when needed.
Also document the use of three-tuples if control of the charset
and language is desired.
........
The tests that were failing on (some) windows machines, where the
msg_XX.txt files used native \r\n lineseps are now also run on machines
that use \n natively, and conversely the \n tests are run on Windows.
The failing tests revealed one place where linesep needed to be added
to a flatten call in generator. There was also another that the tests
didn't catch, so I added a test for that case as well.