long header lines is now (properly) in the Header class. So we no
longer need _split_header() and we'll just defer to Header.encode()
when we have a plain string.
_encode_chunks(): Pass maxlinelen in instead of always using
self._maxlinelen, so we can adjust for shorter initial lines.
Pass this value through to _max_append().
encode(): Weave maxlinelen through to the _encode_chunks() call.
_split_ascii(): When recursively splitting a line on spaces
(i.e. lower level syntactic split), don't append the whole returned
string. Instead, split it on linejoiners and extend the lines up to
the last line (for proper packing). Calculate the linelen based on
the last element in the this list.
part itself is longer than maxlen, and we aren't already splitting on
whitespace, then we recursively split the part on whitespace and
append that to the this list.
preserve spaces in the encoded/unencoded word boundaries. RFC 2047 is
ambiguous here, but most people expect the space to be preserved.
Really closes SF bug # 640110.
_split(): New implementation of ASCII line splitting which should do a
better job and not be subject to the various weird artifacts (bugs)
reported. This should also do a better job of higher-level syntactic
splits by trying first to split on semis, then commas, then
whitespace.
Use a Timbot-ly binary search for optimal non-ASCII split points for
better packing of header lines. This also lets us remove one
recursion call. Don't pass in firstline, but instead pass in the
actual line length we're shooting for. Also pass in the list of split
characters.
encode(): Pass in the list of split characters so applications can
have some control over what "higher level syntactic breaks" are.
Also,
decode_header(): Transform binascii.Errors which can occur when
decoding a base64 RFC 2047 header with bogus data, into an
email.Errors.HeaderParseError. Closes SF bug #696712.
_handle_multipart(): Ensure that if the preamble exists but does not
end in a newline, a newline is still added. Without this, the
boundary separator will end up on the preamble line, breaking the MIME
structure.
_make_boundary(): Handle differences in the decimal point character
based on the locale.
Charset: Alias __repr__ to __str__ for debugging.
header_encode(): When calling quopriMIME.header_encode(), set
maxlinelen=None so that the lower level function doesn't (also) try to
wrap/fold the line.
_max_append(): Change the comparison so that the new string is
concatenated if it's less than or equal to the max length.
header_encode(): Allow for maxlinelen == None to mean, don't do any
line splitting. This is because this module is mostly used by higher
level abstractions (Header.py) which already ensures line lengths. We
do this in a cheapo way by setting the max_encoding to some insanely
<100k wink> large number.
because the test file, msg_26.txt which has \r\n line endings, was
getting munged by cvs, which knows to do line ending conversions for
text files. But we want \r\n to be preserved on all platforms, so we
cvs admin'd the file to be -kb (binary), which means we have to open
the file in binary mode to preserve these line ends. Hopefully this
will be the end of the thrashing on this issue (but probably not).
Test passes on *nix now, and Tim confirms it passes on Windows. We'll
leave it to Jack to test MacOS.
is passed straight through to the unicode() and ustr.encode() calls.
I think it's the best we can do to address the UnicodeErrors in badly
encoded headers such as is described in SF bug #648119.