Broytmann in SF patch #600096. Specifically, the former function now
encodes the triplets, while the latter adds optional charset and
language arguments.
2045, section 5.2 states that if the Content-Type: header is
syntactically invalid, the default type should be text/plain.
Implement minimal sanity checking of the header -- it must have
exactly one slash in it. This closes SF patch #597593 by Skip, but in
a different way.
Note that these methods used to raise ValueError for invalid ctypes,
but now they won't.
get the MIME main and sub types, instead of getting the whole ctype
and splitting it here. The two more specific methods now correctly
implement RFC 2045, section 5.2.
imports e.g. test_support must do so using an absolute package name
such as "import test.test_support" or "from test import test_support".
This also updates the README in Lib/test, and gets rid of the
duplicate data dirctory in Lib/test/data (replaced by
Lib/email/test/data).
Now Tim and Jack can have at it. :)
from test.test_support import TestSkipped, run_unittest
to
from test_support import TestSkipped, run_unittest
Otherwise, if the Japanese codecs aren't installed, regrtest doesn't
believe the TestSkipped exception raised by this test matches the
except (ImportError, test_support.TestSkipped), msg:
it's looking for, and reports the skip as a crash failure instead of
as a skipped test.
I suppose this will make it harder to run this test outside of
regrtest, but under the assumption only Barry does that, better to
make it skip cleanly for everyone else.
(i.e. email.test), so move the guts of them here from Lib/test. The
latter directory will retain stubs to run the email.test tests using
Python's standard regression test.
test_email_torture.py is a torture tester which will not run under
Python's test suite because I don't want to commit megs of data to
that project (it will fail cleanly there). When run under the mimelib
project it'll stress test the package with megs of message samples
collected from various locations in the wild.
(i.e. email.test), so move the guts of them here from Lib/test. The
latter directory will retain stubs to run the email.test tests using
Python's standard regression test.
test_email_torture.py is a torture tester which will not run under
Python's test suite because I don't want to commit megs of data to
that project (it will fail cleanly there). When run under the mimelib
project it'll stress test the package with megs of message samples
collected from various locations in the wild.
email/test/data is a copy of Lib/test/data. The fate of the latter is
still undecided.
backwards compatibility, we're silently deprecating get_type(),
get_subtype() and get_main_type(). We may eventually noisily
deprecate these. For now, we'll just fix a bug in the splitting of
the main and subtypes.
get_content_type(), get_content_maintype(), get_content_subtype(): New
methods which replace the above. These /always/ return a content type
string and do not take a failobj, because an email message always at
least has a default content type.
set_default_type(): Someday there may be additional default content
types, so don't hard code an assertion about the value of the ctype
argument.
quoting:
in non-strict mode, messages don't require a blank line at the end
with a missing end-terminator. A single newline is sufficient now.
Handle trailing whitespace at the end of a boundary. Had to switch
from using string.split() to re.split()
Handle whitespace on the end of a parameter list for Content-type.
Handle whitespace on the end of a plain content-type header.
Specifically,
get_type(): Strip the content type string.
_get_params_preserve(): Strip the parameter names and values on both
sides.
_parsebody(): Lots of changes as described above, with some stylistic
changes by Barry (who hopefully didn't screw things up ;).
create a Header instance. Closes feature request #539481.
Header.__init__(): Allow the initial string to be omitted.
__eq__(), __ne__(): Support rich comparisons for equality of Header
instances withy Header instances or strings.
Also, update a bunch of docstrings.
argument to the constructor -- defaulting to true -- which is
different than Anthony's approach of using global state.
parse(), parsestr(): Grow a `headersonly' argument which stops parsing
once the header block has been seen, i.e. it does /not/ parse or even
read the body of the message. This is used for parsing message/rfc822
type messages.
We need test cases for the non-strict parsing. Anthony will supply
these.
_parsebody(): We can get rid of the isdigest end-of-line kludges,
although we still need to know if we're parsing a multipart/digest so
we can set the default type accordingly.
text/plain but the RFCs state that inside a multipart/digest, the
default type is message/rfc822. To preserve idempotency, we need a
separate place to define the default type than the Content-Type:
header.
get_default_type(), set_default_type(): Accessor and mutator methods
for the default type.
recursive generation).
_dispatch(): If the message object doesn't have a Content-Type:
header, check its default type instead of assuming it's text/plain.
This makes for correct generation of message/rfc822 containers.
_handle_multipart(): We can get rid of the isdigest kludge. Just
print the message as normal and everything will work out correctly.
_handle_mulitpart_digest(): We don't need this anymore either.
Specifically,
decode_rfc2231(), encode_rfc2231(): Functions to encode and decode RFC
2231 style parameters.
decode_params(): Function to decode a list of parameters.
Specifically,
_formatparam(): Teach this about encoded `param' arguments, which are
a 3-tuple of items (charset, language, value). language is ignored.
_unquotevalue(): Handle both 3-tuple RFC 2231 values and unencoded
values.
_get_params_preserve(): Decode the parameters before returning them.
get_params(), get_param(): Use _unquotevalue().
get_filename(), get_boundary(): Teach these about encoded (3-tuple)
parameters.
headers with no charset or 'us-ascii' charsets. Actually this is only
partially true: we know about semicolons (but not true parameters) and
we know about whitespace (but not technically folding whitespace).
Still it should be good enough for all practical purposes.
Other changes include:
__init__(): Add a continuation_ws argument, which defaults to a single
space. Set this to change the whitespace used for continuation lines
when a header must be split. Also, changed the way header line
lengths are calculated, so that they take into account continuation_ws
(when tabs-expanded) and any provided header_name parameter. This
should do much better on returning split headers for which the first
and subsequent lines must fit into a specified width.
guess_maxlinelen(): Removed. I don't think we need this method as
part of the public API.
encode_chunks() -> _encode_chunks(): I don't think we need this one as
part of the public API either.
know anything about RFC 2047 encoded headers. Fortunately we have a
perfectly good header splitter in Header.encode(). So we just call
that to give us a properly formatted and split header.
Header.encode() didn't know about "highest-level syntactic breaks" but
that's been fixed now too.