Commit Graph

112 Commits

Author SHA1 Message Date
Barry Warsaw 15aefa94d0 Fixing some RFC 2231 related issues as reported in the Spambayes
project, and with assistance from Oleg Broytmann.  Specifically,

get_param(), get_params(): Document that these methods may return
parameter values that are either strings, or 3-tuples in the case of
RFC 2231 encoded parameters.  The application should be prepared to
deal with such return values.

get_boundary(): Be prepared to deal with RFC 2231 encoded boundary
parameters.  It makes little sense to have boundaries that are
anything but ascii, so if we get back a 3-tuple from get_param() we
will decode it into ascii and let any failures percolate up.

get_content_charset(): New method which treats the charset parameter
just like the boundary parameter in get_boundary().  Note that
"get_charset()" was already taken to return the default Charset
object.

get_charsets(): Rewrite to use get_content_charset().
2002-09-26 17:19:34 +00:00
Barry Warsaw 6f30a8ab62 __version__: Bump to 2.4
Move the imports of Parser and Message inside the
message_from_string() and message_from_file() functions.  This way
just "import email" won't suck in most of the submodules of the
package.

Note: this will break code that relied on "import email" giving you a
bunch of the submodules, but that was never documented and should not
have been relied on.
2002-09-25 22:07:50 +00:00
Barry Warsaw 40363b63f0 Open the test files in binary mode so the \r\n files won't cause
failures on Windows.  Closes SF bug # 609988.
2002-09-18 22:17:57 +00:00
Barry Warsaw 78170048f9 Bump to 2.3.1 to pick up the missing file. 2002-09-12 03:44:50 +00:00
Barry Warsaw fbcde75c70 get_payload(): Document that calling it with no arguments returns a
reference to the payload.
2002-09-11 14:11:35 +00:00
Barry Warsaw bc6edac8df test_utils_quote_unquote(): Test for unquote() properly
de-backslash-ifying.
2002-09-11 02:31:24 +00:00
Barry Warsaw 184d55a897 rfc822.unquote() doesn't properly de-backslash-ify in Python prior to
2.3.  This patch (adapted from Quinn Dunkan's SF patch #573204) fixes
the problem and should get ported to rfc822.py.
2002-09-11 02:22:48 +00:00
Barry Warsaw 034b47acfe _parsebody(): Instead of raising a BoundaryError when no start
boundary could be found -- in a lax parser -- the entire body is
assigned to the message payload.
2002-09-10 16:14:56 +00:00
Barry Warsaw b1c1de3805 Import _isstring() from the compatibility layer.
_handle_text(): Use _isstring() for stringiness test.

_handle_multipart(): Add a test before the ListType test, checking for
stringiness of the payload.  String payloads for multitypes means a
message with broken MIME chrome was parsed by a lax parser.  Instead
of raising a BoundaryError in those cases, the entire body is assigned
to the message payload (but since the content type is still
multipart/*, the Generator needs to be updated too).
2002-09-10 16:13:45 +00:00
Barry Warsaw 356afac41f _isstring(): Factor out "stringiness" test, e.g. for StringType or
UnicodeType, which is different between Python 2.1 and 2.2.
2002-09-10 16:09:06 +00:00
Barry Warsaw 45d9bde6c1 _ascii_split(): Don't lstrip continuation lines. Closes SF bug #601392. 2002-09-10 15:57:29 +00:00
Barry Warsaw 24d45df3f2 test_splitting_first_line_only_is_long(): New test for SF bug #601392,
broken wrapping of long ASCII headers.
2002-09-10 15:46:44 +00:00
Barry Warsaw dad90c202a A sample message with broken MIME boundaries. 2002-09-10 15:43:30 +00:00
Barry Warsaw e99e2f53e7 test_set_param(), test_del_param(): Test RFC 2231 encoding support by
Oleg Broytmann in SF patch #600096.  Whitespace normalized by Barry.
2002-09-06 03:56:26 +00:00
Barry Warsaw 3c25535dc8 _formatparam(), set_param(): RFC 2231 encoding support by Oleg
Broytmann in SF patch #600096.  Specifically, the former function now
encodes the triplets, while the latter adds optional charset and
language arguments.
2002-09-06 03:55:04 +00:00
Barry Warsaw 470288c54e test_mondo_message(): "binary" is not a legal content type, so with
the previous RFC 2045, $5.2 repair to get_content_type() this
subpart's type will now be text/plain.
2002-09-06 03:41:27 +00:00
Barry Warsaw 58fb61cce5 test_replace_header(): New test for Message.replace_header(). 2002-09-06 03:39:59 +00:00
Barry Warsaw 229727fa07 replace_header(): New method given by Skip Montanaro in SF patch
#601959.  Modified slightly by Barry (who added the KeyError in case
the header is missing.
2002-09-06 03:38:12 +00:00
Barry Warsaw a4ce1cf34c _structure(): Use .get_content_type() 2002-09-01 21:04:43 +00:00
Barry Warsaw 1a1607546c Whitespace normalization. 2002-08-27 22:38:50 +00:00
Barry Warsaw 48b0d36b4d Typo 2002-08-27 22:34:44 +00:00
Tim Peters 280488b9a3 Whitespace normalization. 2002-08-23 18:19:30 +00:00
Barry Warsaw 4d5ef6aed6 Bump version number to 2.3 2002-08-20 14:51:34 +00:00
Barry Warsaw 3328136e3c Added tests for SF patch #597593, syntactically invalid Content-Type: headers. 2002-08-20 14:51:10 +00:00
Barry Warsaw f36d804b3b get_content_type(), get_content_maintype(), get_content_subtype(): RFC
2045, section 5.2 states that if the Content-Type: header is
syntactically invalid, the default type should be text/plain.
Implement minimal sanity checking of the header -- it must have
exactly one slash in it.  This closes SF patch #597593 by Skip, but in
a different way.

Note that these methods used to raise ValueError for invalid ctypes,
but now they won't.
2002-08-20 14:50:09 +00:00
Barry Warsaw dfea3b3963 _dispatch(): Use get_content_maintype() and get_content_subtype() to
get the MIME main and sub types, instead of getting the whole ctype
and splitting it here.   The two more specific methods now correctly
implement RFC 2045, section 5.2.
2002-08-20 14:47:30 +00:00
Barry Warsaw b404bb7813 test_three_lines(): Test case reported by Andrew McNamara. Works in
email 2.2 but fails in email 1.0.
2002-08-20 12:54:07 +00:00
Barry Warsaw 9e4e050c59 Use full package paths in imports. 2002-07-23 20:35:58 +00:00
Barry Warsaw 10d0d595e0 Added a couple of more tests for Header charset handling. 2002-07-23 19:46:35 +00:00
Barry Warsaw 04f357cffe Get rid of relative imports in all unittests. Now anything that
imports e.g. test_support must do so using an absolute package name
such as "import test.test_support" or "from test import test_support".

This also updates the README in Lib/test, and gets rid of the
duplicate data dirctory in Lib/test/data (replaced by
Lib/email/test/data).

Now Tim and Jack can have at it. :)
2002-07-23 19:04:11 +00:00
Barry Warsaw 92825a9a52 append(): Bite the bullet and let charset be the string name of a
character set, which we'll convert to a Charset instance.  Sigh.
2002-07-23 06:08:10 +00:00
Barry Warsaw 15d3739446 make_header(): Watch out for charset is None, which decode_header()
will return as the charset if implicit us-ascii is used.
2002-07-23 04:29:54 +00:00
Tim Peters 53d019cf5a Changed import from
from test.test_support import TestSkipped, run_unittest
to
    from test_support import TestSkipped, run_unittest

Otherwise, if the Japanese codecs aren't installed, regrtest doesn't
believe the TestSkipped exception raised by this test matches the

    except (ImportError, test_support.TestSkipped), msg:

it's looking for, and reports the skip as a crash failure instead of
as a skipped test.

I suppose this will make it harder to run this test outside of
regrtest, but under the assumption only Barry does that, better to
make it skip cleanly for everyone else.
2002-07-21 06:06:30 +00:00
Barry Warsaw 190390b026 The email package's tests live much better in a subpackage
(i.e. email.test), so move the guts of them here from Lib/test.  The
latter directory will retain stubs to run the email.test tests using
Python's standard regression test.

test_email_torture.py is a torture tester which will not run under
Python's test suite because I don't want to commit megs of data to
that project (it will fail cleanly there).  When run under the mimelib
project it'll stress test the package with megs of message samples
collected from various locations in the wild.
2002-07-19 22:31:10 +00:00
Barry Warsaw 629038093c The email package's tests live much better in a subpackage
(i.e. email.test), so move the guts of them here from Lib/test.  The
latter directory will retain stubs to run the email.test tests using
Python's standard regression test.

test_email_torture.py is a torture tester which will not run under
Python's test suite because I don't want to commit megs of data to
that project (it will fail cleanly there).  When run under the mimelib
project it'll stress test the package with megs of message samples
collected from various locations in the wild.

email/test/data is a copy of Lib/test/data.  The fate of the latter is
still undecided.
2002-07-19 22:29:49 +00:00
Barry Warsaw d8e8e54c2b message_from_string(), message_from_file(): The consensus on the
mimelib-devel list is that non-strict parsing should be the default.
Make it so.
2002-07-19 22:26:01 +00:00
Barry Warsaw bb26b4530b Parser.__init__(): The consensus on the mimelib-devel list is that
non-strict parsing should be the default.  Make it so.
2002-07-19 22:25:34 +00:00
Barry Warsaw c10686426e To better support default content types, fix an API wart, and preserve
backwards compatibility, we're silently deprecating get_type(),
get_subtype() and get_main_type().  We may eventually noisily
deprecate these.  For now, we'll just fix a bug in the splitting of
the main and subtypes.

get_content_type(), get_content_maintype(), get_content_subtype(): New
methods which replace the above.  These /always/ return a content type
string and do not take a failobj, because an email message always at
least has a default content type.

set_default_type(): Someday there may be additional default content
types, so don't hard code an assertion about the value of the ctype
argument.
2002-07-19 22:24:55 +00:00
Barry Warsaw d43857455e _structure(): Take an optional `fp' argument which would be the object
to print>> the structure to.  Defaults to sys.stdout.
2002-07-19 22:21:47 +00:00
Barry Warsaw 1cecdc6bcb _dispatch(): Use the new Message.get_content_type() method as hashed
out on the mimelib-devel list.
2002-07-19 22:21:02 +00:00
Barry Warsaw 7aeac9180e Anthony Baxter's cleanup patch. Python project SF patch # 583190,
quoting:

  in non-strict mode, messages don't require a blank line at the end
  with a missing end-terminator. A single newline is sufficient now.

  Handle trailing whitespace at the end of a boundary. Had to switch
  from using string.split() to re.split()

  Handle whitespace on the end of a parameter list for Content-type.

  Handle whitespace on the end of a plain content-type header.

Specifically,

get_type(): Strip the content type string.

_get_params_preserve(): Strip the parameter names and values on both
sides.

_parsebody(): Lots of changes as described above, with some stylistic
changes by Barry (who hopefully didn't screw things up ;).
2002-07-18 23:09:09 +00:00
Barry Warsaw 2d2fc229a0 Anthony Baxter's patch to expose the parser's `strict' flag in these
convenience functions.  Closes SF # 583188 (python project).
2002-07-18 21:29:17 +00:00
Barry Warsaw 4ef1c7d85b _structure(): Don't get the whole Content-Type: header, just get the
type with get_type().
2002-07-11 20:24:36 +00:00
Barry Warsaw f488b2c6d5 _dispatch(): Comment improvements. 2002-07-11 18:48:40 +00:00
Barry Warsaw 8da39aa56a make_header(): New function to take the output of decode_header() and
create a Header instance.  Closes feature request #539481.

Header.__init__(): Allow the initial string to be omitted.

__eq__(), __ne__(): Support rich comparisons for equality of Header
instances withy Header instances or strings.

Also, update a bunch of docstrings.
2002-07-09 16:33:47 +00:00
Barry Warsaw f6caeba03a Anthony Baxter's patch for non-strict parsing. This adds a `strict'
argument to the constructor -- defaulting to true -- which is
different than Anthony's approach of using global state.

parse(), parsestr(): Grow a `headersonly' argument which stops parsing
once the header block has been seen, i.e. it does /not/ parse or even
read the body of the message.  This is used for parsing message/rfc822
type messages.

We need test cases for the non-strict parsing.  Anthony will supply
these.

_parsebody(): We can get rid of the isdigest end-of-line kludges,
although we still need to know if we're parsing a multipart/digest so
we can set the default type accordingly.
2002-07-09 02:50:02 +00:00
Barry Warsaw a0c8b9d4d5 Add the concept of a "default type". Normally the default type is
text/plain but the RFCs state that inside a multipart/digest, the
default type is message/rfc822.  To preserve idempotency, we need a
separate place to define the default type than the Content-Type:
header.

get_default_type(), set_default_type(): Accessor and mutator methods
for the default type.
2002-07-09 02:46:12 +00:00
Barry Warsaw bb493a7039 __init__(): Don't attach the subparts if its an empty tuple. If the
boundary was given in the arguments, call set_boundary().
2002-07-09 02:44:26 +00:00
Barry Warsaw 93c40f0c3a clone(): A new method for creating a clone of this generator (for
recursive generation).

_dispatch(): If the message object doesn't have a Content-Type:
header, check its default type instead of assuming it's text/plain.
This makes for correct generation of message/rfc822 containers.

_handle_multipart(): We can get rid of the isdigest kludge.  Just
print the message as normal and everything will work out correctly.

_handle_mulitpart_digest(): We don't need this anymore either.
2002-07-09 02:43:47 +00:00
Barry Warsaw ed53bdb02d __init__(): Be sure to set the default type to message/rfc822. 2002-07-09 02:40:35 +00:00