Commit Graph

85 Commits

Author SHA1 Message Date
Barry Warsaw 9e4e050c59 Use full package paths in imports. 2002-07-23 20:35:58 +00:00
Barry Warsaw 10d0d595e0 Added a couple of more tests for Header charset handling. 2002-07-23 19:46:35 +00:00
Barry Warsaw 04f357cffe Get rid of relative imports in all unittests. Now anything that
imports e.g. test_support must do so using an absolute package name
such as "import test.test_support" or "from test import test_support".

This also updates the README in Lib/test, and gets rid of the
duplicate data dirctory in Lib/test/data (replaced by
Lib/email/test/data).

Now Tim and Jack can have at it. :)
2002-07-23 19:04:11 +00:00
Barry Warsaw 92825a9a52 append(): Bite the bullet and let charset be the string name of a
character set, which we'll convert to a Charset instance.  Sigh.
2002-07-23 06:08:10 +00:00
Barry Warsaw 15d3739446 make_header(): Watch out for charset is None, which decode_header()
will return as the charset if implicit us-ascii is used.
2002-07-23 04:29:54 +00:00
Tim Peters 53d019cf5a Changed import from
from test.test_support import TestSkipped, run_unittest
to
    from test_support import TestSkipped, run_unittest

Otherwise, if the Japanese codecs aren't installed, regrtest doesn't
believe the TestSkipped exception raised by this test matches the

    except (ImportError, test_support.TestSkipped), msg:

it's looking for, and reports the skip as a crash failure instead of
as a skipped test.

I suppose this will make it harder to run this test outside of
regrtest, but under the assumption only Barry does that, better to
make it skip cleanly for everyone else.
2002-07-21 06:06:30 +00:00
Barry Warsaw 190390b026 The email package's tests live much better in a subpackage
(i.e. email.test), so move the guts of them here from Lib/test.  The
latter directory will retain stubs to run the email.test tests using
Python's standard regression test.

test_email_torture.py is a torture tester which will not run under
Python's test suite because I don't want to commit megs of data to
that project (it will fail cleanly there).  When run under the mimelib
project it'll stress test the package with megs of message samples
collected from various locations in the wild.
2002-07-19 22:31:10 +00:00
Barry Warsaw 629038093c The email package's tests live much better in a subpackage
(i.e. email.test), so move the guts of them here from Lib/test.  The
latter directory will retain stubs to run the email.test tests using
Python's standard regression test.

test_email_torture.py is a torture tester which will not run under
Python's test suite because I don't want to commit megs of data to
that project (it will fail cleanly there).  When run under the mimelib
project it'll stress test the package with megs of message samples
collected from various locations in the wild.

email/test/data is a copy of Lib/test/data.  The fate of the latter is
still undecided.
2002-07-19 22:29:49 +00:00
Barry Warsaw d8e8e54c2b message_from_string(), message_from_file(): The consensus on the
mimelib-devel list is that non-strict parsing should be the default.
Make it so.
2002-07-19 22:26:01 +00:00
Barry Warsaw bb26b4530b Parser.__init__(): The consensus on the mimelib-devel list is that
non-strict parsing should be the default.  Make it so.
2002-07-19 22:25:34 +00:00
Barry Warsaw c10686426e To better support default content types, fix an API wart, and preserve
backwards compatibility, we're silently deprecating get_type(),
get_subtype() and get_main_type().  We may eventually noisily
deprecate these.  For now, we'll just fix a bug in the splitting of
the main and subtypes.

get_content_type(), get_content_maintype(), get_content_subtype(): New
methods which replace the above.  These /always/ return a content type
string and do not take a failobj, because an email message always at
least has a default content type.

set_default_type(): Someday there may be additional default content
types, so don't hard code an assertion about the value of the ctype
argument.
2002-07-19 22:24:55 +00:00
Barry Warsaw d43857455e _structure(): Take an optional `fp' argument which would be the object
to print>> the structure to.  Defaults to sys.stdout.
2002-07-19 22:21:47 +00:00
Barry Warsaw 1cecdc6bcb _dispatch(): Use the new Message.get_content_type() method as hashed
out on the mimelib-devel list.
2002-07-19 22:21:02 +00:00
Barry Warsaw 7aeac9180e Anthony Baxter's cleanup patch. Python project SF patch # 583190,
quoting:

  in non-strict mode, messages don't require a blank line at the end
  with a missing end-terminator. A single newline is sufficient now.

  Handle trailing whitespace at the end of a boundary. Had to switch
  from using string.split() to re.split()

  Handle whitespace on the end of a parameter list for Content-type.

  Handle whitespace on the end of a plain content-type header.

Specifically,

get_type(): Strip the content type string.

_get_params_preserve(): Strip the parameter names and values on both
sides.

_parsebody(): Lots of changes as described above, with some stylistic
changes by Barry (who hopefully didn't screw things up ;).
2002-07-18 23:09:09 +00:00
Barry Warsaw 2d2fc229a0 Anthony Baxter's patch to expose the parser's `strict' flag in these
convenience functions.  Closes SF # 583188 (python project).
2002-07-18 21:29:17 +00:00
Barry Warsaw 4ef1c7d85b _structure(): Don't get the whole Content-Type: header, just get the
type with get_type().
2002-07-11 20:24:36 +00:00
Barry Warsaw f488b2c6d5 _dispatch(): Comment improvements. 2002-07-11 18:48:40 +00:00
Barry Warsaw 8da39aa56a make_header(): New function to take the output of decode_header() and
create a Header instance.  Closes feature request #539481.

Header.__init__(): Allow the initial string to be omitted.

__eq__(), __ne__(): Support rich comparisons for equality of Header
instances withy Header instances or strings.

Also, update a bunch of docstrings.
2002-07-09 16:33:47 +00:00
Barry Warsaw f6caeba03a Anthony Baxter's patch for non-strict parsing. This adds a `strict'
argument to the constructor -- defaulting to true -- which is
different than Anthony's approach of using global state.

parse(), parsestr(): Grow a `headersonly' argument which stops parsing
once the header block has been seen, i.e. it does /not/ parse or even
read the body of the message.  This is used for parsing message/rfc822
type messages.

We need test cases for the non-strict parsing.  Anthony will supply
these.

_parsebody(): We can get rid of the isdigest end-of-line kludges,
although we still need to know if we're parsing a multipart/digest so
we can set the default type accordingly.
2002-07-09 02:50:02 +00:00
Barry Warsaw a0c8b9d4d5 Add the concept of a "default type". Normally the default type is
text/plain but the RFCs state that inside a multipart/digest, the
default type is message/rfc822.  To preserve idempotency, we need a
separate place to define the default type than the Content-Type:
header.

get_default_type(), set_default_type(): Accessor and mutator methods
for the default type.
2002-07-09 02:46:12 +00:00
Barry Warsaw bb493a7039 __init__(): Don't attach the subparts if its an empty tuple. If the
boundary was given in the arguments, call set_boundary().
2002-07-09 02:44:26 +00:00
Barry Warsaw 93c40f0c3a clone(): A new method for creating a clone of this generator (for
recursive generation).

_dispatch(): If the message object doesn't have a Content-Type:
header, check its default type instead of assuming it's text/plain.
This makes for correct generation of message/rfc822 containers.

_handle_multipart(): We can get rid of the isdigest kludge.  Just
print the message as normal and everything will work out correctly.

_handle_mulitpart_digest(): We don't need this anymore either.
2002-07-09 02:43:47 +00:00
Barry Warsaw ed53bdb02d __init__(): Be sure to set the default type to message/rfc822. 2002-07-09 02:40:35 +00:00
Barry Warsaw 8fa06b55f6 _structure(): A handy little debugging aid that I don't (yet) intend
to make public, but that others might still find useful.
2002-07-09 02:39:07 +00:00
Barry Warsaw 27b168ca7c With the addition of Oleg's support for RFC 2231, it's time to bump
the version number to 2.1.
2002-07-09 02:13:10 +00:00
Barry Warsaw 6ee7156996 append(): Clarify the expected type of charset. 2002-07-03 05:04:04 +00:00
Barry Warsaw 12566a8826 Oleg Broytmann's support for RFC 2231 encoded parameters, SF patch #549133
Specifically,

decode_rfc2231(), encode_rfc2231(): Functions to encode and decode RFC
2231 style parameters.

decode_params(): Function to decode a list of parameters.
2002-06-29 05:58:04 +00:00
Barry Warsaw 908dc4bea8 Oleg Broytmann's support for RFC 2231 encoded parameters, SF patch #549133
Specifically,

_formatparam(): Teach this about encoded `param' arguments, which are
a 3-tuple of items (charset, language, value).  language is ignored.

_unquotevalue(): Handle both 3-tuple RFC 2231 values and unencoded
values.

_get_params_preserve(): Decode the parameters before returning them.

get_params(), get_param(): Use _unquotevalue().

get_filename(), get_boundary(): Teach these about encoded (3-tuple)
parameters.
2002-06-29 05:56:15 +00:00
Barry Warsaw 8e69bdac33 __unicode__(): Patch # 541263 by Mikhail Zabaluev, implementation
modified by Barry.
2002-06-29 03:26:58 +00:00
Barry Warsaw ba2577b7f1 _max_append(): When adding the string `s' to its own line, it should
be lstrip'd so that old continuation whitespace is replaced by that
specified in Header's continuation_ws parameter.
2002-06-28 23:48:23 +00:00
Barry Warsaw 766125080f Teach this class about "highest-level syntactic breaks" but only for
headers with no charset or 'us-ascii' charsets.  Actually this is only
partially true: we know about semicolons (but not true parameters) and
we know about whitespace (but not technically folding whitespace).
Still it should be good enough for all practical purposes.

Other changes include:

__init__(): Add a continuation_ws argument, which defaults to a single
space.  Set this to change the whitespace used for continuation lines
when a header must be split.  Also, changed the way header line
lengths are calculated, so that they take into account continuation_ws
(when tabs-expanded) and any provided header_name parameter.  This
should do much better on returning split headers for which the first
and subsequent lines must fit into a specified width.

guess_maxlinelen(): Removed.  I don't think we need this method as
part of the public API.

encode_chunks() -> _encode_chunks(): I don't think we need this one as
part of the public API either.
2002-06-28 23:46:53 +00:00
Barry Warsaw 062749ac57 _split_header(): The code here was terminally broken because it didn't
know anything about RFC 2047 encoded headers.  Fortunately we have a
perfectly good header splitter in Header.encode().  So we just call
that to give us a properly formatted and split header.
Header.encode() didn't know about "highest-level syntactic breaks" but
that's been fixed now too.
2002-06-28 23:41:42 +00:00
Barry Warsaw 69e18af968 _parsebody(): Fix for the new message/rfc822 tree structure (the
parent is now a multipart with one element, the sub-message object).
2002-06-02 19:12:03 +00:00
Barry Warsaw d2b2e533c0 header_encode(), encode(): Use _floordiv() from the appropriate
compatibility module.
2002-06-02 19:08:31 +00:00
Barry Warsaw 21f77ac0bc Use absolute import paths for intrapackage imports. 2002-06-02 19:07:16 +00:00
Barry Warsaw 8ba76e8929 Use absolute import paths for intrapackage imports.
as_string(): Use Generator.flatten() for better performance.
2002-06-02 19:05:51 +00:00
Barry Warsaw 524af6f382 Use absolute import paths for intrapackage imports.
Use MIMENonMultipart as the base class so that you can't attach() to
these non-multipart message types.
2002-06-02 19:05:08 +00:00
Barry Warsaw 7dc865ad72 flatten(): Renamed from __call__() which is (silently) deprecated.
__call__() can be 2-3x slower than the equivalent normal method.

_handle_message(): The structure of message/rfc822 message has
changed.  Now parent's payload is a list of length 1, and the zeroth
element is the Message sub-object.  Adjust the printing of such
message trees to reflect this change.
2002-06-02 19:02:37 +00:00
Barry Warsaw ff49279f7c _intdiv2() -> _floordiv(), merge of uncommitted changes. 2002-06-02 18:59:06 +00:00
Neal Norwitz 1fab9ee085 Get email test to pass. Barry, hope this is what you had in mind 2002-06-02 16:38:14 +00:00
Barry Warsaw 9d5e4aa414 Bump to version 2.0.5, and also use absolute import paths. 2002-06-01 06:03:09 +00:00
Barry Warsaw 2f514a806d These two classes provide bases for more specific content type
subclasses.

MIMENonMultipart: Base class for non-multipart/* content type subclass
specializations, e.g. image/gif.  This class overrides attach() which
raises an exception, since it makes no sense to attach a subpart to
e.g. an image/gif message.

MIMEMultipart: Base class for multipart/* content type subclass
specializations, e.g. multipart/mixed.  Does little more than provide
a useful constructor.
2002-06-01 05:59:12 +00:00
Barry Warsaw 1c30aa2292 The _compat modules now export _floordiv() instead of _intdiv2() for
better code reuse.

_split() Use _floordiv().
2002-06-01 05:49:17 +00:00
Barry Warsaw c5d1c045ab Slightly better docstring 2002-06-01 05:45:37 +00:00
Barry Warsaw bb98c8cff0 _is_unicode(): Use UnicodeType instead of the unicode builtin for
Python 2.1 compatibility.
2002-06-01 03:56:07 +00:00
Guido van Rossum ca948b40b4 Use floor division where appropriate. 2002-05-29 20:38:21 +00:00
Guido van Rossum 1a7ac359a0 Importing Charset should not fail when Unicode is disabled. (XXX
Using Unicode-aware methods may still die with a NameError on unicode.
Maybe there's a more elegant solution but I doubt anybody cares.)
2002-05-28 18:49:03 +00:00
Tim Peters 8ac1495a6a Whitespace normalization. 2002-05-23 15:15:30 +00:00
Barry Warsaw 43193150ee Bump to version 2.0.4 2002-05-22 01:52:33 +00:00
Barry Warsaw 4be9eccbc4 getaddresses(): Like the change in rfc822.py, this one needs to access
the AddressList.addresslist attribute directly.

Also, add a test case for the email.Utils.getaddresses() interface.
2002-05-22 01:52:10 +00:00