Commit Graph

388 Commits

Author SHA1 Message Date
Barry Warsaw db6888b7df _make_boundary(): Fix for SF bug #745478, broken boundary calculation
in some locales.  This code simplifies the boundary algorithm to use
randint() which is what we wanted anyway.

Bump package version to 2.5.3.

Backport candidate for Python 2.2.3
2003-05-29 19:39:33 +00:00
Barry Warsaw 362310df81 Bump version number 2003-05-08 03:34:58 +00:00
Barry Warsaw f8b3e1f76e A couple of new parsedate test cases. 2003-05-08 03:34:01 +00:00
Barry Warsaw b5dc39f02c parsedate_tz(): Be slightly more lenient when there's no day of the
week.  Patch given by Daniel Berlin in SF bug # 732761.  Also closes
SF bug # 727719.

Backport candidate.
2003-05-08 03:33:15 +00:00
Barry Warsaw 0822ff7cca Get rid of some hard coded tabs 2003-04-24 15:58:47 +00:00
Barry Warsaw 482c5f7eb7 as_string(): Added some text to the docstring to make it clear that
it's a convenience only and give hints on what to do for more
flexibility.
2003-04-18 23:04:35 +00:00
Barry Warsaw 7ba256f039 Fix a comment 2003-04-02 04:51:33 +00:00
Barry Warsaw 1a99cf045d Bump to version 2.5.1 2003-03-30 20:47:48 +00:00
Barry Warsaw 9c505ae3da test_whitespace_eater_unicode_2(): Test case for SF bug #710498. 2003-03-30 20:47:22 +00:00
Barry Warsaw ba1548a736 __unicode__(): Fix the logic for calculating whether to add a
separating space or not between encoded chunks.  Closes SF bug
#710498.
2003-03-30 20:46:47 +00:00
Barry Warsaw e159d584d2 Temporary bump of the version number. 2003-03-26 17:58:11 +00:00
Barry Warsaw cd7051f698 typed_subpart_iterator(): Fix these to use non-deprecated APIs,
i.e. get_content_maintype() and get_content_subtype().

Also, add True, False for Python 2.2.x where x < 2 compatibility.
2003-03-26 17:57:25 +00:00
Barry Warsaw 8af56778fd typed_subpart_iterator(): Fix these to use non-deprecated APIs,
i.e. get_content_maintype() and get_content_subtype().
2003-03-26 17:56:21 +00:00
Barry Warsaw 5fe9ead82c Email version 2.5 -- I will now backport this to Python 2.2.3. 2003-03-21 18:57:59 +00:00
Barry Warsaw 6613fb8412 _encode_chunks(): Throw out empty chunks. 2003-03-17 20:36:20 +00:00
Barry Warsaw 240754933e test_long_lines_with_different_header(): Another test from Jason. 2003-03-17 20:35:14 +00:00
Barry Warsaw ab75840cd0 test_getaddresses_nasty(): A test for mimelib SF bug # 697641. 2003-03-17 18:36:37 +00:00
Barry Warsaw fa348c876f getaddrlist(): Make sure this consumes all the data, and if there is
no address there (perhaps because of invalid characters, it appends
('', '') to the result set.

Closes mimelib SF bug # 697641.
2003-03-17 18:35:42 +00:00
Barry Warsaw ea8f6fa094 test_whitespace_eater_unicode(): Make this test Python 2.1 compatible. 2003-03-12 03:14:11 +00:00
Barry Warsaw ca53c12c8b Python 2.1 doesn't have True and False 2003-03-12 02:54:17 +00:00
Barry Warsaw f9e0bd8df8 Adjust tests for no newline appending to MIMEText.__init__()'s _text
argument.
2003-03-11 05:10:46 +00:00
Barry Warsaw df6c70b454 beta 1 2003-03-11 05:05:21 +00:00
Barry Warsaw bd757ba1ed Adjust tests for no newline appending to MIMEText.__init__()'s _text
argument.
2003-03-11 05:04:54 +00:00
Barry Warsaw cbec700b49 __init__(): Don't add a newline to _text if it doesn't already end in
one.  Possibly controversial.
2003-03-11 05:04:09 +00:00
Barry Warsaw 12dc230c00 body_line_iterator(): Accept optional decode argument, pass through to
Message.get_payload().
2003-03-11 04:41:35 +00:00
Barry Warsaw 08898499b2 get_payload(): Teach this about various uunencoded
Content-Transfer-Encodings
2003-03-11 04:33:30 +00:00
Barry Warsaw 3840b49d9c test_get_decoded_uu_payload(): A new test for
Content-Transfer-Encoding: x-uuencode
2003-03-11 04:31:37 +00:00
Barry Warsaw a2369928b5 specialsre, escapesre: In SF bug #663369, Matthew Woodcraft points out
that backslashes must be escaped in character sets.
2003-03-10 19:20:18 +00:00
Barry Warsaw a2e64702ca test_escape_backslashes(): A test for SF bug #663369 by Matthew Woodcraft. 2003-03-10 19:18:34 +00:00
Barry Warsaw 59e98ae1c5 _bdecode(): Remove redundant check. 2003-03-10 17:36:04 +00:00
Barry Warsaw 513af770d7 Fix base class 2003-03-10 17:00:43 +00:00
Barry Warsaw e1ff4bbce6 Use ndiffAssertEqual in a couple of places for better error reporting. 2003-03-10 16:59:34 +00:00
Barry Warsaw 21191d3e31 get_payload(): If we get a low-level binascii.Error when base64
decoding the payload, just return it as-is.
2003-03-10 16:13:14 +00:00
Barry Warsaw 3efb651ea3 test_broken_base64_payload(): Test for crash in low-level binascii
module when decoding a message with broken base64.
2003-03-10 16:09:51 +00:00
Barry Warsaw 5b8c69f11e _split_ascii() [method and function]: Don't join the lines just to
split them again.  Simply return them as chunk lists.

_encode_chunks(): Don't add more folding whitespace than necessary.
2003-03-10 15:14:08 +00:00
Barry Warsaw 796376338f test_another_long_multiline_header(): Yet another formatting test. 2003-03-10 15:11:29 +00:00
Barry Warsaw 33975eac3d _split_ascii(): lstrip the individual lines in the ascii split lines,
since we'll be adding our own continuation whitespace later.
2003-03-07 23:24:34 +00:00
Barry Warsaw 28ffcef4e6 test_long_unbreakable_lines_with_continuation(): Another funky example
from Jason Mastaler :)
2003-03-07 23:23:04 +00:00
Barry Warsaw 8e1e7f5468 decode_rfc2231(): RFC 2231 allows leaving out both the charset and
language without including any single quotes.
2003-03-07 22:46:41 +00:00
Barry Warsaw 21fcc4e287 test_rfc2231_no_language_or_charset(): RFC 2231 allows leaving out
both the charset and language without including any single quotes.
2003-03-07 22:45:55 +00:00
Barry Warsaw bf7e241397 whitespace normalization 2003-03-07 15:58:51 +00:00
Barry Warsaw ce6bf59b2d _write_headers(), _split_header(): All of the smarts for splitting
long header lines is now (properly) in the Header class.  So we no
longer need _split_header() and we'll just defer to Header.encode()
when we have a plain string.
2003-03-07 15:43:17 +00:00
Barry Warsaw 9f3fcd9c23 More internal refinements of the ascii splitting algorithm.
_encode_chunks(): Pass maxlinelen in instead of always using
self._maxlinelen, so we can adjust for shorter initial lines.
Pass this value through to _max_append().

encode(): Weave maxlinelen through to the _encode_chunks() call.

_split_ascii(): When recursively splitting a line on spaces
(i.e. lower level syntactic split), don't append the whole returned
string.  Instead, split it on linejoiners and extend the lines up to
the last line (for proper packing).  Calculate the linelen based on
the last element in the this list.
2003-03-07 15:39:37 +00:00
Barry Warsaw 82783e6f33 test_string_headerinst_eq(): Another Jason test :) 2003-03-07 15:35:47 +00:00
Tim Peters 2b4821347f Repaired a misleading comment Barry inherited from me. 2003-03-06 23:41:58 +00:00
Barry Warsaw bd836dfba3 _split_ascii(): In the clause where curlen + partlen > maxlen, if the
part itself is longer than maxlen, and we aren't already splitting on
whitespace, then we recursively split the part on whitespace and
append that to the this list.
2003-03-06 20:33:04 +00:00
Barry Warsaw f0d3585669 test_long_received_header(): Another test case for folding long
Received headers (first on semis then on whitespace), given by Jason
Mastaler.
2003-03-06 20:31:02 +00:00
Barry Warsaw c79ffb022f test_whitespace_eater_unicode(): Test of the last outstanding bug in
SF # 640110.
2003-03-06 16:11:14 +00:00
Barry Warsaw 4848805341 __unicode__(): When converting to a unicode string, we need to
preserve spaces in the encoded/unencoded word boundaries.  RFC 2047 is
ambiguous here, but most people expect the space to be preserved.
Really closes SF bug # 640110.
2003-03-06 16:10:30 +00:00
Barry Warsaw 28ffcb6f84 test_rfc2047_multiline(): Test case for SF bug #640110. 2003-03-06 06:38:29 +00:00
Barry Warsaw 671c3e6373 decode_header(): Typo when appending an unencoded chunk to the
previous unencoded chunk (e.g. when they appear on separate lines).
Closes the 2nd bug in SF #640110 (the first one's already been
fixed).
2003-03-06 06:37:42 +00:00
Barry Warsaw 10627ba9b8 Merge of the folding-reimpl-branch. Specific changes,
Update tests for email 2.5.
2003-03-06 05:41:07 +00:00
Barry Warsaw e899e51c06 Merge of the folding-reimpl-branch. Specific changes,
_split(): New implementation of ASCII line splitting which should do a
better job and not be subject to the various weird artifacts (bugs)
reported.  This should also do a better job of higher-level syntactic
splits by trying first to split on semis, then commas, then
whitespace.

Use a Timbot-ly binary search for optimal non-ASCII split points for
better packing of header lines.  This also lets us remove one
recursion call.  Don't pass in firstline, but instead pass in the
actual line length we're shooting for.  Also pass in the list of split
characters.

encode(): Pass in the list of split characters so applications can
have some control over what "higher level syntactic breaks" are.

Also,

decode_header(): Transform binascii.Errors which can occur when
decoding a base64 RFC 2047 header with bogus data, into an
email.Errors.HeaderParseError.  Closes SF bug #696712.
2003-03-06 05:39:46 +00:00
Barry Warsaw 0e4570bcb0 Merge of the folding-reimpl-branch. Specific changes,
Rename a constant.
2003-03-06 05:25:35 +00:00
Barry Warsaw 5c2f1536d0 Merge of the folding-reimpl-branch. Specific changes,
Remove a senseless comment.
2003-03-06 05:25:00 +00:00
Barry Warsaw 5d384ef069 Merge of the folding-reimpl-branch. Specific changes,
_handle_multipart(): Ensure that if the preamble exists but does not
end in a newline, a newline is still added.  Without this, the
boundary separator will end up on the preamble line, breaking the MIME
structure.

_make_boundary(): Handle differences in the decimal point character
based on the locale.
2003-03-06 05:22:02 +00:00
Barry Warsaw 784cf6ae88 Merge of the folding-reimpl-branch. Specific changes,
Charset: Alias __repr__ to __str__ for debugging.

header_encode(): When calling quopriMIME.header_encode(), set
maxlinelen=None so that the lower level function doesn't (also) try to
wrap/fold the line.
2003-03-06 05:16:29 +00:00
Barry Warsaw 0ed81c35a7 Merge of the folding-reimpl-branch. Specific changes,
_max_append(): Change the comparison so that the new string is
concatenated if it's less than or equal to the max length.

header_encode(): Allow for maxlinelen == None to mean, don't do any
line splitting.  This is because this module is mostly used by higher
level abstractions (Header.py) which already ensures line lengths.  We
do this in a cheapo way by setting the max_encoding to some insanely
<100k wink> large number.
2003-03-06 05:14:20 +00:00
Barry Warsaw 4e68a1ec6c CHARSETS, ALIASES, CODEC_MAP: SF feature request 633543, Korean
support and other charset defaults.  See also:

http://article.gmane.org/gmane.comp.python.mime.devel/250

(this just commits the last bit of the article that wasn't part of
email 2.4.3.)
2003-01-07 00:29:07 +00:00
Barry Warsaw 3d597812b6 Jack complained that on test_crlf_separation() was failing on MacOS9
because the test file, msg_26.txt which has \r\n line endings, was
getting munged by cvs, which knows to do line ending conversions for
text files.  But we want \r\n to be preserved on all platforms, so we
cvs admin'd the file to be -kb (binary), which means we have to open
the file in binary mode to preserve these line ends.  Hopefully this
will be the end of the thrashing on this issue (but probably not).

Test passes on *nix now, and Tim confirms it passes on Windows.  We'll
leave it to Jack to test MacOS.
2003-01-02 22:48:36 +00:00
Barry Warsaw 10ee7a7f15 test_bad_8bit_header(): Tests for optional argument `errors'. See SF
bug #648119.
2002-12-30 19:14:38 +00:00
Barry Warsaw f4fdff715a Header.__init__(), .append(): Add an optional argument `errors' which
is passed straight through to the unicode() and ustr.encode() calls.
I think it's the best we can do to address the UnicodeErrors in badly
encoded headers such as is described in SF bug #648119.
2002-12-30 19:13:00 +00:00
Barry Warsaw 72261c9dfb Actually, make this 2.5a1 since it will include API changes that may
need more vetting, and it will be included in Python 2.3a1.
2002-12-30 19:08:38 +00:00
Barry Warsaw 207d1c2065 Bump to 2.5 2002-12-30 17:45:41 +00:00
Barry Warsaw f29ffbdbf5 TestMIMEAudio.setUp(): Use the email package's copy of the audio test
file, needed because some binary distros (read RPMs) don't include the
test module in their standard Python package.  This eliminates an
external dependency and closes SF bug # 650441.
2002-12-30 17:45:02 +00:00
Barry Warsaw c99c08c764 A copy of the audio test file from Lib/test, needed because some
binary distros (read RPMs) don't include the test module in their
standard Python package.  This eliminates an external dependency and
closes SF bug # 650441.
2002-12-30 17:44:27 +00:00
Barry Warsaw ba97659f5f parsedate_tz(): Fix SF bug #552345, optional FWS between the comma and
the day in an RFC 2822 date.
2002-12-30 17:21:36 +00:00
Barry Warsaw 795833fbc6 test_parsedate_compact(): A test for optional FWS between the comma
and the day number in an RFC 2822 date specification.  See bug
#552345.
2002-12-30 17:20:53 +00:00
Barry Warsaw 5c8fef903d A code cleansing pass 2002-12-30 16:43:42 +00:00
Barry Warsaw 1fb22bb24f Port rfc822.py changes that didn't make it into this copy,
specifically that dots are allowed in obs-phrase.  This fixes parsing
of dots in realnames.
2002-12-30 16:21:07 +00:00
Barry Warsaw edb59c1ee8 test_name_with_dots(): A new test to ensure that we're implementing
RFC 2822's rules w.r.t. dots in the realname part of address fields.
2002-12-30 16:19:52 +00:00
Tim Peters 6578dc925f Whitespace normalization. 2002-12-24 18:31:27 +00:00
Barry Warsaw da2525ed2a parse(), _parseheaders(), _parsebody(): A fix for SF bug #633527,
where in lax parsing, the first non-header line after a header block
(e.g. the first line not containing a colon, and not a continuation),
can be treated as the first body line, even without the RFC mandated
blank line separator.

rfc822 had this behavior, and I vaguely remember problems with this,
but can't remember details.  In any event, all the tests still pass,
so I guess we'll find out. ;/

This patch works by returning the non-header, non-continuation line
from _parseheader() and using that as the first header line prepended
to fp.read() if given.  It's usually None.

We use this approach instead of trying to seek/tell the file-like
object.
2002-11-05 21:44:06 +00:00
Barry Warsaw a0a00761a5 test_no_separating_blank_line(): A test for SF bug #633527, no
separating blank line between a header block and body text.

Tests both lax and strict parsing.
2002-11-05 21:36:17 +00:00
Barry Warsaw 847fdbbe71 A message with no separating blank line between the headers and the
body.  A test message for SF bug #633527.
2002-11-05 21:29:47 +00:00
Barry Warsaw 48b0a1c603 test_text_plain_in_a_multipart_digest(): A test of the fix for SF bug
#631350, where a subobject in a multipart/digest isn't a
message/rfc822.
2002-11-05 21:04:52 +00:00
Barry Warsaw 5c9130ec46 _parsebody(): A fix for SF bug #631350, where a subobject in a
multipart/digest isn't a message/rfc822.  This is legal, but counter
to recommended practice in RFC 2046, $5.1.5.

The fix is to look at the content type after setting the default
content type.  If the maintype is then message or multipart, attach
the parsed subobject, otherwise use set_payload() to set the data of
the other object.
2002-11-05 20:54:37 +00:00
Barry Warsaw 00e6a02ef8 Test case, distilled from SF bug #631350, where a subobject in a
multipart/digest isn't a message/rfc822.  This is legal, but counter
to recommended practice in RFC 2046, $5.1.5.
2002-11-05 20:53:18 +00:00
Barry Warsaw 8f4dcbd3f6 Bump __version__ (yes, to 2.5 "minus") 2002-11-05 19:56:47 +00:00
Barry Warsaw 030ddf794f Jason Mastaler's patch to break the dependence on rfc822.py for the
address parsing routines.  Closes SF patch #613434.
2002-11-05 19:54:52 +00:00
Barry Warsaw 4111804548 test_body_encoding(): a new test for Charset.body_encode(), especially
one that tests the obscure bug reported in SF # 625509.
2002-10-21 05:43:58 +00:00
Barry Warsaw 34aa44538d test_body_encoding(): a new test 2002-10-21 05:31:08 +00:00
Barry Warsaw 3d57589f0f body_encode(): Fixed typo reported by Chris Lawrence, closing SF bug
#625509.  This isn't a huge problem because at the moment there are no
built-in charsets for which header_encoding is QP but body_encoding is
not.
2002-10-21 05:29:53 +00:00
Barry Warsaw 67f8f2fe2a append(): Fixing the test for convertability after consultation with
Ben.  If s is a byte string, make sure it can be converted to unicode
with the input codec, and from unicode with the output codec, or raise
a UnicodeError exception early.  Skip this test (and the unicode->byte
string conversion) when the charset is our faux 8bit raw charset.
2002-10-14 16:52:41 +00:00
Barry Warsaw a74771c0b9 Two new tests for splitting (or not splitting) 8-bit header data. 2002-10-14 15:26:17 +00:00
Barry Warsaw 1a6ea3398e Bump the __version__ 2002-10-14 15:24:18 +00:00
Barry Warsaw 5e3bcff651 __init__(): Fix an invariant, that the charset item in a chunk tuple
must be a Charset instance, not a string.  The bug here was that
self._charset wasn't being converted to a Charset instance so later
.append() calls which used the default charset would break.

_split(): If the charset of the chunk is '8bit', return the chunk
unchanged.  We can't safely split it, so this is the avenue of least
harm.
2002-10-14 15:13:17 +00:00
Barry Warsaw 6c2bc46355 _split_header(): If we have a header which is a byte string containing
8-bit data, we cannot split it safely, so return the original string
unchanged.

_is8bitstring(): Helper function which returns True when we have a
byte string that contains non-ascii characters (i.e. mysterious 8-bit
data).
2002-10-14 15:09:30 +00:00
Barry Warsaw 7cd724049f CHARSETS: Add faux '8bit' encoding for representing raw 8-bit data for
which we know nothing else.
2002-10-14 15:06:55 +00:00
Barry Warsaw 0c358258c9 _encode_chunks(), encode(): Don't modify self._chunks. As Ben says:
Also, it fixes a really egregious error in Header.encode() (really
    in Header._encode_chunks()) that could cause a header to grow and
    grow each time encode() was called if output_codec was different
    from input_codec.

Also, fix a typo.
2002-10-13 04:06:28 +00:00
Barry Warsaw ab9439fdd4 Update the urls and other information about the add-on Japanese,
Korean, and Chinese codecs.
2002-10-13 04:00:45 +00:00
Barry Warsaw c986e54733 Bump version number to 2.4.2 to pick up the latest minor bug fixes. 2002-10-10 15:19:46 +00:00
Barry Warsaw dc8087b26e New tests to verify that charsets are case insensitive, and that by
default get_body_encoding() cannot be SHORTEST.
2002-10-10 15:14:22 +00:00
Barry Warsaw ee07cb1d70 get_content_charset(): RFC 2046 $4.1.2 says charsets are not case
sensitive.  Coerce the argument to lower case.
2002-10-10 15:13:26 +00:00
Barry Warsaw 14fc464ec9 __init__(): RFC 2046 $4.1.2 says charsets are not case sensitive.
Coerce the argument to lower case.  Also, since body encodings can't
be SHORTEST, default the CHARSETS failobj's second item to BASE64.
2002-10-10 15:11:20 +00:00
Barry Warsaw 08c82b8086 openfile(): Go back to opening the files in text mode. This undoes
the change in revision 1.11 (test_email.py) in response to SF bug
#609988.  We now think that was the wrong fix and that WinZip was the
real culprit there.
2002-10-07 17:27:55 +00:00
Barry Warsaw 487fe6ac39 _parsebody(): Use get_content_type() instead of the deprecated
get_type().  Also, one of the regular expressions is constant so might
as well make it a module global.  And, when splitting up digests,
handle lineseps that are longer than 1 character in length
(e.g. \r\n).
2002-10-07 17:27:35 +00:00
Barry Warsaw 1d475d3452 Bump the version to 2.4.1 (not 2.5 as previously mentioned) to sync it
with the standalone mimelib package.
2002-10-07 17:20:25 +00:00
Barry Warsaw 0ac885e821 test__all__(): Fix the import list. 2002-10-01 17:57:06 +00:00
Barry Warsaw 2d7fab1a45 Docstring consistency with the updated .tex files. 2002-10-01 00:52:27 +00:00
Barry Warsaw 1f84ff1d40 _structure(): Swap fp and level arguments. 2002-10-01 00:51:47 +00:00
Barry Warsaw 0ebc5c96c5 Docstring consistency with the updated .tex files. 2002-10-01 00:44:13 +00:00
Barry Warsaw 12272a2f22 Docstring consistency with the updated .tex files. 2002-10-01 00:05:24 +00:00
Barry Warsaw 48330687f3 Docstring consistency with the updated .tex files. 2002-09-30 23:07:35 +00:00
Barry Warsaw 0031982c21 Docstring consistency with the updated .tex files. 2002-09-30 22:15:00 +00:00
Barry Warsaw 03a7559654 Docstring consistency with the updated .tex files. 2002-09-30 21:29:10 +00:00
Barry Warsaw fd2e8f7ea6 Docstring consistency with the updated .tex files. 2002-09-30 21:24:00 +00:00
Barry Warsaw 419b284b7c __all__: Updated 2002-09-30 20:41:33 +00:00
Barry Warsaw 057b8428d0 Docstring consistency with the updated .tex files. 2002-09-30 20:07:22 +00:00
Barry Warsaw 42d1d3edc0 __contains__(): Change the second argument to `name' for consistency.
I seriously doubt this will break any deployed code.

Docstring consistency with the updated .tex files.
2002-09-30 18:17:35 +00:00
Barry Warsaw 174aa49a88 With help from Martin v. Loewis, clarification is added for the
semantics of header chunks using byte and Unicode strings.
Specifically,

append(): When the given string is a byte string, charset (whether
specified explicitly in the argument list or implicitly via the
constructor default) is the encoding of the byte string, and a
UnicodeError will be raised if the string cannot be decoded with that
charset.  If s is a Unicode string, then charset is a hint specifying
the character set of the characters in the string.  In this case, when
producing an RFC 2822 compliant header using RFC 2047 rules, the
Unicode string will be encoded using the following charsets in order:
us-ascii, the charset hint, utf-8.

__init__(): Use the global USASCII Charset instance when the charset
argument is None.  Also, clarification in the docstring.

Also, use True/False where appropriate.
2002-09-30 15:51:31 +00:00
Barry Warsaw d20b66537c The ansi_x3.4_1968 encoding is an alias for ascii, but isn't known in
Python 2.1.3.  However it's required by the email tests suite, so poke
it into the encodings aliases if it's missing.  The is apparently the
approved API for doing so.

Now we can remove the hexversion shortcircuits in the test suite.
2002-09-30 15:23:17 +00:00
Barry Warsaw d63071b05f Make the tests pass under Python 2.1 but only by cheating. Python 2.1
doesn't know about the ansi-x3.4-1968 charset so skip two tests that
rely on that (msg_32.txt and msg_33.txt).
2002-09-28 21:22:52 +00:00
Barry Warsaw eecdc742f5 Add a test for SHORTEST encoding of utf-8 headers, and also update
some of the test values which change because of this.
2002-09-28 21:04:19 +00:00
Barry Warsaw c202d93e0e Use True/False everywhere, and other code cleanups. 2002-09-28 21:02:51 +00:00
Barry Warsaw f776e6922c Code cleanup and add docstrings. 2002-09-28 20:52:26 +00:00
Barry Warsaw 5bdb2bee37 Use True/False everywhere, and other code cleanups. 2002-09-28 20:49:57 +00:00
Barry Warsaw e03e8f09eb Use True/False everywhere. 2002-09-28 20:44:58 +00:00
Barry Warsaw 4ece778bbc is_multipart(): Use isinstance() instead of type equality. 2002-09-28 20:41:39 +00:00
Barry Warsaw c494549566 Docstring and code cleanups, e.g. use True/False everywhere. 2002-09-28 20:40:25 +00:00
Barry Warsaw bba6b0243e __init__(): Minor code cleanup. 2002-09-28 20:27:28 +00:00
Barry Warsaw 5f253279d6 Add a pychecker suppression. 2002-09-28 20:25:15 +00:00
Barry Warsaw 56835dd961 Use True/False everywhere. 2002-09-28 18:04:55 +00:00
Barry Warsaw 5932c9bedd Added a feature suggested by Martin v Loewis, where a new header
encoding flag SHORTEST means to return the shortest encoding between
base64 and qp.  This is used for the header_enc for utf-8.  SHORTEST
isn't legal for body_enc.

Also some code cleanup:

- use True/False everywhere
- use == instead of `is' in a few places
- added _unicode() and make consistent the "is unicode" checks
- update docstrings
2002-09-28 17:47:56 +00:00
Barry Warsaw 09f7424f3a test_unicode_error(): Comment this test out, since we still have
controversy.
2002-09-26 17:21:53 +00:00
Barry Warsaw 9c74569ec9 Fixing some RFC 2231 related issues as reported in the Spambayes
project, and with assistance from Oleg Broytmann.  Specifically,
added some new tests to make sure we handle RFC 2231 encoded
parameters correctly.  Two new data files were added which contain RFC
2231 encoded parameters.
2002-09-26 17:21:02 +00:00
Barry Warsaw 15aefa94d0 Fixing some RFC 2231 related issues as reported in the Spambayes
project, and with assistance from Oleg Broytmann.  Specifically,

get_param(), get_params(): Document that these methods may return
parameter values that are either strings, or 3-tuples in the case of
RFC 2231 encoded parameters.  The application should be prepared to
deal with such return values.

get_boundary(): Be prepared to deal with RFC 2231 encoded boundary
parameters.  It makes little sense to have boundaries that are
anything but ascii, so if we get back a 3-tuple from get_param() we
will decode it into ascii and let any failures percolate up.

get_content_charset(): New method which treats the charset parameter
just like the boundary parameter in get_boundary().  Note that
"get_charset()" was already taken to return the default Charset
object.

get_charsets(): Rewrite to use get_content_charset().
2002-09-26 17:19:34 +00:00
Barry Warsaw 6f30a8ab62 __version__: Bump to 2.4
Move the imports of Parser and Message inside the
message_from_string() and message_from_file() functions.  This way
just "import email" won't suck in most of the submodules of the
package.

Note: this will break code that relied on "import email" giving you a
bunch of the submodules, but that was never documented and should not
have been relied on.
2002-09-25 22:07:50 +00:00
Barry Warsaw 40363b63f0 Open the test files in binary mode so the \r\n files won't cause
failures on Windows.  Closes SF bug # 609988.
2002-09-18 22:17:57 +00:00
Barry Warsaw 78170048f9 Bump to 2.3.1 to pick up the missing file. 2002-09-12 03:44:50 +00:00
Barry Warsaw fbcde75c70 get_payload(): Document that calling it with no arguments returns a
reference to the payload.
2002-09-11 14:11:35 +00:00
Barry Warsaw bc6edac8df test_utils_quote_unquote(): Test for unquote() properly
de-backslash-ifying.
2002-09-11 02:31:24 +00:00
Barry Warsaw 184d55a897 rfc822.unquote() doesn't properly de-backslash-ify in Python prior to
2.3.  This patch (adapted from Quinn Dunkan's SF patch #573204) fixes
the problem and should get ported to rfc822.py.
2002-09-11 02:22:48 +00:00
Barry Warsaw 034b47acfe _parsebody(): Instead of raising a BoundaryError when no start
boundary could be found -- in a lax parser -- the entire body is
assigned to the message payload.
2002-09-10 16:14:56 +00:00
Barry Warsaw b1c1de3805 Import _isstring() from the compatibility layer.
_handle_text(): Use _isstring() for stringiness test.

_handle_multipart(): Add a test before the ListType test, checking for
stringiness of the payload.  String payloads for multitypes means a
message with broken MIME chrome was parsed by a lax parser.  Instead
of raising a BoundaryError in those cases, the entire body is assigned
to the message payload (but since the content type is still
multipart/*, the Generator needs to be updated too).
2002-09-10 16:13:45 +00:00
Barry Warsaw 356afac41f _isstring(): Factor out "stringiness" test, e.g. for StringType or
UnicodeType, which is different between Python 2.1 and 2.2.
2002-09-10 16:09:06 +00:00
Barry Warsaw 45d9bde6c1 _ascii_split(): Don't lstrip continuation lines. Closes SF bug #601392. 2002-09-10 15:57:29 +00:00
Barry Warsaw 24d45df3f2 test_splitting_first_line_only_is_long(): New test for SF bug #601392,
broken wrapping of long ASCII headers.
2002-09-10 15:46:44 +00:00
Barry Warsaw dad90c202a A sample message with broken MIME boundaries. 2002-09-10 15:43:30 +00:00
Barry Warsaw e99e2f53e7 test_set_param(), test_del_param(): Test RFC 2231 encoding support by
Oleg Broytmann in SF patch #600096.  Whitespace normalized by Barry.
2002-09-06 03:56:26 +00:00
Barry Warsaw 3c25535dc8 _formatparam(), set_param(): RFC 2231 encoding support by Oleg
Broytmann in SF patch #600096.  Specifically, the former function now
encodes the triplets, while the latter adds optional charset and
language arguments.
2002-09-06 03:55:04 +00:00
Barry Warsaw 470288c54e test_mondo_message(): "binary" is not a legal content type, so with
the previous RFC 2045, $5.2 repair to get_content_type() this
subpart's type will now be text/plain.
2002-09-06 03:41:27 +00:00
Barry Warsaw 58fb61cce5 test_replace_header(): New test for Message.replace_header(). 2002-09-06 03:39:59 +00:00
Barry Warsaw 229727fa07 replace_header(): New method given by Skip Montanaro in SF patch
#601959.  Modified slightly by Barry (who added the KeyError in case
the header is missing.
2002-09-06 03:38:12 +00:00
Barry Warsaw a4ce1cf34c _structure(): Use .get_content_type() 2002-09-01 21:04:43 +00:00
Barry Warsaw 1a1607546c Whitespace normalization. 2002-08-27 22:38:50 +00:00
Barry Warsaw 48b0d36b4d Typo 2002-08-27 22:34:44 +00:00
Tim Peters 280488b9a3 Whitespace normalization. 2002-08-23 18:19:30 +00:00
Barry Warsaw 4d5ef6aed6 Bump version number to 2.3 2002-08-20 14:51:34 +00:00
Barry Warsaw 3328136e3c Added tests for SF patch #597593, syntactically invalid Content-Type: headers. 2002-08-20 14:51:10 +00:00
Barry Warsaw f36d804b3b get_content_type(), get_content_maintype(), get_content_subtype(): RFC
2045, section 5.2 states that if the Content-Type: header is
syntactically invalid, the default type should be text/plain.
Implement minimal sanity checking of the header -- it must have
exactly one slash in it.  This closes SF patch #597593 by Skip, but in
a different way.

Note that these methods used to raise ValueError for invalid ctypes,
but now they won't.
2002-08-20 14:50:09 +00:00
Barry Warsaw dfea3b3963 _dispatch(): Use get_content_maintype() and get_content_subtype() to
get the MIME main and sub types, instead of getting the whole ctype
and splitting it here.   The two more specific methods now correctly
implement RFC 2045, section 5.2.
2002-08-20 14:47:30 +00:00
Barry Warsaw b404bb7813 test_three_lines(): Test case reported by Andrew McNamara. Works in
email 2.2 but fails in email 1.0.
2002-08-20 12:54:07 +00:00
Barry Warsaw 9e4e050c59 Use full package paths in imports. 2002-07-23 20:35:58 +00:00
Barry Warsaw 10d0d595e0 Added a couple of more tests for Header charset handling. 2002-07-23 19:46:35 +00:00
Barry Warsaw 04f357cffe Get rid of relative imports in all unittests. Now anything that
imports e.g. test_support must do so using an absolute package name
such as "import test.test_support" or "from test import test_support".

This also updates the README in Lib/test, and gets rid of the
duplicate data dirctory in Lib/test/data (replaced by
Lib/email/test/data).

Now Tim and Jack can have at it. :)
2002-07-23 19:04:11 +00:00
Barry Warsaw 92825a9a52 append(): Bite the bullet and let charset be the string name of a
character set, which we'll convert to a Charset instance.  Sigh.
2002-07-23 06:08:10 +00:00
Barry Warsaw 15d3739446 make_header(): Watch out for charset is None, which decode_header()
will return as the charset if implicit us-ascii is used.
2002-07-23 04:29:54 +00:00
Tim Peters 53d019cf5a Changed import from
from test.test_support import TestSkipped, run_unittest
to
    from test_support import TestSkipped, run_unittest

Otherwise, if the Japanese codecs aren't installed, regrtest doesn't
believe the TestSkipped exception raised by this test matches the

    except (ImportError, test_support.TestSkipped), msg:

it's looking for, and reports the skip as a crash failure instead of
as a skipped test.

I suppose this will make it harder to run this test outside of
regrtest, but under the assumption only Barry does that, better to
make it skip cleanly for everyone else.
2002-07-21 06:06:30 +00:00
Barry Warsaw 190390b026 The email package's tests live much better in a subpackage
(i.e. email.test), so move the guts of them here from Lib/test.  The
latter directory will retain stubs to run the email.test tests using
Python's standard regression test.

test_email_torture.py is a torture tester which will not run under
Python's test suite because I don't want to commit megs of data to
that project (it will fail cleanly there).  When run under the mimelib
project it'll stress test the package with megs of message samples
collected from various locations in the wild.
2002-07-19 22:31:10 +00:00
Barry Warsaw 629038093c The email package's tests live much better in a subpackage
(i.e. email.test), so move the guts of them here from Lib/test.  The
latter directory will retain stubs to run the email.test tests using
Python's standard regression test.

test_email_torture.py is a torture tester which will not run under
Python's test suite because I don't want to commit megs of data to
that project (it will fail cleanly there).  When run under the mimelib
project it'll stress test the package with megs of message samples
collected from various locations in the wild.

email/test/data is a copy of Lib/test/data.  The fate of the latter is
still undecided.
2002-07-19 22:29:49 +00:00
Barry Warsaw d8e8e54c2b message_from_string(), message_from_file(): The consensus on the
mimelib-devel list is that non-strict parsing should be the default.
Make it so.
2002-07-19 22:26:01 +00:00
Barry Warsaw bb26b4530b Parser.__init__(): The consensus on the mimelib-devel list is that
non-strict parsing should be the default.  Make it so.
2002-07-19 22:25:34 +00:00
Barry Warsaw c10686426e To better support default content types, fix an API wart, and preserve
backwards compatibility, we're silently deprecating get_type(),
get_subtype() and get_main_type().  We may eventually noisily
deprecate these.  For now, we'll just fix a bug in the splitting of
the main and subtypes.

get_content_type(), get_content_maintype(), get_content_subtype(): New
methods which replace the above.  These /always/ return a content type
string and do not take a failobj, because an email message always at
least has a default content type.

set_default_type(): Someday there may be additional default content
types, so don't hard code an assertion about the value of the ctype
argument.
2002-07-19 22:24:55 +00:00
Barry Warsaw d43857455e _structure(): Take an optional `fp' argument which would be the object
to print>> the structure to.  Defaults to sys.stdout.
2002-07-19 22:21:47 +00:00
Barry Warsaw 1cecdc6bcb _dispatch(): Use the new Message.get_content_type() method as hashed
out on the mimelib-devel list.
2002-07-19 22:21:02 +00:00
Barry Warsaw 7aeac9180e Anthony Baxter's cleanup patch. Python project SF patch # 583190,
quoting:

  in non-strict mode, messages don't require a blank line at the end
  with a missing end-terminator. A single newline is sufficient now.

  Handle trailing whitespace at the end of a boundary. Had to switch
  from using string.split() to re.split()

  Handle whitespace on the end of a parameter list for Content-type.

  Handle whitespace on the end of a plain content-type header.

Specifically,

get_type(): Strip the content type string.

_get_params_preserve(): Strip the parameter names and values on both
sides.

_parsebody(): Lots of changes as described above, with some stylistic
changes by Barry (who hopefully didn't screw things up ;).
2002-07-18 23:09:09 +00:00
Barry Warsaw 2d2fc229a0 Anthony Baxter's patch to expose the parser's `strict' flag in these
convenience functions.  Closes SF # 583188 (python project).
2002-07-18 21:29:17 +00:00
Barry Warsaw 4ef1c7d85b _structure(): Don't get the whole Content-Type: header, just get the
type with get_type().
2002-07-11 20:24:36 +00:00
Barry Warsaw f488b2c6d5 _dispatch(): Comment improvements. 2002-07-11 18:48:40 +00:00
Barry Warsaw 8da39aa56a make_header(): New function to take the output of decode_header() and
create a Header instance.  Closes feature request #539481.

Header.__init__(): Allow the initial string to be omitted.

__eq__(), __ne__(): Support rich comparisons for equality of Header
instances withy Header instances or strings.

Also, update a bunch of docstrings.
2002-07-09 16:33:47 +00:00
Barry Warsaw f6caeba03a Anthony Baxter's patch for non-strict parsing. This adds a `strict'
argument to the constructor -- defaulting to true -- which is
different than Anthony's approach of using global state.

parse(), parsestr(): Grow a `headersonly' argument which stops parsing
once the header block has been seen, i.e. it does /not/ parse or even
read the body of the message.  This is used for parsing message/rfc822
type messages.

We need test cases for the non-strict parsing.  Anthony will supply
these.

_parsebody(): We can get rid of the isdigest end-of-line kludges,
although we still need to know if we're parsing a multipart/digest so
we can set the default type accordingly.
2002-07-09 02:50:02 +00:00
Barry Warsaw a0c8b9d4d5 Add the concept of a "default type". Normally the default type is
text/plain but the RFCs state that inside a multipart/digest, the
default type is message/rfc822.  To preserve idempotency, we need a
separate place to define the default type than the Content-Type:
header.

get_default_type(), set_default_type(): Accessor and mutator methods
for the default type.
2002-07-09 02:46:12 +00:00
Barry Warsaw bb493a7039 __init__(): Don't attach the subparts if its an empty tuple. If the
boundary was given in the arguments, call set_boundary().
2002-07-09 02:44:26 +00:00
Barry Warsaw 93c40f0c3a clone(): A new method for creating a clone of this generator (for
recursive generation).

_dispatch(): If the message object doesn't have a Content-Type:
header, check its default type instead of assuming it's text/plain.
This makes for correct generation of message/rfc822 containers.

_handle_multipart(): We can get rid of the isdigest kludge.  Just
print the message as normal and everything will work out correctly.

_handle_mulitpart_digest(): We don't need this anymore either.
2002-07-09 02:43:47 +00:00
Barry Warsaw ed53bdb02d __init__(): Be sure to set the default type to message/rfc822. 2002-07-09 02:40:35 +00:00
Barry Warsaw 8fa06b55f6 _structure(): A handy little debugging aid that I don't (yet) intend
to make public, but that others might still find useful.
2002-07-09 02:39:07 +00:00
Barry Warsaw 27b168ca7c With the addition of Oleg's support for RFC 2231, it's time to bump
the version number to 2.1.
2002-07-09 02:13:10 +00:00
Barry Warsaw 6ee7156996 append(): Clarify the expected type of charset. 2002-07-03 05:04:04 +00:00
Barry Warsaw 12566a8826 Oleg Broytmann's support for RFC 2231 encoded parameters, SF patch #549133
Specifically,

decode_rfc2231(), encode_rfc2231(): Functions to encode and decode RFC
2231 style parameters.

decode_params(): Function to decode a list of parameters.
2002-06-29 05:58:04 +00:00
Barry Warsaw 908dc4bea8 Oleg Broytmann's support for RFC 2231 encoded parameters, SF patch #549133
Specifically,

_formatparam(): Teach this about encoded `param' arguments, which are
a 3-tuple of items (charset, language, value).  language is ignored.

_unquotevalue(): Handle both 3-tuple RFC 2231 values and unencoded
values.

_get_params_preserve(): Decode the parameters before returning them.

get_params(), get_param(): Use _unquotevalue().

get_filename(), get_boundary(): Teach these about encoded (3-tuple)
parameters.
2002-06-29 05:56:15 +00:00
Barry Warsaw 8e69bdac33 __unicode__(): Patch # 541263 by Mikhail Zabaluev, implementation
modified by Barry.
2002-06-29 03:26:58 +00:00
Barry Warsaw ba2577b7f1 _max_append(): When adding the string `s' to its own line, it should
be lstrip'd so that old continuation whitespace is replaced by that
specified in Header's continuation_ws parameter.
2002-06-28 23:48:23 +00:00
Barry Warsaw 766125080f Teach this class about "highest-level syntactic breaks" but only for
headers with no charset or 'us-ascii' charsets.  Actually this is only
partially true: we know about semicolons (but not true parameters) and
we know about whitespace (but not technically folding whitespace).
Still it should be good enough for all practical purposes.

Other changes include:

__init__(): Add a continuation_ws argument, which defaults to a single
space.  Set this to change the whitespace used for continuation lines
when a header must be split.  Also, changed the way header line
lengths are calculated, so that they take into account continuation_ws
(when tabs-expanded) and any provided header_name parameter.  This
should do much better on returning split headers for which the first
and subsequent lines must fit into a specified width.

guess_maxlinelen(): Removed.  I don't think we need this method as
part of the public API.

encode_chunks() -> _encode_chunks(): I don't think we need this one as
part of the public API either.
2002-06-28 23:46:53 +00:00
Barry Warsaw 062749ac57 _split_header(): The code here was terminally broken because it didn't
know anything about RFC 2047 encoded headers.  Fortunately we have a
perfectly good header splitter in Header.encode().  So we just call
that to give us a properly formatted and split header.
Header.encode() didn't know about "highest-level syntactic breaks" but
that's been fixed now too.
2002-06-28 23:41:42 +00:00
Barry Warsaw 69e18af968 _parsebody(): Fix for the new message/rfc822 tree structure (the
parent is now a multipart with one element, the sub-message object).
2002-06-02 19:12:03 +00:00
Barry Warsaw d2b2e533c0 header_encode(), encode(): Use _floordiv() from the appropriate
compatibility module.
2002-06-02 19:08:31 +00:00
Barry Warsaw 21f77ac0bc Use absolute import paths for intrapackage imports. 2002-06-02 19:07:16 +00:00
Barry Warsaw 8ba76e8929 Use absolute import paths for intrapackage imports.
as_string(): Use Generator.flatten() for better performance.
2002-06-02 19:05:51 +00:00
Barry Warsaw 524af6f382 Use absolute import paths for intrapackage imports.
Use MIMENonMultipart as the base class so that you can't attach() to
these non-multipart message types.
2002-06-02 19:05:08 +00:00
Barry Warsaw 7dc865ad72 flatten(): Renamed from __call__() which is (silently) deprecated.
__call__() can be 2-3x slower than the equivalent normal method.

_handle_message(): The structure of message/rfc822 message has
changed.  Now parent's payload is a list of length 1, and the zeroth
element is the Message sub-object.  Adjust the printing of such
message trees to reflect this change.
2002-06-02 19:02:37 +00:00
Barry Warsaw ff49279f7c _intdiv2() -> _floordiv(), merge of uncommitted changes. 2002-06-02 18:59:06 +00:00
Neal Norwitz 1fab9ee085 Get email test to pass. Barry, hope this is what you had in mind 2002-06-02 16:38:14 +00:00
Barry Warsaw 9d5e4aa414 Bump to version 2.0.5, and also use absolute import paths. 2002-06-01 06:03:09 +00:00
Barry Warsaw 2f514a806d These two classes provide bases for more specific content type
subclasses.

MIMENonMultipart: Base class for non-multipart/* content type subclass
specializations, e.g. image/gif.  This class overrides attach() which
raises an exception, since it makes no sense to attach a subpart to
e.g. an image/gif message.

MIMEMultipart: Base class for multipart/* content type subclass
specializations, e.g. multipart/mixed.  Does little more than provide
a useful constructor.
2002-06-01 05:59:12 +00:00
Barry Warsaw 1c30aa2292 The _compat modules now export _floordiv() instead of _intdiv2() for
better code reuse.

_split() Use _floordiv().
2002-06-01 05:49:17 +00:00
Barry Warsaw c5d1c045ab Slightly better docstring 2002-06-01 05:45:37 +00:00
Barry Warsaw bb98c8cff0 _is_unicode(): Use UnicodeType instead of the unicode builtin for
Python 2.1 compatibility.
2002-06-01 03:56:07 +00:00
Guido van Rossum ca948b40b4 Use floor division where appropriate. 2002-05-29 20:38:21 +00:00
Guido van Rossum 1a7ac359a0 Importing Charset should not fail when Unicode is disabled. (XXX
Using Unicode-aware methods may still die with a NameError on unicode.
Maybe there's a more elegant solution but I doubt anybody cares.)
2002-05-28 18:49:03 +00:00