Commit Graph

166 Commits

Author SHA1 Message Date
Barry Warsaw da2525ed2a parse(), _parseheaders(), _parsebody(): A fix for SF bug #633527,
where in lax parsing, the first non-header line after a header block
(e.g. the first line not containing a colon, and not a continuation),
can be treated as the first body line, even without the RFC mandated
blank line separator.

rfc822 had this behavior, and I vaguely remember problems with this,
but can't remember details.  In any event, all the tests still pass,
so I guess we'll find out. ;/

This patch works by returning the non-header, non-continuation line
from _parseheader() and using that as the first header line prepended
to fp.read() if given.  It's usually None.

We use this approach instead of trying to seek/tell the file-like
object.
2002-11-05 21:44:06 +00:00
Barry Warsaw a0a00761a5 test_no_separating_blank_line(): A test for SF bug #633527, no
separating blank line between a header block and body text.

Tests both lax and strict parsing.
2002-11-05 21:36:17 +00:00
Barry Warsaw 847fdbbe71 A message with no separating blank line between the headers and the
body.  A test message for SF bug #633527.
2002-11-05 21:29:47 +00:00
Barry Warsaw 48b0a1c603 test_text_plain_in_a_multipart_digest(): A test of the fix for SF bug
#631350, where a subobject in a multipart/digest isn't a
message/rfc822.
2002-11-05 21:04:52 +00:00
Barry Warsaw 5c9130ec46 _parsebody(): A fix for SF bug #631350, where a subobject in a
multipart/digest isn't a message/rfc822.  This is legal, but counter
to recommended practice in RFC 2046, $5.1.5.

The fix is to look at the content type after setting the default
content type.  If the maintype is then message or multipart, attach
the parsed subobject, otherwise use set_payload() to set the data of
the other object.
2002-11-05 20:54:37 +00:00
Barry Warsaw 00e6a02ef8 Test case, distilled from SF bug #631350, where a subobject in a
multipart/digest isn't a message/rfc822.  This is legal, but counter
to recommended practice in RFC 2046, $5.1.5.
2002-11-05 20:53:18 +00:00
Barry Warsaw 8f4dcbd3f6 Bump __version__ (yes, to 2.5 "minus") 2002-11-05 19:56:47 +00:00
Barry Warsaw 030ddf794f Jason Mastaler's patch to break the dependence on rfc822.py for the
address parsing routines.  Closes SF patch #613434.
2002-11-05 19:54:52 +00:00
Barry Warsaw 4111804548 test_body_encoding(): a new test for Charset.body_encode(), especially
one that tests the obscure bug reported in SF # 625509.
2002-10-21 05:43:58 +00:00
Barry Warsaw 34aa44538d test_body_encoding(): a new test 2002-10-21 05:31:08 +00:00
Barry Warsaw 3d57589f0f body_encode(): Fixed typo reported by Chris Lawrence, closing SF bug
#625509.  This isn't a huge problem because at the moment there are no
built-in charsets for which header_encoding is QP but body_encoding is
not.
2002-10-21 05:29:53 +00:00
Barry Warsaw 67f8f2fe2a append(): Fixing the test for convertability after consultation with
Ben.  If s is a byte string, make sure it can be converted to unicode
with the input codec, and from unicode with the output codec, or raise
a UnicodeError exception early.  Skip this test (and the unicode->byte
string conversion) when the charset is our faux 8bit raw charset.
2002-10-14 16:52:41 +00:00
Barry Warsaw a74771c0b9 Two new tests for splitting (or not splitting) 8-bit header data. 2002-10-14 15:26:17 +00:00
Barry Warsaw 1a6ea3398e Bump the __version__ 2002-10-14 15:24:18 +00:00
Barry Warsaw 5e3bcff651 __init__(): Fix an invariant, that the charset item in a chunk tuple
must be a Charset instance, not a string.  The bug here was that
self._charset wasn't being converted to a Charset instance so later
.append() calls which used the default charset would break.

_split(): If the charset of the chunk is '8bit', return the chunk
unchanged.  We can't safely split it, so this is the avenue of least
harm.
2002-10-14 15:13:17 +00:00
Barry Warsaw 6c2bc46355 _split_header(): If we have a header which is a byte string containing
8-bit data, we cannot split it safely, so return the original string
unchanged.

_is8bitstring(): Helper function which returns True when we have a
byte string that contains non-ascii characters (i.e. mysterious 8-bit
data).
2002-10-14 15:09:30 +00:00
Barry Warsaw 7cd724049f CHARSETS: Add faux '8bit' encoding for representing raw 8-bit data for
which we know nothing else.
2002-10-14 15:06:55 +00:00
Barry Warsaw 0c358258c9 _encode_chunks(), encode(): Don't modify self._chunks. As Ben says:
Also, it fixes a really egregious error in Header.encode() (really
    in Header._encode_chunks()) that could cause a header to grow and
    grow each time encode() was called if output_codec was different
    from input_codec.

Also, fix a typo.
2002-10-13 04:06:28 +00:00
Barry Warsaw ab9439fdd4 Update the urls and other information about the add-on Japanese,
Korean, and Chinese codecs.
2002-10-13 04:00:45 +00:00
Barry Warsaw c986e54733 Bump version number to 2.4.2 to pick up the latest minor bug fixes. 2002-10-10 15:19:46 +00:00
Barry Warsaw dc8087b26e New tests to verify that charsets are case insensitive, and that by
default get_body_encoding() cannot be SHORTEST.
2002-10-10 15:14:22 +00:00
Barry Warsaw ee07cb1d70 get_content_charset(): RFC 2046 $4.1.2 says charsets are not case
sensitive.  Coerce the argument to lower case.
2002-10-10 15:13:26 +00:00
Barry Warsaw 14fc464ec9 __init__(): RFC 2046 $4.1.2 says charsets are not case sensitive.
Coerce the argument to lower case.  Also, since body encodings can't
be SHORTEST, default the CHARSETS failobj's second item to BASE64.
2002-10-10 15:11:20 +00:00
Barry Warsaw 08c82b8086 openfile(): Go back to opening the files in text mode. This undoes
the change in revision 1.11 (test_email.py) in response to SF bug
#609988.  We now think that was the wrong fix and that WinZip was the
real culprit there.
2002-10-07 17:27:55 +00:00
Barry Warsaw 487fe6ac39 _parsebody(): Use get_content_type() instead of the deprecated
get_type().  Also, one of the regular expressions is constant so might
as well make it a module global.  And, when splitting up digests,
handle lineseps that are longer than 1 character in length
(e.g. \r\n).
2002-10-07 17:27:35 +00:00
Barry Warsaw 1d475d3452 Bump the version to 2.4.1 (not 2.5 as previously mentioned) to sync it
with the standalone mimelib package.
2002-10-07 17:20:25 +00:00
Barry Warsaw 0ac885e821 test__all__(): Fix the import list. 2002-10-01 17:57:06 +00:00
Barry Warsaw 2d7fab1a45 Docstring consistency with the updated .tex files. 2002-10-01 00:52:27 +00:00
Barry Warsaw 1f84ff1d40 _structure(): Swap fp and level arguments. 2002-10-01 00:51:47 +00:00
Barry Warsaw 0ebc5c96c5 Docstring consistency with the updated .tex files. 2002-10-01 00:44:13 +00:00
Barry Warsaw 12272a2f22 Docstring consistency with the updated .tex files. 2002-10-01 00:05:24 +00:00
Barry Warsaw 48330687f3 Docstring consistency with the updated .tex files. 2002-09-30 23:07:35 +00:00
Barry Warsaw 0031982c21 Docstring consistency with the updated .tex files. 2002-09-30 22:15:00 +00:00
Barry Warsaw 03a7559654 Docstring consistency with the updated .tex files. 2002-09-30 21:29:10 +00:00
Barry Warsaw fd2e8f7ea6 Docstring consistency with the updated .tex files. 2002-09-30 21:24:00 +00:00
Barry Warsaw 419b284b7c __all__: Updated 2002-09-30 20:41:33 +00:00
Barry Warsaw 057b8428d0 Docstring consistency with the updated .tex files. 2002-09-30 20:07:22 +00:00
Barry Warsaw 42d1d3edc0 __contains__(): Change the second argument to `name' for consistency.
I seriously doubt this will break any deployed code.

Docstring consistency with the updated .tex files.
2002-09-30 18:17:35 +00:00
Barry Warsaw 174aa49a88 With help from Martin v. Loewis, clarification is added for the
semantics of header chunks using byte and Unicode strings.
Specifically,

append(): When the given string is a byte string, charset (whether
specified explicitly in the argument list or implicitly via the
constructor default) is the encoding of the byte string, and a
UnicodeError will be raised if the string cannot be decoded with that
charset.  If s is a Unicode string, then charset is a hint specifying
the character set of the characters in the string.  In this case, when
producing an RFC 2822 compliant header using RFC 2047 rules, the
Unicode string will be encoded using the following charsets in order:
us-ascii, the charset hint, utf-8.

__init__(): Use the global USASCII Charset instance when the charset
argument is None.  Also, clarification in the docstring.

Also, use True/False where appropriate.
2002-09-30 15:51:31 +00:00
Barry Warsaw d20b66537c The ansi_x3.4_1968 encoding is an alias for ascii, but isn't known in
Python 2.1.3.  However it's required by the email tests suite, so poke
it into the encodings aliases if it's missing.  The is apparently the
approved API for doing so.

Now we can remove the hexversion shortcircuits in the test suite.
2002-09-30 15:23:17 +00:00
Barry Warsaw d63071b05f Make the tests pass under Python 2.1 but only by cheating. Python 2.1
doesn't know about the ansi-x3.4-1968 charset so skip two tests that
rely on that (msg_32.txt and msg_33.txt).
2002-09-28 21:22:52 +00:00
Barry Warsaw eecdc742f5 Add a test for SHORTEST encoding of utf-8 headers, and also update
some of the test values which change because of this.
2002-09-28 21:04:19 +00:00
Barry Warsaw c202d93e0e Use True/False everywhere, and other code cleanups. 2002-09-28 21:02:51 +00:00
Barry Warsaw f776e6922c Code cleanup and add docstrings. 2002-09-28 20:52:26 +00:00
Barry Warsaw 5bdb2bee37 Use True/False everywhere, and other code cleanups. 2002-09-28 20:49:57 +00:00
Barry Warsaw e03e8f09eb Use True/False everywhere. 2002-09-28 20:44:58 +00:00
Barry Warsaw 4ece778bbc is_multipart(): Use isinstance() instead of type equality. 2002-09-28 20:41:39 +00:00
Barry Warsaw c494549566 Docstring and code cleanups, e.g. use True/False everywhere. 2002-09-28 20:40:25 +00:00
Barry Warsaw bba6b0243e __init__(): Minor code cleanup. 2002-09-28 20:27:28 +00:00
Barry Warsaw 5f253279d6 Add a pychecker suppression. 2002-09-28 20:25:15 +00:00