Commit Graph

16 Commits

Author SHA1 Message Date
Barry Warsaw 67f8f2fe2a append(): Fixing the test for convertability after consultation with
Ben.  If s is a byte string, make sure it can be converted to unicode
with the input codec, and from unicode with the output codec, or raise
a UnicodeError exception early.  Skip this test (and the unicode->byte
string conversion) when the charset is our faux 8bit raw charset.
2002-10-14 16:52:41 +00:00
Barry Warsaw 5e3bcff651 __init__(): Fix an invariant, that the charset item in a chunk tuple
must be a Charset instance, not a string.  The bug here was that
self._charset wasn't being converted to a Charset instance so later
.append() calls which used the default charset would break.

_split(): If the charset of the chunk is '8bit', return the chunk
unchanged.  We can't safely split it, so this is the avenue of least
harm.
2002-10-14 15:13:17 +00:00
Barry Warsaw 0c358258c9 _encode_chunks(), encode(): Don't modify self._chunks. As Ben says:
Also, it fixes a really egregious error in Header.encode() (really
    in Header._encode_chunks()) that could cause a header to grow and
    grow each time encode() was called if output_codec was different
    from input_codec.

Also, fix a typo.
2002-10-13 04:06:28 +00:00
Barry Warsaw 48330687f3 Docstring consistency with the updated .tex files. 2002-09-30 23:07:35 +00:00
Barry Warsaw 174aa49a88 With help from Martin v. Loewis, clarification is added for the
semantics of header chunks using byte and Unicode strings.
Specifically,

append(): When the given string is a byte string, charset (whether
specified explicitly in the argument list or implicitly via the
constructor default) is the encoding of the byte string, and a
UnicodeError will be raised if the string cannot be decoded with that
charset.  If s is a Unicode string, then charset is a hint specifying
the character set of the characters in the string.  In this case, when
producing an RFC 2822 compliant header using RFC 2047 rules, the
Unicode string will be encoded using the following charsets in order:
us-ascii, the charset hint, utf-8.

__init__(): Use the global USASCII Charset instance when the charset
argument is None.  Also, clarification in the docstring.

Also, use True/False where appropriate.
2002-09-30 15:51:31 +00:00
Barry Warsaw 45d9bde6c1 _ascii_split(): Don't lstrip continuation lines. Closes SF bug #601392. 2002-09-10 15:57:29 +00:00
Barry Warsaw 92825a9a52 append(): Bite the bullet and let charset be the string name of a
character set, which we'll convert to a Charset instance.  Sigh.
2002-07-23 06:08:10 +00:00
Barry Warsaw 15d3739446 make_header(): Watch out for charset is None, which decode_header()
will return as the charset if implicit us-ascii is used.
2002-07-23 04:29:54 +00:00
Barry Warsaw 8da39aa56a make_header(): New function to take the output of decode_header() and
create a Header instance.  Closes feature request #539481.

Header.__init__(): Allow the initial string to be omitted.

__eq__(), __ne__(): Support rich comparisons for equality of Header
instances withy Header instances or strings.

Also, update a bunch of docstrings.
2002-07-09 16:33:47 +00:00
Barry Warsaw 6ee7156996 append(): Clarify the expected type of charset. 2002-07-03 05:04:04 +00:00
Barry Warsaw 8e69bdac33 __unicode__(): Patch # 541263 by Mikhail Zabaluev, implementation
modified by Barry.
2002-06-29 03:26:58 +00:00
Barry Warsaw 766125080f Teach this class about "highest-level syntactic breaks" but only for
headers with no charset or 'us-ascii' charsets.  Actually this is only
partially true: we know about semicolons (but not true parameters) and
we know about whitespace (but not technically folding whitespace).
Still it should be good enough for all practical purposes.

Other changes include:

__init__(): Add a continuation_ws argument, which defaults to a single
space.  Set this to change the whitespace used for continuation lines
when a header must be split.  Also, changed the way header line
lengths are calculated, so that they take into account continuation_ws
(when tabs-expanded) and any provided header_name parameter.  This
should do much better on returning split headers for which the first
and subsequent lines must fit into a specified width.

guess_maxlinelen(): Removed.  I don't think we need this method as
part of the public API.

encode_chunks() -> _encode_chunks(): I don't think we need this one as
part of the public API either.
2002-06-28 23:46:53 +00:00
Barry Warsaw 1c30aa2292 The _compat modules now export _floordiv() instead of _intdiv2() for
better code reuse.

_split() Use _floordiv().
2002-06-01 05:49:17 +00:00
Tim Peters 8ac1495a6a Whitespace normalization. 2002-05-23 15:15:30 +00:00
Barry Warsaw 812031b955 Fixed a bug in the splitting of lines, and improved the splitting for
single byte character sets.  Also fixed a semantic problem with the
constructor's default arguments.  Specifically,

__init__(): Change the maxlinelen argument default to None instead of
MAXLINELEN.  The semantics should have been (and now are) that if
maxlinelen is given it is always honored.  If it isn't given, but
header_name is given, then the maximum line length is calculated.  If
neither are given then the default 76 characters is used.

_split(): If the character set is a single byte character set then we
can split the line at the maxlinelen because we know that encoding the
header won't increase its length.  If the charset isn't a single byte
charset then we use the quicker divide-and-conquer line splitting
algorithm as before.
2002-05-19 23:47:53 +00:00
Barry Warsaw 409a4c08b5 Sync'ing with standalone email package 2.0.1. This adds support for
non-us-ascii character sets in headers and bodies.  Some API changes
(with DeprecationWarnings for the old APIs).  Better RFC-compliant
implementations of base64 and quoted-printable.

Updated test cases.  Documentation updates to follow (after I finish
writing them ;).
2002-04-10 21:01:31 +00:00