179 lines
7.4 KiB
TeX
179 lines
7.4 KiB
TeX
\declaremodule{standard}{email.header}
|
|
\modulesynopsis{Representing non-ASCII headers}
|
|
|
|
\rfc{2822} is the base standard that describes the format of email
|
|
messages. It derives from the older \rfc{822} standard which came
|
|
into widespread use at a time when most email was composed of \ASCII{}
|
|
characters only. \rfc{2822} is a specification written assuming email
|
|
contains only 7-bit \ASCII{} characters.
|
|
|
|
Of course, as email has been deployed worldwide, it has become
|
|
internationalized, such that language specific character sets can now
|
|
be used in email messages. The base standard still requires email
|
|
messages to be transferred using only 7-bit \ASCII{} characters, so a
|
|
slew of RFCs have been written describing how to encode email
|
|
containing non-\ASCII{} characters into \rfc{2822}-compliant format.
|
|
These RFCs include \rfc{2045}, \rfc{2046}, \rfc{2047}, and \rfc{2231}.
|
|
The \module{email} package supports these standards in its
|
|
\module{email.header} and \module{email.charset} modules.
|
|
|
|
If you want to include non-\ASCII{} characters in your email headers,
|
|
say in the \mailheader{Subject} or \mailheader{To} fields, you should
|
|
use the \class{Header} class and assign the field in the
|
|
\class{Message} object to an instance of \class{Header} instead of
|
|
using a string for the header value. Import the \class{Header} class from the
|
|
\module{email.header} module. For example:
|
|
|
|
\begin{verbatim}
|
|
>>> from email.message import Message
|
|
>>> from email.header import Header
|
|
>>> msg = Message()
|
|
>>> h = Header('p\xf6stal', 'iso-8859-1')
|
|
>>> msg['Subject'] = h
|
|
>>> print msg.as_string()
|
|
Subject: =?iso-8859-1?q?p=F6stal?=
|
|
|
|
|
|
\end{verbatim}
|
|
|
|
Notice here how we wanted the \mailheader{Subject} field to contain a
|
|
non-\ASCII{} character? We did this by creating a \class{Header}
|
|
instance and passing in the character set that the byte string was
|
|
encoded in. When the subsequent \class{Message} instance was
|
|
flattened, the \mailheader{Subject} field was properly \rfc{2047}
|
|
encoded. MIME-aware mail readers would show this header using the
|
|
embedded ISO-8859-1 character.
|
|
|
|
\versionadded{2.2.2}
|
|
|
|
Here is the \class{Header} class description:
|
|
|
|
\begin{classdesc}{Header}{\optional{s\optional{, charset\optional{,
|
|
maxlinelen\optional{, header_name\optional{, continuation_ws\optional{,
|
|
errors}}}}}}}
|
|
Create a MIME-compliant header that can contain strings in different
|
|
character sets.
|
|
|
|
Optional \var{s} is the initial header value. If \code{None} (the
|
|
default), the initial header value is not set. You can later append
|
|
to the header with \method{append()} method calls. \var{s} may be a
|
|
byte string or a Unicode string, but see the \method{append()}
|
|
documentation for semantics.
|
|
|
|
Optional \var{charset} serves two purposes: it has the same meaning as
|
|
the \var{charset} argument to the \method{append()} method. It also
|
|
sets the default character set for all subsequent \method{append()}
|
|
calls that omit the \var{charset} argument. If \var{charset} is not
|
|
provided in the constructor (the default), the \code{us-ascii}
|
|
character set is used both as \var{s}'s initial charset and as the
|
|
default for subsequent \method{append()} calls.
|
|
|
|
The maximum line length can be specified explicit via
|
|
\var{maxlinelen}. For splitting the first line to a shorter value (to
|
|
account for the field header which isn't included in \var{s},
|
|
e.g. \mailheader{Subject}) pass in the name of the field in
|
|
\var{header_name}. The default \var{maxlinelen} is 76, and the
|
|
default value for \var{header_name} is \code{None}, meaning it is not
|
|
taken into account for the first line of a long, split header.
|
|
|
|
Optional \var{continuation_ws} must be \rfc{2822}-compliant folding
|
|
whitespace, and is usually either a space or a hard tab character.
|
|
This character will be prepended to continuation lines.
|
|
\end{classdesc}
|
|
|
|
Optional \var{errors} is passed straight through to the
|
|
\method{append()} method.
|
|
|
|
\begin{methoddesc}[Header]{append}{s\optional{, charset\optional{, errors}}}
|
|
Append the string \var{s} to the MIME header.
|
|
|
|
Optional \var{charset}, if given, should be a \class{Charset} instance
|
|
(see \refmodule{email.charset}) or the name of a character set, which
|
|
will be converted to a \class{Charset} instance. A value of
|
|
\code{None} (the default) means that the \var{charset} given in the
|
|
constructor is used.
|
|
|
|
\var{s} may be a byte string or a Unicode string. If it is a byte
|
|
string (i.e. \code{isinstance(s, str)} is true), then
|
|
\var{charset} is the encoding of that byte string, and a
|
|
\exception{UnicodeError} will be raised if the string cannot be
|
|
decoded with that character set.
|
|
|
|
If \var{s} is a Unicode string, then \var{charset} is a hint
|
|
specifying the character set of the characters in the string. In this
|
|
case, when producing an \rfc{2822}-compliant header using \rfc{2047}
|
|
rules, the Unicode string will be encoded using the following charsets
|
|
in order: \code{us-ascii}, the \var{charset} hint, \code{utf-8}. The
|
|
first character set to not provoke a \exception{UnicodeError} is used.
|
|
|
|
Optional \var{errors} is passed through to any \function{unicode()} or
|
|
\function{ustr.encode()} call, and defaults to ``strict''.
|
|
\end{methoddesc}
|
|
|
|
\begin{methoddesc}[Header]{encode}{\optional{splitchars}}
|
|
Encode a message header into an RFC-compliant format, possibly
|
|
wrapping long lines and encapsulating non-\ASCII{} parts in base64 or
|
|
quoted-printable encodings. Optional \var{splitchars} is a string
|
|
containing characters to split long ASCII lines on, in rough support
|
|
of \rfc{2822}'s \emph{highest level syntactic breaks}. This doesn't
|
|
affect \rfc{2047} encoded lines.
|
|
\end{methoddesc}
|
|
|
|
The \class{Header} class also provides a number of methods to support
|
|
standard operators and built-in functions.
|
|
|
|
\begin{methoddesc}[Header]{__str__}{}
|
|
A synonym for \method{Header.encode()}. Useful for
|
|
\code{str(aHeader)}.
|
|
\end{methoddesc}
|
|
|
|
\begin{methoddesc}[Header]{__unicode__}{}
|
|
A helper for the built-in \function{unicode()} function. Returns the
|
|
header as a Unicode string.
|
|
\end{methoddesc}
|
|
|
|
\begin{methoddesc}[Header]{__eq__}{other}
|
|
This method allows you to compare two \class{Header} instances for equality.
|
|
\end{methoddesc}
|
|
|
|
\begin{methoddesc}[Header]{__ne__}{other}
|
|
This method allows you to compare two \class{Header} instances for inequality.
|
|
\end{methoddesc}
|
|
|
|
The \module{email.header} module also provides the following
|
|
convenient functions.
|
|
|
|
\begin{funcdesc}{decode_header}{header}
|
|
Decode a message header value without converting the character set.
|
|
The header value is in \var{header}.
|
|
|
|
This function returns a list of \code{(decoded_string, charset)} pairs
|
|
containing each of the decoded parts of the header. \var{charset} is
|
|
\code{None} for non-encoded parts of the header, otherwise a lower
|
|
case string containing the name of the character set specified in the
|
|
encoded string.
|
|
|
|
Here's an example:
|
|
|
|
\begin{verbatim}
|
|
>>> from email.header import decode_header
|
|
>>> decode_header('=?iso-8859-1?q?p=F6stal?=')
|
|
[('p\xf6stal', 'iso-8859-1')]
|
|
\end{verbatim}
|
|
\end{funcdesc}
|
|
|
|
\begin{funcdesc}{make_header}{decoded_seq\optional{, maxlinelen\optional{,
|
|
header_name\optional{, continuation_ws}}}}
|
|
Create a \class{Header} instance from a sequence of pairs as returned
|
|
by \function{decode_header()}.
|
|
|
|
\function{decode_header()} takes a header value string and returns a
|
|
sequence of pairs of the format \code{(decoded_string, charset)} where
|
|
\var{charset} is the name of the character set.
|
|
|
|
This function takes one of those sequence of pairs and returns a
|
|
\class{Header} instance. Optional \var{maxlinelen},
|
|
\var{header_name}, and \var{continuation_ws} are as in the
|
|
\class{Header} constructor.
|
|
\end{funcdesc}
|