cpython/Doc/lib/librfc822.tex

330 lines
14 KiB
TeX
Raw Normal View History

\section{\module{rfc822} ---
Parse RFC 2822 mail headers}
1995-02-27 13:53:25 -04:00
\declaremodule{standard}{rfc822}
\modulesynopsis{Parse \rfc{2822} style mail messages.}
This module defines a class, \class{Message}, which represents an
``email message'' as defined by the Internet standard
\rfc{2822}.\footnote{This module originally conformed to \rfc{822},
hence the name. Since then, \rfc{2822} has been released as an
update to \rfc{822}. This module should be considered
\rfc{2822}-conformant, especially in cases where the
syntax or semantics have changed since \rfc{822}.} Such messages
consist of a collection of message headers, and a message body. This
module also defines a helper class
\class{AddressList} for parsing \rfc{2822} addresses. Please refer to
the RFC for information on the specific syntax of \rfc{2822} messages.
1995-02-27 13:53:25 -04:00
The \refmodule{mailbox}\refstmodindex{mailbox} module provides classes
to read mailboxes produced by various end-user mail programs.
\begin{classdesc}{Message}{file\optional{, seekable}}
A \class{Message} instance is instantiated with an input object as
parameter. Message relies only on the input object having a
1998-08-10 14:46:22 -03:00
\method{readline()} method; in particular, ordinary file objects
qualify. Instantiation reads headers from the input object up to a
delimiter line (normally a blank line) and stores them in the
instance. The message body, following the headers, is not consumed.
1998-08-10 14:46:22 -03:00
This class can work with any input object that supports a
\method{readline()} method. If the input object has seek and tell
capability, the \method{rewindbody()} method will work; also, illegal
lines will be pushed back onto the input stream. If the input object
lacks seek but has an \method{unread()} method that can push back a
line of input, \class{Message} will use that to push back illegal
lines. Thus this class can be used to parse messages coming from a
buffered stream.
The optional \var{seekable} argument is provided as a workaround for
certain stdio libraries in which \cfunction{tell()} discards buffered
data before discovering that the \cfunction{lseek()} system call
doesn't work. For maximum portability, you should set the seekable
argument to zero to prevent that initial \method{tell()} when passing
in an unseekable object such as a a file object created from a socket
object.
1995-02-27 13:53:25 -04:00
Input lines as read from the file may either be terminated by CR-LF or
by a single linefeed; a terminating CR-LF is replaced by a single
linefeed before the line is stored.
All header matching is done independent of upper or lower case;
1998-08-10 14:46:22 -03:00
e.g.\ \code{\var{m}['From']}, \code{\var{m}['from']} and
\code{\var{m}['FROM']} all yield the same result.
\end{classdesc}
1995-02-27 13:53:25 -04:00
\begin{classdesc}{AddressList}{field}
You may instantiate the \class{AddressList} helper class using a single
string parameter, a comma-separated list of \rfc{2822} addresses to be
1998-08-10 14:46:22 -03:00
parsed. (The parameter \code{None} yields an empty list.)
\end{classdesc}
\begin{funcdesc}{quote}{str}
Return a new string with backslashes in \var{str} replaced by two
backslashes and double quotes replaced by backslash-double quote.
\end{funcdesc}
\begin{funcdesc}{unquote}{str}
Return a new string which is an \emph{unquoted} version of \var{str}.
If \var{str} ends and begins with double quotes, they are stripped
off. Likewise if \var{str} ends and begins with angle brackets, they
are stripped off.
\end{funcdesc}
\begin{funcdesc}{parseaddr}{address}
Parse \var{address}, which should be the value of some
address-containing field such as \mailheader{To} or \mailheader{Cc},
into its constituent ``realname'' and ``email address'' parts.
Returns a tuple of that information, unless the parse fails, in which
case a 2-tuple \code{(None, None)} is returned.
\end{funcdesc}
\begin{funcdesc}{dump_address_pair}{pair}
The inverse of \method{parseaddr()}, this takes a 2-tuple of the form
\code{(\var{realname}, \var{email_address})} and returns the string
value suitable for a \mailheader{To} or \mailheader{Cc} header. If
the first element of \var{pair} is false, then the second element is
returned unmodified.
\end{funcdesc}
\begin{funcdesc}{parsedate}{date}
Attempts to parse a date according to the rules in \rfc{2822}.
however, some mailers don't follow that format as specified, so
\function{parsedate()} tries to guess correctly in such cases.
\var{date} is a string containing an \rfc{2822} date, such as
\code{'Mon, 20 Nov 1995 19:12:08 -0500'}. If it succeeds in parsing
the date, \function{parsedate()} returns a 9-tuple that can be passed
directly to \function{time.mktime()}; otherwise \code{None} will be
returned. Note that fields 6, 7, and 8 of the result tuple are not
usable.
\end{funcdesc}
\begin{funcdesc}{parsedate_tz}{date}
Performs the same function as \function{parsedate()}, but returns
either \code{None} or a 10-tuple; the first 9 elements make up a tuple
that can be passed directly to \function{time.mktime()}, and the tenth
is the offset of the date's timezone from UTC (which is the official
term for Greenwich Mean Time). (Note that the sign of the timezone
offset is the opposite of the sign of the \code{time.timezone}
variable for the same timezone; the latter variable follows the
\POSIX{} standard while this module follows \rfc{2822}.) If the input
string has no timezone, the last element of the tuple returned is
\code{None}. Note that fields 6, 7, and 8 of the result tuple are not
usable.
\end{funcdesc}
\begin{funcdesc}{mktime_tz}{tuple}
Turn a 10-tuple as returned by \function{parsedate_tz()} into a UTC
timestamp. If the timezone item in the tuple is \code{None}, assume
local time. Minor deficiency: this first interprets the first 8
elements as a local time and then compensates for the timezone
difference; this may yield a slight error around daylight savings time
switch dates. Not enough to worry about for common use.
\end{funcdesc}
\begin{seealso}
\seemodule{mailbox}{Classes to read various mailbox formats produced
by end-user mail programs.}
\seemodule{mimetools}{Subclass of rfc.Message that handles MIME encoded
messages.}
\end{seealso}
\subsection{Message Objects \label{message-objects}}
A \class{Message} instance has the following methods:
1995-02-27 13:53:25 -04:00
\begin{methoddesc}{rewindbody}{}
1995-02-27 13:53:25 -04:00
Seek to the start of the message body. This only works if the file
object is seekable.
\end{methoddesc}
1995-02-27 13:53:25 -04:00
\begin{methoddesc}{isheader}{line}
Returns a line's canonicalized fieldname (the dictionary key that will
be used to index it) if the line is a legal \rfc{2822} header; otherwise
returns \code{None} (implying that parsing should stop here and the
line be pushed back on the input stream). It is sometimes useful to
override this method in a subclass.
\end{methoddesc}
\begin{methoddesc}{islast}{line}
Return true if the given line is a delimiter on which Message should
stop. The delimiter line is consumed, and the file object's read
location positioned immediately after it. By default this method just
checks that the line is blank, but you can override it in a subclass.
\end{methoddesc}
\begin{methoddesc}{iscomment}{line}
Return true if the given line should be ignored entirely, just skipped.
By default this is a stub that always returns false, but you can
override it in a subclass.
\end{methoddesc}
\begin{methoddesc}{getallmatchingheaders}{name}
1995-03-07 06:14:09 -04:00
Return a list of lines consisting of all headers matching
1995-02-27 13:53:25 -04:00
\var{name}, if any. Each physical line, whether it is a continuation
line or not, is a separate list item. Return the empty list if no
header matches \var{name}.
\end{methoddesc}
1995-02-27 13:53:25 -04:00
\begin{methoddesc}{getfirstmatchingheader}{name}
1995-02-27 13:53:25 -04:00
Return a list of lines comprising the first header matching
\var{name}, and its continuation line(s), if any. Return
\code{None} if there is no header matching \var{name}.
\end{methoddesc}
1995-02-27 13:53:25 -04:00
\begin{methoddesc}{getrawheader}{name}
1995-02-27 13:53:25 -04:00
Return a single string consisting of the text after the colon in the
first header matching \var{name}. This includes leading whitespace,
the trailing linefeed, and internal linefeeds and whitespace if there
any continuation line(s) were present. Return \code{None} if there is
no header matching \var{name}.
\end{methoddesc}
1995-02-27 13:53:25 -04:00
\begin{methoddesc}{getheader}{name\optional{, default}}
1995-02-27 13:53:25 -04:00
Like \code{getrawheader(\var{name})}, but strip leading and trailing
whitespace. Internal whitespace is not stripped. The optional
\var{default} argument can be used to specify a different default to
be returned when there is no header matching \var{name}.
\end{methoddesc}
\begin{methoddesc}{get}{name\optional{, default}}
1998-08-10 14:46:22 -03:00
An alias for \method{getheader()}, to make the interface more compatible
with regular dictionaries.
\end{methoddesc}
1995-02-27 13:53:25 -04:00
\begin{methoddesc}{getaddr}{name}
Return a pair \code{(\var{full name}, \var{email address})} parsed
from the string returned by \code{getheader(\var{name})}. If no
header matching \var{name} exists, return \code{(None, None)};
otherwise both the full name and the address are (possibly empty)
strings.
1995-02-27 13:53:25 -04:00
Example: If \var{m}'s first \mailheader{From} header contains the
string \code{'jack@cwi.nl (Jack Jansen)'}, then
1995-02-27 13:53:25 -04:00
\code{m.getaddr('From')} will yield the pair
\code{('Jack Jansen', 'jack@cwi.nl')}.
1995-02-27 13:53:25 -04:00
If the header contained
\code{'Jack Jansen <jack@cwi.nl>'} instead, it would yield the
1995-02-27 13:53:25 -04:00
exact same result.
\end{methoddesc}
1995-02-27 13:53:25 -04:00
\begin{methoddesc}{getaddrlist}{name}
1995-02-27 13:53:25 -04:00
This is similar to \code{getaddr(\var{list})}, but parses a header
containing a list of email addresses (e.g.\ a \mailheader{To} header) and
returns a list of \code{(\var{full name}, \var{email address})} pairs
(even if there was only one address in the header). If there is no
header matching \var{name}, return an empty list.
1995-02-27 13:53:25 -04:00
If multiple headers exist that match the named header (e.g. if there
are several \mailheader{Cc} headers), all are parsed for addresses.
Any continuation lines the named headers contain are also parsed.
\end{methoddesc}
1995-02-27 13:53:25 -04:00
\begin{methoddesc}{getdate}{name}
Retrieve a header using \method{getheader()} and parse it into a 9-tuple
compatible with \function{time.mktime()}; note that fields 6, 7, and 8
are not usable. If there is no header matching
1995-02-27 13:53:25 -04:00
\var{name}, or it is unparsable, return \code{None}.
Date parsing appears to be a black art, and not all mailers adhere to
the standard. While it has been tested and found correct on a large
collection of email from many sources, it is still possible that this
function may occasionally yield an incorrect result.
\end{methoddesc}
1995-02-27 13:53:25 -04:00
\begin{methoddesc}{getdate_tz}{name}
Retrieve a header using \method{getheader()} and parse it into a
10-tuple; the first 9 elements will make a tuple compatible with
\function{time.mktime()}, and the 10th is a number giving the offset
of the date's timezone from UTC. Note that fields 6, 7, and 8
are not usable. Similarly to \method{getdate()}, if
there is no header matching \var{name}, or it is unparsable, return
\code{None}.
\end{methoddesc}
\class{Message} instances also support a limited mapping interface.
In particular: \code{\var{m}[name]} is like
\code{\var{m}.getheader(name)} but raises \exception{KeyError} if
there is no matching header; and \code{len(\var{m})},
\code{\var{m}.get(name\optional{, deafult})},
\code{\var{m}.has_key(name)}, \code{\var{m}.keys()},
\code{\var{m}.values()} \code{\var{m}.items()}, and
\code{\var{m}.setdefault(name\optional{, default})} act as expected,
with the one difference that \method{get()} and \method{setdefault()}
use an empty string as the default value. \class{Message} instances
also support the mapping writable interface \code{\var{m}[name] =
value} and \code{del \var{m}[name]}. \class{Message} objects do not
support the \method{clear()}, \method{copy()}, \method{popitem()}, or
\method{update()} methods of the mapping interface. (Support for
\method{get()} and \method{setdefault()} was only added in Python
2.2.)
1995-02-27 13:53:25 -04:00
Finally, \class{Message} instances have some public instance variables:
1995-02-27 13:53:25 -04:00
\begin{memberdesc}{headers}
1995-02-27 13:53:25 -04:00
A list containing the entire set of header lines, in the order in
which they were read (except that setitem calls may disturb this
order). Each line contains a trailing newline. The
1995-02-27 13:53:25 -04:00
blank line terminating the headers is not contained in the list.
\end{memberdesc}
1995-02-27 13:53:25 -04:00
\begin{memberdesc}{fp}
The file or file-like object passed at instantiation time. This can
be used to read the message content.
\end{memberdesc}
\begin{memberdesc}{unixfrom}
The \UNIX{} \samp{From~} line, if the message had one, or an empty
string. This is needed to regenerate the message in some contexts,
such as an \code{mbox}-style mailbox file.
\end{memberdesc}
\subsection{AddressList Objects \label{addresslist-objects}}
An \class{AddressList} instance has the following methods:
\begin{methoddesc}{__len__}{}
Return the number of addresses in the address list.
\end{methoddesc}
\begin{methoddesc}{__str__}{}
Return a canonicalized string representation of the address list.
Addresses are rendered in "name" <host@domain> form, comma-separated.
\end{methoddesc}
\begin{methoddesc}{__add__}{alist}
Return a new \class{AddressList} instance that contains all addresses
in both \class{AddressList} operands, with duplicates removed (set
union).
\end{methoddesc}
\begin{methoddesc}{__iadd__}{alist}
In-place version of \method{__add__()}; turns this \class{AddressList}
instance into the union of itself and the right-hand instance,
\var{alist}.
\end{methoddesc}
\begin{methoddesc}{__sub__}{alist}
Return a new \class{AddressList} instance that contains every address
in the left-hand \class{AddressList} operand that is not present in
the right-hand address operand (set difference).
\end{methoddesc}
\begin{methoddesc}{__isub__}{alist}
In-place version of \method{__sub__()}, removing addresses in this
list which are also in \var{alist}.
\end{methoddesc}
Finally, \class{AddressList} instances have one public instance variable:
\begin{memberdesc}{addresslist}
A list of tuple string pairs, one per address. In each member, the
first is the canonicalized name part, the second is the
actual route-address (\character{@}-separated username-host.domain
pair).
\end{memberdesc}