356 lines
14 KiB
ReStructuredText
356 lines
14 KiB
ReStructuredText
|
|
:mod:`rfc822` --- Parse RFC 2822 mail headers
|
|
=============================================
|
|
|
|
.. module:: rfc822
|
|
:synopsis: Parse 2822 style mail messages.
|
|
:deprecated:
|
|
|
|
|
|
.. deprecated:: 2.3
|
|
The :mod:`email` package should be used in preference to the :mod:`rfc822`
|
|
module. This module is present only to maintain backward compatibility, and
|
|
has been removed in 3.0.
|
|
|
|
This module defines a class, :class:`Message`, which represents an "email
|
|
message" as defined by the Internet standard :rfc:`2822`. [#]_ Such messages
|
|
consist of a collection of message headers, and a message body. This module
|
|
also defines a helper class :class:`AddressList` for parsing :rfc:`2822`
|
|
addresses. Please refer to the RFC for information on the specific syntax of
|
|
:rfc:`2822` messages.
|
|
|
|
.. index:: module: mailbox
|
|
|
|
The :mod:`mailbox` module provides classes to read mailboxes produced by
|
|
various end-user mail programs.
|
|
|
|
|
|
.. class:: Message(file[, seekable])
|
|
|
|
A :class:`Message` instance is instantiated with an input object as parameter.
|
|
Message relies only on the input object having a :meth:`readline` method; in
|
|
particular, ordinary file objects qualify. Instantiation reads headers from the
|
|
input object up to a delimiter line (normally a blank line) and stores them in
|
|
the instance. The message body, following the headers, is not consumed.
|
|
|
|
This class can work with any input object that supports a :meth:`readline`
|
|
method. If the input object has seek and tell capability, the
|
|
:meth:`rewindbody` method will work; also, illegal lines will be pushed back
|
|
onto the input stream. If the input object lacks seek but has an :meth:`unread`
|
|
method that can push back a line of input, :class:`Message` will use that to
|
|
push back illegal lines. Thus this class can be used to parse messages coming
|
|
from a buffered stream.
|
|
|
|
The optional *seekable* argument is provided as a workaround for certain stdio
|
|
libraries in which :cfunc:`tell` discards buffered data before discovering that
|
|
the :cfunc:`lseek` system call doesn't work. For maximum portability, you
|
|
should set the seekable argument to zero to prevent that initial :meth:`tell`
|
|
when passing in an unseekable object such as a file object created from a socket
|
|
object.
|
|
|
|
Input lines as read from the file may either be terminated by CR-LF or by a
|
|
single linefeed; a terminating CR-LF is replaced by a single linefeed before the
|
|
line is stored.
|
|
|
|
All header matching is done independent of upper or lower case; e.g.
|
|
``m['From']``, ``m['from']`` and ``m['FROM']`` all yield the same result.
|
|
|
|
|
|
.. class:: AddressList(field)
|
|
|
|
You may instantiate the :class:`AddressList` helper class using a single string
|
|
parameter, a comma-separated list of :rfc:`2822` addresses to be parsed. (The
|
|
parameter ``None`` yields an empty list.)
|
|
|
|
|
|
.. function:: quote(str)
|
|
|
|
Return a new string with backslashes in *str* replaced by two backslashes and
|
|
double quotes replaced by backslash-double quote.
|
|
|
|
|
|
.. function:: unquote(str)
|
|
|
|
Return a new string which is an *unquoted* version of *str*. If *str* ends and
|
|
begins with double quotes, they are stripped off. Likewise if *str* ends and
|
|
begins with angle brackets, they are stripped off.
|
|
|
|
|
|
.. function:: parseaddr(address)
|
|
|
|
Parse *address*, which should be the value of some address-containing field such
|
|
as :mailheader:`To` or :mailheader:`Cc`, into its constituent "realname" and
|
|
"email address" parts. Returns a tuple of that information, unless the parse
|
|
fails, in which case a 2-tuple ``(None, None)`` is returned.
|
|
|
|
|
|
.. function:: dump_address_pair(pair)
|
|
|
|
The inverse of :meth:`parseaddr`, this takes a 2-tuple of the form ``(realname,
|
|
email_address)`` and returns the string value suitable for a :mailheader:`To` or
|
|
:mailheader:`Cc` header. If the first element of *pair* is false, then the
|
|
second element is returned unmodified.
|
|
|
|
|
|
.. function:: parsedate(date)
|
|
|
|
Attempts to parse a date according to the rules in :rfc:`2822`. however, some
|
|
mailers don't follow that format as specified, so :func:`parsedate` tries to
|
|
guess correctly in such cases. *date* is a string containing an :rfc:`2822`
|
|
date, such as ``'Mon, 20 Nov 1995 19:12:08 -0500'``. If it succeeds in parsing
|
|
the date, :func:`parsedate` returns a 9-tuple that can be passed directly to
|
|
:func:`time.mktime`; otherwise ``None`` will be returned. Note that indexes 6,
|
|
7, and 8 of the result tuple are not usable.
|
|
|
|
|
|
.. function:: parsedate_tz(date)
|
|
|
|
Performs the same function as :func:`parsedate`, but returns either ``None`` or
|
|
a 10-tuple; the first 9 elements make up a tuple that can be passed directly to
|
|
:func:`time.mktime`, and the tenth is the offset of the date's timezone from UTC
|
|
(which is the official term for Greenwich Mean Time). (Note that the sign of
|
|
the timezone offset is the opposite of the sign of the ``time.timezone``
|
|
variable for the same timezone; the latter variable follows the POSIX standard
|
|
while this module follows :rfc:`2822`.) If the input string has no timezone,
|
|
the last element of the tuple returned is ``None``. Note that indexes 6, 7, and
|
|
8 of the result tuple are not usable.
|
|
|
|
|
|
.. function:: mktime_tz(tuple)
|
|
|
|
Turn a 10-tuple as returned by :func:`parsedate_tz` into a UTC timestamp. If
|
|
the timezone item in the tuple is ``None``, assume local time. Minor
|
|
deficiency: this first interprets the first 8 elements as a local time and then
|
|
compensates for the timezone difference; this may yield a slight error around
|
|
daylight savings time switch dates. Not enough to worry about for common use.
|
|
|
|
|
|
.. seealso::
|
|
|
|
Module :mod:`email`
|
|
Comprehensive email handling package; supersedes the :mod:`rfc822` module.
|
|
|
|
Module :mod:`mailbox`
|
|
Classes to read various mailbox formats produced by end-user mail programs.
|
|
|
|
Module :mod:`mimetools`
|
|
Subclass of :class:`rfc822.Message` that handles MIME encoded messages.
|
|
|
|
|
|
.. _message-objects:
|
|
|
|
Message Objects
|
|
---------------
|
|
|
|
A :class:`Message` instance has the following methods:
|
|
|
|
|
|
.. method:: Message.rewindbody()
|
|
|
|
Seek to the start of the message body. This only works if the file object is
|
|
seekable.
|
|
|
|
|
|
.. method:: Message.isheader(line)
|
|
|
|
Returns a line's canonicalized fieldname (the dictionary key that will be used
|
|
to index it) if the line is a legal :rfc:`2822` header; otherwise returns
|
|
``None`` (implying that parsing should stop here and the line be pushed back on
|
|
the input stream). It is sometimes useful to override this method in a
|
|
subclass.
|
|
|
|
|
|
.. method:: Message.islast(line)
|
|
|
|
Return true if the given line is a delimiter on which Message should stop. The
|
|
delimiter line is consumed, and the file object's read location positioned
|
|
immediately after it. By default this method just checks that the line is
|
|
blank, but you can override it in a subclass.
|
|
|
|
|
|
.. method:: Message.iscomment(line)
|
|
|
|
Return ``True`` if the given line should be ignored entirely, just skipped. By
|
|
default this is a stub that always returns ``False``, but you can override it in
|
|
a subclass.
|
|
|
|
|
|
.. method:: Message.getallmatchingheaders(name)
|
|
|
|
Return a list of lines consisting of all headers matching *name*, if any. Each
|
|
physical line, whether it is a continuation line or not, is a separate list
|
|
item. Return the empty list if no header matches *name*.
|
|
|
|
|
|
.. method:: Message.getfirstmatchingheader(name)
|
|
|
|
Return a list of lines comprising the first header matching *name*, and its
|
|
continuation line(s), if any. Return ``None`` if there is no header matching
|
|
*name*.
|
|
|
|
|
|
.. method:: Message.getrawheader(name)
|
|
|
|
Return a single string consisting of the text after the colon in the first
|
|
header matching *name*. This includes leading whitespace, the trailing
|
|
linefeed, and internal linefeeds and whitespace if there any continuation
|
|
line(s) were present. Return ``None`` if there is no header matching *name*.
|
|
|
|
|
|
.. method:: Message.getheader(name[, default])
|
|
|
|
Return a single string consisting of the last header matching *name*,
|
|
but strip leading and trailing whitespace.
|
|
Internal whitespace is not stripped. The optional *default* argument can be
|
|
used to specify a different default to be returned when there is no header
|
|
matching *name*; it defaults to ``None``.
|
|
This is the preferred way to get parsed headers.
|
|
|
|
|
|
.. method:: Message.get(name[, default])
|
|
|
|
An alias for :meth:`getheader`, to make the interface more compatible with
|
|
regular dictionaries.
|
|
|
|
|
|
.. method:: Message.getaddr(name)
|
|
|
|
Return a pair ``(full name, email address)`` parsed from the string returned by
|
|
``getheader(name)``. If no header matching *name* exists, return ``(None,
|
|
None)``; otherwise both the full name and the address are (possibly empty)
|
|
strings.
|
|
|
|
Example: If *m*'s first :mailheader:`From` header contains the string
|
|
``'jack@cwi.nl (Jack Jansen)'``, then ``m.getaddr('From')`` will yield the pair
|
|
``('Jack Jansen', 'jack@cwi.nl')``. If the header contained ``'Jack Jansen
|
|
<jack@cwi.nl>'`` instead, it would yield the exact same result.
|
|
|
|
|
|
.. method:: Message.getaddrlist(name)
|
|
|
|
This is similar to ``getaddr(list)``, but parses a header containing a list of
|
|
email addresses (e.g. a :mailheader:`To` header) and returns a list of ``(full
|
|
name, email address)`` pairs (even if there was only one address in the header).
|
|
If there is no header matching *name*, return an empty list.
|
|
|
|
If multiple headers exist that match the named header (e.g. if there are several
|
|
:mailheader:`Cc` headers), all are parsed for addresses. Any continuation lines
|
|
the named headers contain are also parsed.
|
|
|
|
|
|
.. method:: Message.getdate(name)
|
|
|
|
Retrieve a header using :meth:`getheader` and parse it into a 9-tuple compatible
|
|
with :func:`time.mktime`; note that fields 6, 7, and 8 are not usable. If
|
|
there is no header matching *name*, or it is unparsable, return ``None``.
|
|
|
|
Date parsing appears to be a black art, and not all mailers adhere to the
|
|
standard. While it has been tested and found correct on a large collection of
|
|
email from many sources, it is still possible that this function may
|
|
occasionally yield an incorrect result.
|
|
|
|
|
|
.. method:: Message.getdate_tz(name)
|
|
|
|
Retrieve a header using :meth:`getheader` and parse it into a 10-tuple; the
|
|
first 9 elements will make a tuple compatible with :func:`time.mktime`, and the
|
|
10th is a number giving the offset of the date's timezone from UTC. Note that
|
|
fields 6, 7, and 8 are not usable. Similarly to :meth:`getdate`, if there is
|
|
no header matching *name*, or it is unparsable, return ``None``.
|
|
|
|
:class:`Message` instances also support a limited mapping interface. In
|
|
particular: ``m[name]`` is like ``m.getheader(name)`` but raises :exc:`KeyError`
|
|
if there is no matching header; and ``len(m)``, ``m.get(name[, default])``,
|
|
``name in m``, ``m.keys()``, ``m.values()`` ``m.items()``, and
|
|
``m.setdefault(name[, default])`` act as expected, with the one difference
|
|
that :meth:`setdefault` uses an empty string as the default value.
|
|
:class:`Message` instances also support the mapping writable interface ``m[name]
|
|
= value`` and ``del m[name]``. :class:`Message` objects do not support the
|
|
:meth:`clear`, :meth:`copy`, :meth:`popitem`, or :meth:`update` methods of the
|
|
mapping interface. (Support for :meth:`get` and :meth:`setdefault` was only
|
|
added in Python 2.2.)
|
|
|
|
Finally, :class:`Message` instances have some public instance variables:
|
|
|
|
|
|
.. attribute:: Message.headers
|
|
|
|
A list containing the entire set of header lines, in the order in which they
|
|
were read (except that setitem calls may disturb this order). Each line contains
|
|
a trailing newline. The blank line terminating the headers is not contained in
|
|
the list.
|
|
|
|
|
|
.. attribute:: Message.fp
|
|
|
|
The file or file-like object passed at instantiation time. This can be used to
|
|
read the message content.
|
|
|
|
|
|
.. attribute:: Message.unixfrom
|
|
|
|
The Unix ``From`` line, if the message had one, or an empty string. This is
|
|
needed to regenerate the message in some contexts, such as an ``mbox``\ -style
|
|
mailbox file.
|
|
|
|
|
|
.. _addresslist-objects:
|
|
|
|
AddressList Objects
|
|
-------------------
|
|
|
|
An :class:`AddressList` instance has the following methods:
|
|
|
|
|
|
.. method:: AddressList.__len__()
|
|
|
|
Return the number of addresses in the address list.
|
|
|
|
|
|
.. method:: AddressList.__str__()
|
|
|
|
Return a canonicalized string representation of the address list. Addresses are
|
|
rendered in "name" <host@domain> form, comma-separated.
|
|
|
|
|
|
.. method:: AddressList.__add__(alist)
|
|
|
|
Return a new :class:`AddressList` instance that contains all addresses in both
|
|
:class:`AddressList` operands, with duplicates removed (set union).
|
|
|
|
|
|
.. method:: AddressList.__iadd__(alist)
|
|
|
|
In-place version of :meth:`__add__`; turns this :class:`AddressList` instance
|
|
into the union of itself and the right-hand instance, *alist*.
|
|
|
|
|
|
.. method:: AddressList.__sub__(alist)
|
|
|
|
Return a new :class:`AddressList` instance that contains every address in the
|
|
left-hand :class:`AddressList` operand that is not present in the right-hand
|
|
address operand (set difference).
|
|
|
|
|
|
.. method:: AddressList.__isub__(alist)
|
|
|
|
In-place version of :meth:`__sub__`, removing addresses in this list which are
|
|
also in *alist*.
|
|
|
|
Finally, :class:`AddressList` instances have one public instance variable:
|
|
|
|
|
|
.. attribute:: AddressList.addresslist
|
|
|
|
A list of tuple string pairs, one per address. In each member, the first is the
|
|
canonicalized name part, the second is the actual route-address (``'@'``\
|
|
-separated username-host.domain pair).
|
|
|
|
.. rubric:: Footnotes
|
|
|
|
.. [#] This module originally conformed to :rfc:`822`, hence the name. Since then,
|
|
:rfc:`2822` has been released as an update to :rfc:`822`. This module should be
|
|
considered :rfc:`2822`\ -conformant, especially in cases where the syntax or
|
|
semantics have changed since :rfc:`822`.
|
|
|