2012-05-27 18:13:54 -03:00
|
|
|
:mod:`email.header`: Internationalized headers
|
|
|
|
----------------------------------------------
|
2007-08-15 11:28:01 -03:00
|
|
|
|
|
|
|
.. module:: email.header
|
|
|
|
:synopsis: Representing non-ASCII headers
|
|
|
|
|
|
|
|
|
|
|
|
:rfc:`2822` is the base standard that describes the format of email messages.
|
|
|
|
It derives from the older :rfc:`822` standard which came into widespread use at
|
|
|
|
a time when most email was composed of ASCII characters only. :rfc:`2822` is a
|
|
|
|
specification written assuming email contains only 7-bit ASCII characters.
|
|
|
|
|
|
|
|
Of course, as email has been deployed worldwide, it has become
|
|
|
|
internationalized, such that language specific character sets can now be used in
|
|
|
|
email messages. The base standard still requires email messages to be
|
|
|
|
transferred using only 7-bit ASCII characters, so a slew of RFCs have been
|
|
|
|
written describing how to encode email containing non-ASCII characters into
|
|
|
|
:rfc:`2822`\ -compliant format. These RFCs include :rfc:`2045`, :rfc:`2046`,
|
|
|
|
:rfc:`2047`, and :rfc:`2231`. The :mod:`email` package supports these standards
|
|
|
|
in its :mod:`email.header` and :mod:`email.charset` modules.
|
|
|
|
|
|
|
|
If you want to include non-ASCII characters in your email headers, say in the
|
|
|
|
:mailheader:`Subject` or :mailheader:`To` fields, you should use the
|
2009-04-13 10:13:25 -03:00
|
|
|
:class:`Header` class and assign the field in the :class:`~email.message.Message`
|
|
|
|
object to an instance of :class:`Header` instead of using a string for the header
|
|
|
|
value. Import the :class:`Header` class from the :mod:`email.header` module.
|
|
|
|
For example::
|
2007-08-15 11:28:01 -03:00
|
|
|
|
|
|
|
>>> from email.message import Message
|
|
|
|
>>> from email.header import Header
|
|
|
|
>>> msg = Message()
|
|
|
|
>>> h = Header('p\xf6stal', 'iso-8859-1')
|
|
|
|
>>> msg['Subject'] = h
|
|
|
|
>>> print msg.as_string()
|
|
|
|
Subject: =?iso-8859-1?q?p=F6stal?=
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Notice here how we wanted the :mailheader:`Subject` field to contain a non-ASCII
|
|
|
|
character? We did this by creating a :class:`Header` instance and passing in
|
|
|
|
the character set that the byte string was encoded in. When the subsequent
|
2009-04-13 10:13:25 -03:00
|
|
|
:class:`~email.message.Message` instance was flattened, the :mailheader:`Subject`
|
|
|
|
field was properly :rfc:`2047` encoded. MIME-aware mail readers would show this
|
|
|
|
header using the embedded ISO-8859-1 character.
|
2007-08-15 11:28:01 -03:00
|
|
|
|
|
|
|
.. versionadded:: 2.2.2
|
|
|
|
|
|
|
|
Here is the :class:`Header` class description:
|
|
|
|
|
|
|
|
|
|
|
|
.. class:: Header([s[, charset[, maxlinelen[, header_name[, continuation_ws[, errors]]]]]])
|
|
|
|
|
|
|
|
Create a MIME-compliant header that can contain strings in different character
|
|
|
|
sets.
|
|
|
|
|
|
|
|
Optional *s* is the initial header value. If ``None`` (the default), the
|
|
|
|
initial header value is not set. You can later append to the header with
|
|
|
|
:meth:`append` method calls. *s* may be a byte string or a Unicode string, but
|
|
|
|
see the :meth:`append` documentation for semantics.
|
|
|
|
|
|
|
|
Optional *charset* serves two purposes: it has the same meaning as the *charset*
|
|
|
|
argument to the :meth:`append` method. It also sets the default character set
|
|
|
|
for all subsequent :meth:`append` calls that omit the *charset* argument. If
|
|
|
|
*charset* is not provided in the constructor (the default), the ``us-ascii``
|
|
|
|
character set is used both as *s*'s initial charset and as the default for
|
|
|
|
subsequent :meth:`append` calls.
|
|
|
|
|
Merged revisions 86542,87136,87216,87221,87228,87256,87337-87338,87372,87516,87571,88164 via svnmerge from
svn+ssh://pythondev@svn.python.org/python/branches/py3k
........
r86542 | r.david.murray | 2010-11-19 22:48:58 -0500 (Fri, 19 Nov 2010) | 2 lines
Make test class name unique so that both test classes run.
........
r87136 | r.david.murray | 2010-12-08 17:53:00 -0500 (Wed, 08 Dec 2010) | 6 lines
Have script_helper._assert_python strip refcount strings from stderr.
This makes the output of the function and those that depend on it
independent of whether or not they are being run under a debug
build.
........
r87216 | r.david.murray | 2010-12-13 17:50:30 -0500 (Mon, 13 Dec 2010) | 2 lines
#10698: fix typo in example.
........
r87221 | r.david.murray | 2010-12-13 19:55:46 -0500 (Mon, 13 Dec 2010) | 4 lines
#10699: fix docstring for tzset: it does not take a parameter
Thanks to Garrett Cooper for the fix.
........
r87228 | r.david.murray | 2010-12-13 21:25:43 -0500 (Mon, 13 Dec 2010) | 2 lines
Turn on regrtest -W (rerun immediately) option for Windows, too.
........
r87256 | r.david.murray | 2010-12-14 21:19:14 -0500 (Tue, 14 Dec 2010) | 2 lines
#10705: document what the values of debuglevel are and mean.
........
r87337 | r.david.murray | 2010-12-17 11:11:40 -0500 (Fri, 17 Dec 2010) | 2 lines
#10559: provide instructions for accessing sys.argv when first mentioned.
........
r87338 | r.david.murray | 2010-12-17 11:29:07 -0500 (Fri, 17 Dec 2010) | 2 lines
#10454: clarify the compileall docs and help messages.
[changes to compileall.py were not backported, only the doc changes]
........
r87372 | r.david.murray | 2010-12-18 11:39:06 -0500 (Sat, 18 Dec 2010) | 2 lines
#10728: the default for printing help is sys.stdout, not stderr.
........
r87516 | r.david.murray | 2010-12-27 15:09:32 -0500 (Mon, 27 Dec 2010) | 5 lines
#7056: runtest and runtest_inner don't use testdir, so drop it from their sigs
I've only tested regular runs and -j runs. If I've broken anything
else I'm sure I'll hear about it sooner or later.
........
r87571 | r.david.murray | 2010-12-29 14:06:48 -0500 (Wed, 29 Dec 2010) | 2 lines
Fix same typo in docs.
........
r88164 | r.david.murray | 2011-01-24 14:34:58 -0500 (Mon, 24 Jan 2011) | 12 lines
#10960: fix 'stat' links, link to lstat from stat, general tidy of stat doc.
Original patch by Michal Nowikowski, with some additions and wording
fixes by me.
I changed the wording from 'Performs a stat system call' to 'Performs
the equivalent of a stat system call', since on Windows there are no
stat/lstat system calls involved. I also extended Michal's breakout
of the attributes into a list to the other paragraphs, and rearranged
the order of the paragraphs in the 'stat' docs to make it flow
better and put it in what I think is a more logical/useful order.
........
2011-02-11 13:25:54 -04:00
|
|
|
The maximum line length can be specified explicitly via *maxlinelen*. For
|
2007-08-15 11:28:01 -03:00
|
|
|
splitting the first line to a shorter value (to account for the field header
|
|
|
|
which isn't included in *s*, e.g. :mailheader:`Subject`) pass in the name of the
|
|
|
|
field in *header_name*. The default *maxlinelen* is 76, and the default value
|
|
|
|
for *header_name* is ``None``, meaning it is not taken into account for the
|
|
|
|
first line of a long, split header.
|
|
|
|
|
|
|
|
Optional *continuation_ws* must be :rfc:`2822`\ -compliant folding whitespace,
|
|
|
|
and is usually either a space or a hard tab character. This character will be
|
2009-03-30 19:42:17 -03:00
|
|
|
prepended to continuation lines. *continuation_ws* defaults to a single
|
|
|
|
space character (" ").
|
2007-08-15 11:28:01 -03:00
|
|
|
|
2008-04-24 22:29:10 -03:00
|
|
|
Optional *errors* is passed straight through to the :meth:`append` method.
|
2007-08-15 11:28:01 -03:00
|
|
|
|
|
|
|
|
2008-04-24 22:29:10 -03:00
|
|
|
.. method:: append(s[, charset[, errors]])
|
2007-08-15 11:28:01 -03:00
|
|
|
|
2008-04-24 22:29:10 -03:00
|
|
|
Append the string *s* to the MIME header.
|
2007-08-15 11:28:01 -03:00
|
|
|
|
2009-04-13 10:13:25 -03:00
|
|
|
Optional *charset*, if given, should be a :class:`~email.charset.Charset`
|
|
|
|
instance (see :mod:`email.charset`) or the name of a character set, which
|
|
|
|
will be converted to a :class:`~email.charset.Charset` instance. A value
|
|
|
|
of ``None`` (the default) means that the *charset* given in the constructor
|
|
|
|
is used.
|
2007-08-15 11:28:01 -03:00
|
|
|
|
2008-04-24 22:29:10 -03:00
|
|
|
*s* may be a byte string or a Unicode string. If it is a byte string
|
|
|
|
(i.e. ``isinstance(s, str)`` is true), then *charset* is the encoding of
|
|
|
|
that byte string, and a :exc:`UnicodeError` will be raised if the string
|
|
|
|
cannot be decoded with that character set.
|
2007-08-15 11:28:01 -03:00
|
|
|
|
2008-04-24 22:29:10 -03:00
|
|
|
If *s* is a Unicode string, then *charset* is a hint specifying the
|
|
|
|
character set of the characters in the string. In this case, when
|
|
|
|
producing an :rfc:`2822`\ -compliant header using :rfc:`2047` rules, the
|
|
|
|
Unicode string will be encoded using the following charsets in order:
|
|
|
|
``us-ascii``, the *charset* hint, ``utf-8``. The first character set to
|
|
|
|
not provoke a :exc:`UnicodeError` is used.
|
2007-08-15 11:28:01 -03:00
|
|
|
|
2008-04-24 22:29:10 -03:00
|
|
|
Optional *errors* is passed through to any :func:`unicode` or
|
|
|
|
:func:`ustr.encode` call, and defaults to "strict".
|
2007-08-15 11:28:01 -03:00
|
|
|
|
|
|
|
|
2008-04-24 22:29:10 -03:00
|
|
|
.. method:: encode([splitchars])
|
2007-08-15 11:28:01 -03:00
|
|
|
|
2008-04-24 22:29:10 -03:00
|
|
|
Encode a message header into an RFC-compliant format, possibly wrapping
|
|
|
|
long lines and encapsulating non-ASCII parts in base64 or quoted-printable
|
|
|
|
encodings. Optional *splitchars* is a string containing characters to
|
|
|
|
split long ASCII lines on, in rough support of :rfc:`2822`'s *highest
|
|
|
|
level syntactic breaks*. This doesn't affect :rfc:`2047` encoded lines.
|
2007-08-15 11:28:01 -03:00
|
|
|
|
2008-04-24 22:29:10 -03:00
|
|
|
The :class:`Header` class also provides a number of methods to support
|
|
|
|
standard operators and built-in functions.
|
2007-08-15 11:28:01 -03:00
|
|
|
|
|
|
|
|
2008-04-24 22:29:10 -03:00
|
|
|
.. method:: __str__()
|
2007-08-15 11:28:01 -03:00
|
|
|
|
2008-04-24 22:29:10 -03:00
|
|
|
A synonym for :meth:`Header.encode`. Useful for ``str(aHeader)``.
|
2007-08-15 11:28:01 -03:00
|
|
|
|
|
|
|
|
2008-04-24 22:29:10 -03:00
|
|
|
.. method:: __unicode__()
|
2007-08-15 11:28:01 -03:00
|
|
|
|
2008-04-24 22:29:10 -03:00
|
|
|
A helper for the built-in :func:`unicode` function. Returns the header as
|
|
|
|
a Unicode string.
|
2007-08-15 11:28:01 -03:00
|
|
|
|
|
|
|
|
2008-04-24 22:29:10 -03:00
|
|
|
.. method:: __eq__(other)
|
2007-08-15 11:28:01 -03:00
|
|
|
|
2008-04-24 22:29:10 -03:00
|
|
|
This method allows you to compare two :class:`Header` instances for
|
|
|
|
equality.
|
2007-08-15 11:28:01 -03:00
|
|
|
|
|
|
|
|
2008-04-24 22:29:10 -03:00
|
|
|
.. method:: __ne__(other)
|
2007-08-15 11:28:01 -03:00
|
|
|
|
2008-04-24 22:29:10 -03:00
|
|
|
This method allows you to compare two :class:`Header` instances for
|
|
|
|
inequality.
|
2007-08-15 11:28:01 -03:00
|
|
|
|
|
|
|
The :mod:`email.header` module also provides the following convenient functions.
|
|
|
|
|
|
|
|
|
|
|
|
.. function:: decode_header(header)
|
|
|
|
|
|
|
|
Decode a message header value without converting the character set. The header
|
|
|
|
value is in *header*.
|
|
|
|
|
|
|
|
This function returns a list of ``(decoded_string, charset)`` pairs containing
|
|
|
|
each of the decoded parts of the header. *charset* is ``None`` for non-encoded
|
|
|
|
parts of the header, otherwise a lower case string containing the name of the
|
|
|
|
character set specified in the encoded string.
|
|
|
|
|
|
|
|
Here's an example::
|
|
|
|
|
|
|
|
>>> from email.header import decode_header
|
|
|
|
>>> decode_header('=?iso-8859-1?q?p=F6stal?=')
|
|
|
|
[('p\xf6stal', 'iso-8859-1')]
|
|
|
|
|
|
|
|
|
|
|
|
.. function:: make_header(decoded_seq[, maxlinelen[, header_name[, continuation_ws]]])
|
|
|
|
|
|
|
|
Create a :class:`Header` instance from a sequence of pairs as returned by
|
|
|
|
:func:`decode_header`.
|
|
|
|
|
|
|
|
:func:`decode_header` takes a header value string and returns a sequence of
|
|
|
|
pairs of the format ``(decoded_string, charset)`` where *charset* is the name of
|
|
|
|
the character set.
|
|
|
|
|
|
|
|
This function takes one of those sequence of pairs and returns a :class:`Header`
|
|
|
|
instance. Optional *maxlinelen*, *header_name*, and *continuation_ws* are as in
|
|
|
|
the :class:`Header` constructor.
|
|
|
|
|