2007-08-15 11:28:01 -03:00
|
|
|
|
|
|
|
:mod:`multifile` --- Support for files containing distinct parts
|
|
|
|
================================================================
|
|
|
|
|
|
|
|
.. module:: multifile
|
|
|
|
:synopsis: Support for reading files which contain distinct parts, such as some MIME data.
|
2007-08-15 15:41:25 -03:00
|
|
|
:deprecated:
|
2007-08-15 11:28:01 -03:00
|
|
|
.. sectionauthor:: Eric S. Raymond <esr@snark.thyrsus.com>
|
|
|
|
|
|
|
|
|
|
|
|
.. deprecated:: 2.5
|
|
|
|
The :mod:`email` package should be used in preference to the :mod:`multifile`
|
|
|
|
module. This module is present only to maintain backward compatibility.
|
|
|
|
|
|
|
|
The :class:`MultiFile` object enables you to treat sections of a text file as
|
|
|
|
file-like input objects, with ``''`` being returned by :meth:`readline` when a
|
|
|
|
given delimiter pattern is encountered. The defaults of this class are designed
|
|
|
|
to make it useful for parsing MIME multipart messages, but by subclassing it and
|
|
|
|
overriding methods it can be easily adapted for more general use.
|
|
|
|
|
|
|
|
|
|
|
|
.. class:: MultiFile(fp[, seekable])
|
|
|
|
|
|
|
|
Create a multi-file. You must instantiate this class with an input object
|
|
|
|
argument for the :class:`MultiFile` instance to get lines from, such as a file
|
|
|
|
object returned by :func:`open`.
|
|
|
|
|
|
|
|
:class:`MultiFile` only ever looks at the input object's :meth:`readline`,
|
|
|
|
:meth:`seek` and :meth:`tell` methods, and the latter two are only needed if you
|
|
|
|
want random access to the individual MIME parts. To use :class:`MultiFile` on a
|
|
|
|
non-seekable stream object, set the optional *seekable* argument to false; this
|
|
|
|
will prevent using the input object's :meth:`seek` and :meth:`tell` methods.
|
|
|
|
|
|
|
|
It will be useful to know that in :class:`MultiFile`'s view of the world, text
|
|
|
|
is composed of three kinds of lines: data, section-dividers, and end-markers.
|
|
|
|
MultiFile is designed to support parsing of messages that may have multiple
|
|
|
|
nested message parts, each with its own pattern for section-divider and
|
|
|
|
end-marker lines.
|
|
|
|
|
|
|
|
|
|
|
|
.. seealso::
|
|
|
|
|
|
|
|
Module :mod:`email`
|
|
|
|
Comprehensive email handling package; supersedes the :mod:`multifile` module.
|
|
|
|
|
|
|
|
|
|
|
|
.. _multifile-objects:
|
|
|
|
|
|
|
|
MultiFile Objects
|
|
|
|
-----------------
|
|
|
|
|
|
|
|
A :class:`MultiFile` instance has the following methods:
|
|
|
|
|
|
|
|
|
|
|
|
.. method:: MultiFile.readline(str)
|
|
|
|
|
|
|
|
Read a line. If the line is data (not a section-divider or end-marker or real
|
|
|
|
EOF) return it. If the line matches the most-recently-stacked boundary, return
|
|
|
|
``''`` and set ``self.last`` to 1 or 0 according as the match is or is not an
|
|
|
|
end-marker. If the line matches any other stacked boundary, raise an error. On
|
|
|
|
encountering end-of-file on the underlying stream object, the method raises
|
|
|
|
:exc:`Error` unless all boundaries have been popped.
|
|
|
|
|
|
|
|
|
|
|
|
.. method:: MultiFile.readlines(str)
|
|
|
|
|
|
|
|
Return all lines remaining in this part as a list of strings.
|
|
|
|
|
|
|
|
|
|
|
|
.. method:: MultiFile.read()
|
|
|
|
|
|
|
|
Read all lines, up to the next section. Return them as a single (multiline)
|
|
|
|
string. Note that this doesn't take a size argument!
|
|
|
|
|
|
|
|
|
|
|
|
.. method:: MultiFile.seek(pos[, whence])
|
|
|
|
|
|
|
|
Seek. Seek indices are relative to the start of the current section. The *pos*
|
|
|
|
and *whence* arguments are interpreted as for a file seek.
|
|
|
|
|
|
|
|
|
|
|
|
.. method:: MultiFile.tell()
|
|
|
|
|
|
|
|
Return the file position relative to the start of the current section.
|
|
|
|
|
|
|
|
|
|
|
|
.. method:: MultiFile.next()
|
|
|
|
|
|
|
|
Skip lines to the next section (that is, read lines until a section-divider or
|
|
|
|
end-marker has been consumed). Return true if there is such a section, false if
|
|
|
|
an end-marker is seen. Re-enable the most-recently-pushed boundary.
|
|
|
|
|
|
|
|
|
|
|
|
.. method:: MultiFile.is_data(str)
|
|
|
|
|
|
|
|
Return true if *str* is data and false if it might be a section boundary. As
|
|
|
|
written, it tests for a prefix other than ``'-``\ ``-'`` at start of line (which
|
|
|
|
all MIME boundaries have) but it is declared so it can be overridden in derived
|
|
|
|
classes.
|
|
|
|
|
|
|
|
Note that this test is used intended as a fast guard for the real boundary
|
|
|
|
tests; if it always returns false it will merely slow processing, not cause it
|
|
|
|
to fail.
|
|
|
|
|
|
|
|
|
|
|
|
.. method:: MultiFile.push(str)
|
|
|
|
|
|
|
|
Push a boundary string. When a decorated version of this boundary is found as
|
|
|
|
an input line, it will be interpreted as a section-divider or end-marker
|
|
|
|
(depending on the decoration, see :rfc:`2045`). All subsequent reads will
|
|
|
|
return the empty string to indicate end-of-file, until a call to :meth:`pop`
|
|
|
|
removes the boundary a or :meth:`next` call reenables it.
|
|
|
|
|
|
|
|
It is possible to push more than one boundary. Encountering the
|
|
|
|
most-recently-pushed boundary will return EOF; encountering any other
|
|
|
|
boundary will raise an error.
|
|
|
|
|
|
|
|
|
|
|
|
.. method:: MultiFile.pop()
|
|
|
|
|
|
|
|
Pop a section boundary. This boundary will no longer be interpreted as EOF.
|
|
|
|
|
|
|
|
|
|
|
|
.. method:: MultiFile.section_divider(str)
|
|
|
|
|
|
|
|
Turn a boundary into a section-divider line. By default, this method
|
|
|
|
prepends ``'--'`` (which MIME section boundaries have) but it is declared so
|
|
|
|
it can be overridden in derived classes. This method need not append LF or
|
|
|
|
CR-LF, as comparison with the result ignores trailing whitespace.
|
|
|
|
|
|
|
|
|
|
|
|
.. method:: MultiFile.end_marker(str)
|
|
|
|
|
|
|
|
Turn a boundary string into an end-marker line. By default, this method
|
|
|
|
prepends ``'--'`` and appends ``'--'`` (like a MIME-multipart end-of-message
|
|
|
|
marker) but it is declared so it can be overridden in derived classes. This
|
|
|
|
method need not append LF or CR-LF, as comparison with the result ignores
|
|
|
|
trailing whitespace.
|
|
|
|
|
|
|
|
Finally, :class:`MultiFile` instances have two public instance variables:
|
|
|
|
|
|
|
|
|
|
|
|
.. attribute:: MultiFile.level
|
|
|
|
|
|
|
|
Nesting depth of the current part.
|
|
|
|
|
|
|
|
|
|
|
|
.. attribute:: MultiFile.last
|
|
|
|
|
|
|
|
True if the last end-of-file was for an end-of-message marker.
|
|
|
|
|
|
|
|
|
|
|
|
.. _multifile-example:
|
|
|
|
|
|
|
|
:class:`MultiFile` Example
|
|
|
|
--------------------------
|
|
|
|
|
|
|
|
.. sectionauthor:: Skip Montanaro <skip@mojam.com>
|
|
|
|
|
|
|
|
|
|
|
|
::
|
|
|
|
|
|
|
|
import mimetools
|
|
|
|
import multifile
|
|
|
|
import StringIO
|
|
|
|
|
|
|
|
def extract_mime_part_matching(stream, mimetype):
|
|
|
|
"""Return the first element in a multipart MIME message on stream
|
|
|
|
matching mimetype."""
|
|
|
|
|
|
|
|
msg = mimetools.Message(stream)
|
|
|
|
msgtype = msg.gettype()
|
|
|
|
params = msg.getplist()
|
|
|
|
|
|
|
|
data = StringIO.StringIO()
|
|
|
|
if msgtype[:10] == "multipart/":
|
|
|
|
|
|
|
|
file = multifile.MultiFile(stream)
|
|
|
|
file.push(msg.getparam("boundary"))
|
|
|
|
while file.next():
|
|
|
|
submsg = mimetools.Message(file)
|
|
|
|
try:
|
|
|
|
data = StringIO.StringIO()
|
|
|
|
mimetools.decode(file, data, submsg.getencoding())
|
|
|
|
except ValueError:
|
|
|
|
continue
|
|
|
|
if submsg.gettype() == mimetype:
|
|
|
|
break
|
|
|
|
file.pop()
|
|
|
|
return data.getvalue()
|
|
|
|
|