cpython/Doc/lib/libmultifile.tex

% Documentation by ESR
\section{Standard Module \module{multifile}}
\stmodindex{multiFile}
\label{module-multifile}

The \code{MultiFile} object enables you to treat sections of a text
file as file-like input objects, with EOF being returned by
\code{readline} when a given delimiter pattern is encountered.  The
defaults of this class are designed to make it useful for parsing
MIME multipart messages, but by subclassing it and overriding methods 
it can be easily adapted for more general use.

\begin{classdesc}{MultiFile}{fp[, seekable=1]}
Create a multi-file.  You must instantiate this class with an input
object argument for MultiFile to get lines from, such as as a file
object returned by \code{open}.

MultiFile only ever looks at the input object's \code{readline},
\code{seek} and \code{tell} methods, and the latter two are only
needed if you want to random-access the multifile sections. To use
MultiFile on a non-seekable stream object, set the optional seekable
argument to 0; this will avoid using the input object's \code{seek}
and \code{tell} at all.
\end{classdesc}

It will be useful to know that in MultiFile's view of the world, text
is composed of three kinds of lines: data, section-dividers, and
end-markers.  MultiFile is designed to support parsing of
messages that may have multiple nested message parts, each with its
own pattern for section-divider and end-marker lines.

\subsection{MultiFile Objects}
\label{MultiFile-objects}

A \class{MultiFile} instance has the following methods:

\begin{methoddesc}{push}{str}
Push a boundary string.  When an appropriately decorated version of
this boundary is found as an input line, it will be interpreted as a
section-divider or end-marker and passed back as EOF.  All subsequent
reads will also be passed back as EOF, until a \method{pop} removes
the boundary a or \method{next} call reenables it.

It is possible to push more than one boundary.  Encountering the
most-recently-pushed boundary will return EOF; encountering any other
boundary will raise an error.
\end{methoddesc}

\begin{methoddesc}{readline}{str}
Read a line.  If the line is data (not a section-divider or end-marker
or real EOF) return it.  If the line matches the most-recently-stacked
boundary, return \code{''} and set \code{self.last} to 1 or 0 according as
the match is or is not an end-marker.  If the line matches any other
stacked boundary, raise an error.  If the line is a real EOF, raise an
error unless all boundaries have been popped.
\end{methoddesc}

\begin{methoddesc}{readlines}{str}
Read all lines, up to the next section.  Return them as a list of strings
\end{methoddesc}

\begin{methoddesc}{read}{str}
Read all lines, up to the next section.  Return them as a single
(multiline) string.  Note that this doesn't take a size argument!
\end{methoddesc}

\begin{methoddesc}{next}{str}
Skip lines to the next section (that is, read lines until a
section-divider or end-marker has been consumed).  Return 1 if there
is such a section, 0 if an end-marker is seen.  Re-enable the
most-recently-pushed boundary.
\end{methoddesc}

\begin{methoddesc}{pop}{str}
Pop a section boundary.  This boundary will no longer be interpreted as EOF.
\end{methoddesc}

\begin{methoddesc}{seek}{str, pos, whence=0}
Seek.  Seek indices are relative to the start of the current section.
The pos and whence arguments are interpreted as for a file seek.
\end{methoddesc}

\begin{methoddesc}{next}{str}
Tell.  Tell indices are relative to the start of the current section.
\end{methoddesc}

\begin{methoddesc}{is_data}{str}
Return true if a 1 is certainly data and 0 if it might be a section
boundary.  As written, it tests for a prefix other than '--' at start of
line (which all MIME boundaries have) but it is declared so it can be
overridden in derived classes.

Note that this test is used intended as a fast guard for the real
boundary tests; if it always returns 0 it will merely slow processing,
not cause it to fail.
\end{methoddesc}

\begin{methoddesc}{section_divider}{str}
Turn a boundary into a section-divider line.  By default, this
method prepends '--' (which MIME section boundaries have) but it is
declared so it can be overridden in derived classes.  This method
need not append LF or CR-LF, as comparison with the result ignores
trailing whitespace. 
\end{methoddesc}

\begin{methoddesc}{end_marker}{str}
Turn a boundary string into an end-marker line.  By default, this
method prepends '--' and appends '--' (like a MIME-multipart
end-of-message marker) but it is declared so it can be be overridden
in derived classes.  This method need not append LF or CR-LF, as
comparison with the result ignores trailing whitespace.
\end{methoddesc}

Finally, \class{MultiFile} instances have two public instance variables:

\begin{memberdesc}{level}
\end{memberdesc}

\begin{memberdesc}{last}
1 if the last EOF passed back was for an end-of-message marker, 0 otherwise. 
\end{memberdesc}

Example:

\begin{verbatim}
    fp = MultiFile(sys.stdin, 0)
    fp.push(outer_boundary)
    message1 = fp.readlines()
    # We should now be either at real EOF or stopped on a message
    # boundary. Re-enable the outer boundary.
    fp.next()
    # Read another message with the same delimiter
    message2 = fp.readlines()
    # Re-enable that delimiter again
    fp.next()
    # Now look for a message subpart with a different boundary
    fp.push(inner_boundary)
    sub_header = fp.readlines()
    # If no exception has been thrown, we're looking at the start of
    # the message subpart.  Reset and grab the subpart
    fp.next()
    sub_body = fp.readlines()
    # Got it.  Now pop the inner boundary to re-enable the outer one.
    fp.pop()
    # Read to next outer boundary
    message3 = fp.readlines()
\end{verbatim}
Contributions by Eric Raymond: documentation for modules cmd, multifile and smtplib. 1998-06-28 14:55:53 -03:00			`% Documentation by ESR`
			`\section{Standard Module \module{multifile}}`
			`\stmodindex{multiFile}`
			`\label{module-multifile}`

			`The \code{MultiFile} object enables you to treat sections of a text`
			`file as file-like input objects, with EOF being returned by`
			`\code{readline} when a given delimiter pattern is encountered. The`
			`defaults of this class are designed to make it useful for parsing`
			`MIME multipart messages, but by subclassing it and overriding methods`
			`it can be easily adapted for more general use.`

			`\begin{classdesc}{MultiFile}{fp[, seekable=1]}`
			`Create a multi-file. You must instantiate this class with an input`
			`object argument for MultiFile to get lines from, such as as a file`
			`object returned by \code{open}.`

			`MultiFile only ever looks at the input object's \code{readline},`
			`\code{seek} and \code{tell} methods, and the latter two are only`
			`needed if you want to random-access the multifile sections. To use`
			`MultiFile on a non-seekable stream object, set the optional seekable`
			`argument to 0; this will avoid using the input object's \code{seek}`
			`and \code{tell} at all.`
			`\end{classdesc}`

			`It will be useful to know that in MultiFile's view of the world, text`
			`is composed of three kinds of lines: data, section-dividers, and`
			`end-markers. MultiFile is designed to support parsing of`
			`messages that may have multiple nested message parts, each with its`
			`own pattern for section-divider and end-marker lines.`

			`\subsection{MultiFile Objects}`
			`\label{MultiFile-objects}`

			`A \class{MultiFile} instance has the following methods:`

			`\begin{methoddesc}{push}{str}`
			`Push a boundary string. When an appropriately decorated version of`
			`this boundary is found as an input line, it will be interpreted as a`
			`section-divider or end-marker and passed back as EOF. All subsequent`
			`reads will also be passed back as EOF, until a \method{pop} removes`
			`the boundary a or \method{next} call reenables it.`

			`It is possible to push more than one boundary. Encountering the`
			`most-recently-pushed boundary will return EOF; encountering any other`
			`boundary will raise an error.`
			`\end{methoddesc}`

			`\begin{methoddesc}{readline}{str}`
			`Read a line. If the line is data (not a section-divider or end-marker`
			`or real EOF) return it. If the line matches the most-recently-stacked`
Note that readline returns '' on EOF, not "EOF" (whatever that may be). 1998-06-30 13:35:25 -03:00			`boundary, return \code{''} and set \code{self.last} to 1 or 0 according as`
Contributions by Eric Raymond: documentation for modules cmd, multifile and smtplib. 1998-06-28 14:55:53 -03:00			`the match is or is not an end-marker. If the line matches any other`
			`stacked boundary, raise an error. If the line is a real EOF, raise an`
			`error unless all boundaries have been popped.`
			`\end{methoddesc}`

			`\begin{methoddesc}{readlines}{str}`
			`Read all lines, up to the next section. Return them as a list of strings`
			`\end{methoddesc}`

			`\begin{methoddesc}{read}{str}`
			`Read all lines, up to the next section. Return them as a single`
			`(multiline) string. Note that this doesn't take a size argument!`
			`\end{methoddesc}`

			`\begin{methoddesc}{next}{str}`
			`Skip lines to the next section (that is, read lines until a`
			`section-divider or end-marker has been consumed). Return 1 if there`
			`is such a section, 0 if an end-marker is seen. Re-enable the`
			`most-recently-pushed boundary.`
			`\end{methoddesc}`

			`\begin{methoddesc}{pop}{str}`
			`Pop a section boundary. This boundary will no longer be interpreted as EOF.`
			`\end{methoddesc}`

			`\begin{methoddesc}{seek}{str, pos, whence=0}`
			`Seek. Seek indices are relative to the start of the current section.`
			`The pos and whence arguments are interpreted as for a file seek.`
			`\end{methoddesc}`

			`\begin{methoddesc}{next}{str}`
			`Tell. Tell indices are relative to the start of the current section.`
			`\end{methoddesc}`

			`\begin{methoddesc}{is_data}{str}`
			`Return true if a 1 is certainly data and 0 if it might be a section`
			`boundary. As written, it tests for a prefix other than '--' at start of`
			`line (which all MIME boundaries have) but it is declared so it can be`
			`overridden in derived classes.`

			`Note that this test is used intended as a fast guard for the real`
			`boundary tests; if it always returns 0 it will merely slow processing,`
			`not cause it to fail.`
			`\end{methoddesc}`

			`\begin{methoddesc}{section_divider}{str}`
			`Turn a boundary into a section-divider line. By default, this`
			`method prepends '--' (which MIME section boundaries have) but it is`
			`declared so it can be overridden in derived classes. This method`
			`need not append LF or CR-LF, as comparison with the result ignores`
			`trailing whitespace.`
			`\end{methoddesc}`

			`\begin{methoddesc}{end_marker}{str}`
			`Turn a boundary string into an end-marker line. By default, this`
			`method prepends '--' and appends '--' (like a MIME-multipart`
			`end-of-message marker) but it is declared so it can be be overridden`
			`in derived classes. This method need not append LF or CR-LF, as`
			`comparison with the result ignores trailing whitespace.`
			`\end{methoddesc}`

			`Finally, \class{MultiFile} instances have two public instance variables:`

			`\begin{memberdesc}{level}`
			`\end{memberdesc}`

			`\begin{memberdesc}{last}`
			`1 if the last EOF passed back was for an end-of-message marker, 0 otherwise.`
			`\end{memberdesc}`

			`Example:`

			`\begin{verbatim}`
			`fp = MultiFile(sys.stdin, 0)`
			`fp.push(outer_boundary)`
			`message1 = fp.readlines()`
			`# We should now be either at real EOF or stopped on a message`
			`# boundary. Re-enable the outer boundary.`
			`fp.next()`
			`# Read another message with the same delimiter`
			`message2 = fp.readlines()`
			`# Re-enable that delimiter again`
			`fp.next()`
			`# Now look for a message subpart with a different boundary`
			`fp.push(inner_boundary)`
			`sub_header = fp.readlines()`
			`# If no exception has been thrown, we're looking at the start of`
			`# the message subpart. Reset and grab the subpart`
			`fp.next()`
			`sub_body = fp.readlines()`
			`# Got it. Now pop the inner boundary to re-enable the outer one.`
			`fp.pop()`
			`# Read to next outer boundary`
			`message3 = fp.readlines()`
			`\end{verbatim}`