2000-09-23 01:47:56 -03:00
|
|
|
\section{\module{xml.parsers.expat} ---
|
2000-10-29 01:10:30 -04:00
|
|
|
Fast XML parsing using Expat}
|
2000-06-10 23:42:07 -03:00
|
|
|
|
2001-02-08 11:40:33 -04:00
|
|
|
% Markup notes:
|
|
|
|
%
|
|
|
|
% Many of the attributes of the XMLParser objects are callbacks.
|
|
|
|
% Since signature information must be presented, these are described
|
|
|
|
% using the methoddesc environment. Since they are attributes which
|
|
|
|
% are set by client code, in-text references to these attributes
|
|
|
|
% should be marked using the \member macro and should not include the
|
|
|
|
% parentheses used when marking functions and methods.
|
|
|
|
|
2000-09-23 01:47:56 -03:00
|
|
|
\declaremodule{standard}{xml.parsers.expat}
|
|
|
|
\modulesynopsis{An interface to the Expat non-validating XML parser.}
|
2000-06-10 23:42:07 -03:00
|
|
|
\moduleauthor{Paul Prescod}{paul@prescod.net}
|
|
|
|
\sectionauthor{A.M. Kuchling}{amk1@bigfoot.com}
|
|
|
|
|
2000-09-23 01:47:56 -03:00
|
|
|
\versionadded{2.0}
|
|
|
|
|
2000-10-29 01:10:30 -04:00
|
|
|
The \module{xml.parsers.expat} module is a Python interface to the
|
|
|
|
Expat\index{Expat} non-validating XML parser.
|
2000-06-10 23:42:07 -03:00
|
|
|
The module provides a single extension type, \class{xmlparser}, that
|
|
|
|
represents the current state of an XML parser. After an
|
|
|
|
\class{xmlparser} object has been created, various attributes of the object
|
|
|
|
can be set to handler functions. When an XML document is then fed to
|
|
|
|
the parser, the handler functions are called for the character data
|
|
|
|
and markup in the XML document.
|
2000-09-23 01:47:56 -03:00
|
|
|
|
|
|
|
This module uses the \module{pyexpat}\refbimodindex{pyexpat} module to
|
|
|
|
provide access to the Expat parser. Direct use of the
|
|
|
|
\module{pyexpat} module is deprecated.
|
2000-10-29 01:10:30 -04:00
|
|
|
|
|
|
|
This module provides one exception and one type object:
|
|
|
|
|
2001-02-14 14:54:32 -04:00
|
|
|
\begin{excdesc}{ExpatError}
|
2001-09-20 17:43:28 -03:00
|
|
|
The exception raised when Expat reports an error. See section
|
|
|
|
\ref{expaterror-objects}, ``ExpatError Exceptions,'' for more
|
|
|
|
information on interpreting Expat errors.
|
2000-10-29 01:10:30 -04:00
|
|
|
\end{excdesc}
|
|
|
|
|
2001-02-14 14:54:32 -04:00
|
|
|
\begin{excdesc}{error}
|
|
|
|
Alias for \exception{ExpatError}.
|
|
|
|
\end{excdesc}
|
|
|
|
|
2000-10-29 01:10:30 -04:00
|
|
|
\begin{datadesc}{XMLParserType}
|
|
|
|
The type of the return values from the \function{ParserCreate()}
|
|
|
|
function.
|
|
|
|
\end{datadesc}
|
|
|
|
|
|
|
|
|
2000-09-23 01:47:56 -03:00
|
|
|
The \module{xml.parsers.expat} module contains two functions:
|
2000-06-10 23:42:07 -03:00
|
|
|
|
|
|
|
\begin{funcdesc}{ErrorString}{errno}
|
|
|
|
Returns an explanatory string for a given error number \var{errno}.
|
|
|
|
\end{funcdesc}
|
|
|
|
|
2000-10-29 01:10:30 -04:00
|
|
|
\begin{funcdesc}{ParserCreate}{\optional{encoding\optional{,
|
|
|
|
namespace_separator}}}
|
2000-06-10 23:42:07 -03:00
|
|
|
Creates and returns a new \class{xmlparser} object.
|
|
|
|
\var{encoding}, if specified, must be a string naming the encoding
|
|
|
|
used by the XML data. Expat doesn't support as many encodings as
|
|
|
|
Python does, and its repertoire of encodings can't be extended; it
|
2001-02-08 11:40:33 -04:00
|
|
|
supports UTF-8, UTF-16, ISO-8859-1 (Latin1), and ASCII. If
|
|
|
|
\var{encoding} is given it will override the implicit or explicit
|
|
|
|
encoding of the document.
|
2000-06-10 23:42:07 -03:00
|
|
|
|
|
|
|
Expat can optionally do XML namespace processing for you, enabled by
|
2000-10-29 01:10:30 -04:00
|
|
|
providing a value for \var{namespace_separator}. The value must be a
|
|
|
|
one-character string; a \exception{ValueError} will be raised if the
|
|
|
|
string has an illegal length (\code{None} is considered the same as
|
|
|
|
omission). When namespace processing is enabled, element type names
|
|
|
|
and attribute names that belong to a namespace will be expanded. The
|
|
|
|
element name passed to the element handlers
|
2001-02-08 11:40:33 -04:00
|
|
|
\member{StartElementHandler} and \member{EndElementHandler}
|
2000-06-10 23:42:07 -03:00
|
|
|
will be the concatenation of the namespace URI, the namespace
|
|
|
|
separator character, and the local part of the name. If the namespace
|
2000-10-29 01:10:30 -04:00
|
|
|
separator is a zero byte (\code{chr(0)}) then the namespace URI and
|
2001-02-08 11:40:33 -04:00
|
|
|
the local part will be concatenated without any separator.
|
2000-06-10 23:42:07 -03:00
|
|
|
|
2000-11-28 02:38:22 -04:00
|
|
|
For example, if \var{namespace_separator} is set to a space character
|
|
|
|
(\character{ }) and the following document is parsed:
|
2000-06-10 23:42:07 -03:00
|
|
|
|
|
|
|
\begin{verbatim}
|
|
|
|
<?xml version="1.0"?>
|
|
|
|
<root xmlns = "http://default-namespace.org/"
|
|
|
|
xmlns:py = "http://www.python.org/ns/">
|
|
|
|
<py:elem1 />
|
|
|
|
<elem2 xmlns="" />
|
|
|
|
</root>
|
|
|
|
\end{verbatim}
|
|
|
|
|
2001-02-08 11:40:33 -04:00
|
|
|
\member{StartElementHandler} will receive the following strings
|
2000-09-25 11:14:30 -03:00
|
|
|
for each element:
|
2000-06-10 23:42:07 -03:00
|
|
|
|
|
|
|
\begin{verbatim}
|
|
|
|
http://default-namespace.org/ root
|
|
|
|
http://www.python.org/ns/ elem1
|
|
|
|
elem2
|
|
|
|
\end{verbatim}
|
|
|
|
\end{funcdesc}
|
|
|
|
|
2000-12-23 18:19:05 -04:00
|
|
|
|
|
|
|
\subsection{XMLParser Objects \label{xmlparser-objects}}
|
|
|
|
|
2000-06-10 23:42:07 -03:00
|
|
|
\class{xmlparser} objects have the following methods:
|
|
|
|
|
2000-11-28 02:38:22 -04:00
|
|
|
\begin{methoddesc}[xmlparser]{Parse}{data\optional{, isfinal}}
|
2000-06-10 23:42:07 -03:00
|
|
|
Parses the contents of the string \var{data}, calling the appropriate
|
|
|
|
handler functions to process the parsed data. \var{isfinal} must be
|
2000-12-23 18:19:05 -04:00
|
|
|
true on the final call to this method. \var{data} can be the empty
|
2000-07-04 23:03:34 -03:00
|
|
|
string at any time.
|
2000-06-10 23:42:07 -03:00
|
|
|
\end{methoddesc}
|
|
|
|
|
2000-10-29 01:10:30 -04:00
|
|
|
\begin{methoddesc}[xmlparser]{ParseFile}{file}
|
2000-06-10 23:42:07 -03:00
|
|
|
Parse XML data reading from the object \var{file}. \var{file} only
|
|
|
|
needs to provide the \method{read(\var{nbytes})} method, returning the
|
|
|
|
empty string when there's no more data.
|
|
|
|
\end{methoddesc}
|
|
|
|
|
2000-10-29 01:10:30 -04:00
|
|
|
\begin{methoddesc}[xmlparser]{SetBase}{base}
|
2001-02-08 11:40:33 -04:00
|
|
|
Sets the base to be used for resolving relative URIs in system
|
|
|
|
identifiers in declarations. Resolving relative identifiers is left
|
|
|
|
to the application: this value will be passed through as the
|
|
|
|
\var{base} argument to the \function{ExternalEntityRefHandler},
|
|
|
|
\function{NotationDeclHandler}, and
|
|
|
|
\function{UnparsedEntityDeclHandler} functions.
|
2000-06-10 23:42:07 -03:00
|
|
|
\end{methoddesc}
|
|
|
|
|
2000-10-29 01:10:30 -04:00
|
|
|
\begin{methoddesc}[xmlparser]{GetBase}{}
|
2000-06-10 23:42:07 -03:00
|
|
|
Returns a string containing the base set by a previous call to
|
|
|
|
\method{SetBase()}, or \code{None} if
|
|
|
|
\method{SetBase()} hasn't been called.
|
|
|
|
\end{methoddesc}
|
|
|
|
|
2001-02-14 14:54:32 -04:00
|
|
|
\begin{methoddesc}[xmlparser]{GetInputContext}{}
|
|
|
|
Returns the input data that generated the current event as a string.
|
|
|
|
The data is in the encoding of the entity which contains the text.
|
|
|
|
When called while an event handler is not active, the return value is
|
|
|
|
\code{None}.
|
|
|
|
\versionadded{2.1}
|
|
|
|
\end{methoddesc}
|
|
|
|
|
2000-12-23 18:19:05 -04:00
|
|
|
\begin{methoddesc}[xmlparser]{ExternalEntityParserCreate}{context\optional{,
|
|
|
|
encoding}}
|
|
|
|
Create a ``child'' parser which can be used to parse an external
|
|
|
|
parsed entity referred to by content parsed by the parent parser. The
|
2001-01-04 01:48:08 -04:00
|
|
|
\var{context} parameter should be the string passed to the
|
2000-12-23 18:19:05 -04:00
|
|
|
\method{ExternalEntityRefHandler()} handler function, described below.
|
2001-02-08 11:40:33 -04:00
|
|
|
The child parser is created with the \member{ordered_attributes},
|
|
|
|
\member{returns_unicode} and \member{specified_attributes} set to the
|
|
|
|
values of this parser.
|
2000-12-23 18:19:05 -04:00
|
|
|
\end{methoddesc}
|
|
|
|
|
2000-10-29 01:10:30 -04:00
|
|
|
|
2000-09-25 11:14:30 -03:00
|
|
|
\class{xmlparser} objects have the following attributes:
|
2000-08-17 20:15:21 -03:00
|
|
|
|
2001-02-08 11:40:33 -04:00
|
|
|
\begin{memberdesc}[xmlparser]{ordered_attributes}
|
|
|
|
Setting this attribute to a non-zero integer causes the attributes to
|
|
|
|
be reported as a list rather than a dictionary. The attributes are
|
|
|
|
presented in the order found in the document text. For each
|
|
|
|
attribute, two list entries are presented: the attribute name and the
|
|
|
|
attribute value. (Older versions of this module also used this
|
|
|
|
format.) By default, this attribute is false; it may be changed at
|
|
|
|
any time.
|
|
|
|
\versionadded{2.1}
|
|
|
|
\end{memberdesc}
|
|
|
|
|
2000-10-29 01:10:30 -04:00
|
|
|
\begin{memberdesc}[xmlparser]{returns_unicode}
|
2001-02-08 11:40:33 -04:00
|
|
|
If this attribute is set to a non-zero integer, the handler functions
|
|
|
|
will be passed Unicode strings. If \member{returns_unicode} is 0,
|
|
|
|
8-bit strings containing UTF-8 encoded data will be passed to the
|
|
|
|
handlers.
|
2000-12-06 20:00:21 -04:00
|
|
|
\versionchanged[Can be changed at any time to affect the result
|
2001-09-20 17:43:28 -03:00
|
|
|
type]{1.6}
|
2000-10-29 01:10:30 -04:00
|
|
|
\end{memberdesc}
|
2000-08-17 20:15:21 -03:00
|
|
|
|
2001-02-08 11:40:33 -04:00
|
|
|
\begin{memberdesc}[xmlparser]{specified_attributes}
|
|
|
|
If set to a non-zero integer, the parser will report only those
|
|
|
|
attributes which were specified in the document instance and not those
|
|
|
|
which were derived from attribute declarations. Applications which
|
|
|
|
set this need to be especially careful to use what additional
|
|
|
|
information is available from the declarations as needed to comply
|
|
|
|
with the standards for the behavior of XML processors. By default,
|
|
|
|
this attribute is false; it may be changed at any time.
|
|
|
|
\versionadded{2.1}
|
|
|
|
\end{memberdesc}
|
|
|
|
|
2000-08-17 20:15:21 -03:00
|
|
|
The following attributes contain values relating to the most recent
|
|
|
|
error encountered by an \class{xmlparser} object, and will only have
|
|
|
|
correct values once a call to \method{Parse()} or \method{ParseFile()}
|
2001-02-15 01:37:51 -04:00
|
|
|
has raised a \exception{xml.parsers.expat.ExpatError} exception.
|
2000-06-10 23:42:07 -03:00
|
|
|
|
2000-10-29 01:10:30 -04:00
|
|
|
\begin{memberdesc}[xmlparser]{ErrorByteIndex}
|
2000-06-10 23:42:07 -03:00
|
|
|
Byte index at which an error occurred.
|
2000-10-29 01:10:30 -04:00
|
|
|
\end{memberdesc}
|
2000-06-10 23:42:07 -03:00
|
|
|
|
2000-10-29 01:10:30 -04:00
|
|
|
\begin{memberdesc}[xmlparser]{ErrorCode}
|
2000-06-10 23:42:07 -03:00
|
|
|
Numeric code specifying the problem. This value can be passed to the
|
|
|
|
\function{ErrorString()} function, or compared to one of the constants
|
2001-02-15 01:37:51 -04:00
|
|
|
defined in the \code{errors} object.
|
2000-10-29 01:10:30 -04:00
|
|
|
\end{memberdesc}
|
2000-06-10 23:42:07 -03:00
|
|
|
|
2000-10-29 01:10:30 -04:00
|
|
|
\begin{memberdesc}[xmlparser]{ErrorColumnNumber}
|
2000-06-10 23:42:07 -03:00
|
|
|
Column number at which an error occurred.
|
2000-10-29 01:10:30 -04:00
|
|
|
\end{memberdesc}
|
2000-06-10 23:42:07 -03:00
|
|
|
|
2000-10-29 01:10:30 -04:00
|
|
|
\begin{memberdesc}[xmlparser]{ErrorLineNumber}
|
2000-06-10 23:42:07 -03:00
|
|
|
Line number at which an error occurred.
|
2000-10-29 01:10:30 -04:00
|
|
|
\end{memberdesc}
|
2000-06-10 23:42:07 -03:00
|
|
|
|
|
|
|
Here is the list of handlers that can be set. To set a handler on an
|
2000-07-04 23:03:34 -03:00
|
|
|
\class{xmlparser} object \var{o}, use
|
|
|
|
\code{\var{o}.\var{handlername} = \var{func}}. \var{handlername} must
|
|
|
|
be taken from the following list, and \var{func} must be a callable
|
|
|
|
object accepting the correct number of arguments. The arguments are
|
|
|
|
all strings, unless otherwise stated.
|
2000-06-10 23:42:07 -03:00
|
|
|
|
2001-02-08 11:40:33 -04:00
|
|
|
\begin{methoddesc}[xmlparser]{XmlDeclHandler}{version, encoding, standalone}
|
|
|
|
Called when the XML declaration is parsed. The XML declaration is the
|
|
|
|
(optional) declaration of the applicable version of the XML
|
|
|
|
recommendation, the encoding of the document text, and an optional
|
|
|
|
``standalone'' declaration. \var{version} and \var{encoding} will be
|
|
|
|
strings of the type dictated by the \member{returns_unicode}
|
|
|
|
attribute, and \var{standalone} will be \code{1} if the document is
|
|
|
|
declared standalone, \code{0} if it is declared not to be standalone,
|
|
|
|
or \code{-1} if the standalone clause was omitted.
|
|
|
|
This is only available with Expat version 1.95.0 or newer.
|
|
|
|
\versionadded{2.1}
|
|
|
|
\end{methoddesc}
|
|
|
|
|
|
|
|
\begin{methoddesc}[xmlparser]{StartDoctypeDeclHandler}{doctypeName,
|
|
|
|
systemId, publicId,
|
|
|
|
has_internal_subset}
|
|
|
|
Called when Expat begins parsing the document type declaration
|
|
|
|
(\code{<!DOCTYPE \ldots}). The \var{doctypeName} is provided exactly
|
|
|
|
as presented. The \var{systemId} and \var{publicId} parameters give
|
|
|
|
the system and public identifiers if specified, or \code{None} if
|
|
|
|
omitted. \var{has_internal_subset} will be true if the document
|
|
|
|
contains and internal document declaration subset.
|
|
|
|
This requires Expat version 1.2 or newer.
|
|
|
|
\end{methoddesc}
|
|
|
|
|
|
|
|
\begin{methoddesc}[xmlparser]{EndDoctypeDeclHandler}{}
|
|
|
|
Called when Expat is done parsing the document type delaration.
|
|
|
|
This requires Expat version 1.2 or newer.
|
|
|
|
\end{methoddesc}
|
|
|
|
|
|
|
|
\begin{methoddesc}[xmlparser]{ElementDeclHandler}{name, model}
|
|
|
|
Called once for each element type declaration. \var{name} is the name
|
|
|
|
of the element type, and \var{model} is a representation of the
|
|
|
|
content model.
|
|
|
|
\end{methoddesc}
|
|
|
|
|
|
|
|
\begin{methoddesc}[xmlparser]{AttlistDeclHandler}{elname, attname,
|
|
|
|
type, default, required}
|
|
|
|
Called for each declared attribute for an element type. If an
|
|
|
|
attribute list declaration declares three attributes, this handler is
|
|
|
|
called three times, once for each attribute. \var{elname} is the name
|
|
|
|
of the element to which the declaration applies and \var{attname} is
|
|
|
|
the name of the attribute declared. The attribute type is a string
|
|
|
|
passed as \var{type}; the possible values are \code{'CDATA'},
|
|
|
|
\code{'ID'}, \code{'IDREF'}, ...
|
|
|
|
\var{default} gives the default value for the attribute used when the
|
|
|
|
attribute is not specified by the document instance, or \code{None} if
|
|
|
|
there is no default value (\code{\#IMPLIED} values). If the attribute
|
|
|
|
is required to be given in the document instance, \var{required} will
|
|
|
|
be true.
|
|
|
|
This requires Expat version 1.95.0 or newer.
|
|
|
|
\end{methoddesc}
|
|
|
|
|
2000-10-29 01:10:30 -04:00
|
|
|
\begin{methoddesc}[xmlparser]{StartElementHandler}{name, attributes}
|
2000-06-10 23:42:07 -03:00
|
|
|
Called for the start of every element. \var{name} is a string
|
|
|
|
containing the element name, and \var{attributes} is a dictionary
|
|
|
|
mapping attribute names to their values.
|
|
|
|
\end{methoddesc}
|
|
|
|
|
2000-10-29 01:10:30 -04:00
|
|
|
\begin{methoddesc}[xmlparser]{EndElementHandler}{name}
|
2000-06-10 23:42:07 -03:00
|
|
|
Called for the end of every element.
|
|
|
|
\end{methoddesc}
|
|
|
|
|
2000-10-29 01:10:30 -04:00
|
|
|
\begin{methoddesc}[xmlparser]{ProcessingInstructionHandler}{target, data}
|
2001-02-08 11:40:33 -04:00
|
|
|
Called for every processing instruction.
|
2000-06-10 23:42:07 -03:00
|
|
|
\end{methoddesc}
|
|
|
|
|
2000-10-29 01:10:30 -04:00
|
|
|
\begin{methoddesc}[xmlparser]{CharacterDataHandler}{data}
|
2001-02-08 11:40:33 -04:00
|
|
|
Called for character data. This will be called for normal character
|
|
|
|
data, CDATA marked content, and ignorable whitespace. Applications
|
|
|
|
which must distinguish these cases can use the
|
|
|
|
\member{StartCdataSectionHandler}, \member{EndCdataSectionHandler},
|
|
|
|
and \member{ElementDeclHandler} callbacks to collect the required
|
|
|
|
information.
|
2000-06-10 23:42:07 -03:00
|
|
|
\end{methoddesc}
|
|
|
|
|
2000-10-29 01:10:30 -04:00
|
|
|
\begin{methoddesc}[xmlparser]{UnparsedEntityDeclHandler}{entityName, base,
|
|
|
|
systemId, publicId,
|
|
|
|
notationName}
|
2001-02-08 11:40:33 -04:00
|
|
|
Called for unparsed (NDATA) entity declarations. This is only present
|
|
|
|
for version 1.2 of the Expat library; for more recent versions, use
|
|
|
|
\member{EntityDeclHandler} instead. (The underlying function in the
|
|
|
|
Expat library has been declared obsolete.)
|
|
|
|
\end{methoddesc}
|
|
|
|
|
|
|
|
\begin{methoddesc}[xmlparser]{EntityDeclHandler}{entityName,
|
|
|
|
is_parameter_entity, value,
|
|
|
|
base, systemId,
|
|
|
|
publicId,
|
|
|
|
notationName}
|
|
|
|
Called for all entity declarations. For parameter and internal
|
|
|
|
entities, \var{value} will be a string giving the declared contents
|
|
|
|
of the entity; this will be \code{None} for external entities. The
|
|
|
|
\var{notationName} parameter will be \code{None} for parsed entities,
|
|
|
|
and the name of the notation for unparsed entities.
|
|
|
|
\var{is_parameter_entity} will be true if the entity is a paremeter
|
|
|
|
entity or false for general entities (most applications only need to
|
|
|
|
be concerned with general entities).
|
|
|
|
This is only available starting with version 1.95.0 of the Expat
|
|
|
|
library.
|
|
|
|
\versionadded{2.1}
|
2000-06-10 23:42:07 -03:00
|
|
|
\end{methoddesc}
|
|
|
|
|
2000-10-29 01:10:30 -04:00
|
|
|
\begin{methoddesc}[xmlparser]{NotationDeclHandler}{notationName, base,
|
|
|
|
systemId, publicId}
|
2001-02-08 11:40:33 -04:00
|
|
|
Called for notation declarations. \var{notationName}, \var{base}, and
|
|
|
|
\var{systemId}, and \var{publicId} are strings if given. If the
|
|
|
|
public identifier is omitted, \var{publicId} will be \code{None}.
|
2000-06-10 23:42:07 -03:00
|
|
|
\end{methoddesc}
|
|
|
|
|
2000-10-29 01:10:30 -04:00
|
|
|
\begin{methoddesc}[xmlparser]{StartNamespaceDeclHandler}{prefix, uri}
|
2001-02-08 11:40:33 -04:00
|
|
|
Called when an element contains a namespace declaration. Namespace
|
|
|
|
declarations are processed before the \member{StartElementHandler} is
|
|
|
|
called for the element on which declarations are placed.
|
2000-06-10 23:42:07 -03:00
|
|
|
\end{methoddesc}
|
|
|
|
|
2000-10-29 01:10:30 -04:00
|
|
|
\begin{methoddesc}[xmlparser]{EndNamespaceDeclHandler}{prefix}
|
2000-06-10 23:42:07 -03:00
|
|
|
Called when the closing tag is reached for an element
|
2001-02-08 11:40:33 -04:00
|
|
|
that contained a namespace declaration. This is called once for each
|
|
|
|
namespace declaration on the element in the reverse of the order for
|
|
|
|
which the \member{StartNamespaceDeclHandler} was called to indicate
|
|
|
|
the start of each namespace declaration's scope. Calls to this
|
|
|
|
handler are made after the corresponding \member{EndElementHandler}
|
|
|
|
for the end of the element.
|
2000-06-10 23:42:07 -03:00
|
|
|
\end{methoddesc}
|
|
|
|
|
2000-10-29 01:10:30 -04:00
|
|
|
\begin{methoddesc}[xmlparser]{CommentHandler}{data}
|
2001-02-08 11:40:33 -04:00
|
|
|
Called for comments. \var{data} is the text of the comment, excluding
|
2001-02-15 01:37:51 -04:00
|
|
|
the leading `\code{<!-}\code{-}' and trailing `\code{-}\code{->}'.
|
2000-06-10 23:42:07 -03:00
|
|
|
\end{methoddesc}
|
|
|
|
|
2000-10-29 01:10:30 -04:00
|
|
|
\begin{methoddesc}[xmlparser]{StartCdataSectionHandler}{}
|
2001-02-08 11:40:33 -04:00
|
|
|
Called at the start of a CDATA section. This and
|
|
|
|
\member{StartCdataSectionHandler} are needed to be able to identify
|
|
|
|
the syntactical start and end for CDATA sections.
|
2000-06-10 23:42:07 -03:00
|
|
|
\end{methoddesc}
|
|
|
|
|
2000-10-29 01:10:30 -04:00
|
|
|
\begin{methoddesc}[xmlparser]{EndCdataSectionHandler}{}
|
2000-06-10 23:42:07 -03:00
|
|
|
Called at the end of a CDATA section.
|
|
|
|
\end{methoddesc}
|
|
|
|
|
2000-10-29 01:10:30 -04:00
|
|
|
\begin{methoddesc}[xmlparser]{DefaultHandler}{data}
|
2000-06-10 23:42:07 -03:00
|
|
|
Called for any characters in the XML document for
|
|
|
|
which no applicable handler has been specified. This means
|
|
|
|
characters that are part of a construct which could be reported, but
|
|
|
|
for which no handler has been supplied.
|
|
|
|
\end{methoddesc}
|
|
|
|
|
2000-10-29 01:10:30 -04:00
|
|
|
\begin{methoddesc}[xmlparser]{DefaultHandlerExpand}{data}
|
2000-06-10 23:42:07 -03:00
|
|
|
This is the same as the \function{DefaultHandler},
|
|
|
|
but doesn't inhibit expansion of internal entities.
|
|
|
|
The entity reference will not be passed to the default handler.
|
|
|
|
\end{methoddesc}
|
|
|
|
|
2001-02-08 11:40:33 -04:00
|
|
|
\begin{methoddesc}[xmlparser]{NotStandaloneHandler}{} Called if the
|
|
|
|
XML document hasn't been declared as being a standalone document.
|
|
|
|
This happens when there is an external subset or a reference to a
|
|
|
|
parameter entity, but the XML declaration does not set standalone to
|
|
|
|
\code{yes} in an XML declaration. If this handler returns \code{0},
|
|
|
|
then the parser will throw an \constant{XML_ERROR_NOT_STANDALONE}
|
|
|
|
error. If this handler is not set, no exception is raised by the
|
|
|
|
parser for this condition.
|
2000-06-10 23:42:07 -03:00
|
|
|
\end{methoddesc}
|
|
|
|
|
2000-10-29 01:10:30 -04:00
|
|
|
\begin{methoddesc}[xmlparser]{ExternalEntityRefHandler}{context, base,
|
|
|
|
systemId, publicId}
|
2001-02-08 11:40:33 -04:00
|
|
|
Called for references to external entities. \var{base} is the current
|
|
|
|
base, as set by a previous call to \method{SetBase()}. The public and
|
|
|
|
system identifiers, \var{systemId} and \var{publicId}, are strings if
|
|
|
|
given; if the public identifier is not given, \var{publicId} will be
|
2001-02-15 01:37:51 -04:00
|
|
|
\code{None}. The \var{context} value is opaque and should only be
|
|
|
|
used as described below.
|
2001-02-08 11:40:33 -04:00
|
|
|
|
|
|
|
For external entities to be parsed, this handler must be implemented.
|
|
|
|
It is responsible for creating the sub-parser using
|
2001-02-15 01:37:51 -04:00
|
|
|
\code{ExternalEntityParserCreate(\var{context})}, initializing it with
|
|
|
|
the appropriate callbacks, and parsing the entity. This handler
|
|
|
|
should return an integer; if it returns \code{0}, the parser will
|
|
|
|
throw an \constant{XML_ERROR_EXTERNAL_ENTITY_HANDLING} error,
|
|
|
|
otherwise parsing will continue.
|
2001-02-08 11:40:33 -04:00
|
|
|
|
|
|
|
If this handler is not provided, external entities are reported by the
|
|
|
|
\member{DefaultHandler} callback, if provided.
|
2000-06-10 23:42:07 -03:00
|
|
|
\end{methoddesc}
|
|
|
|
|
|
|
|
|
2001-02-14 14:54:32 -04:00
|
|
|
\subsection{ExpatError Exceptions \label{expaterror-objects}}
|
|
|
|
\sectionauthor{Fred L. Drake, Jr.}{fdrake@acm.org}
|
|
|
|
|
|
|
|
\exception{ExpatError} exceptions have a number of interesting
|
|
|
|
attributes:
|
|
|
|
|
|
|
|
\begin{memberdesc}[ExpatError]{code}
|
|
|
|
Expat's internal error number for the specific error. This will
|
|
|
|
match one of the constants defined in the \code{errors} object from
|
|
|
|
this module.
|
|
|
|
\versionadded{2.1}
|
|
|
|
\end{memberdesc}
|
|
|
|
|
|
|
|
\begin{memberdesc}[ExpatError]{lineno}
|
|
|
|
Line number on which the error was detected. The first line is
|
|
|
|
numbered \code{1}.
|
|
|
|
\versionadded{2.1}
|
|
|
|
\end{memberdesc}
|
|
|
|
|
|
|
|
\begin{memberdesc}[ExpatError]{offset}
|
|
|
|
Character offset into the line where the error occurred. The first
|
|
|
|
column is numbered \code{0}.
|
|
|
|
\versionadded{2.1}
|
|
|
|
\end{memberdesc}
|
|
|
|
|
|
|
|
|
2000-09-23 01:47:56 -03:00
|
|
|
\subsection{Example \label{expat-example}}
|
2000-06-10 23:42:07 -03:00
|
|
|
|
2000-07-04 23:03:34 -03:00
|
|
|
The following program defines three handlers that just print out their
|
2000-06-10 23:42:07 -03:00
|
|
|
arguments.
|
|
|
|
|
|
|
|
\begin{verbatim}
|
2000-09-23 01:47:56 -03:00
|
|
|
import xml.parsers.expat
|
2000-06-10 23:42:07 -03:00
|
|
|
|
|
|
|
# 3 handler functions
|
|
|
|
def start_element(name, attrs):
|
|
|
|
print 'Start element:', name, attrs
|
|
|
|
def end_element(name):
|
|
|
|
print 'End element:', name
|
|
|
|
def char_data(data):
|
|
|
|
print 'Character data:', repr(data)
|
|
|
|
|
2000-09-23 01:47:56 -03:00
|
|
|
p = xml.parsers.expat.ParserCreate()
|
2000-06-10 23:42:07 -03:00
|
|
|
|
|
|
|
p.StartElementHandler = start_element
|
2000-09-23 01:47:56 -03:00
|
|
|
p.EndElementHandler = end_element
|
|
|
|
p.CharacterDataHandler = char_data
|
2000-06-10 23:42:07 -03:00
|
|
|
|
|
|
|
p.Parse("""<?xml version="1.0"?>
|
|
|
|
<parent id="top"><child1 name="paul">Text goes here</child1>
|
|
|
|
<child2 name="fred">More text</child2>
|
|
|
|
</parent>""")
|
|
|
|
\end{verbatim}
|
|
|
|
|
|
|
|
The output from this program is:
|
|
|
|
|
|
|
|
\begin{verbatim}
|
|
|
|
Start element: parent {'id': 'top'}
|
|
|
|
Start element: child1 {'name': 'paul'}
|
|
|
|
Character data: 'Text goes here'
|
|
|
|
End element: child1
|
2001-01-24 13:19:08 -04:00
|
|
|
Character data: '\n'
|
2000-06-10 23:42:07 -03:00
|
|
|
Start element: child2 {'name': 'fred'}
|
|
|
|
Character data: 'More text'
|
|
|
|
End element: child2
|
2001-01-24 13:19:08 -04:00
|
|
|
Character data: '\n'
|
2000-06-10 23:42:07 -03:00
|
|
|
End element: parent
|
|
|
|
\end{verbatim}
|
2000-07-04 23:03:34 -03:00
|
|
|
|
|
|
|
|
2001-02-08 11:40:33 -04:00
|
|
|
\subsection{Content Model Descriptions \label{expat-content-models}}
|
|
|
|
\sectionauthor{Fred L. Drake, Jr.}{fdrake@acm.org}
|
|
|
|
|
|
|
|
Content modules are described using nested tuples. Each tuple
|
|
|
|
contains four values: the type, the quantifier, the name, and a tuple
|
|
|
|
of children. Children are simply additional content module
|
|
|
|
descriptions.
|
|
|
|
|
|
|
|
The values of the first two fields are constants defined in the
|
|
|
|
\code{model} object of the \module{xml.parsers.expat} module. These
|
|
|
|
constants can be collected in two groups: the model type group and the
|
|
|
|
quantifier group.
|
|
|
|
|
|
|
|
The constants in the model type group are:
|
|
|
|
|
|
|
|
\begin{datadescni}{XML_CTYPE_ANY}
|
|
|
|
The element named by the model name was declared to have a content
|
|
|
|
model of \code{ANY}.
|
|
|
|
\end{datadescni}
|
|
|
|
|
|
|
|
\begin{datadescni}{XML_CTYPE_CHOICE}
|
|
|
|
The named element allows a choice from a number of options; this is
|
|
|
|
used for content models such as \code{(A | B | C)}.
|
|
|
|
\end{datadescni}
|
|
|
|
|
|
|
|
\begin{datadescni}{XML_CTYPE_EMPTY}
|
|
|
|
Elements which are declared to be \code{EMPTY} have this model type.
|
|
|
|
\end{datadescni}
|
|
|
|
|
|
|
|
\begin{datadescni}{XML_CTYPE_MIXED}
|
|
|
|
\end{datadescni}
|
|
|
|
|
|
|
|
\begin{datadescni}{XML_CTYPE_NAME}
|
|
|
|
\end{datadescni}
|
|
|
|
|
|
|
|
\begin{datadescni}{XML_CTYPE_SEQ}
|
|
|
|
Models which represent a series of models which follow one after the
|
|
|
|
other are indicated with this model type. This is used for models
|
|
|
|
such as \code{(A, B, C)}.
|
|
|
|
\end{datadescni}
|
|
|
|
|
|
|
|
|
|
|
|
The constants in the quantifier group are:
|
|
|
|
|
|
|
|
\begin{datadescni}{XML_CQUANT_NONE}
|
2001-09-20 17:43:28 -03:00
|
|
|
No modifier is given, so it can appear exactly once, as for \code{A}.
|
2001-02-08 11:40:33 -04:00
|
|
|
\end{datadescni}
|
|
|
|
|
|
|
|
\begin{datadescni}{XML_CQUANT_OPT}
|
2001-09-20 17:43:28 -03:00
|
|
|
The model is optional: it can appear once or not at all, as for
|
2001-02-08 11:40:33 -04:00
|
|
|
\code{A?}.
|
|
|
|
\end{datadescni}
|
|
|
|
|
|
|
|
\begin{datadescni}{XML_CQUANT_PLUS}
|
2001-09-20 17:43:28 -03:00
|
|
|
The model must occur one or more times (like \code{A+}).
|
2001-02-08 11:40:33 -04:00
|
|
|
\end{datadescni}
|
|
|
|
|
|
|
|
\begin{datadescni}{XML_CQUANT_REP}
|
|
|
|
The model must occur zero or more times, as for \code{A*}.
|
|
|
|
\end{datadescni}
|
|
|
|
|
|
|
|
|
2000-09-23 01:47:56 -03:00
|
|
|
\subsection{Expat error constants \label{expat-errors}}
|
2000-07-04 23:03:34 -03:00
|
|
|
\sectionauthor{A.M. Kuchling}{amk1@bigfoot.com}
|
|
|
|
|
2001-02-14 14:54:32 -04:00
|
|
|
The following constants are provided in the \code{errors} object of
|
|
|
|
the \refmodule{xml.parsers.expat} module. These constants are useful
|
|
|
|
in interpreting some of the attributes of the \exception{ExpatError}
|
|
|
|
exception objects raised when an error has occurred.
|
2000-07-04 23:03:34 -03:00
|
|
|
|
2000-09-23 01:47:56 -03:00
|
|
|
The \code{errors} object has the following attributes:
|
2000-07-04 23:03:34 -03:00
|
|
|
|
2001-02-08 11:40:33 -04:00
|
|
|
\begin{datadescni}{XML_ERROR_ASYNC_ENTITY}
|
|
|
|
\end{datadescni}
|
2000-07-11 13:30:30 -03:00
|
|
|
|
2001-02-08 11:40:33 -04:00
|
|
|
\begin{datadescni}{XML_ERROR_ATTRIBUTE_EXTERNAL_ENTITY_REF}
|
|
|
|
An entity reference in an attribute value referred to an external
|
|
|
|
entity instead of an internal entity.
|
|
|
|
\end{datadescni}
|
2000-07-11 13:30:30 -03:00
|
|
|
|
2001-02-08 11:40:33 -04:00
|
|
|
\begin{datadescni}{XML_ERROR_BAD_CHAR_REF}
|
2001-09-20 17:43:28 -03:00
|
|
|
A character reference referred to a character which is illegal in XML
|
|
|
|
(for example, character \code{0}, or `\code{\&\#0;}'.
|
2001-02-08 11:40:33 -04:00
|
|
|
\end{datadescni}
|
2000-07-11 13:30:30 -03:00
|
|
|
|
2001-02-08 11:40:33 -04:00
|
|
|
\begin{datadescni}{XML_ERROR_BINARY_ENTITY_REF}
|
2001-09-20 17:43:28 -03:00
|
|
|
An entity reference referred to an entity which was declared with a
|
|
|
|
notation, so cannot be parsed.
|
2001-02-08 11:40:33 -04:00
|
|
|
\end{datadescni}
|
2000-07-11 13:30:30 -03:00
|
|
|
|
2001-02-08 11:40:33 -04:00
|
|
|
\begin{datadescni}{XML_ERROR_DUPLICATE_ATTRIBUTE}
|
2000-07-11 13:30:30 -03:00
|
|
|
An attribute was used more than once in a start tag.
|
2001-02-08 11:40:33 -04:00
|
|
|
\end{datadescni}
|
2000-07-11 13:30:30 -03:00
|
|
|
|
2001-02-08 11:40:33 -04:00
|
|
|
\begin{datadescni}{XML_ERROR_INCORRECT_ENCODING}
|
|
|
|
\end{datadescni}
|
2000-07-11 13:30:30 -03:00
|
|
|
|
2001-02-08 11:40:33 -04:00
|
|
|
\begin{datadescni}{XML_ERROR_INVALID_TOKEN}
|
2001-09-20 17:43:28 -03:00
|
|
|
Raised when an input byte could not properly be assigned to a
|
|
|
|
character; for example, a NUL byte (value \code{0}) in a UTF-8 input
|
|
|
|
stream.
|
2001-02-08 11:40:33 -04:00
|
|
|
\end{datadescni}
|
2000-07-11 13:30:30 -03:00
|
|
|
|
2001-02-08 11:40:33 -04:00
|
|
|
\begin{datadescni}{XML_ERROR_JUNK_AFTER_DOC_ELEMENT}
|
2000-07-11 13:30:30 -03:00
|
|
|
Something other than whitespace occurred after the document element.
|
2001-02-08 11:40:33 -04:00
|
|
|
\end{datadescni}
|
2000-07-11 13:30:30 -03:00
|
|
|
|
2001-02-08 11:40:33 -04:00
|
|
|
\begin{datadescni}{XML_ERROR_MISPLACED_XML_PI}
|
2001-09-20 17:43:28 -03:00
|
|
|
An XML declaration was found somewhere other than the start of the
|
|
|
|
input data.
|
2001-02-08 11:40:33 -04:00
|
|
|
\end{datadescni}
|
2000-07-11 13:30:30 -03:00
|
|
|
|
2001-02-08 11:40:33 -04:00
|
|
|
\begin{datadescni}{XML_ERROR_NO_ELEMENTS}
|
2001-09-20 17:43:28 -03:00
|
|
|
The document contains no elements (XML requires all documents to
|
|
|
|
contain exactly one top-level element)..
|
2001-02-08 11:40:33 -04:00
|
|
|
\end{datadescni}
|
2000-07-11 13:30:30 -03:00
|
|
|
|
2001-02-08 11:40:33 -04:00
|
|
|
\begin{datadescni}{XML_ERROR_NO_MEMORY}
|
2000-07-11 13:30:30 -03:00
|
|
|
Expat was not able to allocate memory internally.
|
2001-02-08 11:40:33 -04:00
|
|
|
\end{datadescni}
|
2000-07-11 13:30:30 -03:00
|
|
|
|
2001-02-08 11:40:33 -04:00
|
|
|
\begin{datadescni}{XML_ERROR_PARAM_ENTITY_REF}
|
2001-09-20 17:43:28 -03:00
|
|
|
A parameter entity reference was found where it was not allowed.
|
2001-02-08 11:40:33 -04:00
|
|
|
\end{datadescni}
|
2000-07-11 13:30:30 -03:00
|
|
|
|
2001-02-08 11:40:33 -04:00
|
|
|
\begin{datadescni}{XML_ERROR_PARTIAL_CHAR}
|
2001-09-20 17:43:28 -03:00
|
|
|
|
2001-02-08 11:40:33 -04:00
|
|
|
\end{datadescni}
|
2000-07-11 13:30:30 -03:00
|
|
|
|
2001-02-08 11:40:33 -04:00
|
|
|
\begin{datadescni}{XML_ERROR_RECURSIVE_ENTITY_REF}
|
2001-09-20 17:43:28 -03:00
|
|
|
An entity reference contained another reference to the same entity;
|
|
|
|
possibly via a different name, and possibly indirectly.
|
2001-02-08 11:40:33 -04:00
|
|
|
\end{datadescni}
|
2000-07-11 13:30:30 -03:00
|
|
|
|
2001-02-08 11:40:33 -04:00
|
|
|
\begin{datadescni}{XML_ERROR_SYNTAX}
|
2000-07-11 13:30:30 -03:00
|
|
|
Some unspecified syntax error was encountered.
|
2001-02-08 11:40:33 -04:00
|
|
|
\end{datadescni}
|
2000-07-11 13:30:30 -03:00
|
|
|
|
2001-02-08 11:40:33 -04:00
|
|
|
\begin{datadescni}{XML_ERROR_TAG_MISMATCH}
|
2000-07-11 13:30:30 -03:00
|
|
|
An end tag did not match the innermost open start tag.
|
2001-02-08 11:40:33 -04:00
|
|
|
\end{datadescni}
|
2000-07-11 13:30:30 -03:00
|
|
|
|
2001-02-08 11:40:33 -04:00
|
|
|
\begin{datadescni}{XML_ERROR_UNCLOSED_TOKEN}
|
2001-09-20 17:43:28 -03:00
|
|
|
Some token (such as a start tag) was not closed before the end of the
|
|
|
|
stream or the next token was encountered.
|
2001-02-08 11:40:33 -04:00
|
|
|
\end{datadescni}
|
2000-07-11 13:30:30 -03:00
|
|
|
|
2001-02-08 11:40:33 -04:00
|
|
|
\begin{datadescni}{XML_ERROR_UNDEFINED_ENTITY}
|
2000-07-11 13:30:30 -03:00
|
|
|
A reference was made to a entity which was not defined.
|
2001-02-08 11:40:33 -04:00
|
|
|
\end{datadescni}
|
2000-07-11 13:30:30 -03:00
|
|
|
|
2001-02-08 11:40:33 -04:00
|
|
|
\begin{datadescni}{XML_ERROR_UNKNOWN_ENCODING}
|
2000-07-11 13:30:30 -03:00
|
|
|
The document encoding is not supported by Expat.
|
2001-02-08 11:40:33 -04:00
|
|
|
\end{datadescni}
|