892 lines
35 KiB
TeX
892 lines
35 KiB
TeX
\section{\module{xml.dom} ---
|
|
The Document Object Model API}
|
|
|
|
\declaremodule{standard}{xml.dom}
|
|
\modulesynopsis{Document Object Model API for Python.}
|
|
\sectionauthor{Paul Prescod}{paul@prescod.net}
|
|
\sectionauthor{Martin v. L\"owis}{loewis@informatik.hu-berlin.de}
|
|
|
|
\versionadded{2.0}
|
|
|
|
The Document Object Model, or ``DOM,'' is a cross-language API from
|
|
the World Wide Web Consortium (W3C) for accessing and modifying XML
|
|
documents. A DOM implementation presents an XML document as a tree
|
|
structure, or allows client code to build such a structure from
|
|
scratch. It then gives access to the structure through a set of
|
|
objects which provided well-known interfaces.
|
|
|
|
The DOM is extremely useful for random-access applications. SAX only
|
|
allows you a view of one bit of the document at a time. If you are
|
|
looking at one SAX element, you have no access to another. If you are
|
|
looking at a text node, you have no access to a containing element.
|
|
When you write a SAX application, you need to keep track of your
|
|
program's position in the document somewhere in your own code. SAX
|
|
does not do it for you. Also, if you need to look ahead in the XML
|
|
document, you are just out of luck.
|
|
|
|
Some applications are simply impossible in an event driven model with
|
|
no access to a tree. Of course you could build some sort of tree
|
|
yourself in SAX events, but the DOM allows you to avoid writing that
|
|
code. The DOM is a standard tree representation for XML data.
|
|
|
|
%What if your needs are somewhere between SAX and the DOM? Perhaps
|
|
%you cannot afford to load the entire tree in memory but you find the
|
|
%SAX model somewhat cumbersome and low-level. There is also a module
|
|
%called xml.dom.pulldom that allows you to build trees of only the
|
|
%parts of a document that you need structured access to. It also has
|
|
%features that allow you to find your way around the DOM.
|
|
% See http://www.prescod.net/python/pulldom
|
|
|
|
The Document Object Model is being defined by the W3C in stages, or
|
|
``levels'' in their terminology. The Python mapping of the API is
|
|
substantially based on the DOM Level~2 recommendation. The mapping of
|
|
the Level~3 specification, currently only available in draft form, is
|
|
being developed by the \ulink{Python XML Special Interest
|
|
Group}{http://www.python.org/sigs/xml-sig/} as part of the
|
|
\ulink{PyXML package}{http://pyxml.sourceforge.net/}. Refer to the
|
|
documentation bundled with that package for information on the current
|
|
state of DOM Level~3 support.
|
|
|
|
DOM applications typically start by parsing some XML into a DOM. How
|
|
this is accomplished is not covered at all by DOM Level~1, and Level~2
|
|
provides only limited improvements: There is a
|
|
\class{DOMImplementation} object class which provides access to
|
|
\class{Document} creation methods, but no way to access an XML
|
|
reader/parser/Document builder in an implementation-independent way.
|
|
There is also no well-defined way to access these methods without an
|
|
existing \class{Document} object. In Python, each DOM implementation
|
|
will provide a function \function{getDOMImplementation()}. DOM Level~3
|
|
adds a Load/Store specification, which defines an interface to the
|
|
reader, but this is not yet available in the Python standard library.
|
|
|
|
Once you have a DOM document object, you can access the parts of your
|
|
XML document through its properties and methods. These properties are
|
|
defined in the DOM specification; this portion of the reference manual
|
|
describes the interpretation of the specification in Python.
|
|
|
|
The specification provided by the W3C defines the DOM API for Java,
|
|
ECMAScript, and OMG IDL. The Python mapping defined here is based in
|
|
large part on the IDL version of the specification, but strict
|
|
compliance is not required (though implementations are free to support
|
|
the strict mapping from IDL). See section \ref{dom-conformance},
|
|
``Conformance,'' for a detailed discussion of mapping requirements.
|
|
|
|
|
|
\begin{seealso}
|
|
\seetitle[http://www.w3.org/TR/DOM-Level-2-Core/]{Document Object
|
|
Model (DOM) Level~2 Specification}
|
|
{The W3C recommendation upon which the Python DOM API is
|
|
based.}
|
|
\seetitle[http://www.w3.org/TR/REC-DOM-Level-1/]{Document Object
|
|
Model (DOM) Level~1 Specification}
|
|
{The W3C recommendation for the
|
|
DOM supported by \module{xml.dom.minidom}.}
|
|
\seetitle[http://pyxml.sourceforge.net]{PyXML}{Users that require a
|
|
full-featured implementation of DOM should use the PyXML
|
|
package.}
|
|
\seetitle[http://cgi.omg.org/cgi-bin/doc?orbos/99-08-02.pdf]{CORBA
|
|
Scripting with Python}
|
|
{This specifies the mapping from OMG IDL to Python.}
|
|
\end{seealso}
|
|
|
|
\subsection{Module Contents}
|
|
|
|
The \module{xml.dom} contains the following functions:
|
|
|
|
\begin{funcdesc}{registerDOMImplementation}{name, factory}
|
|
Register the \var{factory} function with the name \var{name}. The
|
|
factory function should return an object which implements the
|
|
\class{DOMImplementation} interface. The factory function can return
|
|
the same object every time, or a new one for each call, as appropriate
|
|
for the specific implementation (e.g. if that implementation supports
|
|
some customization).
|
|
\end{funcdesc}
|
|
|
|
\begin{funcdesc}{getDOMImplementation}{\optional{name\optional{, features}}}
|
|
Return a suitable DOM implementation. The \var{name} is either
|
|
well-known, the module name of a DOM implementation, or
|
|
\code{None}. If it is not \code{None}, imports the corresponding
|
|
module and returns a \class{DOMImplementation} object if the import
|
|
succeeds. If no name is given, and if the environment variable
|
|
\envvar{PYTHON_DOM} is set, this variable is used to find the
|
|
implementation.
|
|
|
|
If name is not given, this examines the available implementations to
|
|
find one with the required feature set. If no implementation can be
|
|
found, raise an \exception{ImportError}. The features list must be a
|
|
sequence of \code{(\var{feature}, \var{version})} pairs which are
|
|
passed to the \method{hasFeature()} method on available
|
|
\class{DOMImplementation} objects.
|
|
\end{funcdesc}
|
|
|
|
|
|
Some convenience constants are also provided:
|
|
|
|
\begin{datadesc}{EMPTY_NAMESPACE}
|
|
The value used to indicate that no namespace is associated with a
|
|
node in the DOM. This is typically found as the
|
|
\member{namespaceURI} of a node, or used as the \var{namespaceURI}
|
|
parameter to a namespaces-specific method.
|
|
\versionadded{2.2}
|
|
\end{datadesc}
|
|
|
|
\begin{datadesc}{XML_NAMESPACE}
|
|
The namespace URI associated with the reserved prefix \code{xml}, as
|
|
defined by
|
|
\citetitle[http://www.w3.org/TR/REC-xml-names/]{Namespaces in XML}
|
|
(section~4).
|
|
\versionadded{2.2}
|
|
\end{datadesc}
|
|
|
|
\begin{datadesc}{XMLNS_NAMESPACE}
|
|
The namespace URI for namespace declarations, as defined by
|
|
\citetitle[http://www.w3.org/TR/DOM-Level-2-Core/core.html]{Document
|
|
Object Model (DOM) Level~2 Core Specification} (section~1.1.8).
|
|
\versionadded{2.2}
|
|
\end{datadesc}
|
|
|
|
\begin{datadesc}{XHTML_NAMESPACE}
|
|
The URI of the XHTML namespace as defined by
|
|
\citetitle[http://www.w3.org/TR/xhtml1/]{XHTML 1.0: The Extensible
|
|
HyperText Markup Language} (section~3.1.1).
|
|
\versionadded{2.2}
|
|
\end{datadesc}
|
|
|
|
|
|
% Should the Node documentation go here?
|
|
|
|
In addition, \module{xml.dom} contains a base \class{Node} class and
|
|
the DOM exception classes. The \class{Node} class provided by this
|
|
module does not implement any of the methods or attributes defined by
|
|
the DOM specification; concrete DOM implementations must provide
|
|
those. The \class{Node} class provided as part of this module does
|
|
provide the constants used for the \member{nodeType} attribute on
|
|
concrete \class{Node} objects; they are located within the class
|
|
rather than at the module level to conform with the DOM
|
|
specifications.
|
|
|
|
|
|
\subsection{Objects in the DOM \label{dom-objects}}
|
|
|
|
The definitive documentation for the DOM is the DOM specification from
|
|
the W3C.
|
|
|
|
Note that DOM attributes may also be manipulated as nodes instead of
|
|
as simple strings. It is fairly rare that you must do this, however,
|
|
so this usage is not yet documented.
|
|
|
|
|
|
\begin{tableiii}{l|l|l}{class}{Interface}{Section}{Purpose}
|
|
\lineiii{DOMImplementation}{\ref{dom-implementation-objects}}
|
|
{Interface to the underlying implementation.}
|
|
\lineiii{Node}{\ref{dom-node-objects}}
|
|
{Base interface for most objects in a document.}
|
|
\lineiii{NodeList}{\ref{dom-nodelist-objects}}
|
|
{Interface for a sequence of nodes.}
|
|
\lineiii{DocumentType}{\ref{dom-documenttype-objects}}
|
|
{Information about the declarations needed to process a document.}
|
|
\lineiii{Document}{\ref{dom-document-objects}}
|
|
{Object which represents an entire document.}
|
|
\lineiii{Element}{\ref{dom-element-objects}}
|
|
{Element nodes in the document hierarchy.}
|
|
\lineiii{Attr}{\ref{dom-attr-objects}}
|
|
{Attribute value nodes on element nodes.}
|
|
\lineiii{Comment}{\ref{dom-comment-objects}}
|
|
{Representation of comments in the source document.}
|
|
\lineiii{Text}{\ref{dom-text-objects}}
|
|
{Nodes containing textual content from the document.}
|
|
\lineiii{ProcessingInstruction}{\ref{dom-pi-objects}}
|
|
{Processing instruction representation.}
|
|
\end{tableiii}
|
|
|
|
An additional section describes the exceptions defined for working
|
|
with the DOM in Python.
|
|
|
|
|
|
\subsubsection{DOMImplementation Objects
|
|
\label{dom-implementation-objects}}
|
|
|
|
The \class{DOMImplementation} interface provides a way for
|
|
applications to determine the availability of particular features in
|
|
the DOM they are using. DOM Level~2 added the ability to create new
|
|
\class{Document} and \class{DocumentType} objects using the
|
|
\class{DOMImplementation} as well.
|
|
|
|
\begin{methoddesc}[DOMImplementation]{hasFeature}{feature, version}
|
|
\end{methoddesc}
|
|
|
|
|
|
\subsubsection{Node Objects \label{dom-node-objects}}
|
|
|
|
All of the components of an XML document are subclasses of
|
|
\class{Node}.
|
|
|
|
\begin{memberdesc}[Node]{nodeType}
|
|
An integer representing the node type. Symbolic constants for the
|
|
types are on the \class{Node} object:
|
|
\constant{ELEMENT_NODE}, \constant{ATTRIBUTE_NODE},
|
|
\constant{TEXT_NODE}, \constant{CDATA_SECTION_NODE},
|
|
\constant{ENTITY_NODE}, \constant{PROCESSING_INSTRUCTION_NODE},
|
|
\constant{COMMENT_NODE}, \constant{DOCUMENT_NODE},
|
|
\constant{DOCUMENT_TYPE_NODE}, \constant{NOTATION_NODE}.
|
|
This is a read-only attribute.
|
|
\end{memberdesc}
|
|
|
|
\begin{memberdesc}[Node]{parentNode}
|
|
The parent of the current node, or \code{None} for the document node.
|
|
The value is always a \class{Node} object or \code{None}. For
|
|
\class{Element} nodes, this will be the parent element, except for the
|
|
root element, in which case it will be the \class{Document} object.
|
|
For \class{Attr} nodes, this is always \code{None}.
|
|
This is a read-only attribute.
|
|
\end{memberdesc}
|
|
|
|
\begin{memberdesc}[Node]{attributes}
|
|
A \class{NamedNodeMap} of attribute objects. Only elements have
|
|
actual values for this; others provide \code{None} for this attribute.
|
|
This is a read-only attribute.
|
|
\end{memberdesc}
|
|
|
|
\begin{memberdesc}[Node]{previousSibling}
|
|
The node that immediately precedes this one with the same parent. For
|
|
instance the element with an end-tag that comes just before the
|
|
\var{self} element's start-tag. Of course, XML documents are made
|
|
up of more than just elements so the previous sibling could be text, a
|
|
comment, or something else. If this node is the first child of the
|
|
parent, this attribute will be \code{None}.
|
|
This is a read-only attribute.
|
|
\end{memberdesc}
|
|
|
|
\begin{memberdesc}[Node]{nextSibling}
|
|
The node that immediately follows this one with the same parent. See
|
|
also \member{previousSibling}. If this is the last child of the
|
|
parent, this attribute will be \code{None}.
|
|
This is a read-only attribute.
|
|
\end{memberdesc}
|
|
|
|
\begin{memberdesc}[Node]{childNodes}
|
|
A list of nodes contained within this node.
|
|
This is a read-only attribute.
|
|
\end{memberdesc}
|
|
|
|
\begin{memberdesc}[Node]{firstChild}
|
|
The first child of the node, if there are any, or \code{None}.
|
|
This is a read-only attribute.
|
|
\end{memberdesc}
|
|
|
|
\begin{memberdesc}[Node]{lastChild}
|
|
The last child of the node, if there are any, or \code{None}.
|
|
This is a read-only attribute.
|
|
\end{memberdesc}
|
|
|
|
\begin{memberdesc}[Node]{localName}
|
|
The part of the \member{tagName} following the colon if there is one,
|
|
else the entire \member{tagName}. The value is a string.
|
|
\end{memberdesc}
|
|
|
|
\begin{memberdesc}[Node]{prefix}
|
|
The part of the \member{tagName} preceding the colon if there is one,
|
|
else the empty string. The value is a string, or \code{None}
|
|
\end{memberdesc}
|
|
|
|
\begin{memberdesc}[Node]{namespaceURI}
|
|
The namespace associated with the element name. This will be a
|
|
string or \code{None}. This is a read-only attribute.
|
|
\end{memberdesc}
|
|
|
|
\begin{memberdesc}[Node]{nodeName}
|
|
This has a different meaning for each node type; see the DOM
|
|
specification for details. You can always get the information you
|
|
would get here from another property such as the \member{tagName}
|
|
property for elements or the \member{name} property for attributes.
|
|
For all node types, the value of this attribute will be either a
|
|
string or \code{None}. This is a read-only attribute.
|
|
\end{memberdesc}
|
|
|
|
\begin{memberdesc}[Node]{nodeValue}
|
|
This has a different meaning for each node type; see the DOM
|
|
specification for details. The situation is similar to that with
|
|
\member{nodeName}. The value is a string or \code{None}.
|
|
\end{memberdesc}
|
|
|
|
\begin{methoddesc}[Node]{hasAttributes}{}
|
|
Returns true if the node has any attributes.
|
|
\end{methoddesc}
|
|
|
|
\begin{methoddesc}[Node]{hasChildNodes}{}
|
|
Returns true if the node has any child nodes.
|
|
\end{methoddesc}
|
|
|
|
\begin{methoddesc}[Node]{isSameNode}{other}
|
|
Returns true if \var{other} refers to the same node as this node.
|
|
This is especially useful for DOM implementations which use any sort
|
|
of proxy architecture (because more than one object can refer to the
|
|
same node).
|
|
|
|
\begin{notice}
|
|
This is based on a proposed DOM Level~3 API which is still in the
|
|
``working draft'' stage, but this particular interface appears
|
|
uncontroversial. Changes from the W3C will not necessarily affect
|
|
this method in the Python DOM interface (though any new W3C API for
|
|
this would also be supported).
|
|
\end{notice}
|
|
\end{methoddesc}
|
|
|
|
\begin{methoddesc}[Node]{appendChild}{newChild}
|
|
Add a new child node to this node at the end of the list of children,
|
|
returning \var{newChild}.
|
|
\end{methoddesc}
|
|
|
|
\begin{methoddesc}[Node]{insertBefore}{newChild, refChild}
|
|
Insert a new child node before an existing child. It must be the case
|
|
that \var{refChild} is a child of this node; if not,
|
|
\exception{ValueError} is raised. \var{newChild} is returned.
|
|
\end{methoddesc}
|
|
|
|
\begin{methoddesc}[Node]{removeChild}{oldChild}
|
|
Remove a child node. \var{oldChild} must be a child of this node; if
|
|
not, \exception{ValueError} is raised. \var{oldChild} is returned on
|
|
success. If \var{oldChild} will not be used further, its
|
|
\method{unlink()} method should be called.
|
|
\end{methoddesc}
|
|
|
|
\begin{methoddesc}[Node]{replaceChild}{newChild, oldChild}
|
|
Replace an existing node with a new node. It must be the case that
|
|
\var{oldChild} is a child of this node; if not,
|
|
\exception{ValueError} is raised.
|
|
\end{methoddesc}
|
|
|
|
\begin{methoddesc}[Node]{normalize}{}
|
|
Join adjacent text nodes so that all stretches of text are stored as
|
|
single \class{Text} instances. This simplifies processing text from a
|
|
DOM tree for many applications.
|
|
\versionadded{2.1}
|
|
\end{methoddesc}
|
|
|
|
\begin{methoddesc}[Node]{cloneNode}{deep}
|
|
Clone this node. Setting \var{deep} means to clone all child nodes as
|
|
well. This returns the clone.
|
|
\end{methoddesc}
|
|
|
|
|
|
\subsubsection{NodeList Objects \label{dom-nodelist-objects}}
|
|
|
|
A \class{NodeList} represents a sequence of nodes. These objects are
|
|
used in two ways in the DOM Core recommendation: the
|
|
\class{Element} objects provides one as it's list of child nodes, and
|
|
the \method{getElementsByTagName()} and
|
|
\method{getElementsByTagNameNS()} methods of \class{Node} return
|
|
objects with this interface to represent query results.
|
|
|
|
The DOM Level~2 recommendation defines one method and one attribute
|
|
for these objects:
|
|
|
|
\begin{methoddesc}[NodeList]{item}{i}
|
|
Return the \var{i}'th item from the sequence, if there is one, or
|
|
\code{None}. The index \var{i} is not allowed to be less then zero
|
|
or greater than or equal to the length of the sequence.
|
|
\end{methoddesc}
|
|
|
|
\begin{memberdesc}[NodeList]{length}
|
|
The number of nodes in the sequence.
|
|
\end{memberdesc}
|
|
|
|
In addition, the Python DOM interface requires that some additional
|
|
support is provided to allow \class{NodeList} objects to be used as
|
|
Python sequences. All \class{NodeList} implementations must include
|
|
support for \method{__len__()} and \method{__getitem__()}; this allows
|
|
iteration over the \class{NodeList} in \keyword{for} statements and
|
|
proper support for the \function{len()} built-in function.
|
|
|
|
If a DOM implementation supports modification of the document, the
|
|
\class{NodeList} implementation must also support the
|
|
\method{__setitem__()} and \method{__delitem__()} methods.
|
|
|
|
|
|
\subsubsection{DocumentType Objects \label{dom-documenttype-objects}}
|
|
|
|
Information about the notations and entities declared by a document
|
|
(including the external subset if the parser uses it and can provide
|
|
the information) is available from a \class{DocumentType} object. The
|
|
\class{DocumentType} for a document is available from the
|
|
\class{Document} object's \member{doctype} attribute; if there is no
|
|
\code{DOCTYPE} declaration for the document, the document's
|
|
\member{doctype} attribute will be set to \code{None} instead of an
|
|
instance of this interface.
|
|
|
|
\class{DocumentType} is a specialization of \class{Node}, and adds the
|
|
following attributes:
|
|
|
|
\begin{memberdesc}[DocumentType]{publicId}
|
|
The public identifier for the external subset of the document type
|
|
definition. This will be a string or \code{None}.
|
|
\end{memberdesc}
|
|
|
|
\begin{memberdesc}[DocumentType]{systemId}
|
|
The system identifier for the external subset of the document type
|
|
definition. This will be a URI as a string, or \code{None}.
|
|
\end{memberdesc}
|
|
|
|
\begin{memberdesc}[DocumentType]{internalSubset}
|
|
A string giving the complete internal subset from the document.
|
|
This does not include the brackets which enclose the subset. If the
|
|
document has no internal subset, this should be \code{None}.
|
|
\end{memberdesc}
|
|
|
|
\begin{memberdesc}[DocumentType]{name}
|
|
The name of the root element as given in the \code{DOCTYPE}
|
|
declaration, if present.
|
|
\end{memberdesc}
|
|
|
|
\begin{memberdesc}[DocumentType]{entities}
|
|
This is a \class{NamedNodeMap} giving the definitions of external
|
|
entities. For entity names defined more than once, only the first
|
|
definition is provided (others are ignored as required by the XML
|
|
recommendation). This may be \code{None} if the information is not
|
|
provided by the parser, or if no entities are defined.
|
|
\end{memberdesc}
|
|
|
|
\begin{memberdesc}[DocumentType]{notations}
|
|
This is a \class{NamedNodeMap} giving the definitions of notations.
|
|
For notation names defined more than once, only the first definition
|
|
is provided (others are ignored as required by the XML
|
|
recommendation). This may be \code{None} if the information is not
|
|
provided by the parser, or if no notations are defined.
|
|
\end{memberdesc}
|
|
|
|
|
|
\subsubsection{Document Objects \label{dom-document-objects}}
|
|
|
|
A \class{Document} represents an entire XML document, including its
|
|
constituent elements, attributes, processing instructions, comments
|
|
etc. Remeber that it inherits properties from \class{Node}.
|
|
|
|
\begin{memberdesc}[Document]{documentElement}
|
|
The one and only root element of the document.
|
|
\end{memberdesc}
|
|
|
|
\begin{methoddesc}[Document]{createElement}{tagName}
|
|
Create and return a new element node. The element is not inserted
|
|
into the document when it is created. You need to explicitly insert
|
|
it with one of the other methods such as \method{insertBefore()} or
|
|
\method{appendChild()}.
|
|
\end{methoddesc}
|
|
|
|
\begin{methoddesc}[Document]{createElementNS}{namespaceURI, tagName}
|
|
Create and return a new element with a namespace. The
|
|
\var{tagName} may have a prefix. The element is not inserted into the
|
|
document when it is created. You need to explicitly insert it with
|
|
one of the other methods such as \method{insertBefore()} or
|
|
\method{appendChild()}.
|
|
\end{methoddesc}
|
|
|
|
\begin{methoddesc}[Document]{createTextNode}{data}
|
|
Create and return a text node containing the data passed as a
|
|
parameter. As with the other creation methods, this one does not
|
|
insert the node into the tree.
|
|
\end{methoddesc}
|
|
|
|
\begin{methoddesc}[Document]{createComment}{data}
|
|
Create and return a comment node containing the data passed as a
|
|
parameter. As with the other creation methods, this one does not
|
|
insert the node into the tree.
|
|
\end{methoddesc}
|
|
|
|
\begin{methoddesc}[Document]{createProcessingInstruction}{target, data}
|
|
Create and return a processing instruction node containing the
|
|
\var{target} and \var{data} passed as parameters. As with the other
|
|
creation methods, this one does not insert the node into the tree.
|
|
\end{methoddesc}
|
|
|
|
\begin{methoddesc}[Document]{createAttribute}{name}
|
|
Create and return an attribute node. This method does not associate
|
|
the attribute node with any particular element. You must use
|
|
\method{setAttributeNode()} on the appropriate \class{Element} object
|
|
to use the newly created attribute instance.
|
|
\end{methoddesc}
|
|
|
|
\begin{methoddesc}[Document]{createAttributeNS}{namespaceURI, qualifiedName}
|
|
Create and return an attribute node with a namespace. The
|
|
\var{tagName} may have a prefix. This method does not associate the
|
|
attribute node with any particular element. You must use
|
|
\method{setAttributeNode()} on the appropriate \class{Element} object
|
|
to use the newly created attribute instance.
|
|
\end{methoddesc}
|
|
|
|
\begin{methoddesc}[Document]{getElementsByTagName}{tagName}
|
|
Search for all descendants (direct children, children's children,
|
|
etc.) with a particular element type name.
|
|
\end{methoddesc}
|
|
|
|
\begin{methoddesc}[Document]{getElementsByTagNameNS}{namespaceURI, localName}
|
|
Search for all descendants (direct children, children's children,
|
|
etc.) with a particular namespace URI and localname. The localname is
|
|
the part of the namespace after the prefix.
|
|
\end{methoddesc}
|
|
|
|
|
|
\subsubsection{Element Objects \label{dom-element-objects}}
|
|
|
|
\class{Element} is a subclass of \class{Node}, so inherits all the
|
|
attributes of that class.
|
|
|
|
\begin{memberdesc}[Element]{tagName}
|
|
The element type name. In a namespace-using document it may have
|
|
colons in it. The value is a string.
|
|
\end{memberdesc}
|
|
|
|
\begin{methoddesc}[Element]{getElementsByTagName}{tagName}
|
|
Same as equivalent method in the \class{Document} class.
|
|
\end{methoddesc}
|
|
|
|
\begin{methoddesc}[Element]{getElementsByTagNameNS}{tagName}
|
|
Same as equivalent method in the \class{Document} class.
|
|
\end{methoddesc}
|
|
|
|
\begin{methoddesc}[Element]{getAttribute}{attname}
|
|
Return an attribute value as a string.
|
|
\end{methoddesc}
|
|
|
|
\begin{methoddesc}[Element]{getAttributeNode}{attrname}
|
|
Return the \class{Attr} node for the attribute named by
|
|
\var{attrname}.
|
|
\end{methoddesc}
|
|
|
|
\begin{methoddesc}[Element]{getAttributeNS}{namespaceURI, localName}
|
|
Return an attribute value as a string, given a \var{namespaceURI} and
|
|
\var{localName}.
|
|
\end{methoddesc}
|
|
|
|
\begin{methoddesc}[Element]{getAttributeNodeNS}{namespaceURI, localName}
|
|
Return an attribute value as a node, given a \var{namespaceURI} and
|
|
\var{localName}.
|
|
\end{methoddesc}
|
|
|
|
\begin{methoddesc}[Element]{removeAttribute}{attname}
|
|
Remove an attribute by name. No exception is raised if there is no
|
|
matching attribute.
|
|
\end{methoddesc}
|
|
|
|
\begin{methoddesc}[Element]{removeAttributeNode}{oldAttr}
|
|
Remove and return \var{oldAttr} from the attribute list, if present.
|
|
If \var{oldAttr} is not present, \exception{NotFoundErr} is raised.
|
|
\end{methoddesc}
|
|
|
|
\begin{methoddesc}[Element]{removeAttributeNS}{namespaceURI, localName}
|
|
Remove an attribute by name. Note that it uses a localName, not a
|
|
qname. No exception is raised if there is no matching attribute.
|
|
\end{methoddesc}
|
|
|
|
\begin{methoddesc}[Element]{setAttribute}{attname, value}
|
|
Set an attribute value from a string.
|
|
\end{methoddesc}
|
|
|
|
\begin{methoddesc}[Element]{setAttributeNode}{newAttr}
|
|
Add a new attibute node to the element, replacing an existing
|
|
attribute if necessary if the \member{name} attribute matches. If a
|
|
replacement occurs, the old attribute node will be returned. If
|
|
\var{newAttr} is already in use, \exception{InuseAttributeErr} will be
|
|
raised.
|
|
\end{methoddesc}
|
|
|
|
\begin{methoddesc}[Element]{setAttributeNodeNS}{newAttr}
|
|
Add a new attibute node to the element, replacing an existing
|
|
attribute if necessary if the \member{namespaceURI} and
|
|
\member{localName} attributes match. If a replacement occurs, the old
|
|
attribute node will be returned. If \var{newAttr} is already in use,
|
|
\exception{InuseAttributeErr} will be raised.
|
|
\end{methoddesc}
|
|
|
|
\begin{methoddesc}[Element]{setAttributeNS}{namespaceURI, qname, value}
|
|
Set an attribute value from a string, given a \var{namespaceURI} and a
|
|
\var{qname}. Note that a qname is the whole attribute name. This is
|
|
different than above.
|
|
\end{methoddesc}
|
|
|
|
|
|
\subsubsection{Attr Objects \label{dom-attr-objects}}
|
|
|
|
\class{Attr} inherits from \class{Node}, so inherits all its
|
|
attributes.
|
|
|
|
\begin{memberdesc}[Attr]{name}
|
|
The attribute name. In a namespace-using document it may have colons
|
|
in it.
|
|
\end{memberdesc}
|
|
|
|
\begin{memberdesc}[Attr]{localName}
|
|
The part of the name following the colon if there is one, else the
|
|
entire name. This is a read-only attribute.
|
|
\end{memberdesc}
|
|
|
|
\begin{memberdesc}[Attr]{prefix}
|
|
The part of the name preceding the colon if there is one, else the
|
|
empty string.
|
|
\end{memberdesc}
|
|
|
|
|
|
\subsubsection{NamedNodeMap Objects \label{dom-attributelist-objects}}
|
|
|
|
\class{NamedNodeMap} does \emph{not} inherit from \class{Node}.
|
|
|
|
\begin{memberdesc}[NamedNodeMap]{length}
|
|
The length of the attribute list.
|
|
\end{memberdesc}
|
|
|
|
\begin{methoddesc}[NamedNodeMap]{item}{index}
|
|
Return an attribute with a particular index. The order you get the
|
|
attributes in is arbitrary but will be consistent for the life of a
|
|
DOM. Each item is an attribute node. Get its value with the
|
|
\member{value} attribbute.
|
|
\end{methoddesc}
|
|
|
|
There are also experimental methods that give this class more mapping
|
|
behavior. You can use them or you can use the standardized
|
|
\method{getAttribute*()} family of methods on the \class{Element}
|
|
objects.
|
|
|
|
|
|
\subsubsection{Comment Objects \label{dom-comment-objects}}
|
|
|
|
\class{Comment} represents a comment in the XML document. It is a
|
|
subclass of \class{Node}, but cannot have child nodes.
|
|
|
|
\begin{memberdesc}[Comment]{data}
|
|
The content of the comment as a string. The attribute contains all
|
|
characters between the leading \code{<!-}\code{-} and trailing
|
|
\code{-}\code{->}, but does not include them.
|
|
\end{memberdesc}
|
|
|
|
|
|
\subsubsection{Text and CDATASection Objects \label{dom-text-objects}}
|
|
|
|
The \class{Text} interface represents text in the XML document. If
|
|
the parser and DOM implementation support the DOM's XML extension,
|
|
portions of the text enclosed in CDATA marked sections are stored in
|
|
\class{CDATASection} objects. These two interfaces are identical, but
|
|
provide different values for the \member{nodeType} attribute.
|
|
|
|
These interfaces extend the \class{Node} interface. They cannot have
|
|
child nodes.
|
|
|
|
\begin{memberdesc}[Text]{data}
|
|
The content of the text node as a string.
|
|
\end{memberdesc}
|
|
|
|
\begin{notice}
|
|
The use of a \class{CDATASection} node does not indicate that the
|
|
node represents a complete CDATA marked section, only that the
|
|
content of the node was part of a CDATA section. A single CDATA
|
|
section may be represented by more than one node in the document
|
|
tree. There is no way to determine whether two adjacent
|
|
\class{CDATASection} nodes represent different CDATA marked
|
|
sections.
|
|
\end{notice}
|
|
|
|
|
|
\subsubsection{ProcessingInstruction Objects \label{dom-pi-objects}}
|
|
|
|
Represents a processing instruction in the XML document; this inherits
|
|
from the \class{Node} interface and cannot have child nodes.
|
|
|
|
\begin{memberdesc}[ProcessingInstruction]{target}
|
|
The content of the processing instruction up to the first whitespace
|
|
character. This is a read-only attribute.
|
|
\end{memberdesc}
|
|
|
|
\begin{memberdesc}[ProcessingInstruction]{data}
|
|
The content of the processing instruction following the first
|
|
whitespace character.
|
|
\end{memberdesc}
|
|
|
|
|
|
\subsubsection{Exceptions \label{dom-exceptions}}
|
|
|
|
\versionadded{2.1}
|
|
|
|
The DOM Level~2 recommendation defines a single exception,
|
|
\exception{DOMException}, and a number of constants that allow
|
|
applications to determine what sort of error occurred.
|
|
\exception{DOMException} instances carry a \member{code} attribute
|
|
that provides the appropriate value for the specific exception.
|
|
|
|
The Python DOM interface provides the constants, but also expands the
|
|
set of exceptions so that a specific exception exists for each of the
|
|
exception codes defined by the DOM. The implementations must raise
|
|
the appropriate specific exception, each of which carries the
|
|
appropriate value for the \member{code} attribute.
|
|
|
|
\begin{excdesc}{DOMException}
|
|
Base exception class used for all specific DOM exceptions. This
|
|
exception class cannot be directly instantiated.
|
|
\end{excdesc}
|
|
|
|
\begin{excdesc}{DomstringSizeErr}
|
|
Raised when a specified range of text does not fit into a string.
|
|
This is not known to be used in the Python DOM implementations, but
|
|
may be received from DOM implementations not written in Python.
|
|
\end{excdesc}
|
|
|
|
\begin{excdesc}{HierarchyRequestErr}
|
|
Raised when an attempt is made to insert a node where the node type
|
|
is not allowed.
|
|
\end{excdesc}
|
|
|
|
\begin{excdesc}{IndexSizeErr}
|
|
Raised when an index or size parameter to a method is negative or
|
|
exceeds the allowed values.
|
|
\end{excdesc}
|
|
|
|
\begin{excdesc}{InuseAttributeErr}
|
|
Raised when an attempt is made to insert an \class{Attr} node that
|
|
is already present elsewhere in the document.
|
|
\end{excdesc}
|
|
|
|
\begin{excdesc}{InvalidAccessErr}
|
|
Raised if a parameter or an operation is not supported on the
|
|
underlying object.
|
|
\end{excdesc}
|
|
|
|
\begin{excdesc}{InvalidCharacterErr}
|
|
This exception is raised when a string parameter contains a
|
|
character that is not permitted in the context it's being used in by
|
|
the XML 1.0 recommendation. For example, attempting to create an
|
|
\class{Element} node with a space in the element type name will
|
|
cause this error to be raised.
|
|
\end{excdesc}
|
|
|
|
\begin{excdesc}{InvalidModificationErr}
|
|
Raised when an attempt is made to modify the type of a node.
|
|
\end{excdesc}
|
|
|
|
\begin{excdesc}{InvalidStateErr}
|
|
Raised when an attempt is made to use an object that is not or is no
|
|
longer usable.
|
|
\end{excdesc}
|
|
|
|
\begin{excdesc}{NamespaceErr}
|
|
If an attempt is made to change any object in a way that is not
|
|
permitted with regard to the
|
|
\citetitle[http://www.w3.org/TR/REC-xml-names/]{Namespaces in XML}
|
|
recommendation, this exception is raised.
|
|
\end{excdesc}
|
|
|
|
\begin{excdesc}{NotFoundErr}
|
|
Exception when a node does not exist in the referenced context. For
|
|
example, \method{NamedNodeMap.removeNamedItem()} will raise this if
|
|
the node passed in does not exist in the map.
|
|
\end{excdesc}
|
|
|
|
\begin{excdesc}{NotSupportedErr}
|
|
Raised when the implementation does not support the requested type
|
|
of object or operation.
|
|
\end{excdesc}
|
|
|
|
\begin{excdesc}{NoDataAllowedErr}
|
|
This is raised if data is specified for a node which does not
|
|
support data.
|
|
% XXX a better explanation is needed!
|
|
\end{excdesc}
|
|
|
|
\begin{excdesc}{NoModificationAllowedErr}
|
|
Raised on attempts to modify an object where modifications are not
|
|
allowed (such as for read-only nodes).
|
|
\end{excdesc}
|
|
|
|
\begin{excdesc}{SyntaxErr}
|
|
Raised when an invalid or illegal string is specified.
|
|
% XXX how is this different from InvalidCharacterErr ???
|
|
\end{excdesc}
|
|
|
|
\begin{excdesc}{WrongDocumentErr}
|
|
Raised when a node is inserted in a different document than it
|
|
currently belongs to, and the implementation does not support
|
|
migrating the node from one document to the other.
|
|
\end{excdesc}
|
|
|
|
The exception codes defined in the DOM recommendation map to the
|
|
exceptions described above according to this table:
|
|
|
|
\begin{tableii}{l|l}{constant}{Constant}{Exception}
|
|
\lineii{DOMSTRING_SIZE_ERR}{\exception{DomstringSizeErr}}
|
|
\lineii{HIERARCHY_REQUEST_ERR}{\exception{HierarchyRequestErr}}
|
|
\lineii{INDEX_SIZE_ERR}{\exception{IndexSizeErr}}
|
|
\lineii{INUSE_ATTRIBUTE_ERR}{\exception{InuseAttributeErr}}
|
|
\lineii{INVALID_ACCESS_ERR}{\exception{InvalidAccessErr}}
|
|
\lineii{INVALID_CHARACTER_ERR}{\exception{InvalidCharacterErr}}
|
|
\lineii{INVALID_MODIFICATION_ERR}{\exception{InvalidModificationErr}}
|
|
\lineii{INVALID_STATE_ERR}{\exception{InvalidStateErr}}
|
|
\lineii{NAMESPACE_ERR}{\exception{NamespaceErr}}
|
|
\lineii{NOT_FOUND_ERR}{\exception{NotFoundErr}}
|
|
\lineii{NOT_SUPPORTED_ERR}{\exception{NotSupportedErr}}
|
|
\lineii{NO_DATA_ALLOWED_ERR}{\exception{NoDataAllowedErr}}
|
|
\lineii{NO_MODIFICATION_ALLOWED_ERR}{\exception{NoModificationAllowedErr}}
|
|
\lineii{SYNTAX_ERR}{\exception{SyntaxErr}}
|
|
\lineii{WRONG_DOCUMENT_ERR}{\exception{WrongDocumentErr}}
|
|
\end{tableii}
|
|
|
|
|
|
\subsection{Conformance \label{dom-conformance}}
|
|
|
|
This section describes the conformance requirements and relationships
|
|
between the Python DOM API, the W3C DOM recommendations, and the OMG
|
|
IDL mapping for Python.
|
|
|
|
|
|
\subsubsection{Type Mapping \label{dom-type-mapping}}
|
|
|
|
The primitive IDL types used in the DOM specification are mapped to
|
|
Python types according to the following table.
|
|
|
|
\begin{tableii}{l|l}{code}{IDL Type}{Python Type}
|
|
\lineii{boolean}{\code{IntegerType} (with a value of \code{0} or \code{1})}
|
|
\lineii{int}{\code{IntegerType}}
|
|
\lineii{long int}{\code{IntegerType}}
|
|
\lineii{unsigned int}{\code{IntegerType}}
|
|
\end{tableii}
|
|
|
|
Additionally, the \class{DOMString} defined in the recommendation is
|
|
mapped to a Python string or Unicode string. Applications should
|
|
be able to handle Unicode whenever a string is returned from the DOM.
|
|
|
|
The IDL \keyword{null} value is mapped to \code{None}, which may be
|
|
accepted or provided by the implementation whenever \keyword{null} is
|
|
allowed by the API.
|
|
|
|
|
|
\subsubsection{Accessor Methods \label{dom-accessor-methods}}
|
|
|
|
The mapping from OMG IDL to Python defines accessor functions for IDL
|
|
\keyword{attribute} declarations in much the way the Java mapping
|
|
does. Mapping the IDL declarations
|
|
|
|
\begin{verbatim}
|
|
readonly attribute string someValue;
|
|
attribute string anotherValue;
|
|
\end{verbatim}
|
|
|
|
yields three accessor functions: a ``get'' method for
|
|
\member{someValue} (\method{_get_someValue()}), and ``get'' and
|
|
``set'' methods for
|
|
\member{anotherValue} (\method{_get_anotherValue()} and
|
|
\method{_set_anotherValue()}). The mapping, in particular, does not
|
|
require that the IDL attributes are accessible as normal Python
|
|
attributes: \code{\var{object}.someValue} is \emph{not} required to
|
|
work, and may raise an \exception{AttributeError}.
|
|
|
|
The Python DOM API, however, \emph{does} require that normal attribute
|
|
access work. This means that the typical surrogates generated by
|
|
Python IDL compilers are not likely to work, and wrapper objects may
|
|
be needed on the client if the DOM objects are accessed via CORBA.
|
|
While this does require some additional consideration for CORBA DOM
|
|
clients, the implementers with experience using DOM over CORBA from
|
|
Python do not consider this a problem. Attributes that are declared
|
|
\keyword{readonly} may not restrict write access in all DOM
|
|
implementations.
|
|
|
|
Additionally, the accessor functions are not required. If provided,
|
|
they should take the form defined by the Python IDL mapping, but
|
|
these methods are considered unnecessary since the attributes are
|
|
accessible directly from Python. ``Set'' accessors should never be
|
|
provided for \keyword{readonly} attributes.
|