Add new section on the XML package. (This was the only major new 2.0 feature
left that wasn't covered. The article is therefore now essentially complete.) A few minor changes
This commit is contained in:
parent
0be483fd4d
commit
6032c48b47
|
@ -156,8 +156,8 @@ type implementation by Fredrik Lundh. A detailed explanation of the
|
||||||
interface is in the file \file{Misc/unicode.txt} in the Python source
|
interface is in the file \file{Misc/unicode.txt} in the Python source
|
||||||
distribution; it's also available on the Web at
|
distribution; it's also available on the Web at
|
||||||
\url{http://starship.python.net/crew/lemburg/unicode-proposal.txt}.
|
\url{http://starship.python.net/crew/lemburg/unicode-proposal.txt}.
|
||||||
This article will simply cover the most significant points from the
|
This article will simply cover the most significant points about the Unicode
|
||||||
full interface.
|
interfaces.
|
||||||
|
|
||||||
In Python source code, Unicode strings are written as
|
In Python source code, Unicode strings are written as
|
||||||
\code{u"string"}. Arbitrary Unicode characters can be written using a
|
\code{u"string"}. Arbitrary Unicode characters can be written using a
|
||||||
|
@ -615,12 +615,12 @@ b.append(b)
|
||||||
\end{verbatim}
|
\end{verbatim}
|
||||||
|
|
||||||
The comparison \code{a==b} returns true, because the two recursive
|
The comparison \code{a==b} returns true, because the two recursive
|
||||||
data structures are isomorphic. \footnote{See the thread ``trashcan
|
data structures are isomorphic. See the thread ``trashcan
|
||||||
and PR\#7'' in the April 2000 archives of the python-dev mailing list
|
and PR\#7'' in the April 2000 archives of the python-dev mailing list
|
||||||
for the discussion leading up to this implementation, and some useful
|
for the discussion leading up to this implementation, and some useful
|
||||||
relevant links.
|
relevant links.
|
||||||
%http://www.python.org/pipermail/python-dev/2000-April/004834.html
|
% Starting URL:
|
||||||
}
|
% http://www.python.org/pipermail/python-dev/2000-April/004834.html
|
||||||
|
|
||||||
Work has been done on porting Python to 64-bit Windows on the Itanium
|
Work has been done on porting Python to 64-bit Windows on the Itanium
|
||||||
processor, mostly by Trent Mick of ActiveState. (Confusingly,
|
processor, mostly by Trent Mick of ActiveState. (Confusingly,
|
||||||
|
@ -950,7 +950,6 @@ expat_extension = Extension('xml.parsers.pyexpat',
|
||||||
)
|
)
|
||||||
setup (name = "PyXML", version = "0.5.4",
|
setup (name = "PyXML", version = "0.5.4",
|
||||||
ext_modules =[ expat_extension ] )
|
ext_modules =[ expat_extension ] )
|
||||||
|
|
||||||
\end{verbatim}
|
\end{verbatim}
|
||||||
|
|
||||||
The Distutils can also take care of creating source and binary
|
The Distutils can also take care of creating source and binary
|
||||||
|
@ -966,10 +965,165 @@ development.
|
||||||
All this is documented in a new manual, \textit{Distributing Python
|
All this is documented in a new manual, \textit{Distributing Python
|
||||||
Modules}, that joins the basic set of Python documentation.
|
Modules}, that joins the basic set of Python documentation.
|
||||||
|
|
||||||
% ======================================================================
|
======================================================================
|
||||||
%\section{New XML Code}
|
\section{XML Modules}
|
||||||
|
|
||||||
%XXX write this section...
|
Python 1.5.2 included a simple XML parser in the form of the
|
||||||
|
\module{xmllib} module, contributed by Sjoerd Mullender. Since
|
||||||
|
1.5.2's release, two different interfaces for processing XML have
|
||||||
|
become common: SAX2 (version 2 of the Simple API for XML) provides an
|
||||||
|
event-driven interface with some similarities to \module{xmllib}, and
|
||||||
|
the DOM (Document Object Model) provides a tree-based interface,
|
||||||
|
transforming an XML document into a tree of nodes that can be
|
||||||
|
traversed and modified. Python 2.0 includes a SAX2 interface and a
|
||||||
|
stripped-down DOM interface as part of the \module{xml} package.
|
||||||
|
Here we will give a brief overview of these new interfaces; consult
|
||||||
|
the Python documentation or the source code for complete details.
|
||||||
|
The Python XML SIG is also working on improved documentation.
|
||||||
|
|
||||||
|
\subsection{SAX2 Support}
|
||||||
|
|
||||||
|
SAX defines an event-driven interface for parsing XML. To use SAX,
|
||||||
|
you must write a SAX handler class. Handler classes inherit from
|
||||||
|
various classes provided by SAX, and override various methods that
|
||||||
|
will then be called by the XML parser. For example, the
|
||||||
|
\method{startElement} and \method{endElement} methods are called for
|
||||||
|
every starting and end tag encountered by the parser, the
|
||||||
|
\method{characters()} method is called for every chunk of character
|
||||||
|
data, and so forth.
|
||||||
|
|
||||||
|
The advantage of the event-driven approach is that that the whole
|
||||||
|
document doesn't have to be resident in memory at any one time, which
|
||||||
|
matters if you are processing really huge documents. However, writing
|
||||||
|
the SAX handler class can get very complicated if you're trying to
|
||||||
|
modify the document structure in some elaborate way.
|
||||||
|
|
||||||
|
For example, this little example program defines a handler that prints
|
||||||
|
a message for every starting and ending tag, and then parses the file
|
||||||
|
\file{hamlet.xml} using it:
|
||||||
|
|
||||||
|
\begin{verbatim}
|
||||||
|
from xml import sax
|
||||||
|
|
||||||
|
class SimpleHandler(sax.ContentHandler):
|
||||||
|
def startElement(self, name, attrs):
|
||||||
|
print 'Start of element:', name, attrs.keys()
|
||||||
|
|
||||||
|
def endElement(self, name):
|
||||||
|
print 'End of element:', name
|
||||||
|
|
||||||
|
# Create a parser object
|
||||||
|
parser = sax.make_parser()
|
||||||
|
|
||||||
|
# Tell it what handler to use
|
||||||
|
handler = SimpleHandler()
|
||||||
|
parser.setContentHandler( handler )
|
||||||
|
|
||||||
|
# Parse a file!
|
||||||
|
parser.parse( 'hamlet.xml' )
|
||||||
|
\end{verbatim}
|
||||||
|
|
||||||
|
For more information, consult the Python documentation, or the XML
|
||||||
|
HOWTO at \url{http://www.python.org/doc/howto/xml/}.
|
||||||
|
|
||||||
|
\subsection{DOM Support}
|
||||||
|
|
||||||
|
The Document Object Model is a tree-based representation for an XML
|
||||||
|
document. A top-level \class{Document} instance is the root of the
|
||||||
|
tree, and has a single child which is the top-level \class{Element}
|
||||||
|
instance. This \class{Element} has children nodes representing
|
||||||
|
character data and any sub-elements, which may have further children
|
||||||
|
of their own, and so forth. Using the DOM you can traverse the
|
||||||
|
resulting tree any way you like, access element and attribute values,
|
||||||
|
insert and delete nodes, and convert the tree back into XML.
|
||||||
|
|
||||||
|
The DOM is useful for modifying XML documents, because you can create
|
||||||
|
a DOM tree, modify it by adding new nodes or rearranging subtrees, and
|
||||||
|
then produce a new XML document as output. You can also construct a
|
||||||
|
DOM tree manually and convert it to XML, which can be a more flexible
|
||||||
|
way of producing XML output than simply writing
|
||||||
|
\code{<tag1>}...\code{</tag1>} to a file.
|
||||||
|
|
||||||
|
The DOM implementation included with Python lives in the
|
||||||
|
\module{xml.dom.minidom} module. It's a lightweight implementation of
|
||||||
|
the Level 1 DOM with support for XML namespaces. The
|
||||||
|
\function{parse()} and \function{parseString()} convenience
|
||||||
|
functions are provided for generating a DOM tree:
|
||||||
|
|
||||||
|
\begin{verbatim}
|
||||||
|
from xml.dom import minidom
|
||||||
|
doc = minidom.parse('hamlet.xml')
|
||||||
|
\end{verbatim}
|
||||||
|
|
||||||
|
\code{doc} is a \class{Document} instance. \class{Document}, like all
|
||||||
|
the other DOM classes such as \class{Element} and \class{Text}, is a
|
||||||
|
subclass of the \class{Node} base class. All the nodes in a DOM tree
|
||||||
|
therefore support certain common methods, such as \method{toxml()}
|
||||||
|
which returns a string containing the XML representation of the node
|
||||||
|
and its children. Each class also has special methods of its own; for
|
||||||
|
example, \class{Element} and \class{Document} instances have a method
|
||||||
|
to find all child elements with a given tag name. Continuing from the
|
||||||
|
previous 2-line example:
|
||||||
|
|
||||||
|
\begin{verbatim}
|
||||||
|
perslist = doc.getElementsByTagName( 'PERSONA' )
|
||||||
|
print perslist[0].toxml()
|
||||||
|
print perslist[1].toxml()
|
||||||
|
\end{verbatim}
|
||||||
|
|
||||||
|
For the \textit{Hamlet} XML file, the above few lines output:
|
||||||
|
|
||||||
|
\begin{verbatim}
|
||||||
|
<PERSONA>CLAUDIUS, king of Denmark. </PERSONA>
|
||||||
|
<PERSONA>HAMLET, son to the late, and nephew to the present king.</PERSONA>
|
||||||
|
\end{verbatim}
|
||||||
|
|
||||||
|
The root element of the document is available as
|
||||||
|
\code{doc.documentElement}, and its children can be easily modified
|
||||||
|
by deleting, adding, or removing nodes:
|
||||||
|
|
||||||
|
\begin{verbatim}
|
||||||
|
root = doc.documentElement
|
||||||
|
|
||||||
|
# Remove the first child
|
||||||
|
root.removeChild( root.childNodes[0] )
|
||||||
|
|
||||||
|
# Move the new first child to the end
|
||||||
|
root.appendChild( root.childNodes[0] )
|
||||||
|
|
||||||
|
# Insert the new first child (originally,
|
||||||
|
# the third child) before the 20th child.
|
||||||
|
root.insertBefore( root.childNodes[0], root.childNodes[20] )
|
||||||
|
\end{verbatim}
|
||||||
|
|
||||||
|
Again, I will refer you to the Python documentation for a complete
|
||||||
|
listing of the different \class{Node} classes and their various methods.
|
||||||
|
|
||||||
|
\subsection{Relationship to PyXML}
|
||||||
|
|
||||||
|
The XML Special Interest Group has been working on XML-related Python
|
||||||
|
code for a while. Its code distribution, called PyXML, is available
|
||||||
|
from the SIG's Web pages at \url{http://www.python.org/sigs/xml-sig/}.
|
||||||
|
The PyXML distribution also used the package name \samp{xml}. If
|
||||||
|
you've written programs that used PyXML, you're probably wondering
|
||||||
|
about its compatibility with the 2.0 \module{xml} package.
|
||||||
|
|
||||||
|
The answer is that Python 2.0's \module{xml} package isn't compatible
|
||||||
|
with PyXML, but can be made compatible by installing a recent version
|
||||||
|
PyXML. Many applications can get by with the XML support that is
|
||||||
|
included with Python 2.0, but more complicated applications will
|
||||||
|
require that the full PyXML package will be installed. When
|
||||||
|
installed, PyXML versions 0.6.0 or greater will replace the
|
||||||
|
\module{xml} package shipped with Python, and will be a strict
|
||||||
|
superset of the standard package, adding a bunch of additional
|
||||||
|
features. Some of the additional features in PyXML include:
|
||||||
|
|
||||||
|
\begin{itemize}
|
||||||
|
\item 4DOM, a full DOM implementation
|
||||||
|
from FourThought LLC.
|
||||||
|
\item The xmlproc validating parser, written by Lars Marius Garshol.
|
||||||
|
\item The \module{sgmlop} parser accelerator module, written by Fredrik Lundh.
|
||||||
|
\end{itemize}
|
||||||
|
|
||||||
% ======================================================================
|
% ======================================================================
|
||||||
\section{Module changes}
|
\section{Module changes}
|
||||||
|
@ -982,6 +1136,8 @@ standard library; some of the affected modules include
|
||||||
and \module{nntplib}. Consult the CVS logs for the exact
|
and \module{nntplib}. Consult the CVS logs for the exact
|
||||||
patch-by-patch details.
|
patch-by-patch details.
|
||||||
|
|
||||||
|
% XXX gettext support
|
||||||
|
|
||||||
Brian Gallew contributed OpenSSL support for the \module{socket}
|
Brian Gallew contributed OpenSSL support for the \module{socket}
|
||||||
module. OpenSSL is an implementation of the Secure Socket Layer,
|
module. OpenSSL is an implementation of the Secure Socket Layer,
|
||||||
which encrypts the data being sent over a socket. When compiling
|
which encrypts the data being sent over a socket. When compiling
|
||||||
|
|
Loading…
Reference in New Issue