From 6032c48b475c53263dcb2c9e5955e033fdf599d8 Mon Sep 17 00:00:00 2001 From: "Andrew M. Kuchling" Date: Thu, 12 Oct 2000 02:37:14 +0000 Subject: [PATCH] Add new section on the XML package. (This was the only major new 2.0 feature left that wasn't covered. The article is therefore now essentially complete.) A few minor changes --- Doc/whatsnew/whatsnew20.tex | 174 ++++++++++++++++++++++++++++++++++-- 1 file changed, 165 insertions(+), 9 deletions(-) diff --git a/Doc/whatsnew/whatsnew20.tex b/Doc/whatsnew/whatsnew20.tex index 7440b92dd10..e1cd14a692e 100644 --- a/Doc/whatsnew/whatsnew20.tex +++ b/Doc/whatsnew/whatsnew20.tex @@ -156,8 +156,8 @@ type implementation by Fredrik Lundh. A detailed explanation of the interface is in the file \file{Misc/unicode.txt} in the Python source distribution; it's also available on the Web at \url{http://starship.python.net/crew/lemburg/unicode-proposal.txt}. -This article will simply cover the most significant points from the -full interface. +This article will simply cover the most significant points about the Unicode +interfaces. In Python source code, Unicode strings are written as \code{u"string"}. Arbitrary Unicode characters can be written using a @@ -615,12 +615,12 @@ b.append(b) \end{verbatim} The comparison \code{a==b} returns true, because the two recursive -data structures are isomorphic. \footnote{See the thread ``trashcan +data structures are isomorphic. See the thread ``trashcan and PR\#7'' in the April 2000 archives of the python-dev mailing list for the discussion leading up to this implementation, and some useful relevant links. -%http://www.python.org/pipermail/python-dev/2000-April/004834.html -} +% Starting URL: +% http://www.python.org/pipermail/python-dev/2000-April/004834.html Work has been done on porting Python to 64-bit Windows on the Itanium processor, mostly by Trent Mick of ActiveState. (Confusingly, @@ -950,7 +950,6 @@ expat_extension = Extension('xml.parsers.pyexpat', ) setup (name = "PyXML", version = "0.5.4", ext_modules =[ expat_extension ] ) - \end{verbatim} The Distutils can also take care of creating source and binary @@ -966,10 +965,165 @@ development. All this is documented in a new manual, \textit{Distributing Python Modules}, that joins the basic set of Python documentation. -% ====================================================================== -%\section{New XML Code} +====================================================================== +\section{XML Modules} -%XXX write this section... +Python 1.5.2 included a simple XML parser in the form of the +\module{xmllib} module, contributed by Sjoerd Mullender. Since +1.5.2's release, two different interfaces for processing XML have +become common: SAX2 (version 2 of the Simple API for XML) provides an +event-driven interface with some similarities to \module{xmllib}, and +the DOM (Document Object Model) provides a tree-based interface, +transforming an XML document into a tree of nodes that can be +traversed and modified. Python 2.0 includes a SAX2 interface and a +stripped-down DOM interface as part of the \module{xml} package. +Here we will give a brief overview of these new interfaces; consult +the Python documentation or the source code for complete details. +The Python XML SIG is also working on improved documentation. + +\subsection{SAX2 Support} + +SAX defines an event-driven interface for parsing XML. To use SAX, +you must write a SAX handler class. Handler classes inherit from +various classes provided by SAX, and override various methods that +will then be called by the XML parser. For example, the +\method{startElement} and \method{endElement} methods are called for +every starting and end tag encountered by the parser, the +\method{characters()} method is called for every chunk of character +data, and so forth. + +The advantage of the event-driven approach is that that the whole +document doesn't have to be resident in memory at any one time, which +matters if you are processing really huge documents. However, writing +the SAX handler class can get very complicated if you're trying to +modify the document structure in some elaborate way. + +For example, this little example program defines a handler that prints +a message for every starting and ending tag, and then parses the file +\file{hamlet.xml} using it: + +\begin{verbatim} +from xml import sax + +class SimpleHandler(sax.ContentHandler): + def startElement(self, name, attrs): + print 'Start of element:', name, attrs.keys() + + def endElement(self, name): + print 'End of element:', name + +# Create a parser object +parser = sax.make_parser() + +# Tell it what handler to use +handler = SimpleHandler() +parser.setContentHandler( handler ) + +# Parse a file! +parser.parse( 'hamlet.xml' ) +\end{verbatim} + +For more information, consult the Python documentation, or the XML +HOWTO at \url{http://www.python.org/doc/howto/xml/}. + +\subsection{DOM Support} + +The Document Object Model is a tree-based representation for an XML +document. A top-level \class{Document} instance is the root of the +tree, and has a single child which is the top-level \class{Element} +instance. This \class{Element} has children nodes representing +character data and any sub-elements, which may have further children +of their own, and so forth. Using the DOM you can traverse the +resulting tree any way you like, access element and attribute values, +insert and delete nodes, and convert the tree back into XML. + +The DOM is useful for modifying XML documents, because you can create +a DOM tree, modify it by adding new nodes or rearranging subtrees, and +then produce a new XML document as output. You can also construct a +DOM tree manually and convert it to XML, which can be a more flexible +way of producing XML output than simply writing +\code{}...\code{} to a file. + +The DOM implementation included with Python lives in the +\module{xml.dom.minidom} module. It's a lightweight implementation of +the Level 1 DOM with support for XML namespaces. The +\function{parse()} and \function{parseString()} convenience +functions are provided for generating a DOM tree: + +\begin{verbatim} +from xml.dom import minidom +doc = minidom.parse('hamlet.xml') +\end{verbatim} + +\code{doc} is a \class{Document} instance. \class{Document}, like all +the other DOM classes such as \class{Element} and \class{Text}, is a +subclass of the \class{Node} base class. All the nodes in a DOM tree +therefore support certain common methods, such as \method{toxml()} +which returns a string containing the XML representation of the node +and its children. Each class also has special methods of its own; for +example, \class{Element} and \class{Document} instances have a method +to find all child elements with a given tag name. Continuing from the +previous 2-line example: + +\begin{verbatim} +perslist = doc.getElementsByTagName( 'PERSONA' ) +print perslist[0].toxml() +print perslist[1].toxml() +\end{verbatim} + +For the \textit{Hamlet} XML file, the above few lines output: + +\begin{verbatim} +CLAUDIUS, king of Denmark. +HAMLET, son to the late, and nephew to the present king. +\end{verbatim} + +The root element of the document is available as +\code{doc.documentElement}, and its children can be easily modified +by deleting, adding, or removing nodes: + +\begin{verbatim} +root = doc.documentElement + +# Remove the first child +root.removeChild( root.childNodes[0] ) + +# Move the new first child to the end +root.appendChild( root.childNodes[0] ) + +# Insert the new first child (originally, +# the third child) before the 20th child. +root.insertBefore( root.childNodes[0], root.childNodes[20] ) +\end{verbatim} + +Again, I will refer you to the Python documentation for a complete +listing of the different \class{Node} classes and their various methods. + +\subsection{Relationship to PyXML} + +The XML Special Interest Group has been working on XML-related Python +code for a while. Its code distribution, called PyXML, is available +from the SIG's Web pages at \url{http://www.python.org/sigs/xml-sig/}. +The PyXML distribution also used the package name \samp{xml}. If +you've written programs that used PyXML, you're probably wondering +about its compatibility with the 2.0 \module{xml} package. + +The answer is that Python 2.0's \module{xml} package isn't compatible +with PyXML, but can be made compatible by installing a recent version +PyXML. Many applications can get by with the XML support that is +included with Python 2.0, but more complicated applications will +require that the full PyXML package will be installed. When +installed, PyXML versions 0.6.0 or greater will replace the +\module{xml} package shipped with Python, and will be a strict +superset of the standard package, adding a bunch of additional +features. Some of the additional features in PyXML include: + +\begin{itemize} +\item 4DOM, a full DOM implementation +from FourThought LLC. +\item The xmlproc validating parser, written by Lars Marius Garshol. +\item The \module{sgmlop} parser accelerator module, written by Fredrik Lundh. +\end{itemize} % ====================================================================== \section{Module changes} @@ -982,6 +1136,8 @@ standard library; some of the affected modules include and \module{nntplib}. Consult the CVS logs for the exact patch-by-patch details. +% XXX gettext support + Brian Gallew contributed OpenSSL support for the \module{socket} module. OpenSSL is an implementation of the Secure Socket Layer, which encrypts the data being sent over a socket. When compiling