Add new section on the XML package. (This was the only major new 2.0 feature

left that wasn't covered. The article is therefore now essentially complete.) A few minor changes
2000-10-12 02:37:14 +00:00 · 2000-10-12 02:37:14 +00:00 · 6032c48b47
parent 0be483fd4d
commit 6032c48b47
1 changed files with 165 additions and 9 deletions
--- a/Doc/whatsnew/whatsnew20.tex
+++ b/Doc/whatsnew/whatsnew20.tex
@ -156,8 +156,8 @@ type implementation by Fredrik Lundh.  A detailed explanation of the
 interface is in the file \file{Misc/unicode.txt} in the Python source
 distribution; it's also available on the Web at
 \url{http://starship.python.net/crew/lemburg/unicode-proposal.txt}.
-This article will simply cover the most significant points from the
-full interface.
+This article will simply cover the most significant points about the Unicode 
+interfaces.

 In Python source code, Unicode strings are written as
 \code{u"string"}.  Arbitrary Unicode characters can be written using a
@ -615,12 +615,12 @@ b.append(b)
 \end{verbatim}

 The comparison \code{a==b} returns true, because the two recursive
-data structures are isomorphic. \footnote{See the thread ``trashcan
+data structures are isomorphic. See the thread ``trashcan
 and PR\#7'' in the April 2000 archives of the python-dev mailing list
 for the discussion leading up to this implementation, and some useful
 relevant links.
-%http://www.python.org/pipermail/python-dev/2000-April/004834.html
-}
+% Starting URL:
+% http://www.python.org/pipermail/python-dev/2000-April/004834.html

 Work has been done on porting Python to 64-bit Windows on the Itanium
 processor, mostly by Trent Mick of ActiveState.  (Confusingly,
@ -950,7 +950,6 @@ expat_extension = Extension('xml.parsers.pyexpat',
       )
 setup (name = "PyXML", version = "0.5.4", 
       ext_modules =[ expat_extension ] )
-	        
 \end{verbatim}

 The Distutils can also take care of creating source and binary
@ -966,10 +965,165 @@ development.
 All this is documented in a new manual, \textit{Distributing Python
 Modules}, that joins the basic set of Python documentation.

-% ======================================================================
-%\section{New XML Code}
+======================================================================
+\section{XML Modules}

-%XXX write this section...
+Python 1.5.2 included a simple XML parser in the form of the
+\module{xmllib} module, contributed by Sjoerd Mullender.  Since
+1.5.2's release, two different interfaces for processing XML have
+become common: SAX2 (version 2 of the Simple API for XML) provides an
+event-driven interface with some similarities to \module{xmllib}, and
+the DOM (Document Object Model) provides a tree-based interface,
+transforming an XML document into a tree of nodes that can be
+traversed and modified.  Python 2.0 includes a SAX2 interface and a
+stripped-down DOM interface as part of the \module{xml} package.
+Here we will give a brief overview of these new interfaces; consult
+the Python documentation or the source code for complete details.
+The Python XML SIG is also working on improved documentation.
+
+\subsection{SAX2 Support}
+
+SAX defines an event-driven interface for parsing XML.  To use SAX,
+you must write a SAX handler class.  Handler classes inherit from
+various classes provided by SAX, and override various methods that
+will then be called by the XML parser.  For example, the
+\method{startElement} and \method{endElement} methods are called for
+every starting and end tag encountered by the parser, the
+\method{characters()} method is called for every chunk of character
+data, and so forth.
+
+The advantage of the event-driven approach is that that the whole
+document doesn't have to be resident in memory at any one time, which
+matters if you are processing really huge documents.  However, writing
+the SAX handler class can get very complicated if you're trying to
+modify the document structure in some elaborate way.
+
+For example, this little example program defines a handler that prints
+a message for every starting and ending tag, and then parses the file
+\file{hamlet.xml} using it:
+
+\begin{verbatim}
+from xml import sax
+
+class SimpleHandler(sax.ContentHandler):
+    def startElement(self, name, attrs):
+        print 'Start of element:', name, attrs.keys()
+
+    def endElement(self, name):
+        print 'End of element:', name
+
+# Create a parser object
+parser = sax.make_parser()
+
+# Tell it what handler to use
+handler = SimpleHandler()
+parser.setContentHandler( handler )
+
+# Parse a file!
+parser.parse( 'hamlet.xml' )
+\end{verbatim}
+
+For more information, consult the Python documentation, or the XML
+HOWTO at \url{http://www.python.org/doc/howto/xml/}.
+
+\subsection{DOM Support}
+
+The Document Object Model is a tree-based representation for an XML
+document.  A top-level \class{Document} instance is the root of the
+tree, and has a single child which is the top-level \class{Element}
+instance. This \class{Element} has children nodes representing
+character data and any sub-elements, which may have further children
+of their own, and so forth.  Using the DOM you can traverse the
+resulting tree any way you like, access element and attribute values,
+insert and delete nodes, and convert the tree back into XML.
+
+The DOM is useful for modifying XML documents, because you can create
+a DOM tree, modify it by adding new nodes or rearranging subtrees, and
+then produce a new XML document as output.  You can also construct a
+DOM tree manually and convert it to XML, which can be a more flexible
+way of producing XML output than simply writing
+\code{<tag1>}...\code{</tag1>} to a file.
+
+The DOM implementation included with Python lives in the
+\module{xml.dom.minidom} module.  It's a lightweight implementation of
+the Level 1 DOM with support for XML namespaces.  The 
+\function{parse()} and \function{parseString()} convenience
+functions are provided for generating a DOM tree:
+
+\begin{verbatim}
+from xml.dom import minidom
+doc = minidom.parse('hamlet.xml')
+\end{verbatim}
+
+\code{doc} is a \class{Document} instance.  \class{Document}, like all
+the other DOM classes such as \class{Element} and \class{Text}, is a
+subclass of the \class{Node} base class.  All the nodes in a DOM tree
+therefore support certain common methods, such as \method{toxml()}
+which returns a string containing the XML representation of the node
+and its children.  Each class also has special methods of its own; for
+example, \class{Element} and \class{Document} instances have a method
+to find all child elements with a given tag name.  Continuing from the
+previous 2-line example:
+
+\begin{verbatim}
+perslist = doc.getElementsByTagName( 'PERSONA' )
+print perslist[0].toxml()
+print perslist[1].toxml()
+\end{verbatim}
+
+For the \textit{Hamlet} XML file, the above few lines output:
+
+\begin{verbatim}
+<PERSONA>CLAUDIUS, king of Denmark. </PERSONA>
+<PERSONA>HAMLET, son to the late, and nephew to the present king.</PERSONA>
+\end{verbatim}
+
+The root element of the document is available as
+\code{doc.documentElement}, and its children can be easily modified
+by deleting, adding, or removing nodes:
+
+\begin{verbatim}
+root = doc.documentElement
+
+# Remove the first child
+root.removeChild( root.childNodes[0] )
+
+# Move the new first child to the end
+root.appendChild( root.childNodes[0] )
+
+# Insert the new first child (originally,
+# the third child) before the 20th child.
+root.insertBefore( root.childNodes[0], root.childNodes[20] )
+\end{verbatim}
+
+Again, I will refer you to the Python documentation for a complete
+listing of the different \class{Node} classes and their various methods.
+
+\subsection{Relationship to PyXML}
+
+The XML Special Interest Group has been working on XML-related Python
+code for a while.  Its code distribution, called PyXML, is available
+from the SIG's Web pages at \url{http://www.python.org/sigs/xml-sig/}.
+The PyXML distribution also used the package name \samp{xml}.  If
+you've written programs that used PyXML, you're probably wondering
+about its compatibility with the 2.0 \module{xml} package.
+
+The answer is that Python 2.0's \module{xml} package isn't compatible
+with PyXML, but can be made compatible by installing a recent version
+PyXML.  Many applications can get by with the XML support that is
+included with Python 2.0, but more complicated applications will
+require that the full PyXML package will be installed.  When
+installed, PyXML versions 0.6.0 or greater will replace the
+\module{xml} package shipped with Python, and will be a strict
+superset of the standard package, adding a bunch of additional
+features.  Some of the additional features in PyXML include:
+
+\begin{itemize}
+\item 4DOM, a full DOM implementation
+from FourThought LLC.
+\item The xmlproc validating parser, written by Lars Marius Garshol.
+\item The \module{sgmlop} parser accelerator module, written by Fredrik Lundh.
+\end{itemize}

 % ======================================================================
 \section{Module changes}
@ -982,6 +1136,8 @@ standard library; some of the affected modules include
 and \module{nntplib}.  Consult the CVS logs for the exact
 patch-by-patch details.  

+% XXX gettext support
+
 Brian Gallew contributed OpenSSL support for the \module{socket}
 module.  OpenSSL is an implementation of the Secure Socket Layer,
 which encrypts the data being sent over a socket.  When compiling