Add new section on the XML package. (This was the only major new 2.0 feature

left that wasn't covered. The article is therefore now essentially complete.) A few minor changes
2000-10-12 02:37:14 +00:00 · 2000-10-12 02:37:14 +00:00 · 6032c48b47
parent 0be483fd4d
commit 6032c48b47
1 changed files with 165 additions and 9 deletions
--- a/Doc/whatsnew/whatsnew20.tex
+++ b/Doc/whatsnew/whatsnew20.tex
@ -156,8 +156,8 @@ type implementation by Fredrik Lundh.  A detailed explanation of the
 interface is in the file \file{Misc/unicode.txt} in the Python source
 distribution; it's also available on the Web at
 \url{http://starship.python.net/crew/lemburg/unicode-proposal.txt}.
-This article will simply cover the most significant points from the
+This article will simply cover the most significant points about the Unicode 
-full interface.
+interfaces.
 In Python source code, Unicode strings are written as
 \code{u"string"}.  Arbitrary Unicode characters can be written using a
@ -615,12 +615,12 @@ b.append(b)
 \end{verbatim}
 The comparison \code{a==b} returns true, because the two recursive
-data structures are isomorphic. \footnote{See the thread ``trashcan
+data structures are isomorphic. See the thread ``trashcan
 and PR\#7'' in the April 2000 archives of the python-dev mailing list
 for the discussion leading up to this implementation, and some useful
 relevant links.
-%http://www.python.org/pipermail/python-dev/2000-April/004834.html
+% Starting URL:
-}
+% http://www.python.org/pipermail/python-dev/2000-April/004834.html
 Work has been done on porting Python to 64-bit Windows on the Itanium
 processor, mostly by Trent Mick of ActiveState.  (Confusingly,
@ -950,7 +950,6 @@ expat_extension = Extension('xml.parsers.pyexpat',
       )
 setup (name = "PyXML", version = "0.5.4", 
       ext_modules =[ expat_extension ] )
 \end{verbatim}
 The Distutils can also take care of creating source and binary
@ -966,10 +965,165 @@ development.
 All this is documented in a new manual, \textit{Distributing Python
 Modules}, that joins the basic set of Python documentation.
-% ======================================================================
+======================================================================
-%\section{New XML Code}
+\section{XML Modules}
-%XXX write this section...
+Python 1.5.2 included a simple XML parser in the form of the
 \module{xmllib} module, contributed by Sjoerd Mullender.  Since
 1.5.2's release, two different interfaces for processing XML have
 become common: SAX2 (version 2 of the Simple API for XML) provides an
 event-driven interface with some similarities to \module{xmllib}, and
 the DOM (Document Object Model) provides a tree-based interface,
 transforming an XML document into a tree of nodes that can be
 traversed and modified.  Python 2.0 includes a SAX2 interface and a
 stripped-down DOM interface as part of the \module{xml} package.
 Here we will give a brief overview of these new interfaces; consult
 the Python documentation or the source code for complete details.
 The Python XML SIG is also working on improved documentation.
 \subsection{SAX2 Support}
 SAX defines an event-driven interface for parsing XML.  To use SAX,
 you must write a SAX handler class.  Handler classes inherit from
 various classes provided by SAX, and override various methods that
 will then be called by the XML parser.  For example, the
 \method{startElement} and \method{endElement} methods are called for
 every starting and end tag encountered by the parser, the
 \method{characters()} method is called for every chunk of character
 data, and so forth.
 The advantage of the event-driven approach is that that the whole
 document doesn't have to be resident in memory at any one time, which
 matters if you are processing really huge documents.  However, writing
 the SAX handler class can get very complicated if you're trying to
 modify the document structure in some elaborate way.
 For example, this little example program defines a handler that prints
 a message for every starting and ending tag, and then parses the file
 \file{hamlet.xml} using it:
 \begin{verbatim}
 from xml import sax
 class SimpleHandler(sax.ContentHandler):
    def startElement(self, name, attrs):
        print 'Start of element:', name, attrs.keys()
    def endElement(self, name):
        print 'End of element:', name
 # Create a parser object
 parser = sax.make_parser()
 # Tell it what handler to use
 handler = SimpleHandler()
 parser.setContentHandler( handler )
 # Parse a file!
 parser.parse( 'hamlet.xml' )
 \end{verbatim}
 For more information, consult the Python documentation, or the XML
 HOWTO at \url{http://www.python.org/doc/howto/xml/}.
 \subsection{DOM Support}
 The Document Object Model is a tree-based representation for an XML
 document.  A top-level \class{Document} instance is the root of the
 tree, and has a single child which is the top-level \class{Element}
 instance. This \class{Element} has children nodes representing
 character data and any sub-elements, which may have further children
 of their own, and so forth.  Using the DOM you can traverse the
 resulting tree any way you like, access element and attribute values,
 insert and delete nodes, and convert the tree back into XML.
 The DOM is useful for modifying XML documents, because you can create
 a DOM tree, modify it by adding new nodes or rearranging subtrees, and
 then produce a new XML document as output.  You can also construct a
 DOM tree manually and convert it to XML, which can be a more flexible
 way of producing XML output than simply writing
 \code{<tag1>}...\code{</tag1>} to a file.
 The DOM implementation included with Python lives in the
 \module{xml.dom.minidom} module.  It's a lightweight implementation of
 the Level 1 DOM with support for XML namespaces.  The 
 \function{parse()} and \function{parseString()} convenience
 functions are provided for generating a DOM tree:
 \begin{verbatim}
 from xml.dom import minidom
 doc = minidom.parse('hamlet.xml')
 \end{verbatim}
 \code{doc} is a \class{Document} instance.  \class{Document}, like all
 the other DOM classes such as \class{Element} and \class{Text}, is a
 subclass of the \class{Node} base class.  All the nodes in a DOM tree
 therefore support certain common methods, such as \method{toxml()}
 which returns a string containing the XML representation of the node
 and its children.  Each class also has special methods of its own; for
 example, \class{Element} and \class{Document} instances have a method
 to find all child elements with a given tag name.  Continuing from the
 previous 2-line example:
 \begin{verbatim}
 perslist = doc.getElementsByTagName( 'PERSONA' )
 print perslist[0].toxml()
 print perslist[1].toxml()
 \end{verbatim}
 For the \textit{Hamlet} XML file, the above few lines output:
 \begin{verbatim}
 <PERSONA>CLAUDIUS, king of Denmark. </PERSONA>
 <PERSONA>HAMLET, son to the late, and nephew to the present king.</PERSONA>
 \end{verbatim}
 The root element of the document is available as
 \code{doc.documentElement}, and its children can be easily modified
 by deleting, adding, or removing nodes:
 \begin{verbatim}
 root = doc.documentElement
 # Remove the first child
 root.removeChild( root.childNodes[0] )
 # Move the new first child to the end
 root.appendChild( root.childNodes[0] )
 # Insert the new first child (originally,
 # the third child) before the 20th child.
 root.insertBefore( root.childNodes[0], root.childNodes[20] )
 \end{verbatim}
 Again, I will refer you to the Python documentation for a complete
 listing of the different \class{Node} classes and their various methods.
 \subsection{Relationship to PyXML}
 The XML Special Interest Group has been working on XML-related Python
 code for a while.  Its code distribution, called PyXML, is available
 from the SIG's Web pages at \url{http://www.python.org/sigs/xml-sig/}.
 The PyXML distribution also used the package name \samp{xml}.  If
 you've written programs that used PyXML, you're probably wondering
 about its compatibility with the 2.0 \module{xml} package.
 The answer is that Python 2.0's \module{xml} package isn't compatible
 with PyXML, but can be made compatible by installing a recent version
 PyXML.  Many applications can get by with the XML support that is
 included with Python 2.0, but more complicated applications will
 require that the full PyXML package will be installed.  When
 installed, PyXML versions 0.6.0 or greater will replace the
 \module{xml} package shipped with Python, and will be a strict
 superset of the standard package, adding a bunch of additional
 features.  Some of the additional features in PyXML include:
 \begin{itemize}
 \item 4DOM, a full DOM implementation
 from FourThought LLC.
 \item The xmlproc validating parser, written by Lars Marius Garshol.
 \item The \module{sgmlop} parser accelerator module, written by Fredrik Lundh.
 \end{itemize}
 % ======================================================================
 \section{Module changes}
@ -982,6 +1136,8 @@ standard library; some of the affected modules include
 and \module{nntplib}.  Consult the CVS logs for the exact
 patch-by-patch details.  
 % XXX gettext support
 Brian Gallew contributed OpenSSL support for the \module{socket}
 module.  OpenSSL is an implementation of the Secure Socket Layer,
 which encrypts the data being sent over a socket.  When compiling