Issue #15586: porting ET's new documentation bits to 2.7. Patch by Daniel Ellis
This commit is contained in:
parent
85307b46d1
commit
6ee2187cdc
|
@ -46,11 +46,313 @@ the xml.etree.ElementTree.
|
||||||
`Introducing ElementTree 1.3
|
`Introducing ElementTree 1.3
|
||||||
<http://effbot.org/zone/elementtree-13-intro.htm>`_.
|
<http://effbot.org/zone/elementtree-13-intro.htm>`_.
|
||||||
|
|
||||||
|
Tutorial
|
||||||
|
--------
|
||||||
|
|
||||||
|
This is a short tutorial for using :mod:`xml.etree.ElementTree` (``ET`` in
|
||||||
|
short). The goal is to demonstrate some of the building blocks and basic
|
||||||
|
concepts of the module.
|
||||||
|
|
||||||
|
XML tree and elements
|
||||||
|
^^^^^^^^^^^^^^^^^^^^^
|
||||||
|
|
||||||
|
XML is an inherently hierarchical data format, and the most natural way to
|
||||||
|
represent it is with a tree. ``ET`` has two classes for this purpose -
|
||||||
|
:class:`ElementTree` represents the whole XML document as a tree, and
|
||||||
|
:class:`Element` represents a single node in this tree. Interactions with
|
||||||
|
the whole document (reading and writing to/from files) are usually done
|
||||||
|
on the :class:`ElementTree` level. Interactions with a single XML element
|
||||||
|
and its sub-elements are done on the :class:`Element` level.
|
||||||
|
|
||||||
|
.. _elementtree-parsing-xml:
|
||||||
|
|
||||||
|
Parsing XML
|
||||||
|
^^^^^^^^^^^
|
||||||
|
|
||||||
|
We'll be using the following XML document as the sample data for this section:
|
||||||
|
|
||||||
|
.. code-block:: xml
|
||||||
|
|
||||||
|
<?xml version="1.0"?>
|
||||||
|
<data>
|
||||||
|
<country name="Liechtenstein">
|
||||||
|
<rank>1</rank>
|
||||||
|
<year>2008</year>
|
||||||
|
<gdppc>141100</gdppc>
|
||||||
|
<neighbor name="Austria" direction="E"/>
|
||||||
|
<neighbor name="Switzerland" direction="W"/>
|
||||||
|
</country>
|
||||||
|
<country name="Singapore">
|
||||||
|
<rank>4</rank>
|
||||||
|
<year>2011</year>
|
||||||
|
<gdppc>59900</gdppc>
|
||||||
|
<neighbor name="Malaysia" direction="N"/>
|
||||||
|
</country>
|
||||||
|
<country name="Panama">
|
||||||
|
<rank>68</rank>
|
||||||
|
<year>2011</year>
|
||||||
|
<gdppc>13600</gdppc>
|
||||||
|
<neighbor name="Costa Rica" direction="W"/>
|
||||||
|
<neighbor name="Colombia" direction="E"/>
|
||||||
|
</country>
|
||||||
|
</data>
|
||||||
|
|
||||||
|
We have a number of ways to import the data. Reading the file from disk::
|
||||||
|
|
||||||
|
import xml.etree.ElementTree as ET
|
||||||
|
tree = ET.parse('country_data.xml')
|
||||||
|
root = tree.getroot()
|
||||||
|
|
||||||
|
Reading the data from a string::
|
||||||
|
|
||||||
|
root = ET.fromstring(country_data_as_string)
|
||||||
|
|
||||||
|
:func:`fromstring` parses XML from a string directly into an :class:`Element`,
|
||||||
|
which is the root element of the parsed tree. Other parsing functions may
|
||||||
|
create an :class:`ElementTree`. Check the documentation to be sure.
|
||||||
|
|
||||||
|
As an :class:`Element`, ``root`` has a tag and a dictionary of attributes::
|
||||||
|
|
||||||
|
>>> root.tag
|
||||||
|
'data'
|
||||||
|
>>> root.attrib
|
||||||
|
{}
|
||||||
|
|
||||||
|
It also has children nodes over which we can iterate::
|
||||||
|
|
||||||
|
>>> for child in root:
|
||||||
|
... print child.tag, child.attrib
|
||||||
|
...
|
||||||
|
country {'name': 'Liechtenstein'}
|
||||||
|
country {'name': 'Singapore'}
|
||||||
|
country {'name': 'Panama'}
|
||||||
|
|
||||||
|
Children are nested, and we can access specific child nodes by index::
|
||||||
|
|
||||||
|
>>> root[0][1].text
|
||||||
|
'2008'
|
||||||
|
|
||||||
|
Finding interesting elements
|
||||||
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||||
|
|
||||||
|
:class:`Element` has some useful methods that help iterate recursively over all
|
||||||
|
the sub-tree below it (its children, their children, and so on). For example,
|
||||||
|
:meth:`Element.iter`::
|
||||||
|
|
||||||
|
>>> for neighbor in root.iter('neighbor'):
|
||||||
|
... print neighbor.attrib
|
||||||
|
...
|
||||||
|
{'name': 'Austria', 'direction': 'E'}
|
||||||
|
{'name': 'Switzerland', 'direction': 'W'}
|
||||||
|
{'name': 'Malaysia', 'direction': 'N'}
|
||||||
|
{'name': 'Costa Rica', 'direction': 'W'}
|
||||||
|
{'name': 'Colombia', 'direction': 'E'}
|
||||||
|
|
||||||
|
:meth:`Element.findall` finds only elements with a tag which are direct
|
||||||
|
children of the current element. :meth:`Element.find` finds the *first* child
|
||||||
|
with a particular tag, and :meth:`Element.text` accesses the element's text
|
||||||
|
content. :meth:`Element.get` accesses the element's attributes::
|
||||||
|
|
||||||
|
>>> for country in root.findall('country'):
|
||||||
|
... rank = country.find('rank').text
|
||||||
|
... name = country.get('name')
|
||||||
|
... print name, rank
|
||||||
|
...
|
||||||
|
Liechtenstein 1
|
||||||
|
Singapore 4
|
||||||
|
Panama 68
|
||||||
|
|
||||||
|
More sophisticated specification of which elements to look for is possible by
|
||||||
|
using :ref:`XPath <elementtree-xpath>`.
|
||||||
|
|
||||||
|
Modifying an XML File
|
||||||
|
^^^^^^^^^^^^^^^^^^^^^
|
||||||
|
|
||||||
|
:class:`ElementTree` provides a simple way to build XML documents and write them to files.
|
||||||
|
The :meth:`ElementTree.write` method serves this purpose.
|
||||||
|
|
||||||
|
Once created, an :class:`Element` object may be manipulated by directly changing
|
||||||
|
its fields (such as :attr:`Element.text`), adding and modifying attributes
|
||||||
|
(:meth:`Element.set` method), as well as adding new children (for example
|
||||||
|
with :meth:`Element.append`).
|
||||||
|
|
||||||
|
Let's say we want to add one to each country's rank, and add an ``updated``
|
||||||
|
attribute to the rank element::
|
||||||
|
|
||||||
|
>>> for rank in root.iter('rank'):
|
||||||
|
... new_rank = int(rank.text) + 1
|
||||||
|
... rank.text = str(new_rank)
|
||||||
|
... rank.set('updated', 'yes')
|
||||||
|
...
|
||||||
|
>>> tree.write('output.xml')
|
||||||
|
|
||||||
|
Our XML now looks like this:
|
||||||
|
|
||||||
|
.. code-block:: xml
|
||||||
|
|
||||||
|
<?xml version="1.0"?>
|
||||||
|
<data>
|
||||||
|
<country name="Liechtenstein">
|
||||||
|
<rank updated="yes">2</rank>
|
||||||
|
<year>2008</year>
|
||||||
|
<gdppc>141100</gdppc>
|
||||||
|
<neighbor name="Austria" direction="E"/>
|
||||||
|
<neighbor name="Switzerland" direction="W"/>
|
||||||
|
</country>
|
||||||
|
<country name="Singapore">
|
||||||
|
<rank updated="yes">5</rank>
|
||||||
|
<year>2011</year>
|
||||||
|
<gdppc>59900</gdppc>
|
||||||
|
<neighbor name="Malaysia" direction="N"/>
|
||||||
|
</country>
|
||||||
|
<country name="Panama">
|
||||||
|
<rank updated="yes">69</rank>
|
||||||
|
<year>2011</year>
|
||||||
|
<gdppc>13600</gdppc>
|
||||||
|
<neighbor name="Costa Rica" direction="W"/>
|
||||||
|
<neighbor name="Colombia" direction="E"/>
|
||||||
|
</country>
|
||||||
|
</data>
|
||||||
|
|
||||||
|
We can remove elements using :meth:`Element.remove`. Let's say we want to
|
||||||
|
remove all countries with a rank higher than 50::
|
||||||
|
|
||||||
|
>>> for country in root.findall('country'):
|
||||||
|
... rank = int(country.find('rank').text)
|
||||||
|
... if rank > 50:
|
||||||
|
... root.remove(country)
|
||||||
|
...
|
||||||
|
>>> tree.write('output.xml')
|
||||||
|
|
||||||
|
Our XML now looks like this:
|
||||||
|
|
||||||
|
.. code-block:: xml
|
||||||
|
|
||||||
|
<?xml version="1.0"?>
|
||||||
|
<data>
|
||||||
|
<country name="Liechtenstein">
|
||||||
|
<rank updated="yes">2</rank>
|
||||||
|
<year>2008</year>
|
||||||
|
<gdppc>141100</gdppc>
|
||||||
|
<neighbor name="Austria" direction="E"/>
|
||||||
|
<neighbor name="Switzerland" direction="W"/>
|
||||||
|
</country>
|
||||||
|
<country name="Singapore">
|
||||||
|
<rank updated="yes">5</rank>
|
||||||
|
<year>2011</year>
|
||||||
|
<gdppc>59900</gdppc>
|
||||||
|
<neighbor name="Malaysia" direction="N"/>
|
||||||
|
</country>
|
||||||
|
</data>
|
||||||
|
|
||||||
|
Building XML documents
|
||||||
|
^^^^^^^^^^^^^^^^^^^^^^
|
||||||
|
|
||||||
|
The :func:`SubElement` function also provides a convenient way to create new
|
||||||
|
sub-elements for a given element::
|
||||||
|
|
||||||
|
>>> a = ET.Element('a')
|
||||||
|
>>> b = ET.SubElement(a, 'b')
|
||||||
|
>>> c = ET.SubElement(a, 'c')
|
||||||
|
>>> d = ET.SubElement(c, 'd')
|
||||||
|
>>> ET.dump(a)
|
||||||
|
<a><b /><c><d /></c></a>
|
||||||
|
|
||||||
|
Additional resources
|
||||||
|
^^^^^^^^^^^^^^^^^^^^
|
||||||
|
|
||||||
|
See http://effbot.org/zone/element-index.htm for tutorials and links to other
|
||||||
|
docs.
|
||||||
|
|
||||||
|
.. _elementtree-xpath:
|
||||||
|
|
||||||
|
XPath support
|
||||||
|
-------------
|
||||||
|
|
||||||
|
This module provides limited support for
|
||||||
|
`XPath expressions <http://www.w3.org/TR/xpath>`_ for locating elements in a
|
||||||
|
tree. The goal is to support a small subset of the abbreviated syntax; a full
|
||||||
|
XPath engine is outside the scope of the module.
|
||||||
|
|
||||||
|
Example
|
||||||
|
^^^^^^^
|
||||||
|
|
||||||
|
Here's an example that demonstrates some of the XPath capabilities of the
|
||||||
|
module. We'll be using the ``countrydata`` XML document from the
|
||||||
|
:ref:`Parsing XML <elementtree-parsing-xml>` section::
|
||||||
|
|
||||||
|
import xml.etree.ElementTree as ET
|
||||||
|
|
||||||
|
root = ET.fromstring(countrydata)
|
||||||
|
|
||||||
|
# Top-level elements
|
||||||
|
root.findall(".")
|
||||||
|
|
||||||
|
# All 'neighbor' grand-children of 'country' children of the top-level
|
||||||
|
# elements
|
||||||
|
root.findall("./country/neighbor")
|
||||||
|
|
||||||
|
# Nodes with name='Singapore' that have a 'year' child
|
||||||
|
root.findall(".//year/..[@name='Singapore']")
|
||||||
|
|
||||||
|
# 'year' nodes that are children of nodes with name='Singapore'
|
||||||
|
root.findall(".//*[@name='Singapore']/year")
|
||||||
|
|
||||||
|
# All 'neighbor' nodes that are the second child of their parent
|
||||||
|
root.findall(".//neighbor[2]")
|
||||||
|
|
||||||
|
Supported XPath syntax
|
||||||
|
^^^^^^^^^^^^^^^^^^^^^^
|
||||||
|
|
||||||
|
+-----------------------+------------------------------------------------------+
|
||||||
|
| Syntax | Meaning |
|
||||||
|
+=======================+======================================================+
|
||||||
|
| ``tag`` | Selects all child elements with the given tag. |
|
||||||
|
| | For example, ``spam`` selects all child elements |
|
||||||
|
| | named ``spam``, ``spam/egg`` selects all |
|
||||||
|
| | grandchildren named ``egg`` in all children named |
|
||||||
|
| | ``spam``. |
|
||||||
|
+-----------------------+------------------------------------------------------+
|
||||||
|
| ``*`` | Selects all child elements. For example, ``*/egg`` |
|
||||||
|
| | selects all grandchildren named ``egg``. |
|
||||||
|
+-----------------------+------------------------------------------------------+
|
||||||
|
| ``.`` | Selects the current node. This is mostly useful |
|
||||||
|
| | at the beginning of the path, to indicate that it's |
|
||||||
|
| | a relative path. |
|
||||||
|
+-----------------------+------------------------------------------------------+
|
||||||
|
| ``//`` | Selects all subelements, on all levels beneath the |
|
||||||
|
| | current element. For example, ``.//egg`` selects |
|
||||||
|
| | all ``egg`` elements in the entire tree. |
|
||||||
|
+-----------------------+------------------------------------------------------+
|
||||||
|
| ``..`` | Selects the parent element. |
|
||||||
|
+-----------------------+------------------------------------------------------+
|
||||||
|
| ``[@attrib]`` | Selects all elements that have the given attribute. |
|
||||||
|
+-----------------------+------------------------------------------------------+
|
||||||
|
| ``[@attrib='value']`` | Selects all elements for which the given attribute |
|
||||||
|
| | has the given value. The value cannot contain |
|
||||||
|
| | quotes. |
|
||||||
|
+-----------------------+------------------------------------------------------+
|
||||||
|
| ``[tag]`` | Selects all elements that have a child named |
|
||||||
|
| | ``tag``. Only immediate children are supported. |
|
||||||
|
+-----------------------+------------------------------------------------------+
|
||||||
|
| ``[position]`` | Selects all elements that are located at the given |
|
||||||
|
| | position. The position can be either an integer |
|
||||||
|
| | (1 is the first position), the expression ``last()`` |
|
||||||
|
| | (for the last position), or a position relative to |
|
||||||
|
| | the last position (e.g. ``last()-1``). |
|
||||||
|
+-----------------------+------------------------------------------------------+
|
||||||
|
|
||||||
|
Predicates (expressions within square brackets) must be preceded by a tag
|
||||||
|
name, an asterisk, or another predicate. ``position`` predicates must be
|
||||||
|
preceded by a tag name.
|
||||||
|
|
||||||
|
Reference
|
||||||
|
---------
|
||||||
|
|
||||||
.. _elementtree-functions:
|
.. _elementtree-functions:
|
||||||
|
|
||||||
Functions
|
Functions
|
||||||
---------
|
^^^^^^^^^
|
||||||
|
|
||||||
|
|
||||||
.. function:: Comment(text=None)
|
.. function:: Comment(text=None)
|
||||||
|
@ -196,8 +498,7 @@ Functions
|
||||||
.. _elementtree-element-objects:
|
.. _elementtree-element-objects:
|
||||||
|
|
||||||
Element Objects
|
Element Objects
|
||||||
---------------
|
^^^^^^^^^^^^^^^
|
||||||
|
|
||||||
|
|
||||||
.. class:: Element(tag, attrib={}, **extra)
|
.. class:: Element(tag, attrib={}, **extra)
|
||||||
|
|
||||||
|
@ -387,7 +688,7 @@ Element Objects
|
||||||
.. _elementtree-elementtree-objects:
|
.. _elementtree-elementtree-objects:
|
||||||
|
|
||||||
ElementTree Objects
|
ElementTree Objects
|
||||||
-------------------
|
^^^^^^^^^^^^^^^^^^^
|
||||||
|
|
||||||
|
|
||||||
.. class:: ElementTree(element=None, file=None)
|
.. class:: ElementTree(element=None, file=None)
|
||||||
|
@ -507,7 +808,7 @@ Example of changing the attribute "target" of every link in first paragraph::
|
||||||
.. _elementtree-qname-objects:
|
.. _elementtree-qname-objects:
|
||||||
|
|
||||||
QName Objects
|
QName Objects
|
||||||
-------------
|
^^^^^^^^^^^^^
|
||||||
|
|
||||||
|
|
||||||
.. class:: QName(text_or_uri, tag=None)
|
.. class:: QName(text_or_uri, tag=None)
|
||||||
|
@ -523,7 +824,7 @@ QName Objects
|
||||||
.. _elementtree-treebuilder-objects:
|
.. _elementtree-treebuilder-objects:
|
||||||
|
|
||||||
TreeBuilder Objects
|
TreeBuilder Objects
|
||||||
-------------------
|
^^^^^^^^^^^^^^^^^^^
|
||||||
|
|
||||||
|
|
||||||
.. class:: TreeBuilder(element_factory=None)
|
.. class:: TreeBuilder(element_factory=None)
|
||||||
|
@ -574,7 +875,7 @@ TreeBuilder Objects
|
||||||
.. _elementtree-xmlparser-objects:
|
.. _elementtree-xmlparser-objects:
|
||||||
|
|
||||||
XMLParser Objects
|
XMLParser Objects
|
||||||
-----------------
|
^^^^^^^^^^^^^^^^^
|
||||||
|
|
||||||
|
|
||||||
.. class:: XMLParser(html=0, target=None, encoding=None)
|
.. class:: XMLParser(html=0, target=None, encoding=None)
|
||||||
|
|
Loading…
Reference in New Issue