Issue 23729: Document ElementTree namespace handling and fix an omission in the XPATH predicate table.

This commit is contained in:
Raymond Hettinger 2015-03-22 15:29:09 -07:00
parent 936da2a796
commit f6e31b79a8
1 changed files with 68 additions and 0 deletions

View File

@ -284,6 +284,71 @@ sub-elements for a given element::
>>> ET.dump(a) >>> ET.dump(a)
<a><b /><c><d /></c></a> <a><b /><c><d /></c></a>
Parsing XML with Namespaces
^^^^^^^^^^^^^^^^^^^^^^^^^^^
If the XML input has `namespaces
<https://en.wikipedia.org/wiki/XML_namespace>`__, tags and attributes
with prefixes in the form ``prefix:sometag`` get expanded to
``{uri}tag`` where the *prefix* is replaced by the full *URI*. Also,
if there is a `default namespace
<http://www.w3.org/TR/2006/REC-xml-names-20060816/#defaulting>`__,
that full URI gets prepended to all of the non-prefixed tags.
Here is an XML example that incorporates two namespaces, one with the
prefix "fictional" and the other serving as the default namespace:
.. code-block:: xml
<?xml version="1.0"?>
<actors xmlns:fictional="http://characters.example.com"
xmlns="http://people.example.com">
<actor>
<name>John Cleese</name>
<fictional:character>Lancelot</fictional:character>
<fictional:character>Archie Leach</fictional:character>
</actor>
<actor>
<name>Eric Idle</name>
<fictional:character>Sir Robin</fictional:character>
<fictional:character>Gunther</fictional:character>
<fictional:character>Commander Clement</fictional:character>
</actor>
</actors>
One way to search and explore this XML example is to manually add the
URI to every tag or attribute in the xpath of a *find()* or *findall()*::
root = from_string(xml_text)
for actor in root.findall('{http://people.example.com}actor'):
name = actor.find('{http://people.example.com}name')
print(name.text)
for char in actor.findall('{http://characters.example.com}character'):
print(' |-->', char.text)
Another way to search the namespaced XML example is to create a
dictionary with your own prefixes and use those in the search::
ns = {'real_person': 'http://people.example.com',
'role': 'http://characters.example.com'}
for actor in root.findall('real_person:actor', ns):
name = actor.find('real_person:name', ns)
print(name.text)
for char in actor.findall('role:character', ns):
print(' |-->', char.text)
These two approaches both output::
John Cleese
|--> Lancelot
|--> Archie Leach
Eric Idle
|--> Sir Robin
|--> Gunther
|--> Commander Clement
Additional resources Additional resources
^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^
@ -366,6 +431,9 @@ Supported XPath syntax
| ``[tag]`` | Selects all elements that have a child named | | ``[tag]`` | Selects all elements that have a child named |
| | ``tag``. Only immediate children are supported. | | | ``tag``. Only immediate children are supported. |
+-----------------------+------------------------------------------------------+ +-----------------------+------------------------------------------------------+
| ``[tag=text]`` | Selects all elements that have a child named |
| | ``tag`` that includes the given ``text``. |
+-----------------------+------------------------------------------------------+
| ``[position]`` | Selects all elements that are located at the given | | ``[position]`` | Selects all elements that are located at the given |
| | position. The position can be either an integer | | | position. The position can be either an integer |
| | (1 is the first position), the expression ``last()`` | | | (1 is the first position), the expression ``last()`` |