Issue #6472: The xml.etree package is updated to ElementTree 1.3. The cElementTree module is updated too.

This commit is contained in:
Florent Xicluna 2010-03-11 14:36:19 +00:00
parent 4478662f83
commit 3e8c189faa
11 changed files with 3323 additions and 1207 deletions

View File

@ -26,7 +26,8 @@ Each element has a number of properties associated with it:
* a number of child elements, stored in a Python sequence
To create an element instance, use the Element or SubElement factory functions.
To create an element instance, use the :class:`Element` constructor or the
:func:`SubElement` factory function.
The :class:`ElementTree` class can be used to wrap an element structure, and
convert it from and to XML.
@ -46,9 +47,10 @@ Functions
.. function:: Comment([text])
Comment element factory. This factory function creates a special element that
will be serialized as an XML comment. The comment string can be either an 8-bit
ASCII string or a Unicode string. *text* is a string containing the comment
string. Returns an element instance representing a comment.
will be serialized as an XML comment by the standard serializer. The comment
string can be either an 8-bit ASCII string or a Unicode string. *text* is a
string containing the comment string. Returns an element instance representing
a comment.
.. function:: dump(elem)
@ -62,37 +64,36 @@ Functions
*elem* is an element tree or an individual element.
.. function:: Element(tag[, attrib][, **extra])
Element factory. This function returns an object implementing the standard
Element interface. The exact class or type of that object is implementation
dependent, but it will always be compatible with the _ElementInterface class in
this module.
The element name, attribute names, and attribute values can be either 8-bit
ASCII strings or Unicode strings. *tag* is the element name. *attrib* is an
optional dictionary, containing element attributes. *extra* contains additional
attributes, given as keyword arguments. Returns an element instance.
.. function:: fromstring(text)
Parses an XML section from a string constant. Same as XML. *text* is a string
containing XML data. Returns an Element instance.
.. function:: fromstringlist(sequence[, parser])
Parses an XML document from a sequence of string fragments. *sequence* is a list
or other sequence containing XML data fragments. *parser* is an optional parser
instance. If not given, the standard :class:`XMLParser` parser is used.
Returns an Element instance.
.. versionadded:: 2.7
.. function:: iselement(element)
Checks if an object appears to be a valid element object. *element* is an
element instance. Returns a true value if this is an element object.
.. function:: iterparse(source[, events])
.. function:: iterparse(source[, events[, parser]])
Parses an XML section into an element tree incrementally, and reports what's
going on to the user. *source* is a filename or file object containing XML data.
*events* is a list of events to report back. If omitted, only "end" events are
reported. Returns an :term:`iterator` providing ``(event, elem)`` pairs.
reported. *parser* is an optional parser instance. If not given, the standard
:class:`XMLParser` parser is used. Returns an :term:`iterator`
providing ``(event, elem)`` pairs.
.. note::
@ -109,8 +110,8 @@ Functions
Parses an XML section into an element tree. *source* is a filename or file
object containing XML data. *parser* is an optional parser instance. If not
given, the standard XMLTreeBuilder parser is used. Returns an ElementTree
instance.
given, the standard :class:`XMLParser` parser is used. Returns an
:class:`ElementTree` instance.
.. function:: ProcessingInstruction(target[, text])
@ -121,6 +122,16 @@ Functions
an element instance, representing a processing instruction.
.. function:: register_namespace(prefix, uri)
Registers a namespace prefix. The registry is global, and any existing mapping
for either the given prefix or the namespace URI will be removed. *prefix* is a
namespace prefix. *uri* is a namespace uri. Tags and attributes in this namespace
will be serialized with the given prefix, if at all possible.
.. versionadded:: 2.7
.. function:: SubElement(parent, tag[, attrib[, **extra]])
Subelement factory. This function creates an element instance, and appends it
@ -140,155 +151,193 @@ Functions
US-ASCII). Returns an encoded string containing the XML data.
.. function:: XML(text)
.. function:: tostringlist(element[, encoding])
Generates a string representation of an XML element, including all subelements.
*element* is an Element instance. *encoding* is the output encoding (default is
US-ASCII). Returns a sequence object containing the XML data.
.. versionadded:: 2.7
.. function:: XML(text[, parser])
Parses an XML section from a string constant. This function can be used to
embed "XML literals" in Python code. *text* is a string containing XML data.
Returns an Element instance.
*parser* is an optional parser instance. If not given, the standard
:class:`XMLParser` parser is used. Returns an Element instance.
.. function:: XMLID(text)
.. function:: XMLID(text[, parser])
Parses an XML section from a string constant, and also returns a dictionary
which maps from element id:s to elements. *text* is a string containing XML
data. Returns a tuple containing an Element instance and a dictionary.
data. *parser* is an optional parser instance. If not given, the standard
:class:`XMLParser` parser is used. Returns a tuple containing an Element
instance and a dictionary.
.. _elementtree-element-interface:
.. _elementtree-element-objects:
The Element Interface
---------------------
Element objects returned by Element or SubElement have the following methods
and attributes.
Element Objects
---------------
.. attribute:: Element.tag
.. class:: Element(tag[, attrib[, **extra]])
Element class. This class defines the Element interface, and provides a
reference implementation of this interface.
The element name, attribute names, and attribute values can be either 8-bit
ASCII strings or Unicode strings. *tag* is the element name. *attrib* is an
optional dictionary, containing element attributes. *extra* contains additional
attributes, given as keyword arguments.
.. attribute:: tag
A string identifying what kind of data this element represents (the element
type, in other words).
.. attribute:: Element.text
.. attribute:: text
The *text* attribute can be used to hold additional data associated with the
element. As the name implies this attribute is usually a string but may be any
application-specific object. If the element is created from an XML file the
attribute will contain any text found between the element tags.
element. As the name implies this attribute is usually a string but may be
any application-specific object. If the element is created from an XML file
the attribute will contain any text found between the element tags.
.. attribute:: Element.tail
.. attribute:: tail
The *tail* attribute can be used to hold additional data associated with the
element. This attribute is usually a string but may be any application-specific
object. If the element is created from an XML file the attribute will contain
any text found after the element's end tag and before the next tag.
element. This attribute is usually a string but may be any
application-specific object. If the element is created from an XML file the
attribute will contain any text found after the element's end tag and before
the next tag.
.. attribute:: Element.attrib
.. attribute:: attrib
A dictionary containing the element's attributes. Note that while the *attrib*
value is always a real mutable Python dictionary, an ElementTree implementation
may choose to use another internal representation, and create the dictionary
only if someone asks for it. To take advantage of such implementations, use the
dictionary methods below whenever possible.
A dictionary containing the element's attributes. Note that while the
*attrib* value is always a real mutable Python dictionary, an ElementTree
implementation may choose to use another internal representation, and create
the dictionary only if someone asks for it. To take advantage of such
implementations, use the dictionary methods below whenever possible.
The following dictionary-like methods work on the element attributes.
The following dictionary-like methods work on the element attributes.
.. method:: Element.clear()
.. method:: clear()
Resets an element. This function removes all subelements, clears all
attributes, and sets the text and tail attributes to None.
.. method:: Element.get(key[, default=None])
.. method:: get(key[, default])
Gets the element attribute named *key*.
Returns the attribute value, or *default* if the attribute was not found.
.. method:: Element.items()
.. method:: items()
Returns the element attributes as a sequence of (name, value) pairs. The
attributes are returned in an arbitrary order.
.. method:: Element.keys()
.. method:: keys()
Returns the elements attribute names as a list. The names are returned in an
arbitrary order.
.. method:: Element.set(key, value)
.. method:: set(key, value)
Set the attribute *key* on the element to *value*.
The following methods work on the element's children (subelements).
The following methods work on the element's children (subelements).
.. method:: Element.append(subelement)
.. method:: append(subelement)
Adds the element *subelement* to the end of this elements internal list of
subelements.
.. method:: Element.find(match)
.. method:: extend(subelements)
Appends *subelements* from a sequence object with zero or more elements.
Raises :exc:`AssertionError` if a subelement is not a valid object.
.. versionadded:: 2.7
.. method:: find(match)
Finds the first subelement matching *match*. *match* may be a tag name or path.
Returns an element instance or ``None``.
.. method:: Element.findall(match)
.. method:: findall(match)
Finds all subelements matching *match*. *match* may be a tag name or path.
Returns an iterable yielding all matching elements in document order.
.. method:: Element.findtext(condition[, default=None])
.. method:: findtext(condition[, default])
Finds text for the first subelement matching *condition*. *condition* may be a
tag name or path. Returns the text content of the first matching element, or
*default* if no element was found. Note that if the matching element has no
text content an empty string is returned.
Finds text for the first subelement matching *condition*. *condition* may be
a tag name or path. Returns the text content of the first matching element,
or *default* if no element was found. Note that if the matching element has
no text content an empty string is returned.
.. method:: Element.getchildren()
.. method:: getchildren()
Returns all subelements. The elements are returned in document order.
.. deprecated:: 2.7
Use ``list(elem)`` or iteration.
.. method:: Element.getiterator([tag=None])
.. method:: getiterator([tag])
Creates a tree iterator with the current element as the root. The iterator
iterates over this element and all elements below it, in document (depth first)
order. If *tag* is not ``None`` or ``'*'``, only elements whose tag equals
*tag* are returned from the iterator.
.. deprecated:: 2.7
Use method :meth:`Element.iter` instead.
.. method:: Element.insert(index, element)
.. method:: insert(index, element)
Inserts a subelement at the given position in this element.
.. method:: Element.makeelement(tag, attrib)
.. method:: iter([tag])
Creates a new element object of the same type as this element. Do not call this
method, use the SubElement factory function instead.
Creates a tree iterator with the current element as the root. The iterator
iterates over this element and all elements below it, in document (depth
first) order. If *tag* is not ``None`` or ``'*'``, only elements whose tag
equals *tag* are returned from the iterator. If the tree structure is
modified during iteration, the result is undefined.
.. method:: Element.remove(subelement)
.. method:: makeelement(tag, attrib)
Removes *subelement* from the element. Unlike the findXYZ methods this method
compares elements based on the instance identity, not on tag value or contents.
Creates a new element object of the same type as this element. Do not call
this method, use the SubElement factory function instead.
Element objects also support the following sequence type methods for working
with subelements: :meth:`__delitem__`, :meth:`__getitem__`, :meth:`__setitem__`,
:meth:`__len__`.
Caution: Because Element objects do not define a :meth:`__nonzero__` method,
elements with no subelements will test as ``False``. ::
.. method:: remove(subelement)
Removes *subelement* from the element. Unlike the findXYZ methods this
method compares elements based on the instance identity, not on tag value
or contents.
Element objects also support the following sequence type methods for working
with subelements: :meth:`__delitem__`, :meth:`__getitem__`, :meth:`__setitem__`,
:meth:`__len__`.
Caution: Because Element objects do not define a :meth:`__nonzero__` method,
elements with no subelements will test as ``False``. ::
element = root.find('foo')
@ -348,9 +397,8 @@ ElementTree Objects
.. method:: getiterator([tag])
Creates and returns a tree iterator for the root element. The iterator
loops over all elements in this tree, in section order. *tag* is the tag
to look for (default is to return all elements)
.. deprecated:: 2.7
Use method :meth:`ElementTree.iter` instead.
.. method:: getroot()
@ -358,19 +406,28 @@ ElementTree Objects
Returns the root element for this tree.
.. method:: iter([tag])
Creates and returns a tree iterator for the root element. The iterator
loops over all elements in this tree, in section order. *tag* is the tag
to look for (default is to return all elements)
.. method:: parse(source[, parser])
Loads an external XML section into this element tree. *source* is a file
name or file object. *parser* is an optional parser instance. If not
given, the standard XMLTreeBuilder parser is used. Returns the section
given, the standard XMLParser parser is used. Returns the section
root element.
.. method:: write(file[, encoding])
.. method:: write(file[, encoding[, xml_declaration]])
Writes the element tree to a file, as XML. *file* is a file name, or a
file object opened for writing. *encoding* [1]_ is the output encoding
(default is US-ASCII).
(default is US-ASCII). *xml_declaration* controls if an XML declaration
should be added to the file. Use False for never, True for always, None
for only if not US-ASCII or UTF-8. None is default.
This is the XML file that is going to be manipulated::
@ -389,13 +446,13 @@ Example of changing the attribute "target" of every link in first paragraph::
>>> from xml.etree.ElementTree import ElementTree
>>> tree = ElementTree()
>>> tree.parse("index.xhtml")
<Element html at b7d3f1ec>
<Element 'html' at b7d3f1ec>
>>> p = tree.find("body/p") # Finds first occurrence of tag p in body
>>> p
<Element p at 8416e0c>
>>> links = p.getiterator("a") # Returns list of all links
<Element 'p' at 8416e0c>
>>> links = list(p.iter("a")) # Returns list of all links
>>> links
[<Element a at b7d4f9ec>, <Element a at b7d4fb0c>]
[<Element 'a' at b7d4f9ec>, <Element 'a' at b7d4fb0c>]
>>> for i in links: # Iterates through all found links
... i.attrib["target"] = "blank"
>>> tree.write("output.xhtml")
@ -433,7 +490,7 @@ TreeBuilder Objects
.. method:: close()
Flushes the parser buffers, and returns the toplevel document
Flushes the builder buffers, and returns the toplevel document
element. Returns an Element instance.
@ -455,18 +512,31 @@ TreeBuilder Objects
containing element attributes. Returns the opened element.
.. _elementtree-xmltreebuilder-objects:
In addition, a custom :class:`TreeBuilder` object can provide the
following method:
XMLTreeBuilder Objects
----------------------
.. method:: doctype(name, pubid, system)
Handles a doctype declaration. *name* is the doctype name. *pubid* is the
public identifier. *system* is the system identifier. This method does not
exist on the default :class:`TreeBuilder` class.
.. versionadded:: 2.7
.. class:: XMLTreeBuilder([html,] [target])
.. _elementtree-xmlparser-objects:
XMLParser Objects
-----------------
.. class:: XMLParser([html [, target[, encoding]]])
Element structure builder for XML source data, based on the expat parser. *html*
are predefined HTML entities. This flag is not supported by the current
implementation. *target* is the target object. If omitted, the builder uses an
instance of the standard TreeBuilder class.
instance of the standard TreeBuilder class. *encoding* [1]_ is optional.
If given, the value overrides the encoding specified in the XML file.
.. method:: close()
@ -476,22 +546,23 @@ XMLTreeBuilder Objects
.. method:: doctype(name, pubid, system)
Handles a doctype declaration. *name* is the doctype name. *pubid* is the
public identifier. *system* is the system identifier.
.. deprecated:: 2.7
Define the :meth:`TreeBuilder.doctype` method on a custom TreeBuilder
target.
.. method:: feed(data)
Feeds data to the parser. *data* is encoded data.
:meth:`XMLTreeBuilder.feed` calls *target*\'s :meth:`start` method
:meth:`XMLParser.feed` calls *target*\'s :meth:`start` method
for each opening tag, its :meth:`end` method for each closing tag,
and data is processed by method :meth:`data`. :meth:`XMLTreeBuilder.close`
and data is processed by method :meth:`data`. :meth:`XMLParser.close`
calls *target*\'s method :meth:`close`.
:class:`XMLTreeBuilder` can be used not only for building a tree structure.
:class:`XMLParser` can be used not only for building a tree structure.
This is an example of counting the maximum depth of an XML file::
>>> from xml.etree.ElementTree import XMLTreeBuilder
>>> from xml.etree.ElementTree import XMLParser
>>> class MaxDepth: # The target object of the parser
... maxDepth = 0
... depth = 0
@ -507,7 +578,7 @@ This is an example of counting the maximum depth of an XML file::
... return self.maxDepth
...
>>> target = MaxDepth()
>>> parser = XMLTreeBuilder(target=target)
>>> parser = XMLParser(target=target)
>>> exampleXml = """
... <a>
... <b>
@ -530,4 +601,3 @@ This is an example of counting the maximum depth of an XML file::
appropriate standards. For example, "UTF-8" is valid, but "UTF8" is
not. See http://www.w3.org/TR/2006/REC-xml11-20060816/#NT-EncodingDecl
and http://www.iana.org/assignments/character-sets.

View File

@ -0,0 +1,7 @@
<?pi data?>
<!-- comment -->
<root xmlns='namespace'>
<element key='value'>text</element>
<element>text</element>tail
<empty-element/>
</root>

View File

@ -0,0 +1,6 @@
<!-- comment -->
<root>
<element key='value'>text</element>
<element>text</element>tail
<empty-element/>
</root>

File diff suppressed because it is too large Load Diff

View File

@ -1,30 +1,11 @@
# xml.etree test for cElementTree
import sys
from test import test_support
ET = test_support.import_module('xml.etree.cElementTree')
cET = test_support.import_module('xml.etree.cElementTree')
SAMPLE_XML = """
<body>
<tag>text</tag>
<tag />
<section>
<tag>subtext</tag>
</section>
</body>
"""
SAMPLE_XML_NS = """
<body xmlns="http://effbot.org/ns">
<tag>text</tag>
<tag />
<section>
<tag>subtext</tag>
</section>
</body>
"""
# cElementTree specific tests
def sanity():
"""
@ -33,191 +14,21 @@ def sanity():
>>> from xml.etree import cElementTree
"""
def check_method(method):
if not hasattr(method, '__call__'):
print method, "not callable"
def serialize(ET, elem, encoding=None):
import StringIO
file = StringIO.StringIO()
tree = ET.ElementTree(elem)
if encoding:
tree.write(file, encoding)
else:
tree.write(file)
return file.getvalue()
def summarize(elem):
return elem.tag
def summarize_list(seq):
return map(summarize, seq)
def interface():
"""
Test element tree interface.
>>> element = ET.Element("tag", key="value")
>>> tree = ET.ElementTree(element)
Make sure all standard element methods exist.
>>> check_method(element.append)
>>> check_method(element.insert)
>>> check_method(element.remove)
>>> check_method(element.getchildren)
>>> check_method(element.find)
>>> check_method(element.findall)
>>> check_method(element.findtext)
>>> check_method(element.clear)
>>> check_method(element.get)
>>> check_method(element.set)
>>> check_method(element.keys)
>>> check_method(element.items)
>>> check_method(element.getiterator)
Basic method sanity checks.
>>> serialize(ET, element) # 1
'<tag key="value" />'
>>> subelement = ET.Element("subtag")
>>> element.append(subelement)
>>> serialize(ET, element) # 2
'<tag key="value"><subtag /></tag>'
>>> element.insert(0, subelement)
>>> serialize(ET, element) # 3
'<tag key="value"><subtag /><subtag /></tag>'
>>> element.remove(subelement)
>>> serialize(ET, element) # 4
'<tag key="value"><subtag /></tag>'
>>> element.remove(subelement)
>>> serialize(ET, element) # 5
'<tag key="value" />'
>>> element.remove(subelement)
Traceback (most recent call last):
ValueError: list.remove(x): x not in list
>>> serialize(ET, element) # 6
'<tag key="value" />'
"""
def find():
"""
Test find methods (including xpath syntax).
>>> elem = ET.XML(SAMPLE_XML)
>>> elem.find("tag").tag
'tag'
>>> ET.ElementTree(elem).find("tag").tag
'tag'
>>> elem.find("section/tag").tag
'tag'
>>> ET.ElementTree(elem).find("section/tag").tag
'tag'
>>> elem.findtext("tag")
'text'
>>> elem.findtext("tog")
>>> elem.findtext("tog", "default")
'default'
>>> ET.ElementTree(elem).findtext("tag")
'text'
>>> elem.findtext("section/tag")
'subtext'
>>> ET.ElementTree(elem).findtext("section/tag")
'subtext'
>>> summarize_list(elem.findall("tag"))
['tag', 'tag']
>>> summarize_list(elem.findall("*"))
['tag', 'tag', 'section']
>>> summarize_list(elem.findall(".//tag"))
['tag', 'tag', 'tag']
>>> summarize_list(elem.findall("section/tag"))
['tag']
>>> summarize_list(elem.findall("section//tag"))
['tag']
>>> summarize_list(elem.findall("section/*"))
['tag']
>>> summarize_list(elem.findall("section//*"))
['tag']
>>> summarize_list(elem.findall("section/.//*"))
['tag']
>>> summarize_list(elem.findall("*/*"))
['tag']
>>> summarize_list(elem.findall("*//*"))
['tag']
>>> summarize_list(elem.findall("*/tag"))
['tag']
>>> summarize_list(elem.findall("*/./tag"))
['tag']
>>> summarize_list(elem.findall("./tag"))
['tag', 'tag']
>>> summarize_list(elem.findall(".//tag"))
['tag', 'tag', 'tag']
>>> summarize_list(elem.findall("././tag"))
['tag', 'tag']
>>> summarize_list(ET.ElementTree(elem).findall("/tag"))
['tag', 'tag']
>>> summarize_list(ET.ElementTree(elem).findall("./tag"))
['tag', 'tag']
>>> elem = ET.XML(SAMPLE_XML_NS)
>>> summarize_list(elem.findall("tag"))
[]
>>> summarize_list(elem.findall("{http://effbot.org/ns}tag"))
['{http://effbot.org/ns}tag', '{http://effbot.org/ns}tag']
>>> summarize_list(elem.findall(".//{http://effbot.org/ns}tag"))
['{http://effbot.org/ns}tag', '{http://effbot.org/ns}tag', '{http://effbot.org/ns}tag']
"""
def parseliteral():
r"""
>>> element = ET.XML("<html><body>text</body></html>")
>>> ET.ElementTree(element).write(sys.stdout)
<html><body>text</body></html>
>>> element = ET.fromstring("<html><body>text</body></html>")
>>> ET.ElementTree(element).write(sys.stdout)
<html><body>text</body></html>
>>> print ET.tostring(element)
<html><body>text</body></html>
>>> print ET.tostring(element, "ascii")
<?xml version='1.0' encoding='ascii'?>
<html><body>text</body></html>
>>> _, ids = ET.XMLID("<html><body>text</body></html>")
>>> len(ids)
0
>>> _, ids = ET.XMLID("<html><body id='body'>text</body></html>")
>>> len(ids)
1
>>> ids["body"].tag
'body'
"""
def check_encoding(encoding):
"""
>>> check_encoding("ascii")
>>> check_encoding("us-ascii")
>>> check_encoding("iso-8859-1")
>>> check_encoding("iso-8859-15")
>>> check_encoding("cp437")
>>> check_encoding("mac-roman")
"""
ET.XML(
"<?xml version='1.0' encoding='%s'?><xml />" % encoding
)
def bug_1534630():
"""
>>> bob = ET.TreeBuilder()
>>> e = bob.data("data")
>>> e = bob.start("tag", {})
>>> e = bob.end("tag")
>>> e = bob.close()
>>> serialize(ET, e)
'<tag />'
"""
def test_main():
from test import test_xml_etree_c
from test import test_xml_etree, test_xml_etree_c
# Run the tests specific to the C implementation
test_support.run_doctest(test_xml_etree_c, verbosity=True)
# Assign the C implementation before running the doctests
pyET = test_xml_etree.ET
test_xml_etree.ET = cET
try:
# Run the same test suite as xml.etree.ElementTree
test_xml_etree.test_main(module_name='xml.etree.cElementTree')
finally:
test_xml_etree.ET = pyET
if __name__ == '__main__':
test_main()

View File

@ -1,6 +1,6 @@
#
# ElementTree
# $Id: ElementInclude.py 1862 2004-06-18 07:31:02Z Fredrik $
# $Id: ElementInclude.py 3375 2008-02-13 08:05:08Z fredrik $
#
# limited xinclude support for element trees
#
@ -16,7 +16,7 @@
# --------------------------------------------------------------------
# The ElementTree toolkit is
#
# Copyright (c) 1999-2004 by Fredrik Lundh
# Copyright (c) 1999-2008 by Fredrik Lundh
#
# By obtaining, using, and/or copying this software and/or its
# associated documentation, you agree that you have read, understood,
@ -42,14 +42,14 @@
# --------------------------------------------------------------------
# Licensed to PSF under a Contributor Agreement.
# See http://www.python.org/2.4/license for licensing details.
# See http://www.python.org/psf/license for licensing details.
##
# Limited XInclude support for the ElementTree package.
##
import copy
import ElementTree
from . import ElementTree
XINCLUDE = "{http://www.w3.org/2001/XInclude}"

View File

@ -1,6 +1,6 @@
#
# ElementTree
# $Id: ElementPath.py 1858 2004-06-17 21:31:41Z Fredrik $
# $Id: ElementPath.py 3375 2008-02-13 08:05:08Z fredrik $
#
# limited xpath support for element trees
#
@ -8,8 +8,13 @@
# 2003-05-23 fl created
# 2003-05-28 fl added support for // etc
# 2003-08-27 fl fixed parsing of periods in element names
# 2007-09-10 fl new selection engine
# 2007-09-12 fl fixed parent selector
# 2007-09-13 fl added iterfind; changed findall to return a list
# 2007-11-30 fl added namespaces support
# 2009-10-30 fl added child element value filter
#
# Copyright (c) 2003-2004 by Fredrik Lundh. All rights reserved.
# Copyright (c) 2003-2009 by Fredrik Lundh. All rights reserved.
#
# fredrik@pythonware.com
# http://www.pythonware.com
@ -17,7 +22,7 @@
# --------------------------------------------------------------------
# The ElementTree toolkit is
#
# Copyright (c) 1999-2004 by Fredrik Lundh
# Copyright (c) 1999-2009 by Fredrik Lundh
#
# By obtaining, using, and/or copying this software and/or its
# associated documentation, you agree that you have read, understood,
@ -43,7 +48,7 @@
# --------------------------------------------------------------------
# Licensed to PSF under a Contributor Agreement.
# See http://www.python.org/2.4/license for licensing details.
# See http://www.python.org/psf/license for licensing details.
##
# Implementation module for XPath support. There's usually no reason
@ -53,146 +58,246 @@
import re
xpath_tokenizer = re.compile(
"(::|\.\.|\(\)|[/.*:\[\]\(\)@=])|((?:\{[^}]+\})?[^/:\[\]\(\)@=\s]+)|\s+"
).findall
class xpath_descendant_or_self:
pass
##
# Wrapper for a compiled XPath.
class Path:
##
# Create an Path instance from an XPath expression.
def __init__(self, path):
tokens = xpath_tokenizer(path)
# the current version supports 'path/path'-style expressions only
self.path = []
self.tag = None
if tokens and tokens[0][0] == "/":
raise SyntaxError("cannot use absolute path on element")
while tokens:
op, tag = tokens.pop(0)
if tag or op == "*":
self.path.append(tag or op)
elif op == ".":
pass
elif op == "/":
self.path.append(xpath_descendant_or_self())
continue
else:
raise SyntaxError("unsupported path syntax (%s)" % op)
if tokens:
op, tag = tokens.pop(0)
if op != "/":
raise SyntaxError(
"expected path separator (%s)" % (op or tag)
xpath_tokenizer_re = re.compile(
"("
"'[^']*'|\"[^\"]*\"|"
"::|"
"//?|"
"\.\.|"
"\(\)|"
"[/.*:\[\]\(\)@=])|"
"((?:\{[^}]+\})?[^/\[\]\(\)@=\s]+)|"
"\s+"
)
if self.path and isinstance(self.path[-1], xpath_descendant_or_self):
raise SyntaxError("path cannot end with //")
if len(self.path) == 1 and isinstance(self.path[0], type("")):
self.tag = self.path[0]
##
# Find first matching object.
def xpath_tokenizer(pattern, namespaces=None):
for token in xpath_tokenizer_re.findall(pattern):
tag = token[1]
if tag and tag[0] != "{" and ":" in tag:
try:
prefix, uri = tag.split(":", 1)
if not namespaces:
raise KeyError
yield token[0], "{%s}%s" % (namespaces[prefix], uri)
except KeyError:
raise SyntaxError("prefix %r not found in prefix map" % prefix)
else:
yield token
def find(self, element):
tag = self.tag
if tag is None:
nodeset = self.findall(element)
if not nodeset:
return None
return nodeset[0]
for elem in element:
if elem.tag == tag:
return elem
return None
def get_parent_map(context):
parent_map = context.parent_map
if parent_map is None:
context.parent_map = parent_map = {}
for p in context.root.iter():
for e in p:
parent_map[e] = p
return parent_map
##
# Find text for first matching object.
def prepare_child(next, token):
tag = token[1]
def select(context, result):
for elem in result:
for e in elem:
if e.tag == tag:
yield e
return select
def findtext(self, element, default=None):
tag = self.tag
if tag is None:
nodeset = self.findall(element)
if not nodeset:
return default
return nodeset[0].text or ""
for elem in element:
if elem.tag == tag:
return elem.text or ""
return default
def prepare_star(next, token):
def select(context, result):
for elem in result:
for e in elem:
yield e
return select
##
# Find all matching objects.
def prepare_self(next, token):
def select(context, result):
for elem in result:
yield elem
return select
def findall(self, element):
nodeset = [element]
index = 0
def prepare_descendant(next, token):
token = next()
if token[0] == "*":
tag = "*"
elif not token[0]:
tag = token[1]
else:
raise SyntaxError("invalid descendant")
def select(context, result):
for elem in result:
for e in elem.iter(tag):
if e is not elem:
yield e
return select
def prepare_parent(next, token):
def select(context, result):
# FIXME: raise error if .. is applied at toplevel?
parent_map = get_parent_map(context)
result_map = {}
for elem in result:
if elem in parent_map:
parent = parent_map[elem]
if parent not in result_map:
result_map[parent] = None
yield parent
return select
def prepare_predicate(next, token):
# FIXME: replace with real parser!!! refs:
# http://effbot.org/zone/simple-iterator-parser.htm
# http://javascript.crockford.com/tdop/tdop.html
signature = []
predicate = []
while 1:
token = next()
if token[0] == "]":
break
if token[0] and token[0][:1] in "'\"":
token = "'", token[0][1:-1]
signature.append(token[0] or "-")
predicate.append(token[1])
signature = "".join(signature)
# use signature to determine predicate type
if signature == "@-":
# [@attribute] predicate
key = predicate[1]
def select(context, result):
for elem in result:
if elem.get(key) is not None:
yield elem
return select
if signature == "@-='":
# [@attribute='value']
key = predicate[1]
value = predicate[-1]
def select(context, result):
for elem in result:
if elem.get(key) == value:
yield elem
return select
if signature == "-" and not re.match("\d+$", predicate[0]):
# [tag]
tag = predicate[0]
def select(context, result):
for elem in result:
if elem.find(tag) is not None:
yield elem
return select
if signature == "-='" and not re.match("\d+$", predicate[0]):
# [tag='value']
tag = predicate[0]
value = predicate[-1]
def select(context, result):
for elem in result:
for e in elem.findall(tag):
if "".join(e.itertext()) == value:
yield elem
break
return select
if signature == "-" or signature == "-()" or signature == "-()-":
# [index] or [last()] or [last()-index]
if signature == "-":
index = int(predicate[0]) - 1
else:
if predicate[0] != "last":
raise SyntaxError("unsupported function")
if signature == "-()-":
try:
path = self.path[index]
index = index + 1
except IndexError:
return nodeset
set = []
if isinstance(path, xpath_descendant_or_self):
index = int(predicate[2]) - 1
except ValueError:
raise SyntaxError("unsupported expression")
else:
index = -1
def select(context, result):
parent_map = get_parent_map(context)
for elem in result:
try:
tag = self.path[index]
if not isinstance(tag, type("")):
tag = None
else:
index = index + 1
except IndexError:
tag = None # invalid path
for node in nodeset:
new = list(node.getiterator(tag))
if new and new[0] is node:
set.extend(new[1:])
else:
set.extend(new)
else:
for node in nodeset:
for node in node:
if path == "*" or node.tag == path:
set.append(node)
if not set:
return []
nodeset = set
parent = parent_map[elem]
# FIXME: what if the selector is "*" ?
elems = list(parent.findall(elem.tag))
if elems[index] is elem:
yield elem
except (IndexError, KeyError):
pass
return select
raise SyntaxError("invalid predicate")
ops = {
"": prepare_child,
"*": prepare_star,
".": prepare_self,
"..": prepare_parent,
"//": prepare_descendant,
"[": prepare_predicate,
}
_cache = {}
##
# (Internal) Compile path.
class _SelectorContext:
parent_map = None
def __init__(self, root):
self.root = root
def _compile(path):
p = _cache.get(path)
if p is not None:
return p
p = Path(path)
if len(_cache) >= 100:
# --------------------------------------------------------------------
##
# Generate all matching objects.
def iterfind(elem, path, namespaces=None):
# compile selector pattern
if path[-1:] == "/":
path = path + "*" # implicit all (FIXME: keep this?)
try:
selector = _cache[path]
except KeyError:
if len(_cache) > 100:
_cache.clear()
_cache[path] = p
return p
if path[:1] == "/":
raise SyntaxError("cannot use absolute path on element")
next = iter(xpath_tokenizer(path, namespaces)).next
token = next()
selector = []
while 1:
try:
selector.append(ops[token[0]](next, token))
except StopIteration:
raise SyntaxError("invalid path")
try:
token = next()
if token[0] == "/":
token = next()
except StopIteration:
break
_cache[path] = selector
# execute selector pattern
result = [elem]
context = _SelectorContext(elem)
for select in selector:
result = select(context, result)
return result
##
# Find first matching object.
def find(element, path):
return _compile(path).find(element)
##
# Find text for first matching object.
def findtext(element, path, default=None):
return _compile(path).findtext(element, default)
def find(elem, path, namespaces=None):
try:
return iterfind(elem, path, namespaces).next()
except StopIteration:
return None
##
# Find all matching objects.
def findall(element, path):
return _compile(path).findall(element)
def findall(elem, path, namespaces=None):
return list(iterfind(elem, path, namespaces))
##
# Find text for first matching object.
def findtext(elem, path, default=None, namespaces=None):
try:
elem = iterfind(elem, path, namespaces).next()
return elem.text or ""
except StopIteration:
return default

File diff suppressed because it is too large Load Diff

View File

@ -1,10 +1,10 @@
# $Id: __init__.py 1821 2004-06-03 16:57:49Z fredrik $
# $Id: __init__.py 3375 2008-02-13 08:05:08Z fredrik $
# elementtree package
# --------------------------------------------------------------------
# The ElementTree toolkit is
#
# Copyright (c) 1999-2004 by Fredrik Lundh
# Copyright (c) 1999-2008 by Fredrik Lundh
#
# By obtaining, using, and/or copying this software and/or its
# associated documentation, you agree that you have read, understood,
@ -30,4 +30,4 @@
# --------------------------------------------------------------------
# Licensed to PSF under a Contributor Agreement.
# See http://www.python.org/2.4/license for licensing details.
# See http://www.python.org/psf/license for licensing details.

View File

@ -24,6 +24,9 @@ Core and Builtins
Library
-------
- Issue #6472: The xml.etree package is updated to ElementTree 1.3. The
cElementTree module is updated too.
- Issue #7880: Fix sysconfig when the python executable is a symbolic link.
- Issue #7624: Fix isinstance(foo(), collections.Callable) for old-style

File diff suppressed because it is too large Load Diff