This fixes XML unittest fallout from the https://github.com/python/cpython/issues/115398 security fix. When configured using `--with-system-expat` on systems with older pre 2.6.0 versions of libexpat, our unittests were failing.
* sax|etree: Simplify Expat version guard where simplifiable
Idea by Matěj Cepl
* sax|etree: Fix reparse deferral tests for vanilla Expat <2.6.0
This *does not fix* the case of distros with an older version of libexpat with the 2.6.0 feature backported as a security fix. (Ubuntu is a known example of this with its libexpat1 2.5.0-2ubunutu0.1 package)
Allow controlling Expat >=2.6.0 reparse deferral (CVE-2023-52425) by adding five new methods:
- `xml.etree.ElementTree.XMLParser.flush`
- `xml.etree.ElementTree.XMLPullParser.flush`
- `xml.parsers.expat.xmlparser.GetReparseDeferralEnabled`
- `xml.parsers.expat.xmlparser.SetReparseDeferralEnabled`
- `xml.sax.expatreader.ExpatParser.flush`
Based on the "flush" idea from https://github.com/python/cpython/pull/115138#issuecomment-1932444270 .
### Notes
- Please treat as a security fix related to CVE-2023-52425.
Includes code suggested-by: Snild Dolkow <snild@sony.com>
and by core dev Serhiy Storchaka.
Prior to gh-114269, the iterator returned by ElementTree.iterparse was
initialized with the root attribute as None. This restores the previous
behavior.
test_netrc, test_pep646_syntax and test_xml_etree now return results
in the test_main() function.
Changes:
* Rewrite TestResult as a dataclass with a new State class.
* Add test.support.TestStats class and Regrtest.stats_dict attribute.
* libregrtest.runtest functions now modify a TestResult instance
in-place.
* libregrtest summary lists the number of run tests and skipped
tests, and denied resources.
* Add TestResult.has_meaningful_duration() method.
* Compute TestResult duration in the upper function.
* Use time.perf_counter() instead of time.monotonic().
* Regrtest: rename 'resource_denieds' attribute to 'resource_denied'.
* Rename CHILD_ERROR to MULTIPROCESSING_ERROR.
* Use match/case syntadx to have different code depending on the
test state.
Co-authored-by: Alex Waygood <Alex.Waygood@Gmail.com>
When testing element truth values, emit a DeprecationWarning in all implementations.
This had emitted a FutureWarning in the rarely used python-only implementation since ~2.7 and has always been documented as a behavior not to rely on.
Matching an element in a tree search but having it test False can be unexpected. Raising the warning enables making the choice to finally raise an exception for this ambiguous behavior in the future.
The API documentation for [findtext](https://docs.python.org/3/library/xml.etree.elementtree.html#xml.etree.ElementTree.Element.findtext) states that this function gives back an empty string on "no text content." With the previous implementation, this would give back a empty string even on text content values such as 0 or False. This patch attempts to resolve that by only giving back an empty string if the text attribute is set to `None`. Resolves#91447.
Automerge-Triggered-By: GH:gvanrossum
xml.etree: Remove the ElementTree.Element.copy() method of the pure
Python implementation, deprecated in Python 3.10, use the copy.copy()
function instead. The C implementation of xml.etree has no copy()
method, only a __copy__() method.
Suppress writing an XML declaration in open files in ElementTree.write()
with encoding='unicode' and xml_declaration=None.
If file patch is passed to ElementTree.write() with encoding='unicode',
always open a new file in UTF-8.
ElementTree method write() and function tostring() now use the text file's
encoding ("UTF-8" if not available) instead of locale encoding in XML
declaration when encoding="unicode" is specified.
Copying and pickling instances of subclasses of builtin types
bytearray, set, frozenset, collections.OrderedDict, collections.deque,
weakref.WeakSet, and datetime.tzinfo now copies and pickles instance attributes
implemented as slots.
Curly brackets were never allowed in namespace URIs
according to RFC 3986, and so-called namespace-validating
XML parsers have the right to reject them a invalid URIs.
libexpat >=2.4.5 has become strcter in that regard due to
related security issues; with ET.XML instantiating a
namespace-aware parser under the hood, this test has no
future in CPython.
References:
- https://datatracker.ietf.org/doc/html/rfc3968
- https://www.w3.org/TR/xml-names/
Also, test_minidom.py: Support Expat >=2.4.5
* Work correctly if an additional fresh module imports other
additional fresh module which imports a blocked module.
* Raises ImportError if the specified module cannot be imported
while all additional fresh modules are successfully imported.
* Support blocking packages.
* Always restore the import state of fresh and blocked modules
and their submodules.
* Fix test_decimal and test_xml_etree which depended on an undesired
side effect of import_fresh_module().
* bpo-39011: Preserve line endings within attributes
Line endings within attributes were previously normalized to "\n" in Py3.7/3.8.
This patch removes that normalization, as line endings which were
replaced by entity numbers should be preserved in original form.
* bpo-20928: bring elementtree's XInclude support en-par with the implementation in lxml by adding support for recursive includes and a base-URL.
* bpo-20928: Support xincluding the same file multiple times, just not recursively.
* bpo-20928: Add 'max_depth' parameter to xinclude that limits the maximum recursion depth to 6 by default.
* Add news entry for updated ElementInclude support
* bpo-37399: Correctly attach tail text to the last element/comment/pi, even when comments or pis are discarded.
Also fixes the insertion of PIs when "insert_pis=True" is configured for a TreeBuilder.
* Implement C14N 2.0 as a new canonicalize() function in ElementTree.
Missing features:
- prefix renaming in XPath expressions (tag and attribute text is supported)
- preservation of original prefixes given redundant namespace declarations
* bpo-36673: Implement comment/PI parsing support for the TreeBuilder in ElementTree.
* bpo-36673: Rewrite the comment/PI factory handling for the TreeBuilder in "_elementtree" to make it use the same factories as the ElementTree module, and to make it explicit when the comments/PIs are inserted into the tree and when they are not (which is the default).