bpo-39011: Preserve line endings within ElementTree attributes (GH-18468)

* bpo-39011: Preserve line endings within attributes

Line endings within attributes were previously normalized to "\n" in Py3.7/3.8.
This patch removes that normalization, as line endings which were
replaced by entity numbers should be preserved in original form.
This commit is contained in:
mefistotelis 2020-04-12 14:51:58 +02:00 committed by GitHub
parent 8f87eefe7f
commit 5fd8123dfd
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
4 changed files with 22 additions and 9 deletions

View File

@ -412,6 +412,15 @@ customization consistently by always using the value specified by
case), and one used ``__VENV_NAME__`` instead. case), and one used ``__VENV_NAME__`` instead.
(Contributed by Brett Cannon in :issue:`37663`.) (Contributed by Brett Cannon in :issue:`37663`.)
xml
---
White space characters within attributes are now preserved when serializing
:mod:`xml.etree.ElementTree` to XML file. EOLNs are no longer normalized
to "\n". This is the result of discussion about how to interpret
section 2.11 of XML spec.
(Contributed by Mefistotelis in :issue:`39011`.)
Optimizations Optimizations
============= =============

View File

@ -430,13 +430,14 @@ class ElementTreeTest(unittest.TestCase):
self.assertEqual(ET.tostring(elem), self.assertEqual(ET.tostring(elem),
b'<test testa="testval" testb="test1" testc="test2">aa</test>') b'<test testa="testval" testb="test1" testc="test2">aa</test>')
# Test preserving white space chars in attributes
elem = ET.Element('test') elem = ET.Element('test')
elem.set('a', '\r') elem.set('a', '\r')
elem.set('b', '\r\n') elem.set('b', '\r\n')
elem.set('c', '\t\n\r ') elem.set('c', '\t\n\r ')
elem.set('d', '\n\n') elem.set('d', '\n\n\r\r\t\t ')
self.assertEqual(ET.tostring(elem), self.assertEqual(ET.tostring(elem),
b'<test a="&#10;" b="&#10;" c="&#09;&#10;&#10; " d="&#10;&#10;" />') b'<test a="&#13;" b="&#13;&#10;" c="&#09;&#10;&#13; " d="&#10;&#10;&#13;&#13;&#09;&#09; " />')
def test_makeelement(self): def test_makeelement(self):
# Test makeelement handling. # Test makeelement handling.

View File

@ -1057,15 +1057,15 @@ def _escape_attrib(text):
text = text.replace(">", "&gt;") text = text.replace(">", "&gt;")
if "\"" in text: if "\"" in text:
text = text.replace("\"", "&quot;") text = text.replace("\"", "&quot;")
# The following business with carriage returns is to satisfy # Although section 2.11 of the XML specification states that CR or
# Section 2.11 of the XML specification, stating that # CR LN should be replaced with just LN, it applies only to EOLNs
# CR or CR LN should be replaced with just LN # which take part of organizing file into lines. Within attributes,
# we are replacing these with entity numbers, so they do not count.
# http://www.w3.org/TR/REC-xml/#sec-line-ends # http://www.w3.org/TR/REC-xml/#sec-line-ends
if "\r\n" in text: # The current solution, contained in following six lines, was
text = text.replace("\r\n", "\n") # discussed in issue 17582 and 39011.
if "\r" in text: if "\r" in text:
text = text.replace("\r", "\n") text = text.replace("\r", "&#13;")
#The following four lines are issue 17582
if "\n" in text: if "\n" in text:
text = text.replace("\n", "&#10;") text = text.replace("\n", "&#10;")
if "\t" in text: if "\t" in text:

View File

@ -0,0 +1,3 @@
Normalization of line endings in ElementTree attributes was removed, as line
endings which were replaced by entity numbers should be preserved in
original form.