an attribute value was not escaped, you could get two syntax errors:
one about a missing semicolon and one about an unknown entity. Now
you get only one about a bogus ampersand.
- Actually count the linefeeds in a the CDATA content.
- Don't call the endtag handler for an unmatched endtag (this makes
the base class simpler since it doesn't have to deal with unopened
endtags).
- If the __init__ method is called with keyword argument
translate_attribute_references=0, don't attempt to translate
character and entity references in attribute values.
These two fixes were approved by me.
Peter Kropf:
There's a problem with the xmllib module when used with JPython. Specifically,
the JPython re module has trouble with the () characters in strings passed into
re.compile.
Spiros Papadimitriou:
I just downloaded xmllib.py ver. 0.3 from python.org and there
seems to be a slight typo: Line 654 ("tag = self.stack[-1][0]"
in parse_endtag), is indented one level more than it should be.
I just thought I'd let you know...
*this* set of patches is Ka-Ping's final sweep:
The attached patches update the standard library so that all modules
have docstrings beginning with one-line summaries.
A new docstring was added to formatter. The docstring for os.py
was updated to mention nt, os2, ce in addition to posix, dos, mac.
"""
Added some optional arguments to the XMLParser __init__ method to
specify that selected non-standard constructs are to be accepted.
Also removed the documentation for handle_entityrefs since it isn't
used.
"""
The version is incremented to 0.3.
all processing instruction target names containing 'xml' were
rejected, instead (as the standard rejects) only the name 'xml' itself
(or case variants thereof).
Fix leaking of instances by removing the elements variable that we
created on closing the parser. The elements variable is now created
in the reset() method, so that the sequence close(); reset();
... works.
Also, add the name of the entity reference that wasn't found to the
error message.
from Python 1.5.1:
If after __init__ finishes no new elements variable was created, this
patch will search the instance's namespace for all attributes whose
name start with start_ or end_ and put their value in a new elements
instance variable.
- Fixed a bug where a syntax error was reported when a document
started with white space. (White space at the start of a document
is valid if there is no XML declaration.)
- Improved the speed quite a bit for documents that don't make use of
namespaces.
Here is my current version of xmllib.py and the documentation. This
version has some API changes with respect to the version currently in
Python (also the one in 1.5.2a).
This version supports XML namespaces.
When literal mode is entered it should exit automatically when the
matching close tag of the last unclosed open tag is encountered. This
patch fixes this.
The main incompatibility is that the error reporting method is now
called as
parser.syntax_error(msg)
instead of
parser.syntax_error(lineno, msg)
This new version also has some code to deal with the <?xml?> and
<!DOCTYPE> tags at the start of an XML document.
The documentation has been updated, and a small test module has been
created.