- Handle <? processing instructions >.
- Allow . and - in entity names.
Also fixed an oversight in the previous fix (in one place, [ \t\r\n]
was used instead of string.whitespace).
<leonard@dstc.edu.au>; allows hyphen and period in the middle
of attribute names. Still not allowed as first character;
as first character these are illegal in the Reference Concrete
Syntax, and we've not identified any use of these characters as
the first char in an attribute name in deployment on the web.
Allow '=' and '~' in unquoted attribute values.
Added overridable methods handle_starttag(tag, method, attrs) and
handle_endtag(tag, method) so subclasses can decide whether they
really want to call the method (e.g. when suppressing some portion of
the document).
Added support for a number of SGML shortcuts:
shorthand full notation
<tag>...<>... <tag>...<tag>...
<tag>...</> <tag>...</tag>
<tag/.../ <tag>...</tag>
<tag1<tag2> <tag1><tag2>
</tag1</tag2> </tag1></tag2>
</tag1<tag2> </tag1><tag2>
This required factoring out some common actions and rationalizing the
interface to parse_endtag(), so as to make the code more readable.
Fixed syntax for &entity and &#char references so the trailing
semicolon is optional; removed explicit support for trailing period
(which was a TBL mistake in HTML 0.0).
Generalized the test program.
Tried to speed things up a little. (More to come after the profile
results are in.)
Fix error recovery: call the end methods popped from the stack instead
of the one that triggers. (Plus some complications because of the way
HTML extensions are handled in Grail.)