Commit Graph

10 Commits

Author SHA1 Message Date
Guido van Rossum 48766512a0 Reformatted with 4-space tab stops.
Allow '=' and '~' in unquoted attribute values.

Added overridable methods handle_starttag(tag, method, attrs) and
handle_endtag(tag, method) so subclasses can decide whether they
really want to call the method (e.g. when suppressing some portion of
the document).

Added support for a number of SGML shortcuts:

        shorthand               full notation
        <tag>...<>...           <tag>...<tag>...
        <tag>...</>             <tag>...</tag>
        <tag/.../               <tag>...</tag>
        <tag1<tag2>             <tag1><tag2>
        </tag1</tag2>           </tag1></tag2>
        </tag1<tag2>            </tag1><tag2>

This required factoring out some common actions and rationalizing the
interface to parse_endtag(), so as to make the code more readable.

Fixed syntax for &entity and &#char references so the trailing
semicolon is optional; removed explicit support for trailing period
(which was a TBL mistake in HTML 0.0).

Generalized the test program.

Tried to speed things up a little.  (More to come after the profile
results are in.)

Fix error recovery: call the end methods popped from the stack instead
of the one that triggers.  (Plus some complications because of the way
HTML extensions are handled in Grail.)
1996-03-28 18:45:04 +00:00
Guido van Rossum 650ba37e1d typos in attrfind regex 1995-10-06 15:30:28 +00:00
Guido van Rossum e3d9320fc5 allow _ in attr names (Netscape!) 1995-09-30 16:49:36 +00:00
Guido van Rossum 3c0bfd0dee fix <!...!> parsing; added verbose option; don't lowercase entityrefs 1995-09-22 00:54:32 +00:00
Guido van Rossum cf9e27c72e support value-less attributes, using regex.group() 1995-09-01 20:34:29 +00:00
Guido van Rossum eae892d232 added note about missing features 1995-08-10 19:43:53 +00:00
Guido van Rossum 145b2e0168 changed comment parsing 1995-08-04 04:22:39 +00:00
Guido van Rossum efe5ac404f make reporting unbalanced tags an overridable method 1995-06-22 18:56:36 +00:00
Guido van Rossum 1dba24eeca remove redundant backslashes; some cosnetics 1995-03-04 22:28:49 +00:00
Guido van Rossum 7c750e1e09 added html parser and supporting cast 1995-02-27 13:16:55 +00:00