Commit Graph

14 Commits

Author SHA1 Message Date
Fred Drake de2f708299 Fix regexp for attrfind; bug reported by Lars Marius Garshol
<larsga@ifi.uio.no>.
1998-04-16 21:04:26 +00:00
Guido van Rossum 45e2fbc2e7 Mass check-in after untabifying all files that need it. 1998-03-26 21:13:24 +00:00
Guido van Rossum 1fef181183 Although it's hard to be sure, I *think* this is a working conversion
from regex to re style regular expressions.  This should make sgmllib
and htmllib threadsafe, so I can now create a threaded version of
webchecker...
1997-10-23 19:09:21 +00:00
Fred Drake 09bcf8c031 (sgmllib.py): Partial acceptance of patch from David Leonard
<leonard@dstc.edu.au>; allows hyphen and period in the middle
	of attribute names.  Still not allowed as first character;
	as first character these are illegal in the Reference Concrete
	Syntax, and we've not identified any use of these characters as
	the first char in an attribute name in deployment on the web.
1996-12-16 21:56:27 +00:00
Guido van Rossum 48766512a0 Reformatted with 4-space tab stops.
Allow '=' and '~' in unquoted attribute values.

Added overridable methods handle_starttag(tag, method, attrs) and
handle_endtag(tag, method) so subclasses can decide whether they
really want to call the method (e.g. when suppressing some portion of
the document).

Added support for a number of SGML shortcuts:

        shorthand               full notation
        <tag>...<>...           <tag>...<tag>...
        <tag>...</>             <tag>...</tag>
        <tag/.../               <tag>...</tag>
        <tag1<tag2>             <tag1><tag2>
        </tag1</tag2>           </tag1></tag2>
        </tag1<tag2>            </tag1><tag2>

This required factoring out some common actions and rationalizing the
interface to parse_endtag(), so as to make the code more readable.

Fixed syntax for &entity and &#char references so the trailing
semicolon is optional; removed explicit support for trailing period
(which was a TBL mistake in HTML 0.0).

Generalized the test program.

Tried to speed things up a little.  (More to come after the profile
results are in.)

Fix error recovery: call the end methods popped from the stack instead
of the one that triggers.  (Plus some complications because of the way
HTML extensions are handled in Grail.)
1996-03-28 18:45:04 +00:00
Guido van Rossum 650ba37e1d typos in attrfind regex 1995-10-06 15:30:28 +00:00
Guido van Rossum e3d9320fc5 allow _ in attr names (Netscape!) 1995-09-30 16:49:36 +00:00
Guido van Rossum 3c0bfd0dee fix <!...!> parsing; added verbose option; don't lowercase entityrefs 1995-09-22 00:54:32 +00:00
Guido van Rossum cf9e27c72e support value-less attributes, using regex.group() 1995-09-01 20:34:29 +00:00
Guido van Rossum eae892d232 added note about missing features 1995-08-10 19:43:53 +00:00
Guido van Rossum 145b2e0168 changed comment parsing 1995-08-04 04:22:39 +00:00
Guido van Rossum efe5ac404f make reporting unbalanced tags an overridable method 1995-06-22 18:56:36 +00:00
Guido van Rossum 1dba24eeca remove redundant backslashes; some cosnetics 1995-03-04 22:28:49 +00:00
Guido van Rossum 7c750e1e09 added html parser and supporting cast 1995-02-27 13:16:55 +00:00