Commit Graph

17 Commits

Author SHA1 Message Date
Georg Brandl cd3c26a717 Reverting previous checkin. This breaks too much of HTMLParser to be applied
without thought. Anyway, such malformed HTML is better handled by something
like BeautifulSoup.
2005-09-01 06:25:34 +00:00
Georg Brandl 7847405a76 bug [ 761452 ] HTMLParser chokes on my.yahoo.com output 2005-08-31 22:08:45 +00:00
Fred Drake 49b4d19172 remove unnecessary override of base class method 2004-09-08 22:58:36 +00:00
Andrew M. Kuchling b7d8ce0275 [Bug #921657] Allow '@' in unquoted HTML attributes. Not strictly legal according to the HTML REC, but HTMLParser is already a pretty loose parser. Reported by Bernd Zimmermann. 2004-06-05 15:31:45 +00:00
Walter Dörwald 70a6b49821 Replace backticks with repr() or "%r"
From SF patch #852334.
2004-02-12 17:35:32 +00:00
Fred Drake 0834d77bc4 Accept commas in unquoted attribute values.
This closes SF patch #669683.
2003-03-14 16:21:57 +00:00
Fred Drake 30d59baecd Simplify code to remove an unnecessary test. 2002-05-14 15:50:11 +00:00
Fred Drake 248b04383f Convert to using string methods instead of the string module.
In goahead(), use a bound version of rawdata.startswith() since we use the
same method all the time and never change the value of rawdata.  This can
save a lot of bound method creation.
2001-12-03 17:09:50 +00:00
Fred Drake bfc8fea1e0 Re-factor the HTMLParser class to use the new markupbase.ParserBase class.
Use a new internal method, error(), consistently to raise parse errors;
the new base class also uses this.
2001-09-24 20:10:28 +00:00
Tim Peters b64bec3ec0 Whitespace normalization. 2001-09-18 02:26:39 +00:00
Fred Drake 7cf613dc77 HTMLParser is allowed to be more strict than sgmllib, so let's not
change their basic behavior:  When parsing something that cannot possibly
be valid in either HTML or XHTML, raise an exception.
2001-09-04 16:26:03 +00:00
Fred Drake 68eac2b574 Added reasonable parsing of the DOCTYPE declaration, fixed edge cases
regarding bare ampersands in content.
2001-09-04 15:10:16 +00:00
Fred Drake 029acfb922 Deal more appropriately with bare ampersands and pointy brackets; this
module has to deal with "class" HTML-as-deployed as well as XHTML, so we
cannot be as strict as XHTML allows.

This closes SF bug #453059, but uses a different fix than suggested in
the bug comments.
2001-08-20 21:24:19 +00:00
Fred Drake 1d4601d306 Change some comments into docstrings.
Fix handling of hexadecimal character references (legal in XHTML) so that
they are properly interpreted as character references.
This fixes SF bug #445196.
2001-08-03 19:50:59 +00:00
Fred Drake 1c48eb74c9 Merge my changes to the offending comment with Guido's changes. 2001-05-23 04:53:44 +00:00
Guido van Rossum 07f353c560 Removed incorrect comment left over from sgmllib.py. 2001-05-22 23:39:10 +00:00
Guido van Rossum 8846d7178b A much improved HTML parser -- a replacement for sgmllib. The API is
derived from but not quite compatible with that of sgmllib, so it's a
new file.  I suppose it needs documentation, and htmllib needs to be
changed to use this instead of sgmllib, and sgmllib needs to be
declared obsolete.  But that can all be done later.

This code was first published as part of TAL (part of Zope Page
Templates), but that was strongly based on sgmllib anyway.  Authors
are Fred drake and Guido van Rossum.
2001-05-18 14:50:52 +00:00