cpython

Commit Graph

Author	SHA1	Message	Date
Ezio Melotti	5a88853bdc	#20288 : fix handling of invalid numeric charrefs in HTMLParser.	2014-02-01 21:20:22 +02:00
Ezio Melotti	b814745226	#19480 : HTMLParser now accepts all valid start-tag names as defined by the HTML5 standard.	2013-11-07 18:31:36 +02:00
Ezio Melotti	c45868ec69	#14538 : HTMLParser can now parse correctly start tags that contain a bare /.	2012-04-18 19:08:41 -06:00
Ezio Melotti	36b7361fe7	HTMLParser is now able to handle slashes in the start tag.	2012-02-21 09:22:16 +02:00
Ezio Melotti	65d36dab4d	#13987 : HTMLParser is now able to handle malformed start tags.	2012-02-15 13:19:10 +02:00
Ezio Melotti	d2307cb48a	#13987 : HTMLParser is now able to handle EOFs in the middle of a construct.	2012-02-15 12:44:23 +02:00
Ezio Melotti	369cbd744e	Fix an index, add more tests, avoid raising errors for unknown declarations, and clean up comments.	2012-02-13 20:36:55 +02:00
Ezio Melotti	f117443cb8	#13993 : HTMLParser is now able to handle broken end tags.	2012-02-13 16:28:54 +02:00
Ezio Melotti	4b92cc3f79	#13960 : HTMLParser is now able to handle broken comments.	2012-02-13 16:10:44 +02:00
Ezio Melotti	00dc60beee	#13358 : HTMLParser now calls handle_data only once for each CDATA.	2011-11-18 18:00:40 +02:00
Ezio Melotti	0f1571ce7f	#1745761 , #755670 , #13357 , #12629 , #1200313 : improve attribute handling in HTMLParser.	2011-11-14 18:04:05 +02:00
Ezio Melotti	7e82b276dd	#670664 : Fix HTMLParser to correctly handle the content of ``<script>...</script>`` and ``<style>...</style>``.	2011-11-01 14:09:56 +02:00
Éric Araujo	31890bc9ba	Fix display of html.parser.HTMLParser.feed docstrin	2011-05-25 18:11:43 +02:00
Ezio Melotti	9f1ffb2ae9	#7311 : fix HTMLParser to accept non-ASCII attribute values.	2011-04-05 20:40:52 +03:00
Senthil Kumaran	3f60f09eb2	Fix Issue10759 - HTMLParser.unescape() to handle malform charrefs.	2010-12-28 16:05:07 +00:00
Victor Stinner	554a3b82e4	Issue #6662 : Fix parsing of malformatted charref (&#bad;)	2010-05-24 21:33:24 +00:00
Fred Drake	d995e1150c	revert creation of the html.entities and html.parser modules (http://bugs.python.org/issue2882)	2008-05-20 06:08:38 +00:00
Fred Drake	91ae250273	rename HTMLParser to html.parser, htmlentitydefs to html.entities (http://bugs.python.org/issue2882)	2008-05-17 20:30:04 +00:00
Martin v. Löwis	ab8a6bba25	Patch #912410 : Replace HTML entity references for attribute values in HTMLParser.	2007-03-06 14:43:00 +00:00
Georg Brandl	cd3c26a717	Reverting previous checkin. This breaks too much of HTMLParser to be applied without thought. Anyway, such malformed HTML is better handled by something like BeautifulSoup.	2005-09-01 06:25:34 +00:00
Georg Brandl	7847405a76	bug [ 761452 ] HTMLParser chokes on my.yahoo.com output	2005-08-31 22:08:45 +00:00
Fred Drake	49b4d19172	remove unnecessary override of base class method	2004-09-08 22:58:36 +00:00
Andrew M. Kuchling	b7d8ce0275	[Bug #921657 ] Allow '@' in unquoted HTML attributes. Not strictly legal according to the HTML REC, but HTMLParser is already a pretty loose parser. Reported by Bernd Zimmermann.	2004-06-05 15:31:45 +00:00
Walter Dörwald	70a6b49821	Replace backticks with repr() or "%r" From SF patch #852334.	2004-02-12 17:35:32 +00:00
Fred Drake	0834d77bc4	Accept commas in unquoted attribute values. This closes SF patch #669683.	2003-03-14 16:21:57 +00:00
Fred Drake	30d59baecd	Simplify code to remove an unnecessary test.	2002-05-14 15:50:11 +00:00
Fred Drake	248b04383f	Convert to using string methods instead of the string module. In goahead(), use a bound version of rawdata.startswith() since we use the same method all the time and never change the value of rawdata. This can save a lot of bound method creation.	2001-12-03 17:09:50 +00:00
Fred Drake	bfc8fea1e0	Re-factor the HTMLParser class to use the new markupbase.ParserBase class. Use a new internal method, error(), consistently to raise parse errors; the new base class also uses this.	2001-09-24 20:10:28 +00:00
Tim Peters	b64bec3ec0	Whitespace normalization.	2001-09-18 02:26:39 +00:00
Fred Drake	7cf613dc77	HTMLParser is allowed to be more strict than sgmllib, so let's not change their basic behavior: When parsing something that cannot possibly be valid in either HTML or XHTML, raise an exception.	2001-09-04 16:26:03 +00:00
Fred Drake	68eac2b574	Added reasonable parsing of the DOCTYPE declaration, fixed edge cases regarding bare ampersands in content.	2001-09-04 15:10:16 +00:00
Fred Drake	029acfb922	Deal more appropriately with bare ampersands and pointy brackets; this module has to deal with "class" HTML-as-deployed as well as XHTML, so we cannot be as strict as XHTML allows. This closes SF bug #453059, but uses a different fix than suggested in the bug comments.	2001-08-20 21:24:19 +00:00
Fred Drake	1d4601d306	Change some comments into docstrings. Fix handling of hexadecimal character references (legal in XHTML) so that they are properly interpreted as character references. This fixes SF bug #445196.	2001-08-03 19:50:59 +00:00
Fred Drake	1c48eb74c9	Merge my changes to the offending comment with Guido's changes.	2001-05-23 04:53:44 +00:00
Guido van Rossum	07f353c560	Removed incorrect comment left over from sgmllib.py.	2001-05-22 23:39:10 +00:00
Guido van Rossum	8846d7178b	A much improved HTML parser -- a replacement for sgmllib. The API is derived from but not quite compatible with that of sgmllib, so it's a new file. I suppose it needs documentation, and htmllib needs to be changed to use this instead of sgmllib, and sgmllib needs to be declared obsolete. But that can all be done later. This code was first published as part of TAL (part of Zope Page Templates), but that was strongly based on sgmllib anyway. Authors are Fred drake and Guido van Rossum.	2001-05-18 14:50:52 +00:00

36 Commits