Commit Graph

16 Commits

Author SHA1 Message Date
Ezio Melotti 15cb489234 #13358: HTMLParser now calls handle_data only once for each CDATA. 2011-11-18 18:01:49 +02:00
Ezio Melotti c2fe57762b #1745761, #755670, #13357, #12629, #1200313: improve attribute handling in HTMLParser. 2011-11-14 18:53:33 +02:00
Ezio Melotti 7de56f6a04 #670664: Fix HTMLParser to correctly handle the content of ``<script>...</script>`` and ``<style>...</style>``. 2011-11-01 14:12:22 +02:00
Ezio Melotti f50ffa94ab #13273: fix a bug that prevented HTMLParser to properly detect some tags when strict=False. 2011-10-28 13:21:09 +03:00
Ezio Melotti d9e0b068af #12888: Fix a bug in HTMLParser.unescape that prevented it to escape more than 128 entities. Patch by Peter Otten. 2011-09-05 17:11:06 +03:00
Éric Araujo 51b7aedadd Merge 3.1 2011-05-25 18:13:49 +02:00
Éric Araujo 39f180bb1f Fix display of html.parser.HTMLParser.feed docstring 2011-05-04 15:55:47 +02:00
Ezio Melotti 2e3607c1e7 #7311: fix html.parser to accept non-ASCII attribute values. 2011-04-07 22:03:31 +03:00
Senthil Kumaran 6c85838489 Merged revisions 87542 via svnmerge from
svn+ssh://pythondev@svn.python.org/python/branches/py3k

........
  r87542 | senthil.kumaran | 2010-12-28 23:55:16 +0800 (Tue, 28 Dec 2010) | 3 lines

  Fix Issue10759 - html.parser.unescape() fails on HTML entities with incorrect syntax
........
2010-12-28 16:10:56 +00:00
Senthil Kumaran 164540fee1 Fix Issue10759 - html.parser.unescape() fails on HTML entities with incorrect syntax 2010-12-28 15:55:16 +00:00
R. David Murray b579dba119 #1486713: Add a tolerant mode to HTMLParser.
The motivation for adding this option is that the the functionality it
provides used to be provided by sgmllib in Python2, and was used by,
for example, BeautifulSoup.  Without this option, the Python3 version
of BeautifulSoup and the many programs that use it are crippled.

The original patch was by 'kxroberto'.  I modified it heavily but kept his
heuristics and test.  I also added additional heuristics to fix #975556,
#1046092, and part of #6191.  This patch should be completely backward
compatible:  the behavior with the default strict=True is unchanged.
2010-12-03 04:06:39 +00:00
Victor Stinner 30c223cff5 Merged revisions 81504 via svnmerge from
svn+ssh://pythondev@svn.python.org/python/branches/py3k

................
  r81504 | victor.stinner | 2010-05-24 23:46:25 +0200 (lun., 24 mai 2010) | 13 lines

  Recorded merge of revisions 81500-81501 via svnmerge from
  svn+ssh://pythondev@svn.python.org/python/trunk

  ........
    r81500 | victor.stinner | 2010-05-24 23:33:24 +0200 (lun., 24 mai 2010) | 2 lines

    Issue #6662: Fix parsing of malformatted charref (&#bad;)
  ........
    r81501 | victor.stinner | 2010-05-24 23:37:28 +0200 (lun., 24 mai 2010) | 2 lines

    Add the author of the last fix (Issue #6662)
  ........
................
2010-05-24 21:48:07 +00:00
Victor Stinner e021f4b206 Recorded merge of revisions 81500-81501 via svnmerge from
svn+ssh://pythondev@svn.python.org/python/trunk

........
  r81500 | victor.stinner | 2010-05-24 23:33:24 +0200 (lun., 24 mai 2010) | 2 lines

  Issue #6662: Fix parsing of malformatted charref (&#bad;)
........
  r81501 | victor.stinner | 2010-05-24 23:37:28 +0200 (lun., 24 mai 2010) | 2 lines

  Add the author of the last fix (Issue #6662)
........
2010-05-24 21:46:25 +00:00
Antoine Pitrou fd036451bf #2834: Change re module semantics, so that str and bytes mixing is forbidden,
and str (unicode) patterns get full unicode matching by default. The re.ASCII
flag is also introduced to ask for ASCII matching instead.
2008-08-19 17:56:33 +00:00
Mark Dickinson f64dcf3ce0 Change test_htmlparser to reflect the HTMLParser -> html.parser
rename in r63439.

Also fix one occurrence of unichr() in html.parser.
2008-05-21 13:51:18 +00:00
Fred Drake 3c50ea4303 rename HTMLParser to html.parser and htmlentitydefs to html.entities;
includes merge of trunk revision 63432
2008-05-17 22:02:32 +00:00