Ezio Melotti
2e3607c1e7
#7311 : fix html.parser to accept non-ASCII attribute values.
2011-04-07 22:03:31 +03:00
Senthil Kumaran
6c85838489
Merged revisions 87542 via svnmerge from
...
svn+ssh://pythondev@svn.python.org/python/branches/py3k
........
r87542 | senthil.kumaran | 2010-12-28 23:55:16 +0800 (Tue, 28 Dec 2010) | 3 lines
Fix Issue10759 - html.parser.unescape() fails on HTML entities with incorrect syntax
........
2010-12-28 16:10:56 +00:00
Senthil Kumaran
164540fee1
Fix Issue10759 - html.parser.unescape() fails on HTML entities with incorrect syntax
2010-12-28 15:55:16 +00:00
R. David Murray
b579dba119
#1486713 : Add a tolerant mode to HTMLParser.
...
The motivation for adding this option is that the the functionality it
provides used to be provided by sgmllib in Python2, and was used by,
for example, BeautifulSoup. Without this option, the Python3 version
of BeautifulSoup and the many programs that use it are crippled.
The original patch was by 'kxroberto'. I modified it heavily but kept his
heuristics and test. I also added additional heuristics to fix #975556 ,
#1046092 , and part of #6191 . This patch should be completely backward
compatible: the behavior with the default strict=True is unchanged.
2010-12-03 04:06:39 +00:00
Georg Brandl
1f7fffb308
#2830 : add html.escape() helper and move cgi.escape() uses in the standard library to it. It defaults to quote=True and also escapes single quotes, which makes casual use safer. The cgi.escape() interface is not touched, but emits a (silent) PendingDeprecationWarning.
2010-10-15 15:57:45 +00:00
Victor Stinner
30c223cff5
Merged revisions 81504 via svnmerge from
...
svn+ssh://pythondev@svn.python.org/python/branches/py3k
................
r81504 | victor.stinner | 2010-05-24 23:46:25 +0200 (lun., 24 mai 2010) | 13 lines
Recorded merge of revisions 81500-81501 via svnmerge from
svn+ssh://pythondev@svn.python.org/python/trunk
........
r81500 | victor.stinner | 2010-05-24 23:33:24 +0200 (lun., 24 mai 2010) | 2 lines
Issue #6662 : Fix parsing of malformatted charref (&#bad;)
........
r81501 | victor.stinner | 2010-05-24 23:37:28 +0200 (lun., 24 mai 2010) | 2 lines
Add the author of the last fix (Issue #6662 )
........
................
2010-05-24 21:48:07 +00:00
Victor Stinner
e021f4b206
Recorded merge of revisions 81500-81501 via svnmerge from
...
svn+ssh://pythondev@svn.python.org/python/trunk
........
r81500 | victor.stinner | 2010-05-24 23:33:24 +0200 (lun., 24 mai 2010) | 2 lines
Issue #6662 : Fix parsing of malformatted charref (&#bad;)
........
r81501 | victor.stinner | 2010-05-24 23:37:28 +0200 (lun., 24 mai 2010) | 2 lines
Add the author of the last fix (Issue #6662 )
........
2010-05-24 21:46:25 +00:00
Antoine Pitrou
fd036451bf
#2834 : Change re module semantics, so that str and bytes mixing is forbidden,
...
and str (unicode) patterns get full unicode matching by default. The re.ASCII
flag is also introduced to ask for ASCII matching instead.
2008-08-19 17:56:33 +00:00
Mark Dickinson
f64dcf3ce0
Change test_htmlparser to reflect the HTMLParser -> html.parser
...
rename in r63439.
Also fix one occurrence of unichr() in html.parser.
2008-05-21 13:51:18 +00:00
Fred Drake
3c50ea4303
rename HTMLParser to html.parser and htmlentitydefs to html.entities;
...
includes merge of trunk revision 63432
2008-05-17 22:02:32 +00:00