cpython

Commit Graph

Author	SHA1	Message	Date
Fred Drake	a136210a9f	SF bug #1504333 : sgmlib should allow angle brackets in quoted values (modified patch by Sam Ruby; changed to use separate REs for start and end tags to reduce matching cost for end tags; extended tests; updated to avoid breaking previous changes to support IPv6 addresses in unquoted attribute values)	2006-06-29 00:51:53 +00:00
Fred Drake	2f99da636b	- SF bug #853506 : IP6 address parsing in sgmllib ('[' and ']' were not accepted in unquoted attribute values) - cleaned up tests of character and entity reference decoding so the tests cover the documented relationships among handle_charref, handle_entityref, convert_charref, convert_codepoint, and convert_entityref, without bringing up Unicode issues that sgmllib cannot be involved in	2006-06-23 06:03:45 +00:00
Fred Drake	541660553d	fix change that broke the htmllib tests	2006-06-17 01:07:54 +00:00
Fred Drake	fab461a4b5	SF patch 1504676: Make sgmllib char and entity references pluggable (implementation/tests contributed by Sam Ruby)	2006-06-16 23:45:06 +00:00
Fred Drake	6ce9fe880b	explain an XXX in more detail	2006-06-14 05:15:51 +00:00
Tim Peters	480725d4c5	Whitespace normalization.	2006-04-03 02:46:44 +00:00
Georg Brandl	7f6b67c235	patch #1462498 : handle entityrefs in attribute values.	2006-04-01 08:35:18 +00:00
Fred Drake	58ae830fd0	add name that should be considered public to __all__	2004-09-09 01:49:58 +00:00
Walter Dörwald	70a6b49821	Replace backticks with repr() or "%r" From SF patch #852334.	2004-02-12 17:35:32 +00:00
Martin v. Löwis	dc14ab13c4	Patch #793559 : Reset __starttext_tag. Fixes #709491 . Backported to 2.3.	2003-09-20 10:58:38 +00:00
Fred Drake	75ab1462d5	Allow "@" in unquoted attribute values. Added test that checks for characters allowed in the query part of URLs. Backport candidate.	2003-04-29 22:12:55 +00:00
Tim Peters	0eadaac7dc	Whitespace normalization.	2003-04-24 16:02:54 +00:00
Martin v. Löwis	3163a3b4b2	Patch #545300 : Support marked sections.	2003-03-30 14:25:40 +00:00
Fred Drake	0834d77bc4	Accept commas in unquoted attribute values. This closes SF patch #669683.	2003-03-14 16:21:57 +00:00
Raymond Hettinger	f13eb55d59	Replace boolean test with is None.	2002-06-02 00:40:05 +00:00
Raymond Hettinger	54f0222547	SF 563203. Replaced 'has_key()' with 'in'.	2002-06-01 14:18:47 +00:00
Fred Drake	5445f078df	Re-arrange things and remove some unused variables/imports to keep pychecker happy. (This does not cover everything it complained about, though.)	2001-10-26 18:02:28 +00:00
Fred Drake	a3bae3369c	Re-factor the SGMLParser class to use the new markupbase.ParserBase class. Use a new internal method, error(), consistently to raise parse errors; the new base class also uses this. Adjust the parse_comment() method to return the new offset into the buffer instead of the number of characters scanned; this was the only helper method that did it this way, so we have better consistency now. Required to share the new base class. This fixes SF bug #448482 and #453706.	2001-09-24 20:15:51 +00:00
Martin v. Löwis	02d893cfae	Patch #444359 : Remove unused imports.	2001-08-02 07:15:29 +00:00
Fred Drake	390e9dbd4f	Make the new docstrings better conform to Guido's style guide.	2001-07-19 20:57:23 +00:00
Fred Drake	08f8dd6d0c	Added docstrings based on a patch by Evelyn Mitchell. This closes SF patch #440153.	2001-07-19 20:08:04 +00:00
Fred Drake	fb38c76e0f	In CDATA mode, make sure entity-reference syntax is not interpreted; entity references are not allowed in that mode. Do a better job of scanning <!DOCTYPE ...> declarations; based on the code in HTMLParser.py.	2001-07-16 18:30:35 +00:00
Fred Drake	8600b47b61	Be more permissive in what is accepted as an attribute name; this makes this module slightly more resiliant in the face of XHTML input, or just colons in attribute names.	2001-07-14 05:50:33 +00:00
Fred Drake	dc19163b18	Allow underscores in tag names and quote characters in unquoted attribute values. The change for attribute values matches the way Mozilla and Navigator view the world, at least. This closes SF bug #436621.	2001-07-05 18:21:57 +00:00
Guido van Rossum	39d345127e	parse_declaration(): be more lenient in what we accept. We now basically accept <!...> where the dots can be single- or double-quoted strings or any other character except >. Background: I found a real-life example that failed to parse with the old assumption: http://www.opensource.org/licenses/jabberpl.html contains a few constructs of the form <![if !supportLists]>...<![endif]>.	2001-05-21 20:17:17 +00:00
Guido van Rossum	74cde5bb3e	Fix typo in exception name (SGMLParserError should be SGMLParseError) found by Neil Norwitz's PyChecker.	2001-04-15 13:01:41 +00:00
Fred Drake	669573726b	Change RuntimeError to SGMLParseError, which subclasses RuntimeError for backward compatibility. Add support for SGML declaration syntax (<!....>) to some reasonable degree. This does not support everything allowed in SGML, but should work with "real" HTML (internal subset in a DOCTYPE is not handled). The content of the declaration is passed to the .handle_decl() method, which can be overridden by subclasses.	2001-03-16 20:04:57 +00:00
Fred Drake	62dfed96be	Change "[%s]" % string.whitespace to r"\s" in regular expressions.	2001-03-14 16:18:56 +00:00
Guido van Rossum	b68c245662	SF Patch # 103839 byt dougfort: Allow ';' in attributes sgmllib does not recognize HTML attributes containing the semicolon ';' character. This may be in accordance with the HTML spec, but there are sites that use it (excite.com) and the browsers I regularly use (IE5, Netscape, Opera) all handle it. Doug Fort Downright Software LLC	2001-02-19 18:39:09 +00:00
Skip Montanaro	0de65807e6	bunch more __all__ lists also modified check_all function to suppress all warnings since they aren't relevant to what this test is doing (allows quiet checking of regsub, for instance)	2001-02-15 22:15:14 +00:00
Eric S. Raymond	18af564bef	Use ValueError instead of string.atoi.error, since we've switched to int().	2001-02-09 10:12:19 +00:00
Eric S. Raymond	1b645e8cd3	String method conversion.	2001-02-09 07:49:30 +00:00
Tim Peters	495ad3c8cc	Whitespace normalization.	2001-01-15 01:36:40 +00:00
Fred Drake	8152d32375	Update the code to better reflect recommended style: Use != instead of <> since <> is documented as "obsolescent". Use "is" and "is not" when comparing with None or type objects.	2000-12-12 23:20:45 +00:00
Fred Drake	b46696c0ed	[Old patch that hadn't been checked in.] get_starttag_text(): New method. Return the text of the most recently parsed start tag, from the '<' to the '>' or '/'. Not really useful for structure processing, but requested for Web-related use. May also be useful for being able to re-generate the input from the parse events, but there's no equivalent for end tags. attrfind: Be a little more forgiving of unquoted attribute values.	2000-06-29 18:50:59 +00:00
Jeremy Hylton	a05e293a21	typos fixed by Rob Hooft	2000-06-28 14:48:01 +00:00
Guido van Rossum	e7b146fb3b	The third and final doc-string sweep by Ka-Ping Yee. The attached patches update the standard library so that all modules have docstrings beginning with one-line summaries. A new docstring was added to formatter. The docstring for os.py was updated to mention nt, os2, ce in addition to posix, dos, mac.	2000-02-04 15:28:42 +00:00
Fred Drake	dfd8954e36	Allow recognition of attributes even if they don't have space in front of them. I.e., '<a name="foo"href="bar.html">' will now have two attributes recognized. Based on comments from newgroup.	1999-01-25 21:57:07 +00:00
Guido van Rossum	5fdf85254c	Patch by Chris Herborth (posted to comp.lang.python)to make it behave with tags that have - or . in their names.	1998-08-24 20:59:13 +00:00
Guido van Rossum	b84ef9bc61	Put back the call to report_unbalanced() that was lost when parse_endtag() was restructured in parse_endtag() and finish_endtag().	1998-07-07 22:46:11 +00:00
Guido van Rossum	1ad00717fb	Patch by Lars Marius Garshol: - Handle <? processing instructions >. - Allow . and - in entity names. Also fixed an oversight in the previous fix (in one place, [ \t\r\n] was used instead of string.whitespace).	1998-05-28 22:48:53 +00:00
Fred Drake	de2f708299	Fix regexp for attrfind; bug reported by Lars Marius Garshol <larsga@ifi.uio.no>.	1998-04-16 21:04:26 +00:00
Guido van Rossum	45e2fbc2e7	Mass check-in after untabifying all files that need it.	1998-03-26 21:13:24 +00:00
Guido van Rossum	1fef181183	Although it's hard to be sure, I think this is a working conversion from regex to re style regular expressions. This should make sgmllib and htmllib threadsafe, so I can now create a threaded version of webchecker...	1997-10-23 19:09:21 +00:00
Fred Drake	09bcf8c031	(sgmllib.py): Partial acceptance of patch from David Leonard <leonard@dstc.edu.au>; allows hyphen and period in the middle of attribute names. Still not allowed as first character; as first character these are illegal in the Reference Concrete Syntax, and we've not identified any use of these characters as the first char in an attribute name in deployment on the web.	1996-12-16 21:56:27 +00:00
Guido van Rossum	48766512a0	Reformatted with 4-space tab stops. Allow '=' and '~' in unquoted attribute values. Added overridable methods handle_starttag(tag, method, attrs) and handle_endtag(tag, method) so subclasses can decide whether they really want to call the method (e.g. when suppressing some portion of the document). Added support for a number of SGML shortcuts: shorthand full notation <tag>...<>... <tag>...<tag>... <tag>...</> <tag>...</tag> <tag/.../ <tag>...</tag> <tag1<tag2> <tag1><tag2> </tag1</tag2> </tag1></tag2> </tag1<tag2> </tag1><tag2> This required factoring out some common actions and rationalizing the interface to parse_endtag(), so as to make the code more readable. Fixed syntax for &entity and &#char references so the trailing semicolon is optional; removed explicit support for trailing period (which was a TBL mistake in HTML 0.0). Generalized the test program. Tried to speed things up a little. (More to come after the profile results are in.) Fix error recovery: call the end methods popped from the stack instead of the one that triggers. (Plus some complications because of the way HTML extensions are handled in Grail.)	1996-03-28 18:45:04 +00:00
Guido van Rossum	650ba37e1d	typos in attrfind regex	1995-10-06 15:30:28 +00:00
Guido van Rossum	e3d9320fc5	allow _ in attr names (Netscape!)	1995-09-30 16:49:36 +00:00
Guido van Rossum	3c0bfd0dee	fix <!...!> parsing; added verbose option; don't lowercase entityrefs	1995-09-22 00:54:32 +00:00
Guido van Rossum	cf9e27c72e	support value-less attributes, using regex.group()	1995-09-01 20:34:29 +00:00

1 2

55 Commits