e5605ba3c2
- Faster HTML parser derivede from SGMLparser (Fred Gansevles). - All manipulations of todo, done, ext, bad are done via methods, so a derived class can override. Also moved the 'done' marking to dopage(), so run() is much simpler. - Added a method status() which returns a string containing the summary counts; added a "total" count. - Drop the guessing of the file type before opening the document -- we still need to check those links for validity! - Added a subroutine to close a connection which first slurps up the remaining data when it's an ftp URL -- apparently closing an ftp connection without reading till the end makes it hang. - Added -n option to skip running (only useful with -R). - The Checker object now has an instance variable which is set to 1 when it is changed. This is not pickled. |
||
---|---|---|
.. | ||
README | ||
mimetypes.py | ||
robotparser.py | ||
webchecker.py |
README
Webchecker ---------- This is a simple web tree checker, useful to find bad links in a web tree. It currently checks links pointing within the same subweb for validity. The main program is "webchecker.py". See its doc string (or invoke it with the option "-?") for more defails. The module robotparser.py was written by Skip Montanaro; the rest is original work. Jan 29, 1997. --Guido van Rossum (home page: http://www.python.org/~guido/)