cpython/Tools/webchecker
Guido van Rossum e5605ba3c2 Many misc changes.
- Faster HTML parser derivede from SGMLparser (Fred Gansevles).

- All manipulations of todo, done, ext, bad are done via methods, so a
derived class can override.  Also moved the 'done' marking to
dopage(), so run() is much simpler.

- Added a method status() which returns a string containing the
summary counts; added a "total" count.

- Drop the guessing of the file type before opening the document -- we
still need to check those links for validity!

- Added a subroutine to close a connection which first slurps up the
remaining data when it's an ftp URL -- apparently closing an ftp
connection without reading till the end makes it hang.

- Added -n option to skip running (only useful with -R).

- The Checker object now has an instance variable which is set to 1
when it is changed.  This is not pickled.
1997-01-31 14:43:15 +00:00
..
README Basic README file 1997-01-30 03:24:00 +00:00
mimetypes.py mime types guesser 1997-01-30 02:44:20 +00:00
robotparser.py Skip Montanaro's robots.txt parser. 1997-01-30 03:18:23 +00:00
webchecker.py Many misc changes. 1997-01-31 14:43:15 +00:00

README

Webchecker
----------

This is a simple web tree checker, useful to find bad links in a web
tree.  It currently checks links pointing within the same subweb for
validity.  The main program is "webchecker.py".  See its doc string
(or invoke it with the option "-?") for more defails.

The module robotparser.py was written by Skip Montanaro; the rest is
original work.

Jan 29, 1997.

--Guido van Rossum (home page: http://www.python.org/~guido/)