Commit Graph

64 Commits

Author SHA1 Message Date
Guido van Rossum 6133ec656e Process <img> and <frame> tags. Don't bother skipping second href. 1997-02-01 05:16:08 +00:00
Guido van Rossum de99d310cc Check in another copy of tktools.py... 1997-01-31 18:58:53 +00:00
Guido van Rossum 06981c328d Tk interface to webchecker. Not fully featured yet, but usable. 1997-01-31 18:58:12 +00:00
Guido van Rossum 0b0b5f0279 Spin off checking of external page in a subroutine.
Increase MAXPAGE to 150K.
Add back printing of __doc__ for usage message.
1997-01-31 18:57:23 +00:00
Guido van Rossum e5605ba3c2 Many misc changes.
- Faster HTML parser derivede from SGMLparser (Fred Gansevles).

- All manipulations of todo, done, ext, bad are done via methods, so a
derived class can override.  Also moved the 'done' marking to
dopage(), so run() is much simpler.

- Added a method status() which returns a string containing the
summary counts; added a "total" count.

- Drop the guessing of the file type before opening the document -- we
still need to check those links for validity!

- Added a subroutine to close a connection which first slurps up the
remaining data when it's an ftp URL -- apparently closing an ftp
connection without reading till the end makes it hang.

- Added -n option to skip running (only useful with -R).

- The Checker object now has an instance variable which is set to 1
when it is changed.  This is not pickled.
1997-01-31 14:43:15 +00:00
Guido van Rossum c59a5d449f Set proper User-agent header (Python-webchecker/<version>).
When -x is combined with -q, still do the checking, but don't print
the error in this phase -- they are reported by report_errors().
1997-01-30 06:04:00 +00:00
Guido van Rossum 2739cd74b3 Some refinements of the external-link checking code: insert the errors
in the 'bad' dictionary (sanitize them so they are picklable; the
sanitation code is now a subroutine); don't check mailto: URLs; omit
colon in Error message.
1997-01-30 04:26:57 +00:00
Guido van Rossum de66268588 Added -x option to check external links. Slooooow! 1997-01-30 03:58:21 +00:00
Guido van Rossum 325a64f207 Catch I/O errors when parsing robots.txt file.
Add version number, printed at startup in non-quited mode.
1997-01-30 03:30:20 +00:00
Guido van Rossum df47bafa1c Basic README file 1997-01-30 03:24:00 +00:00
Guido van Rossum 3edbb35023 Added robots.txt support, using Skip Montanaro's parser.
Fixed occasional inclusion of unpicklable objects (Message in errors).
Changed indent of a few messages.
1997-01-30 03:19:41 +00:00
Guido van Rossum bbf8c2fafd Skip Montanaro's robots.txt parser. 1997-01-30 03:18:23 +00:00
Guido van Rossum 272b37d686 web tree checker 1997-01-30 02:44:48 +00:00
Guido van Rossum d7e4705d8f mime types guesser 1997-01-30 02:44:20 +00:00