Commit Graph

67 Commits

Author SHA1 Message Date
Ronald Oussoren 9545a23c7f In a number of places code still revers
to "sys.platform == 'mac'" and that is
dead code because it refers to a platform
that is no longer supported (and hasn't been
supported for several releases).

Fixes issue #7908 for the trunk.
2010-05-05 19:09:31 +00:00
Georg Brandl a6168f9e0a Queue renaming reversal part 3: move module into place and
change imports and other references. Closes #2925.
2008-05-25 07:20:14 +00:00
Alexandre Vassalotti 30ece44f2e Added stub for the Queue module to be renamed in 3.0.
Use the 3.0 module name to avoid spurious warnings.
2008-05-11 19:39:48 +00:00
Christian Heimes c5f05e45cf Patch #2167 from calvin: Remove unused imports 2008-02-23 17:40:11 +00:00
Andrew M. Kuchling ab26004923 Use sys.exc_info() 2006-07-26 18:15:45 +00:00
Martin Blais 215f13dd11 Normalized a few cases of whitespace in function declarations.
Found them using::

  find . -name '*.py' | while read i ; do grep 'def[^(]*( ' $i /dev/null ; done
  find . -name '*.py' | while read i ; do grep ' ):' $i /dev/null ; done

(I was doing this all over my own code anyway, because I'd been using spaces in
all defs, so I thought I'd make a run on the Python code as well.  If you need
to do such fixes in your own code, you can use xx-rename or parenregu.el within
emacs.)
2006-06-06 12:46:55 +00:00
Tim Peters 182b5aca27 Whitespace normalization, via reindent.py. 2004-07-18 06:16:08 +00:00
Andrew M. Kuchling a982c44543 [Patch #918212] Support XHTML's 'id' attribute, which can be on any element. 2004-03-21 19:07:23 +00:00
Neal Norwitz 592c4cc460 SF bug 753592, websucker bug
Pass the proper variable when the user supplies a directory.
Will backport.
2003-07-01 04:14:28 +00:00
Mark Hammond ce56c377a0 When bad HTML is encountered, ignore the page rather than failing with
a traceback.
2003-02-27 06:59:10 +00:00
Fred Drake 0b9e3f750c Handle the Content-Type header a little more appropriately: if it
contains options, drop them to get the major/minor content type.
Modified from the supplied patch to support more whitespace variation.
Closes SF patch #613605.
2002-11-12 22:19:34 +00:00
Walter Dörwald aaab30e00c Apply diff2.txt from SF patch http://www.python.org/sf/572113
(with one small bugfix in bgen/bgen/scantools.py)

This replaces string module functions with string methods
for the stuff in the Tools directory. Several uses of
string.letters etc. are still remaining.
2002-09-11 20:36:02 +00:00
Walter Dörwald 88a20baa77 Apply diff.txt from SF patch http://www.python.org/sf/561478
This uses cgi.parse_header() in Checker.checkforhtml(), so that
webchecker recognises the mime type text/html even if options
are specified.
2002-06-06 17:01:21 +00:00
Andrew M. Kuchling 566c0c737f [Bug #512799] urllib.splittype() returns a 2-tuple. (Reported by seb bacon) 2002-03-08 17:19:10 +00:00
Guido van Rossum f0953b9dff Fix SF bug #482171: webchecker dies on file: URLs w/o robots.txt
The cause seems to be that when a file URL doesn't exist,
urllib.urlopen() raises OSError instead of IOError.  Simply add this
to the except clause.  Not elegant, but effective. :-)
2001-12-11 22:41:24 +00:00
Fred Drake a2133339ff Only catch NameError and TypeError when attempting to subclass an
exception (for compatibility with old versions of Python).
2001-05-11 19:40:10 +00:00
Fred Drake d34a9c98a9 Added more link attributes based on additonal information from Chris
McCafferty <christopher.mccafferty@csg.ch>, and a bit of experimentation
with Navigator 4.7.

HTML-as-deployed is evil!
2001-04-05 18:14:50 +00:00
Fred Drake f3186e8242 A number of improvements based on a discussion with Chris McCafferty
<christopher.mccafferty@csg.ch>:

Add javascript: and telnet: to the types of URLs we ignore.

Add support for several additional URL-valued attributes on the BODY,
FRAME, IFRAME, LINK, OBJECT, and SCRIPT elements.
2001-04-04 17:47:25 +00:00
Guido van Rossum f3335e193b Patch inspired by Just van Rossum: on the Mac, in savefilename(), make
the path to save a relative path by prefixing it with os.sep (':').
Also fix an indent inconsistency in the same function.
2000-04-25 21:13:24 +00:00
Guido van Rossum 918429b3b2 Moved robotparser.py to the Lib directory.
If you do a "cvs update" in the Lib directory, it will pop up there.
2000-03-29 16:02:45 +00:00
Guido van Rossum 84306246f1 Fix suggested by Magnus Kessler: in class Page, it is possible for
self.parser to be None; in that case don't dereference it in
getnames().
2000-03-28 20:10:39 +00:00
Guido van Rossum dc8b7980e0 Skip Montanaro:
The robotparser.py module currently lives in Tools/webchecker.  In
preparation for its migration to Lib, I made the following changes:

    * renamed the test() function _test
    * corrected the URLs in _test() so they refer to actual documents
    * added an "if __name__ == '__main__'" catcher to invoke _test()
      when run as a main program
    * added doc strings for the two main methods, parse and can_fetch
    * replaced usage of regsub and regex with corresponding re code
2000-03-27 19:29:31 +00:00
Guido van Rossum 4755ee567d Complete the integration of Sam Bayer's fixes. 1999-11-17 15:41:47 +00:00
Guido van Rossum 497a19879d Changed fron importing wcnew back to webchecker. 1999-11-17 15:40:48 +00:00
Guido van Rossum e284b21457 Integrated Sam Bayer's wcnew.py code. It seems silly to keep two
files.  Removed Sam's "SLB" change comments; otherwise this is the
same as wcnew.py.
1999-11-17 15:40:08 +00:00
Guido van Rossum 61b95db389 # *NOT* by Sam Bayer: reindented to use 4 spaces like the rest here,
# and removed trailing whitespace.
1999-11-17 15:13:21 +00:00
Guido van Rossum 64acb5ce93 Samuel L. Bayer:
- same trick with "import wcnew; webchecker = wcnew" as above
- updated readhtml() method to handle pair representation; used
  new name suppression infrastructure from wcnew.py to suppress
  processing name anchors

[And untabified --GvR]
1999-11-17 15:04:26 +00:00
Guido van Rossum a8946406df Samuel L. Bayer:
- added -t and -a arguments
- added "import wcnew; webchecker = wcnew" in place of "import
  webchecker" (I assume that if you're happy with the changes, you'll
  just replace webchecker.py with wcnew.py, but if I were to do that,
  the diffs would be incomprehensible)
- fixed buggy -v argument (I think you got out of sync with the
  way verbosity was handled in webchecker vs. wcgui between 1.5 and
  1.5.2)
- made -v actually do something by adding a call to c.setflags()
  (probably the same problem as above)
- updated references to URLs to accommodate wcnew.py's pair
  representation; added appropriate calls to format_url() to handle
  display; added argument to ListPanel() initialization to provide
  access to format_url()

[And untabified --GvR]
1999-11-17 15:03:52 +00:00
Guido van Rossum f97eecccb7 Samuel L. Bayer:
- same fixes from webchecker.py
- incorporated small diff between current webchecker.py and 1.5.2
- fixed bug where "extra roots" added with the -t argument were being
  checked as real roots, not just as possible continuations
- added -a argument to suppress checking of name anchors

[And untabified --GvR]
1999-11-17 15:02:53 +00:00
Guido van Rossum dbd5c3e63b Samuel L. Bayer:
- forced new done origins to set errors if they're in self.bad (fixes
  bug where only the first of a number of errorful references to a
  link is reported under some circumstances)
- suppressed adding duplicates to self.todo list (cleans up printout
  in wcgui details)
1999-11-17 15:00:14 +00:00
Guido van Rossum 0ec1493d0b Some changes (maybe not enough?) to make it work on Windows with local
file URLs.
1999-04-26 23:11:46 +00:00
Guido van Rossum 545006259d Added Samuel Bayer's new webchecker.
Unfortunately his code breaks wcgui.py in a way that's not easy
to fix.  I expect that this is a temporary situation --
eventually Sam's changes will be merged back in.
(The changes add a -t option to specify exceptions to the -x
option, and explicit checking for #foo style fragment ids.)
1999-03-24 19:09:00 +00:00
Guido van Rossum 909bc18188 Recover from failed saves; when a file turns out to be a directory,
create a directory and moer the original file to the index.html.
1999-01-03 13:06:00 +00:00
Guido van Rossum a42c1ee21d Added note() message to Page class -- this was used but didn't exist.
(The alternative would be to call self.checker.note() but since
self.checker might be None that's not quite right.
1998-08-06 21:31:13 +00:00
Guido van Rossum b77a68e6b1 Rewrite to support multiple suckers, each with their own thread. 1998-07-08 03:05:22 +00:00
Guido van Rossum 125700addb Instead of printint, use self.message() or self.note(). 1998-07-08 03:04:39 +00:00
Guido van Rossum 0a13f7f23a # This is a new module I wrote over the weekend. Again, you missed the
# checkin email because my PC doesn't have the "Mail" command.

Add threading (now that it works).  Also some small adaptations to
Unix again.
1998-06-15 14:49:16 +00:00
Guido van Rossum e3bd82117f Primitive GUI for websucker. 1998-06-15 12:35:19 +00:00
Guido van Rossum d328a9b5f4 Fix the way a trailing / is changed to /index.html so that it
doesn't depend on the value of os.sep.  (I.e. ported to Windows :-)
1998-06-15 12:34:41 +00:00
Guido van Rossum 6eb9d32c43 sort the urls in the todo list 1998-06-15 12:33:02 +00:00
Guido van Rossum bee64533d6 Use a try-except so that the pickle file is written even when we die
because of an unexpected exception.
1998-04-27 19:35:15 +00:00
Guido van Rossum 986abac1ba Give in to tabnanny 1998-04-06 14:29:28 +00:00
Guido van Rossum 88b02cf346 Use a better way to bind the checkext instance variable to a check
button widget, not involving a __getattr__() method but a callback on
the widget.
1998-03-05 20:12:18 +00:00
Guido van Rossum 1a7eae919a Adapt to new webchecker structure. Due to better structure of
getpage(), much less duplicate code is needed -- we only need to
override readhtml().
1998-02-21 20:08:39 +00:00
Guido van Rossum 00756bd4a6 Major overhaul. Don't use global variable (e.g. verbose); use
instance variables.  Make all global functions methods, for easy
overriding.  Restructure getpage() for easy overriding.  Add
save_pickle() method and load_pickle() global function to make it
easier for other programs to emulate the toplevel interface.
1998-02-21 20:02:09 +00:00
Guido van Rossum f326134e5c Map .shtml to text/html. 1997-10-07 14:56:42 +00:00
Guido van Rossum d57548023f A variant on webchecker that creates a mirror copy of a remote site. 1997-10-06 18:54:25 +00:00
Guido van Rossum 2237b73baf Several changes:
- Change the code that looks for robots.txt to always look in /, even
if the "root" path is somewhere deep down below.

- Add link processing in <AREA> tags.

- Change safeclose() to avoid crashing when the file has no geturl()
method.
1997-10-06 18:54:01 +00:00
Guido van Rossum 68bdad1015 Tiny script to play with it on a Mac. 1997-05-28 16:09:02 +00:00
Guido van Rossum 29f6533c7f Scroll to top of info window when done. 1997-05-09 03:19:29 +00:00