Commit Graph

205 Commits

Author SHA1 Message Date
Raymond Hettinger 42182ebaf6 SF 698520: Iterator for urllib.URLOpener
Contributed by Brett Cannon.
2003-03-09 05:33:33 +00:00
Guido van Rossum 68468eba63 Get rid of many apply() calls. 2003-02-27 20:14:51 +00:00
Jeremy Hylton 3bd6fde4e3 Use fdopen() to create file from fd. 2002-10-11 14:36:24 +00:00
Jack Jansen 4ef1103b71 When testing for localhost/ first map to lower case. Spotted by Skip. 2002-09-12 20:14:04 +00:00
Jack Jansen 3ae2dc5e5e Treat file://localhost/ as local too (same as file:/ and file:///).
Fixes #607789, bugfix candidate.
2002-09-12 19:47:52 +00:00
Raymond Hettinger f2e45dd9dd Modify splituser() method to allow an @ in the userinfo field.
Jeremy reported that this is not allowed by RFC 2396; however,
other tools support unescaped @'s so we should also.

Apply SF patch 596581 closing bug 581529.
2002-08-18 20:08:56 +00:00
Guido van Rossum 3b0a3293c3 Massive changes from SF 589982 (tempfile.py rewrite, by Zack
Weinberg).  This changes all uses of deprecated tempfile functions to
the recommended ones.
2002-08-09 16:38:32 +00:00
Neal Norwitz 60e04cd317 Fix SF #565414, FancyURLopener() needs to support **kwargs
since the URLopener base class does and **kwargs are used in urlopen.
2002-06-11 13:38:51 +00:00
Walter Dörwald 65230a2de7 Remove uses of the string and types modules:
x in string.whitespace => x.isspace()
type(x) in types.StringTypes => isinstance(x, basestring)
isinstance(x, types.StringTypes) => isinstance(x, basestring)
type(x) is types.StringType => isinstance(x, str)
type(x) == types.StringType => isinstance(x, str)
string.split(x, ...) => x.split(...)
string.join(x, y) => y.join(x)
string.zfill(x, ...) => x.zfill(...)
string.count(x, ...) => x.count(...)
hasattr(types, "UnicodeType") => try: unicode except NameError:
type(x) != types.TupleTuple => not isinstance(x, tuple)
isinstance(x, types.TupleType) => isinstance(x, tuple)
type(x) is types.IntType => isinstance(x, int)

Do not mention the string module in the rlcompleter docstring.

This partially applies SF patch http://www.python.org/sf/562373
(with basestring instead of string). (It excludes the changes to
unittest.py and does not change the os.stat stuff.)
2002-06-03 15:58:32 +00:00
Raymond Hettinger 10ff706e27 Replaced boolean tests with is None. 2002-06-02 03:04:52 +00:00
Raymond Hettinger 54f0222547 SF 563203. Replaced 'has_key()' with 'in'. 2002-06-01 14:18:47 +00:00
Guido van Rossum 4b46c0a15f Don't require Unicode support. 2002-05-24 17:58:05 +00:00
Guido van Rossum a2da305211 Fix from SF bug #541980 (Jacques A. Vidrine).
When os.stat() for a file raises OSError, turn it into IOError per
documentation.

Bugfix candidate.
2002-04-15 00:25:01 +00:00
Fred Drake df6eca7eb7 Support manual proxy configuration for simple urlopen() operations.
This change is similar to the supplied patch, but does not save the opener
when a proxy configuration is specified.
This closes SF patch #523415.
2002-04-04 20:41:34 +00:00
Guido van Rossum 64e5aa9391 Fix for a bug in the fix for SF bug 503031. This time the OP verified
that it works.

Bugfix candidate (this and the previous checkin, obviously).
2002-04-02 14:38:16 +00:00
Guido van Rossum b955d6c41e Hopeful fix for SF bug 503031: urllib.py: open_http() host problem.
I really can't test this, but from reading the discussion in that bug
report, it's likely that this works.  It may also close a whole bunch
of other bug reports related to urllib and proxies on Windows, but who
knows.
2002-03-31 23:38:48 +00:00
Neal Norwitz aad1849e25 time and socket were already imported in the module, no need to re-import 2002-03-26 16:25:01 +00:00
Walter Dörwald 92b48b739f use stat attributes instead of tuple entries
and remove the unneccessary "import stat" statement.
2002-03-22 17:30:38 +00:00
Andrew M. Kuchling 56a42356b7 To make 'urllib.py -t' run again, change FTP URL to a file that actually
exists.
2002-03-18 22:18:46 +00:00
Neal Norwitz bc9bc187aa SF #515024 remove unused variable 2002-02-11 18:06:21 +00:00
Guido van Rossum b931bf3c55 SF patch #490515 (Joe A) urllib.open_https() protocol issue
open_http():
    In urllib.py library module, URLopener.open_https()
    returns a class instance of addinfourl() with its
    self.url property missing the protocol.

    Instead of "https://www.someurl.com", it becomes
    "://www.someurl.com".
2001-12-08 17:09:07 +00:00
Fred Drake c680ae8002 Added missing parameter in call to http_error_default();
reported by Neal Norwitz.
2001-10-13 18:37:07 +00:00
Jeremy Hylton 6d8c1aabff Add content-type header to ftp URLs (SF patch #454553)
Modify rfc822.formatdate() to always generate English names,
regardless of locale.  This is required by RFC 1123.

In open_local_file() of urllib and urllib2, use new formatdate() from
rfc822.
2001-08-27 20:16:53 +00:00
Guido van Rossum 88e0b5bee0 SF patch #454553 by Walter Dörwald: auto-guess content-type header for
ftp urls.
2001-08-23 13:38:15 +00:00
Martin v. Löwis 58682b7fe5 Only catch the errors that can actually occur, as reported in bug #411881. 2001-08-11 15:02:57 +00:00
Tim Peters ab9ba27dc0 Whitespace normalization. 2001-08-09 21:40:30 +00:00
Tim Peters 55c12d4d5b SF patch #403640: incomplete proxy handling in URLLIB
Look specific to Windows.  Don't know whether it works.
2001-08-09 18:04:14 +00:00
Guido van Rossum f0713d3f4d SF Patch #420725 by Walter Doerwald:
For local files urllib.py doesn't return the MIME
  headers that the documentation says it does:

  http://www.python.org/doc/current/lib/module-
  urllib.html#l2h-2187 states that "When the method is
  local-file, returned headers will include a Date
  representing the file's last-modified time, a Content-
  Length giving file size, and a Content-Type containing
  a guess at the file's type"

  But in Python 2.1 the only header that gets returned
  is the Content-Type:

  >>> import urllib
  >>> f = urllib.urlopen("gurk.txt")
  >>> f.info().headers
  ['Content-Type: text/plain\n']
2001-08-09 17:43:35 +00:00
Fred Drake ec3dfdee6a Only write out one blank line before the request data.
This closes SF patch #419459.
2001-07-04 05:18:29 +00:00
Guido van Rossum b8bf3bece2 Fix SF bug [ #416231 ] urllib.basejoin fails to apply some ../.
Reported by Juan M. Bello Rivas.
2001-04-15 20:47:33 +00:00
Moshe Zadka 5d87d47295 fixing 408085 - redirect from https becomes http
Even though relative redirects are illegal, they are common
urllib treated every relative redirect as though it was to http,
even if the original was https://
As long as we're compensating for server bugs, might as well do
it properly.
2001-04-09 14:54:21 +00:00
Skip Montanaro 44d5e0c418 updated __all__ to include several other names 2001-03-13 19:47:16 +00:00
Jack Jansen 282fed1363 Grr, splittag was also missing from __all__. 2001-03-05 13:45:38 +00:00
Jack Jansen 49985638fa Added url2pathname and pathname2url to __all__. 2001-03-05 13:41:14 +00:00
Guido van Rossum d74fb6b12a RISCOS changes by dschwertberger. 2001-03-02 06:43:49 +00:00
Skip Montanaro 40fc16059f final round of __all__ lists (I hope) - skipped urllib2 because Moshe may be
giving it a slight facelift
2001-03-01 04:27:19 +00:00
Tim Peters 85ba673b0a Whitespace normalization. 2001-02-28 08:26:44 +00:00
Moshe Zadka e99bd17ed6 Fixing bug #227562 by calling URLopener.http_error_default when
an invalid 401 request is being handled.
2001-02-27 06:27:04 +00:00
Skip Montanaro c3e11d6569 provide simple recovery/escape from apparent redirect recursion. If the
number of entries into http_error_302 exceeds the value set for the maxtries
attribute (which defaults to 10), the recursion is exited by calling
the http_error_500 method (or if that is not defined, http_error_default).
2001-02-15 16:56:36 +00:00
Tim Peters 658cba6706 Whitespace normalization. 2001-02-09 20:06:00 +00:00
Skip Montanaro 14f1ad4a94 allow first param urlencode to be a sequence of two-element tuples - in this
case, the order of parameters in the output matches the order of the inputs.
2001-01-28 21:11:12 +00:00
Skip Montanaro a5d23a19e6 modify urlencode so sequences in the dict are treated as multivalued
parameters.  This closes the code part of patch 103314.
2001-01-20 15:56:39 +00:00
Guido van Rossum e27a7b8074 Anonymous SF bug 129288: "The python 2.0 urllib has %%%x as a format
when quoting forbidden characters. There are scripts out there that
break with lower case, therefore I guess %%%X should be used."

I agree, so am fixing this.
2001-01-19 03:28:15 +00:00
Guido van Rossum afc4f0413a - Make sure to quote the username and password (SF patch #103236 by
dogfort).

- Don't drop the data argument when calling open_https() from the
  authentication error handler.
2001-01-15 18:31:13 +00:00
Tim Peters e119006e7d Whitespace normalization. Top level of Lib now fixed-point for reindent.py! 2001-01-15 03:34:38 +00:00
Moshe Zadka b2a0a838e0 Fixed bug which caused HTTPS not to work at all with string URLs 2001-01-08 07:09:25 +00:00
Guido van Rossum b2493f855a Get rid of string functions, except maketrans() (which is *not*
obsolete!).

Fix a bug in ftpwrapper.retrfile() where somehow ftplib.error_perm was
assumed to be a string.  (The fix applies str().)

Also break some long lines and change the output from test() slightly.
2000-12-15 15:01:37 +00:00
Martin v. Löwis 1d99433a58 Convert Unicode strings to byte strings before passing them into specific
protocols. Closes bug #119822.
2000-12-03 18:30:10 +00:00
Jeremy Hylton d52755f41c Provide a clearer error message when urlopen fails because of an
invalid proxy setting.

Minor change to call of unknown_url; always pass data argument
explicitly since data defaults to None.

PEP 42: Add as a feature that urllib handle proxy setting that contain
only the host and port of the proxy.
2000-10-02 23:04:02 +00:00
Fredrik Lundh b49f88bfc1 - Improved handling of win32 proxy settings (addresses bug #114256).
The earlier code assumed "protocol=host;protocol=host;..." or "host",
but Windows may also use "protocol=host" (just one entry), as well as
"protocol://host".  This code needs some more work, so I'll leave the
bug open for now.
2000-09-24 18:51:25 +00:00
Jeremy Hylton 7ae51bf82d Remove "," from the list of always_safe characters. It is a reserved
character according to RFC 2396. Add some text to quote doc string
that explains the quoting rules better.

This closes SF Bug #114427.

Add _fast_quote operation that uses a dictionary instead of a list
when the standard set of safe characters is used.
2000-09-14 16:59:07 +00:00
Jeremy Hylton 6102e29df2 fixes bug #111951
applies patch #101369 by Moshe Zadke
use explicit list of always safe characters instead of string.letters
add test case
2000-08-31 15:48:10 +00:00
Sjoerd Mullender d7b86f0056 Pass data on to retrieve method.
Don't people *test* their changes?
2000-08-25 11:23:36 +00:00
Guido van Rossum ba3113807d Promote the server version from a local variable to a class variable,
so that a subclass can override it.

This partly addresses Bug #112634 -- but the documentation is still
wrong, since it suggests that you can set self.version *after* calling
the base class __init__.  In fact it must be done *before*.

I'll fix that too.
2000-08-24 16:18:04 +00:00
Fred Drake 316a793a58 Randall Hopper <aa8vb@yahoo.com>>:
Make it easier to use HTTP POST with urlretrieve().
2000-08-24 01:01:26 +00:00
Skip Montanaro 79f1c1778d * added doc strings to urlopen and unquote_plus
* fixed type in doc string for quote
2000-08-22 03:00:52 +00:00
Fred Drake 567ca8e732 Patch from Paul Schreiber <paul@commerceflow.com>:
Patch description
-----------------
This addresses four issues:

(1) usernames and passwords in urls with special characters are now
    decoded properly. i.e. http://foo%2C:bar@www.whatever.com/

(2) Basic Auth support has been added to HTTPS, like it was in HTTP.

(3) Version 1.92 sent the POSTed data, but did not deal with errors
    (HTTP responses other than 200) properly. HTTPS now behaves the
    same way HTTP does.

(4) made URL-checking beahve the same way with HTTPS as it does with
    HTTP (changed == to !=).
2000-08-21 21:42:42 +00:00
Mark Hammond 4f570b9239 Patch #100873 - Use registry to find proxies for urllib on Win32
Note that this patch looks worse than it is - an existing function (getproxies() for all platforms other than Win/Mac) has been moved, renamed and indentation changed, but the body of that function is identical.  Windows now allows the environment variables to override the registry.
2000-07-26 07:04:38 +00:00
Thomas Wouters 7e47402264 Spelling fixes supplied by Rob W. W. Hooft. All these are fixes in either
comments, docstrings or error messages. I fixed two minor things in
test_winreg.py ("didn't" -> "Didn't" and "Didnt" -> "Didn't").

There is a minor style issue involved: Guido seems to have preferred English
grammar (behaviour, honour) in a couple places. This patch changes that to
American, which is the more prominent style in the source. I prefer English
myself, so if English is preferred, I'd be happy to supply a patch myself ;)
2000-07-16 12:04:32 +00:00
Fred Drake 9e94afd18d Fix bug #314, reported by Craig Allen <cba@mediaone.net>:
splittype():  Always lower-case the URL scheme; these are supposed to be
               normalized according to RFC 1738 anyway.
2000-07-01 07:03:30 +00:00
Andrew M. Kuchling 7ad4792307 Comment out an apparent debug print 2000-06-10 01:41:48 +00:00
Guido van Rossum c580dae6da Fix a problem reported by Oleg Broytmann, who complains that very
often, ftp URLs hang in the final close.  Further analysis suggests
that this is because the close hook in addclosehook() calls the hook
before acually closing the connection.  The hook, in this case, waits
for the '226 Transfer complete' status from the server on the command
socket.  However, more and more ftp servers only send this status when
the data socket has actually been closed -- causing a deadlock.

The fix is simple: in addclosehook.close(), call addbase.close()
*before* calling the closehook.
2000-05-24 13:21:46 +00:00
Andrew M. Kuchling 43c5af026f Fix to previous patch: send the request data when it's provided 2000-04-24 14:17:06 +00:00
Andrew M. Kuchling 141e9894b7 Fixed bug reported by JP Calderone: https:// URL's didn't work.
The fix also adds support for POSTing to an https URL
2000-04-23 02:53:11 +00:00
Guido van Rossum e7b146fb3b The third and final doc-string sweep by Ka-Ping Yee.
The attached patches update the standard library so that all modules
have docstrings beginning with one-line summaries.

A new docstring was added to formatter.  The docstring for os.py
was updated to mention nt, os2, ce in addition to posix, dos, mac.
2000-02-04 15:28:42 +00:00
Guido van Rossum 3c8baedaf8 Sjoerd Mullender writes:
Fixed a TypeError: not enough arguments; expected 4, got 3.
When authentication is needed, the default http_error_401 method calls
retry_http_basic_auth.  The default version of that method expected a
data argument which wasn't provided, so now we provide the argument if
it was given and we also made the data argument optional.

Also changed other calls where data was optional to not pass data if
it was not passed to the calling method (in line with other similar
occurances).
2000-02-01 23:36:55 +00:00
Guido van Rossum 09c8b6c3e4 OpenSSL support. This is based on patches for a version of SSLeay by
Brian E Gallew, which were improved and adapted to OpenSSL 0.9.4 by
Laszlo Kovacs of HP.  Both have kindly given permission to include
the patches in the Python distribution.  Final formatting by GvR.
1999-12-07 21:37:17 +00:00
Guido van Rossum 5e006a3cc3 Patches by Michael Reilly to correctly deal with ftp URLs of the form
ftp://user@host//root/path: the double slash in the pathname means to
go to the root directory even if the initial directory isn't the root.
1999-08-18 17:40:33 +00:00
Guido van Rossum 3427c1f71b Sjoerd Mullender:
In splithost, accept empty host part in URLs.  This is required for
file URLs that can have an empty host part.  For such URLs, we should
not return the initial 2 slashes as part of the file name.
1999-07-01 23:20:56 +00:00
Guido van Rossum 336a201d4f Sjoerd Mullender writes:
Urllib makes the URL of the opened file available through the geturl
method of the returned object.  For local files, this consists of
file: plus the name of the file.  This results in an invalid URL if
the file name was relative.  This patch fixes this so that the
returned URL is just a relative URL in that case.  When the file name
is absolute, the URL returned is of the form file:///absolute/path.

[I guess that a URL of the form "file:foo.html" is illegal...  GvR]
1999-06-24 15:27:36 +00:00
Guido van Rossum 0dee4ee0f8 Updated lagging version#. Also added some comments about how quote()
and quote_plus() can be optimized tenfold.
1999-06-09 15:14:50 +00:00
Guido van Rossum 3527f59457 Hack so that if a 302 or 301 redirect contains a relative URL, the
right thing "just happens" (basejoin() with old URL).
1999-03-29 20:23:41 +00:00
Guido van Rossum 3764595c98 Yet another patch by Sjoerd Mullender:
Don't convert URLs to URLs using pathname2url.
1999-03-15 16:16:29 +00:00
Guido van Rossum 367ac80d3b From: Sjoerd Mullender
The filename to URL conversion didn't properly quote special
characters.
The URL to filename didn't properly unquote special chatacters.
1999-03-12 14:31:10 +00:00
Guido van Rossum 29aab7582f open_http also had the 'data is None' test backwards. don't call with the
extra argument if data is None.
1999-03-09 19:31:21 +00:00
Jeremy Hylton b30f52a471 http_error had the 'data is None' test backwards. don't call with the
extra argument if data is None.
1999-02-25 16:14:58 +00:00
Jeremy Hylton f90b002e31 change indentation from 8 spaces to 4 spaces 1999-02-25 16:12:12 +00:00
Jeremy Hylton 547c3f1c13 pleasing the tabnanny 1999-02-25 15:59:54 +00:00
Jeremy Hylton dbc8364e1f When performing a POST request, i.e. when the second argument to
urlopen is used to specify form data, make sure the second argument is
threaded through all of the http_error_NNN calls.  This allows error
handlers like the redirect and authorization handlers to properly
re-start the connection.
1999-02-24 18:42:38 +00:00
Guido van Rossum 4505895e68 As Des Barry points out, we need to call pathname2url(file) in two
calls to addinfourl() in open_file().
1999-02-22 19:01:42 +00:00
Guido van Rossum ed52a20c6e In open_ftp(), check that retrlen is not None before using it in a %d format! 1999-02-16 15:10:12 +00:00
Guido van Rossum 33add0a95a Sjoerd Mullender:
File names with "funny" characters get translated wrong by
pathname2url (any variety).  E.g. the (Unix) file "/ufs/sjoerd/#tmp"
gets translated into "/ufs/sjoerd/#tmp" which, when interpreted as a
URL is file "/ufs/sjoerd/" with fragment ID "tmp".

Here's an easy fix.  (An alternative fix would be to change the
various implementations of pathname2url and url2pathname to include
calls to quote and unquote.

[The main problem is with the normal use of URLs:
	url = url2pathname(file)
	transmit url
	url, tag = splittag(url)
	urlopen(url)
]

In addition, this patch fixes some uses of unquote:
- the host part of URLs should be unquoted
- the file path in the FTP URL should be unquoted before it is split
  into components.
- because of the latter, I removed all unquoting from ftpwrapper,
  and moved it to the caller, but that is not essential
1998-12-18 15:25:22 +00:00
Guido van Rossum 9ab96d40eb Changes by Eric Raymond:
1. Generate a correct Content-Length header visible through the info() method
   if a request to open an FTP URL gets a length in the response to RETR.

2. Take a third argument to urlretrieve() that makes it possible to progress-
   meter an urlretrieve call (this is what I needed the above change for).
   See the second patch band below for details.

3. To avoid spurious errors, I commented out the gopher test.  The target
   document no longer exists.
1998-09-28 14:07:00 +00:00
Guido van Rossum 4163e708ed On the Mac, use Internet Config to find the proxies (Jack Jansen).
Also added two XXX comments about lingering thread unsafeness.
1998-08-06 13:39:09 +00:00
Guido van Rossum 810a3396d1 Speed up the implementation of quote().
Fix the implementation of quote_plus().  (It wouldn't treat '+' in the
original data right.)

Add urlencode(dict) which is handy to create the data for sending a
POST request with urlopen().
1998-07-22 21:33:23 +00:00
Guido van Rossum c94f16f156 Oops! Of course, Tim is right -- when the item is not a hex number,
the '%' should be put back in.
1998-06-29 00:42:54 +00:00
Guido van Rossum 52e86ad05b Speed-up unquote(), inspired by post from Daniel Walton. 1998-06-28 23:49:35 +00:00
Guido van Rossum 2349015a87 Rewrite the (test) main program so that when used as a script, it can
retrieve one or more URLs to stdout.  Use -t to run the self-test.
1998-06-25 02:39:00 +00:00
Guido van Rossum ae9ee7329d Use the getpass module instead of having platform-specific echo on/off
code here.
1998-06-12 14:21:13 +00:00
Guido van Rossum e0c0da98d8 Patches to make the proxy code work again. (Why does that always break
as soon as I change things even just a little bit? :-)  Even works
when accessing a password-protected page through the proxy.  Prompted
by complaints from, and correct operation verified by, Nigel O'Brian.
1998-05-05 13:58:13 +00:00
Guido van Rossum 0eae8fba81 Feeble attempt at making urlopen more robust -- don't call splituser()
when splithost() returned no useable host, to avoid calling
splituser() on None.
1998-04-27 15:19:17 +00:00
Guido van Rossum c74521acc4 Oops -- remove some debug print statements! 1998-04-11 01:18:35 +00:00
Guido van Rossum 0454b51282 Oops, pulled over by the tab police :-) 1998-04-03 15:57:58 +00:00
Guido van Rossum b5916ab065 Change by Sjoerd (with minor reformatting):
guess the mime type of a local file.

Change suggested by Sjoerd (with different implementation):
  when retrieve() creates a temporary file, preserve the suffix.

Corrollary of the first change:
  also return the mime type of a local file in retrieve().
1998-04-03 15:56:16 +00:00
Guido van Rossum a08fabad72 A few lines were indented using spaces instead of tabs -- fix them. 1998-03-30 17:17:24 +00:00
Guido van Rossum 7e7ca0ba17 A few lines were indented using spaces instead of tabs -- fix them. 1998-03-26 21:01:39 +00:00
Guido van Rossum 6d4d1c2a25 Added support for "data" URL, by Sjoerd Mullender. 1998-03-12 14:32:55 +00:00
Guido van Rossum 8a666e7c56 Fix a horrible race condition -- various routines were storing the
most recently opened URL in self.openedurl of the URLopener instance.
This doesn't really work if multiple threads share the same opener
instance!

Fix: openedurl was actually simply the type prefix (e.g. "http:")
followed by the rest of the URL; since the rest of the URL is
available and the type is effectively determined by where you are in
the code, I can reconstruct the full URL easily, e.g. "http:" + url.
1998-02-13 01:39:16 +00:00
Guido van Rossum 03710d2a40 Two suggested features by Sjoerd:
- use the tempcache in the open() method, too.

- use the "unwrap"ped url as key for the tempcache.
1998-02-05 16:22:27 +00:00
Guido van Rossum c5d8fed261 (1) Use matchobj.groups(), not matchbj.group() to get all groups.
(2) Provisional hack to avoid dying when trying to turn echo on or off
on Macs, where os.system() doesn't exist.
1998-02-05 16:21:28 +00:00