Test suites for urllib and urlparse run with each other's function to verify
correctness of replacement and both test suites pass.
Bumped urllib's __version__ attribute up a minor number.
- When redirecting, always use GET. This is common practice and
more-or-less sanctioned by the HTTP standard.
- Add a handler for 307 redirection, which becomes an error for POST,
but a regular redirect for GET and HEAD.
Jeremy reported that this is not allowed by RFC 2396; however,
other tools support unescaped @'s so we should also.
Apply SF patch 596581 closing bug 581529.
I really can't test this, but from reading the discussion in that bug
report, it's likely that this works. It may also close a whole bunch
of other bug reports related to urllib and proxies on Windows, but who
knows.
open_http():
In urllib.py library module, URLopener.open_https()
returns a class instance of addinfourl() with its
self.url property missing the protocol.
Instead of "https://www.someurl.com", it becomes
"://www.someurl.com".
Modify rfc822.formatdate() to always generate English names,
regardless of locale. This is required by RFC 1123.
In open_local_file() of urllib and urllib2, use new formatdate() from
rfc822.
For local files urllib.py doesn't return the MIME
headers that the documentation says it does:
http://www.python.org/doc/current/lib/module-
urllib.html#l2h-2187 states that "When the method is
local-file, returned headers will include a Date
representing the file's last-modified time, a Content-
Length giving file size, and a Content-Type containing
a guess at the file's type"
But in Python 2.1 the only header that gets returned
is the Content-Type:
>>> import urllib
>>> f = urllib.urlopen("gurk.txt")
>>> f.info().headers
['Content-Type: text/plain\n']
Even though relative redirects are illegal, they are common
urllib treated every relative redirect as though it was to http,
even if the original was https://
As long as we're compensating for server bugs, might as well do
it properly.
number of entries into http_error_302 exceeds the value set for the maxtries
attribute (which defaults to 10), the recursion is exited by calling
the http_error_500 method (or if that is not defined, http_error_default).
when quoting forbidden characters. There are scripts out there that
break with lower case, therefore I guess %%%X should be used."
I agree, so am fixing this.
obsolete!).
Fix a bug in ftpwrapper.retrfile() where somehow ftplib.error_perm was
assumed to be a string. (The fix applies str().)
Also break some long lines and change the output from test() slightly.
invalid proxy setting.
Minor change to call of unknown_url; always pass data argument
explicitly since data defaults to None.
PEP 42: Add as a feature that urllib handle proxy setting that contain
only the host and port of the proxy.
The earlier code assumed "protocol=host;protocol=host;..." or "host",
but Windows may also use "protocol=host" (just one entry), as well as
"protocol://host". This code needs some more work, so I'll leave the
bug open for now.
character according to RFC 2396. Add some text to quote doc string
that explains the quoting rules better.
This closes SF Bug #114427.
Add _fast_quote operation that uses a dictionary instead of a list
when the standard set of safe characters is used.
so that a subclass can override it.
This partly addresses Bug #112634 -- but the documentation is still
wrong, since it suggests that you can set self.version *after* calling
the base class __init__. In fact it must be done *before*.
I'll fix that too.
Patch description
-----------------
This addresses four issues:
(1) usernames and passwords in urls with special characters are now
decoded properly. i.e. http://foo%2C:bar@www.whatever.com/
(2) Basic Auth support has been added to HTTPS, like it was in HTTP.
(3) Version 1.92 sent the POSTed data, but did not deal with errors
(HTTP responses other than 200) properly. HTTPS now behaves the
same way HTTP does.
(4) made URL-checking beahve the same way with HTTPS as it does with
HTTP (changed == to !=).
Note that this patch looks worse than it is - an existing function (getproxies() for all platforms other than Win/Mac) has been moved, renamed and indentation changed, but the body of that function is identical. Windows now allows the environment variables to override the registry.
comments, docstrings or error messages. I fixed two minor things in
test_winreg.py ("didn't" -> "Didn't" and "Didnt" -> "Didn't").
There is a minor style issue involved: Guido seems to have preferred English
grammar (behaviour, honour) in a couple places. This patch changes that to
American, which is the more prominent style in the source. I prefer English
myself, so if English is preferred, I'd be happy to supply a patch myself ;)
often, ftp URLs hang in the final close. Further analysis suggests
that this is because the close hook in addclosehook() calls the hook
before acually closing the connection. The hook, in this case, waits
for the '226 Transfer complete' status from the server on the command
socket. However, more and more ftp servers only send this status when
the data socket has actually been closed -- causing a deadlock.
The fix is simple: in addclosehook.close(), call addbase.close()
*before* calling the closehook.
The attached patches update the standard library so that all modules
have docstrings beginning with one-line summaries.
A new docstring was added to formatter. The docstring for os.py
was updated to mention nt, os2, ce in addition to posix, dos, mac.
Fixed a TypeError: not enough arguments; expected 4, got 3.
When authentication is needed, the default http_error_401 method calls
retry_http_basic_auth. The default version of that method expected a
data argument which wasn't provided, so now we provide the argument if
it was given and we also made the data argument optional.
Also changed other calls where data was optional to not pass data if
it was not passed to the calling method (in line with other similar
occurances).
Brian E Gallew, which were improved and adapted to OpenSSL 0.9.4 by
Laszlo Kovacs of HP. Both have kindly given permission to include
the patches in the Python distribution. Final formatting by GvR.
In splithost, accept empty host part in URLs. This is required for
file URLs that can have an empty host part. For such URLs, we should
not return the initial 2 slashes as part of the file name.
Urllib makes the URL of the opened file available through the geturl
method of the returned object. For local files, this consists of
file: plus the name of the file. This results in an invalid URL if
the file name was relative. This patch fixes this so that the
returned URL is just a relative URL in that case. When the file name
is absolute, the URL returned is of the form file:///absolute/path.
[I guess that a URL of the form "file:foo.html" is illegal... GvR]
urlopen is used to specify form data, make sure the second argument is
threaded through all of the http_error_NNN calls. This allows error
handlers like the redirect and authorization handlers to properly
re-start the connection.
File names with "funny" characters get translated wrong by
pathname2url (any variety). E.g. the (Unix) file "/ufs/sjoerd/#tmp"
gets translated into "/ufs/sjoerd/#tmp" which, when interpreted as a
URL is file "/ufs/sjoerd/" with fragment ID "tmp".
Here's an easy fix. (An alternative fix would be to change the
various implementations of pathname2url and url2pathname to include
calls to quote and unquote.
[The main problem is with the normal use of URLs:
url = url2pathname(file)
transmit url
url, tag = splittag(url)
urlopen(url)
]
In addition, this patch fixes some uses of unquote:
- the host part of URLs should be unquoted
- the file path in the FTP URL should be unquoted before it is split
into components.
- because of the latter, I removed all unquoting from ftpwrapper,
and moved it to the caller, but that is not essential
1. Generate a correct Content-Length header visible through the info() method
if a request to open an FTP URL gets a length in the response to RETR.
2. Take a third argument to urlretrieve() that makes it possible to progress-
meter an urlretrieve call (this is what I needed the above change for).
See the second patch band below for details.
3. To avoid spurious errors, I commented out the gopher test. The target
document no longer exists.
Fix the implementation of quote_plus(). (It wouldn't treat '+' in the
original data right.)
Add urlencode(dict) which is handy to create the data for sending a
POST request with urlopen().