File names with "funny" characters get translated wrong by
pathname2url (any variety). E.g. the (Unix) file "/ufs/sjoerd/#tmp"
gets translated into "/ufs/sjoerd/#tmp" which, when interpreted as a
URL is file "/ufs/sjoerd/" with fragment ID "tmp".
Here's an easy fix. (An alternative fix would be to change the
various implementations of pathname2url and url2pathname to include
calls to quote and unquote.
[The main problem is with the normal use of URLs:
url = url2pathname(file)
transmit url
url, tag = splittag(url)
urlopen(url)
]
In addition, this patch fixes some uses of unquote:
- the host part of URLs should be unquoted
- the file path in the FTP URL should be unquoted before it is split
into components.
- because of the latter, I removed all unquoting from ftpwrapper,
and moved it to the caller, but that is not essential
1. Generate a correct Content-Length header visible through the info() method
if a request to open an FTP URL gets a length in the response to RETR.
2. Take a third argument to urlretrieve() that makes it possible to progress-
meter an urlretrieve call (this is what I needed the above change for).
See the second patch band below for details.
3. To avoid spurious errors, I commented out the gopher test. The target
document no longer exists.
Fix the implementation of quote_plus(). (It wouldn't treat '+' in the
original data right.)
Add urlencode(dict) which is handy to create the data for sending a
POST request with urlopen().
as soon as I change things even just a little bit? :-) Even works
when accessing a password-protected page through the proxy. Prompted
by complaints from, and correct operation verified by, Nigel O'Brian.
guess the mime type of a local file.
Change suggested by Sjoerd (with different implementation):
when retrieve() creates a temporary file, preserve the suffix.
Corrollary of the first change:
also return the mime type of a local file in retrieve().
most recently opened URL in self.openedurl of the URLopener instance.
This doesn't really work if multiple threads share the same opener
instance!
Fix: openedurl was actually simply the type prefix (e.g. "http:")
followed by the rest of the URL; since the rest of the URL is
available and the type is effectively determined by where you are in
the code, I can reconstruct the full URL easily, e.g. "http:" + url.
retrieving files from the same host and directory, you had to close
the previous instance before opening a new one; and retrieving a
non-existent file would return an empty file. (The latter fix relies
on maybe an undocumented property of NLST -- NLST of a file returns
just that file, while NLST of a non-existent file returns nothing. A
side effect, unfortunately, seems to be that now ftp-retrieving an
*empty* directory may fail. Ah well.)
Sjoerd: add separate administration of temporary files created y
URLopener.retrieve() so cleanup can properly remove them. The old
code removed everything in tempcache which was a bad idea if the user
had passed a non-temp file into it. (I added a line to delete the
tempcache in cleanup() -- it still seems to make sense.)
Jack: in basejoin(), interpret relative paths starting in "../". This
is necessary if the server uses symbolic links.
that multiple retrievals using the same connection will work.
This leaves open the more general problem that after
f = urlopen("ftp://...")
f must be closed before another retrieval from the same host should be
attempted.