The change to use the newer httplib interface admitted the possibility
that we'd get an HTTP/1.1 chunked response, but the code didn't handle
it correctly. The raw socket object can't be pass to addinfourl(),
because it would read the undecoded response. Instead, addinfourl()
must call HTTPResponse.read(), which will handle the decoding.
One extra wrinkle is that the HTTPReponse object can't be passed to
addinfourl() either, because it doesn't implement readline() or
readlines(). As a quick hack, use socket._fileobject(), which
implements those methods on top of a read buffer. (suggested by mwh)
Finally, add some tests based on test_urllibnet.
Thanks to Andrew Sawyers for originally reporting the chunked problem.
Invoke the standard error handlers for non-200 responses.
Always supply a "Connection: close" header to prevent the server from
leaving the connection open. Downstream users of the socket may
attempt recv()/read() with no arguments, which would block if the
connection were kept open.
The chief benefit of this change is that requests will now use
HTTP/1.1 instead of HTTP/1.0. Bump the module version number as part
of the change.
There are two possible incompatibilities that we'll need to watch out
for when we get to an alpha release. We may get a different class of
exceptions out of httplib, and the do_open() method changed its
signature. The latter is only important if anyone actually subclasses
AbstractHTTPHandler.
John J. Lee writes: "the patch makes it possible to implement
functionality like HTTP cookie handling, Refresh handling,
etc. etc. using handler objects. At the moment urllib2's handler
objects aren't quite up to the job, which results in a lot of
cut-n-paste and subclassing. I believe the changes are
backwards-compatible, with the exception of people who've
reimplemented build_opener()'s functionality -- those people would
need to call opener.add_handler(HTTPErrorProcessor).
The main change is allowing handlers to implement
methods like:
http_request(request)
http_response(request, response)
In addition to the usual
http_open(request)
http_error{_*}(...)
"
Note that the change isn't well documented at least in part because
handlers aren't well documented at all. Need to fix this.
Add a bunch of new tests. It appears that none of these tests
actually use the network, so they don't need to be guarded by a
resource flag.
The patch was tweaked slightly. It's get a different mechanism for
generating the cnonce which uses /dev/urandom when possible to
generate less-easily-guessed random input.
Also rearrange the imports so that they are alphabetical and
duplicates are eliminated.
Add a few XXX comments about things left undone and things that could
be improved.
have to insert it in front of other classes, nor do dirty tricks like
inserting a "dummy" HTTPHandler after a ProxyHandler when building an
opener with proxy support.
The latest changes to the redirect handler couldn't possibly have been
tested, because they did not compute a newurl and failed with a
NameError. The __name__ == "__main__": block has a test for
redirects.
Also, fix SF bug 723831. A urlopen() that failed because the host was
not found raised a socket.gaierror unlike earlier versions of
urllib2. The problem is that httplib actually establishes the
connection at a different point starting with Python 2.2. Move the
try/except to endheaders(), which is where the connection gets
established.
The HTTPError class tries to act as a regular response objects for
HTTP protocol errors that include full responses. If the failure is
more basic, like no valid response, the __init__ choked when it tried
to initialize its superclasses in addinfourl hierarchy that requires a
valid response.
The solution isn't elegant but seems to be effective. Do not
initialize the base classes if there isn't a file object containing
the response. In this case, user code expecting to use the addinfourl
methods will fail; but it was going to fail anyway.
It might be cleaner to factor out HTTPError into two classes, only one
of which inherits from addinfourl. Not sure that the extra complexity
would lead to any improved functionality, though.
Partial fix for SF bug # 563665.
Bug fix candidate for 2.1 and 2.2.
Fix contributed by Jeffrey C. Ollie.
I haven't tested the fix because the situation is non-trivial to
reproduce.
The basic solution is to get rid of the __current_realm attribute of
authentication handlers. Instead, prevent infinite retries by
checking for the presence of an Authenticate: header in the request
object that exactly matches the Authenticate: header that would be
added.
The problem prevent authentication from working correctly in the
presence of retries.
Ollie mentioned that digest authentication has the same problem and I
applied the same solution there.
When checking for strings use,
! if isinstance(uri, (types.StringType, types.UnicodeType)):
Also get rid of some dodgy code that tried to guess whether attributes
were callable or not.
Modify rfc822.formatdate() to always generate English names,
regardless of locale. This is required by RFC 1123.
In open_local_file() of urllib and urllib2, use new formatdate() from
rfc822.
Also fix another bug caught by pychecker-- HTTPError() raised when
redirect limit exceed did not pass an fp object. Had to change method
to keep fp object around until it's certain that the error won't be
raised.
Remove useless line in do_proxy().
- don't close the fp, since that appears to also close the socket
- join the original url with the redirect reponse to deal with
relative redirect URL
wrap two socket ops in try/except to turn them into URLErrors, so that
client code need only catch one exception.
in HTTPError.__del__ only close fp if fp is not None
style changes:
- use f(*args) instead of apply(f, args)
- use __super_init instead of super.__init__(self, ...)
comments, docstrings or error messages. I fixed two minor things in
test_winreg.py ("didn't" -> "Didn't" and "Didnt" -> "Didn't").
There is a minor style issue involved: Guido seems to have preferred English
grammar (behaviour, honour) in a couple places. This patch changes that to
American, which is the more prominent style in the source. I prefer English
myself, so if English is preferred, I'd be happy to supply a patch myself ;)
The attached patches update the standard library so that all modules
have docstrings beginning with one-line summaries.
A new docstring was added to formatter. The docstring for os.py
was updated to mention nt, os2, ce in addition to posix, dos, mac.