In response to "shouldn't the client close the file?", the answer is
"no". The original design behind HTTPConnection is that the client did
not have to worry about it. The response would close itself when you
read the last of the data from it. This closing also dealt with
allowing the connection to perform another request/response (if it was
a persistent connection).
However... the auto-close behavior broke compatibility with the
classic httplib.HTTP class' behavior when a zero-length response body
was present. In that situation, the HTTPResponse object was
auto-closing it since there was no data present, and for an HTTP/1.0
connection-close socket (or an HTTP/0.9 request) connection, that also
ended up closing the socket. When an httplib.HTTP user went to read
the socket... boom. A patch to correct the auto-close (for compat with
old httplib users) was added in rev 1.22.
But for non-zero-length *chunked* bodies, we should keep the
auto-close behavior. The library user is not reading the socket (they
can't cuz of the chunked response we just got done handling), so they
should be immune to the response closing the socket. In fact, I would
like to see (one day) the auto-close restored, and the HTTP subclass
would simply have a flag to disable that behavior (for back-compat
purposes).
* Replaced "while 1" with "while True"
* Rewrote read() and readline() for clarity and speed.
* Replaced variable 'list' with 'hlist'
* Used augmented assignment in two places.
The implementation now stores all the lines of the request in a buffer
and makes a single send() call when the request is finished,
specifically when endheaders() is called.
This appears to improve performance. The old code called send() for
each line. The sends are all short, so they caused bad interactions
with the Nagle algorithm and delayed acknowledgements. In simple
tests, the second packet was delayed by 100s of ms. The second send was
delayed by the Nagle algorithm, waiting for the ack. The delayed ack
strategy delays the ack in hopes of piggybacking it on a data packet,
but the server won't send any data until it receives the complete
request.
This change minimizes the problem that Nagle + delayed ack will cause
a problem, although a request large enough to be broken into two
packets will still suffer some delay. Luckily the MSS is large enough
to accomodate most single packets.
XXX Bug fix candidate?
The recent SSL changes resulted in important, but subtle changes to
close() semantics. Since builtin socket makefile() is not called for
SSL connections, we don't get separately closeable fds for connection
and response. Comments in the code explain how to restore makefile
semantics.
Bug fix candidate.
If multiple header fields with the same name occur, they are combined
according to the rules in RFC 2616 sec 4.2:
Appending each subsequent field-value to the first, each separated by
a comma. The order in which header fields with the same field-name are
received is significant to the interpretation of the combined field
value.
Section 19.6 of RFC 2616 (HTTP/1.1):
It is beyond the scope of a protocol specification to mandate
compliance with previous versions. HTTP/1.1 was deliberately
designed, however, to make supporting previous versions easy....
And we would expect HTTP/1.1 clients to:
- recognize the format of the Status-Line for HTTP/1.0 and 1.1
responses;
- understand any valid response in the format of HTTP/0.9, 1.0, or
1.1.
The changes to the code do handle response in the format of HTTP/0.9.
Some users may consider this a bug because all responses with a
sufficiently corrupted status line will look like an HTTP/0.9
response. These users can pass strict=1 to the HTTP constructors to
get a BadStatusLine exception instead.
While this is a new feature of sorts, it enhances the robustness of
the code (be tolerant in what you accept). Thus, I consider it a bug
fix candidate.
XXX strict needs to be documented.
The HTTPResponse class now handles 100 continue responses, instead of
choking on them. It detects them internally in the _begin() method
and ignores them. Based on a patch by Bob Kline.
This closes SF bugs 498149 and 551273.
The FakeSocket class (for SSL) is now usable with HTTP/1.1
connections. The old version of the code could not work with
persistent connections, because the makefile() implementation read
until EOF before returning. If the connection is persistent, the
server sends a response and leaves the connection open. A client that
reads until EOF will block until the server gives up on the connection
-- more than a minute in my test case.
The problem was fixed by implementing a reasonable makefile(). It
reads data only when it is needed by the layers above it. It's
implementation uses an internal buffer with a default size of 8192.
Also, rename begin() method of HTTPResponse to _begin() because it
should only be called by the HTTPConnection.
FakeSocket class. Without it, the sendall() call got the method on
the underlying socket object, and that messed up SSL.
Does httplib use other methods of sockets that FakeSocket doesn't support?
Someone should take a look... (I'll try to give it a once-over.)
2.2.1 bugfix candidate.
In August, Greg said this looked good, so I'm going ahead with it.
The fix is different from the one in the bug report. Instead of using
a regular expression to extract the host from the url, I use
urlparse.urlsplit.
Martin commented that the patch doesn't address URLs that have basic
authentication username and password in the header. I don't see any
code anywhere in httplib that supports this feature, so I'm not going
to address it for this fix.
Bug fix candidate.
Try to be systematic about dealing with socket and ssl exceptions in
FakeSocket.makefile(). The previous version of the code caught all
ssl errors and treated them as EOF, even though most of the errors
don't mean EOF.
An SSL error can mean on of three things:
1. The SSL/TLS connection was closed.
2. The operation should be retried.
3. An error occurred.
Also, if a socket error occurred and the error was EINTR, retry the
call. Otherwise, it was a legitimate error and the caller should
receive the exception.
For the HTTPS class (when available), ensure that the x509 certificate data
gets passed through to the HTTPSConnection class. Create a new
HTTPS.__init__ to do this, and refactor the HTTP.__init__ into a new _setup
method for both init's to call.
Note: this is solved differently from the patch, which advocated a new
**x509 parameter on the base HTTPConnection class. But that would open
HTTPConnection to arbitrary (ignored) parameters, so was not as desirable.
Header
Dougfort's comments: httplib does not include ':port ' in the HTTP 1.1
'Host:' header. This causes problems if the server is not listening
on Port 80. The test case I use is the login to /manage under Zope,
with Zope listening on port 8080. Zope returns a <frameset> with the
<frame> source URLs lacking the :8080.
The ASCII-art diagram at the top of httplib contains a backslash at
the end of a line, which causes Python to remove the newline. This
one-character patch adds a space after the backslash so it will
appear at the end of the line in the docstring as intended.
socket in httplib.py.
The bug reports that on Windows, you must pass sock._sock to the
socket.ssl() call. But on Unix, you must pass sock itself. (sock is
a wrapper on Windows but not on Unix; the ssl() call wants the real
socket object, not the wrapper.)
So we see if sock has an _sock attribute and if so, extract it.
Unfortunately, the submitter of the bug didn't confirm that this patch
works, so I'll just have to believe it (can't test it myself since I
don't have OpenSSL on Windows set up, and that's a nontrivial thing I
believe).
interface consistent: The client is responsible for closing the
socket, regardless of the amount of data received.
Restore suport for set_debuglevel call.
Modify HTTP to use delegation instead of inheritance. The
_connection_class attribute of the class defines what class to
delegate to. The HTTPS class is a subclass of HTTP that redefines
_connection_class.
Brian E Gallew, which were improved and adapted to OpenSSL 0.9.4 by
Laszlo Kovacs of HP. Both have kindly given permission to include
the patches in the Python distribution. Final formatting by GvR.
(1) No longer close self.sock; close it on close(). (Guido)
(2) Don't use regular expressions for what can be done simply with
string.split() -- regex is thread unsafe. (Jeremy)
(3) Delete unused imports. (Jeremy)