Issue #12319: Always send file request bodies using chunked encoding

The previous attempt to determine the file’s Content-Length gave a false
positive for pipes on Windows.

Also, drop the special case for sending zero-length iterable bodies.
This commit is contained in:
Martin Panter 2016-08-27 01:39:26 +00:00
parent 8f96a30630
commit ef91bb2660
7 changed files with 96 additions and 82 deletions

View File

@ -240,17 +240,17 @@ HTTPConnection Objects
The *headers* argument should be a mapping of extra HTTP headers to send
with the request.
If *headers* contains neither Content-Length nor Transfer-Encoding, a
Content-Length header will be added automatically if possible. If
If *headers* contains neither Content-Length nor Transfer-Encoding,
but there is a request body, one of those
header fields will be added automatically. If
*body* is ``None``, the Content-Length header is set to ``0`` for
methods that expect a body (``PUT``, ``POST``, and ``PATCH``). If
*body* is a string or bytes-like object, the Content-Length header is
set to its length. If *body* is a binary :term:`file object`
supporting :meth:`~io.IOBase.seek`, this will be used to determine
its size. Otherwise, the Content-Length header is not added
automatically. In cases where determining the Content-Length up
front is not possible, the body will be chunk-encoded and the
Transfer-Encoding header will automatically be set.
*body* is a string or a bytes-like object that is not also a
:term:`file <file object>`, the Content-Length header is
set to its length. Any other type of *body* (files
and iterables in general) will be chunk-encoded, and the
Transfer-Encoding header will automatically be set instead of
Content-Length.
The *encode_chunked* argument is only relevant if Transfer-Encoding is
specified in *headers*. If *encode_chunked* is ``False``, the
@ -260,19 +260,18 @@ HTTPConnection Objects
.. note::
Chunked transfer encoding has been added to the HTTP protocol
version 1.1. Unless the HTTP server is known to handle HTTP 1.1,
the caller must either specify the Content-Length or must use a
body representation whose length can be determined automatically.
the caller must either specify the Content-Length, or must pass a
:class:`str` or bytes-like object that is not also a file as the
body representation.
.. versionadded:: 3.2
*body* can now be an iterable.
.. versionchanged:: 3.6
If neither Content-Length nor Transfer-Encoding are set in
*headers* and Content-Length cannot be determined, *body* will now
be automatically chunk-encoded. The *encode_chunked* argument
was added.
The Content-Length for binary file objects is determined with seek.
No attempt is made to determine the Content-Length for text file
*headers*, file and iterable *body* objects are now chunk-encoded.
The *encode_chunked* argument was added.
No attempt is made to determine the Content-Length for file
objects.
.. method:: HTTPConnection.getresponse()

View File

@ -187,12 +187,11 @@ The following classes are provided:
server, or ``None`` if no such data is needed. Currently HTTP
requests are the only ones that use *data*. The supported object
types include bytes, file-like objects, and iterables. If no
``Content-Length`` header has been provided, :class:`HTTPHandler` will
try to determine the length of *data* and set this header accordingly.
If this fails, ``Transfer-Encoding: chunked`` as specified in
:rfc:`7230`, Section 3.3.1 will be used to send the data. See
:meth:`http.client.HTTPConnection.request` for details on the
supported object types and on how the content length is determined.
``Content-Length`` nor ``Transfer-Encoding`` header field
has been provided, :class:`HTTPHandler` will set these headers according
to the type of *data*. ``Content-Length`` will be used to send
bytes objects, while ``Transfer-Encoding: chunked`` as specified in
:rfc:`7230`, Section 3.3.1 will be used to send files and other iterables.
For an HTTP POST request method, *data* should be a buffer in the
standard :mimetype:`application/x-www-form-urlencoded` format. The
@ -256,8 +255,8 @@ The following classes are provided:
.. versionchanged:: 3.6
Do not raise an error if the ``Content-Length`` has not been
provided and could not be determined. Fall back to use chunked
transfer encoding instead.
provided and *data* is neither ``None`` nor a bytes object.
Fall back to use chunked transfer encoding instead.
.. class:: OpenerDirector()

View File

@ -579,8 +579,8 @@ The :class:`~unittest.mock.Mock` class has the following improvements:
urllib.request
--------------
If a HTTP request has a non-empty body but no Content-Length header
and the content length cannot be determined up front, rather than
If a HTTP request has a file or iterable body (other than a
bytes object) but no Content-Length header, rather than
throwing an error, :class:`~urllib.request.AbstractHTTPHandler` now
falls back to use chunked transfer encoding.
(Contributed by Demian Brecht and Rolf Krahl in :issue:`12319`.)
@ -935,6 +935,13 @@ Changes in the Python API
This behavior has also been backported to earlier Python versions
by Setuptools 26.0.0.
* In the :mod:`urllib.request` module and the
:meth:`http.client.HTTPConnection.request` method, if no Content-Length
header field has been specified and the request body is a file object,
it is now sent with HTTP 1.1 chunked encoding. If a file object has to
be sent to a HTTP 1.0 server, the Content-Length value now has to be
specified by the caller. See :issue:`12319`.
Changes in the C API
--------------------

View File

@ -805,35 +805,21 @@ class HTTPConnection:
def _get_content_length(body, method):
"""Get the content-length based on the body.
If the body is "empty", we set Content-Length: 0 for methods
that expect a body (RFC 7230, Section 3.3.2). If the body is
set for other methods, we set the header provided we can
figure out what the length is.
If the body is None, we set Content-Length: 0 for methods that expect
a body (RFC 7230, Section 3.3.2). We also set the Content-Length for
any method if the body is a str or bytes-like object and not a file.
"""
if not body:
if body is None:
# do an explicit check for not None here to distinguish
# between unset and set but empty
if method.upper() in _METHODS_EXPECTING_BODY or body is not None:
if method.upper() in _METHODS_EXPECTING_BODY:
return 0
else:
return None
if hasattr(body, 'read'):
# file-like object.
if HTTPConnection._is_textIO(body):
# text streams are unpredictable because it depends on
# character encoding and line ending translation.
return None
else:
# Is it seekable?
try:
curpos = body.tell()
sz = body.seek(0, io.SEEK_END)
except (TypeError, AttributeError, OSError):
return None
else:
body.seek(curpos)
return sz - curpos
return None
try:
# does it implement the buffer protocol (bytes, bytearray, array)?
@ -1266,8 +1252,7 @@ class HTTPConnection:
# the caller passes encode_chunked=True or the following
# conditions hold:
# 1. content-length has not been explicitly set
# 2. the length of the body cannot be determined
# (e.g. it is a generator or unseekable file)
# 2. the body is a file or iterable, but not a str or bytes-like
# 3. Transfer-Encoding has NOT been explicitly set by the caller
if 'content-length' not in header_names:
@ -1280,7 +1265,7 @@ class HTTPConnection:
encode_chunked = False
content_length = self._get_content_length(body, method)
if content_length is None:
if body:
if body is not None:
if self.debuglevel > 0:
print('Unable to determine size of %r' % body)
encode_chunked = True

View File

@ -381,6 +381,16 @@ class TransferEncodingTest(TestCase):
# same request
self.assertNotIn('content-length', [k.lower() for k in headers])
def test_empty_body(self):
# Zero-length iterable should be treated like any other iterable
conn = client.HTTPConnection('example.com')
conn.sock = FakeSocket(b'')
conn.request('POST', '/', ())
_, headers, body = self._parse_request(conn.sock.data)
self.assertEqual(headers['Transfer-Encoding'], 'chunked')
self.assertNotIn('content-length', [k.lower() for k in headers])
self.assertEqual(body, b"0\r\n\r\n")
def _make_body(self, empty_lines=False):
lines = self.expected_body.split(b' ')
for idx, line in enumerate(lines):
@ -652,7 +662,9 @@ class BasicTest(TestCase):
def test_send_file(self):
expected = (b'GET /foo HTTP/1.1\r\nHost: example.com\r\n'
b'Accept-Encoding: identity\r\nContent-Length:')
b'Accept-Encoding: identity\r\n'
b'Transfer-Encoding: chunked\r\n'
b'\r\n')
with open(__file__, 'rb') as body:
conn = client.HTTPConnection('example.com')
@ -1717,7 +1729,7 @@ class RequestBodyTest(TestCase):
self.assertEqual("5", message.get("content-length"))
self.assertEqual(b'body\xc1', f.read())
def test_file_body(self):
def test_text_file_body(self):
self.addCleanup(support.unlink, support.TESTFN)
with open(support.TESTFN, "w") as f:
f.write("body")
@ -1726,10 +1738,8 @@ class RequestBodyTest(TestCase):
message, f = self.get_headers_and_fp()
self.assertEqual("text/plain", message.get_content_type())
self.assertIsNone(message.get_charset())
# Note that the length of text files is unpredictable
# because it depends on character encoding and line ending
# translation. No content-length will be set, the body
# will be sent using chunked transfer encoding.
# No content-length will be determined for files; the body
# will be sent using chunked transfer encoding instead.
self.assertIsNone(message.get("content-length"))
self.assertEqual("chunked", message.get("transfer-encoding"))
self.assertEqual(b'4\r\nbody\r\n0\r\n\r\n', f.read())
@ -1743,8 +1753,9 @@ class RequestBodyTest(TestCase):
message, f = self.get_headers_and_fp()
self.assertEqual("text/plain", message.get_content_type())
self.assertIsNone(message.get_charset())
self.assertEqual("5", message.get("content-length"))
self.assertEqual(b'body\xc1', f.read())
self.assertEqual("chunked", message.get("Transfer-Encoding"))
self.assertNotIn("Content-Length", message)
self.assertEqual(b'5\r\nbody\xc1\r\n0\r\n\r\n', f.read())
class HTTPResponseTest(TestCase):

View File

@ -913,40 +913,50 @@ class HandlerTests(unittest.TestCase):
self.assertEqual(req.unredirected_hdrs["Spam"], "foo")
def test_http_body_file(self):
# A regular file - Content Length is calculated unless already set.
# A regular file - chunked encoding is used unless Content Length is
# already set.
h = urllib.request.AbstractHTTPHandler()
o = h.parent = MockOpener()
file_obj = tempfile.NamedTemporaryFile(mode='w+b', delete=False)
file_path = file_obj.name
file_obj.write(b"Something\nSomething\nSomething\n")
file_obj.close()
self.addCleanup(os.unlink, file_path)
for headers in {}, {"Content-Length": 30}:
with open(file_path, "rb") as f:
req = Request("http://example.com/", f, headers)
newreq = h.do_request_(req)
self.assertEqual(int(newreq.get_header('Content-length')), 30)
with open(file_path, "rb") as f:
req = Request("http://example.com/", f, {})
newreq = h.do_request_(req)
te = newreq.get_header('Transfer-encoding')
self.assertEqual(te, "chunked")
self.assertFalse(newreq.has_header('Content-length'))
os.unlink(file_path)
with open(file_path, "rb") as f:
req = Request("http://example.com/", f, {"Content-Length": 30})
newreq = h.do_request_(req)
self.assertEqual(int(newreq.get_header('Content-length')), 30)
self.assertFalse(newreq.has_header("Transfer-encoding"))
def test_http_body_fileobj(self):
# A file object - Content Length is calculated unless already set.
# A file object - chunked encoding is used
# unless Content Length is already set.
# (Note that there are some subtle differences to a regular
# file, that is why we are testing both cases.)
h = urllib.request.AbstractHTTPHandler()
o = h.parent = MockOpener()
file_obj = io.BytesIO()
file_obj.write(b"Something\nSomething\nSomething\n")
for headers in {}, {"Content-Length": 30}:
file_obj.seek(0)
req = Request("http://example.com/", file_obj, headers)
newreq = h.do_request_(req)
self.assertEqual(int(newreq.get_header('Content-length')), 30)
req = Request("http://example.com/", file_obj, {})
newreq = h.do_request_(req)
self.assertEqual(newreq.get_header('Transfer-encoding'), 'chunked')
self.assertFalse(newreq.has_header('Content-length'))
headers = {"Content-Length": 30}
req = Request("http://example.com/", file_obj, headers)
newreq = h.do_request_(req)
self.assertEqual(int(newreq.get_header('Content-length')), 30)
self.assertFalse(newreq.has_header("Transfer-encoding"))
file_obj.close()
@ -959,9 +969,7 @@ class HandlerTests(unittest.TestCase):
h = urllib.request.AbstractHTTPHandler()
o = h.parent = MockOpener()
cmd = [sys.executable, "-c",
r"import sys; "
r"sys.stdout.buffer.write(b'Something\nSomething\nSomething\n')"]
cmd = [sys.executable, "-c", r"pass"]
for headers in {}, {"Content-Length": 30}:
with subprocess.Popen(cmd, stdout=subprocess.PIPE) as proc:
req = Request("http://example.com/", proc.stdout, headers)
@ -983,8 +991,6 @@ class HandlerTests(unittest.TestCase):
def iterable_body():
yield b"one"
yield b"two"
yield b"three"
for headers in {}, {"Content-Length": 11}:
req = Request("http://example.com/", iterable_body(), headers)
@ -996,6 +1002,14 @@ class HandlerTests(unittest.TestCase):
else:
self.assertEqual(int(newreq.get_header('Content-length')), 11)
def test_http_body_empty_seq(self):
# Zero-length iterable body should be treated like any other iterable
h = urllib.request.AbstractHTTPHandler()
h.parent = MockOpener()
req = h.do_request_(Request("http://example.com/", ()))
self.assertEqual(req.get_header("Transfer-encoding"), "chunked")
self.assertFalse(req.has_header("Content-length"))
def test_http_body_array(self):
# array.array Iterable - Content Length is calculated

View File

@ -52,10 +52,9 @@ Library
- Issue #12319: Chunked transfer encoding support added to
http.client.HTTPConnection requests. The
urllib.request.AbstractHTTPHandler class does not enforce a Content-Length
header any more. If a HTTP request has a non-empty body, but no
Content-Length header, and the content length cannot be determined
up front, rather than throwing an error, the library now falls back
to use chunked transfer encoding.
header any more. If a HTTP request has a file or iterable body, but no
Content-Length header, the library now falls back to use chunked transfer-
encoding.
- A new version of typing.py from https://github.com/python/typing:
- Collection (only for 3.6) (Issue #27598)