* Set content-length for simple http server 301s
When http.server.SimpleHTTPRequestHandler sends a 301 (Moved
Permanently) due to a missing file, it does not set a Content-Length
of 0. Unfortunately, certain clients can be left waiting for the
connection to be closed in this circumstance, even though no body
will be sent. At time of writing, both curl and Firefox demonstrate
this behavior.
* Test Content-Length on simple http server redirect
When serving a redirect, the SimpleHTTPRequestHandler will now send
`Content-Length: 0`. Several tests for http.server already cover
various behaviors and checks including redirection. This change only
adds one check for the expected Content-Length on the simplest case
for a redirect.
* Add news entry for SimpleHTTPRequestHandler fix
* Clarify the specific kind of 301
Co-authored-by: Senthil Kumaran <skumaran@gatech.edu>
Fixes http.client potential denial of service where it could get stuck reading lines from a malicious server after a 100 Continue response.
Co-authored-by: Gregory P. Smith <greg@krypto.org>
add:
* `_simple_enum` decorator to transform a normal class into an enum
* `_test_simple_enum` function to compare
* `_old_convert_` to enable checking `_convert_` generated enums
`_simple_enum` takes a normal class and converts it into an enum:
@simple_enum(Enum)
class Color:
RED = 1
GREEN = 2
BLUE = 3
`_old_convert_` works much like` _convert_` does, using the original logic:
# in a test file
import socket, enum
CheckedAddressFamily = enum._old_convert_(
enum.IntEnum, 'AddressFamily', 'socket',
lambda C: C.isupper() and C.startswith('AF_'),
source=_socket,
)
`_test_simple_enum` takes a traditional enum and a simple enum and
compares the two:
# in the REPL or the same module as Color
class CheckedColor(Enum):
RED = 1
GREEN = 2
BLUE = 3
_test_simple_enum(CheckedColor, Color)
_test_simple_enum(CheckedAddressFamily, socket.AddressFamily)
Any important differences will raise a TypeError
add:
_simple_enum decorator to transform a normal class into an enum
_test_simple_enum function to compare
_old_convert_ to enable checking _convert_ generated enums
_simple_enum takes a normal class and converts it into an enum:
@simple_enum(Enum)
class Color:
RED = 1
GREEN = 2
BLUE = 3
_old_convert_ works much like _convert_ does, using the original logic:
# in a test file
import socket, enum
CheckedAddressFamily = enum._old_convert_(
enum.IntEnum, 'AddressFamily', 'socket',
lambda C: C.isupper() and C.startswith('AF_'),
source=_socket,
)
test_simple_enum takes a traditional enum and a simple enum and
compares the two:
# in the REPL or the same module as Color
class CheckedColor(Enum):
RED = 1
GREEN = 2
BLUE = 3
_test_simple_enum(CheckedColor, Color)
_test_simple_enum(CheckedAddressFamily, socket.AddressFamily)
Any important differences will raise a TypeError
We now buffer the CONNECT request + tunnel HTTP headers into a single
send call. This prevents the OS from generating multiple network
packets for connection setup when not necessary, improving efficiency.
I've done the implementation for both non-chunked and chunked reads. I haven't benchmarked chunked reads because I don't currently have a convenient way to generate a high-bandwidth chunked stream, but I don't see any reason that it shouldn't enjoy the same benefits that the non-chunked case does. I've used the benchmark attached to the bpo bug to verify that performance now matches the unsized read case.
Automerge-Triggered-By: @methane
CGIHTTPRequestHandler of http.server now logs the CGI script exit
code, rather than the CGI script exit status of os.waitpid().
For example, if the script is killed by signal 11, it now logs:
"CGI script exit code -11."
The regex http.cookiejar.LOOSE_HTTP_DATE_RE was vulnerable to regular
expression denial of service (REDoS).
LOOSE_HTTP_DATE_RE.match is called when using http.cookiejar.CookieJar
to parse Set-Cookie headers returned by a server.
Processing a response from a malicious HTTP server can lead to extreme
CPU usage and execution will be blocked for a long time.
The regex contained multiple overlapping \s* capture groups.
Ignoring the ?-optional capture groups the regex could be simplified to
\d+-\w+-\d+(\s*\s*\s*)$
Therefore, a long sequence of spaces can trigger bad performance.
Matching a malicious string such as
LOOSE_HTTP_DATE_RE.match("1-c-1" + (" " * 2000) + "!")
caused catastrophic backtracking.
The fix removes ambiguity about which \s* should match a particular
space.
You can create a malicious server which responds with Set-Cookie headers
to attack all python programs which access it e.g.
from http.server import BaseHTTPRequestHandler, HTTPServer
def make_set_cookie_value(n_spaces):
spaces = " " * n_spaces
expiry = f"1-c-1{spaces}!"
return f"b;Expires={expiry}"
class Handler(BaseHTTPRequestHandler):
def do_GET(self):
self.log_request(204)
self.send_response_only(204) # Don't bother sending Server and Date
n_spaces = (
int(self.path[1:]) # Can GET e.g. /100 to test shorter sequences
if len(self.path) > 1 else
65506 # Max header line length 65536
)
value = make_set_cookie_value(n_spaces)
for i in range(99): # Not necessary, but we can have up to 100 header lines
self.send_header("Set-Cookie", value)
self.end_headers()
if __name__ == "__main__":
HTTPServer(("", 44020), Handler).serve_forever()
This server returns 99 Set-Cookie headers. Each has 65506 spaces.
Extracting the cookies will pretty much never complete.
Vulnerable client using the example at the bottom of
https://docs.python.org/3/library/http.cookiejar.html :
import http.cookiejar, urllib.request
cj = http.cookiejar.CookieJar()
opener = urllib.request.build_opener(urllib.request.HTTPCookieProcessor(cj))
r = opener.open("http://localhost:44020/")
The popular requests library was also vulnerable without any additional
options (as it uses http.cookiejar by default):
import requests
requests.get("http://localhost:44020/")
* Regression test for http.cookiejar REDoS
If we regress, this test will take a very long time.
* Improve performance of http.cookiejar.ISO_DATE_RE
A string like
"444444" + (" " * 2000) + "A"
could cause poor performance due to the 2 overlapping \s* groups,
although this is not as serious as the REDoS in LOOSE_HTTP_DATE_RE was.
is_cgi() function of http.server library does not currently handle a
cgi script if one of the cgi_directories is located at the
sub-directory of given path. Since is_cgi() in CGIHTTPRequestHandler
class separates given path into (dir, rest) based on the first seen
'/', multi-level directories like /sub/dir/cgi-bin/hello.py is divided
into head=/sub, rest=dir/cgi-bin/hello.py then check whether '/sub'
exists in cgi_directories = [..., '/sub/dir/cgi-bin'].
This patch makes the is_cgi() keep expanding dir part to the next '/'
then checking if that expanded path exists in the cgi_directories.
Signed-off-by: Siwon Kang <kkangshawn@gmail.com>
https://bugs.python.org/issue38863
* bpo-38216: Allow bypassing input validation
* bpo-36274: Also allow the URL encoding to be overridden.
* bpo-38216, bpo-36274: Add tests demonstrating a hook for overriding validation, test demonstrating override encoding, and a test to capture expectation of the interface for the URL.
* Call with skip_host to avoid tripping on the host checking in the URL.
* Remove obsolete comment.
* Make _prepare_path_encoding its own attr.
This makes overriding just that simpler.
Also, don't use the := operator to make backporting easier.
* Add a news entry.
* _prepare_path_encoding -> _encode_prepared_path()
* Once again separate the path validation and request encoding, drastically simplifying the behavior. Drop the guarantee that all processing happens in _prepare_path.
Handle time comparison for cookies with `expires` attribute when `CookieJar.make_cookies` is called.
Co-authored-by: Demian Brecht <demianbrecht@gmail.com>
https://bugs.python.org/issue12144
Automerge-Triggered-By: @asvetlov
Disallow control chars in http URLs in urllib.urlopen. This addresses a potential security problem for applications that do not sanity check their URLs where http request headers could be injected.
* Refactor cookie path check as per RFC 6265
* Add tests for prefix match of path
* Add news entry
* Fix set_ok_path and refactor tests
* Use slice for last letter
Don't send cookies of domain A without Domain attribute to domain B when domain A is a suffix match of domain B while using a cookiejar with `http.cookiejar.DefaultCookiePolicy` policy. Patch by Karthikeyan Singaravelan.
In http.server script, rely on getaddrinfo to bind to preferred address based on the bind parameter.
As a result, now IPv6 is used as the default (including IPv4 on dual-stack systems). Enhanced tests.
AIX allows a trailing slash on local file system paths, which isn't what we want
in http.server. Accordingly, check explicitly for this case in the server code,
rather than relying on the OS raising an exception.
Patch by Michael Felt.
with debuglevel=1 only the header keys got printed. With
this change the header values get printed as well and the single
header entries get '\n' as a separator.