bpo-38630: Fix subprocess.Popen.send_signal() race condition (GH-16984)

On Unix, subprocess.Popen.send_signal() now polls the process status. Polling reduces the risk of sending a signal to the wrong process if the process completed, the Popen.returncode attribute is still None, and the pid has been reassigned (recycled) to a new different process.
2020-01-15 17:38:55 +01:00 · 2020-01-15 17:38:55 +01:00 · e85a305503
parent ed154c387e
commit e85a305503
4 changed files with 57 additions and 3 deletions
--- a/Doc/library/subprocess.rst
+++ b/Doc/library/subprocess.rst
@ -774,6 +774,8 @@ Instances of the :class:`Popen` class have the following methods:

   Sends the signal *signal* to the child.

+   Do nothing if the process completed.
+
   .. note::

      On Windows, SIGTERM is an alias for :meth:`terminate`. CTRL_C_EVENT and
--- a/Lib/subprocess.py
+++ b/Lib/subprocess.py
@ -2061,8 +2061,30 @@ class Popen(object):

        def send_signal(self, sig):
            """Send a signal to the process."""
+            # bpo-38630: Polling reduces the risk of sending a signal to the
+            # wrong process if the process completed, the Popen.returncode
+            # attribute is still None, and the pid has been reassigned
+            # (recycled) to a new different process. This race condition can
+            # happens in two cases.
+            #
+            # Case 1. Thread A calls Popen.poll(), thread B calls
+            # Popen.send_signal(). In thread A, waitpid() succeed and returns
+            # the exit status. Thread B calls kill() because poll() in thread A
+            # did not set returncode yet. Calling poll() in thread B prevents
+            # the race condition thanks to Popen._waitpid_lock.
+            #
+            # Case 2. waitpid(pid, 0) has been called directly, without
+            # using Popen methods: returncode is still None is this case.
+            # Calling Popen.poll() will set returncode to a default value,
+            # since waitpid() fails with ProcessLookupError.
+            self.poll()
+            if self.returncode is not None:
                # Skip signalling a process that we know has already died.
-            if self.returncode is None:
+                return
+
+            # The race condition can still happen if the race condition
+            # described above happens between the returncode test
+            # and the kill() call.
            os.kill(self.pid, sig)

        def terminate(self):
--- a/Lib/test/test_subprocess.py
+++ b/Lib/test/test_subprocess.py
@ -3120,6 +3120,31 @@ class POSIXProcessTestCase(BaseTestCase):

        self.assertEqual(returncode, -3)

+    def test_send_signal_race(self):
+        # bpo-38630: send_signal() must poll the process exit status to reduce
+        # the risk of sending the signal to the wrong process.
+        proc = subprocess.Popen(ZERO_RETURN_CMD)
+
+        # wait until the process completes without using the Popen APIs.
+        pid, status = os.waitpid(proc.pid, 0)
+        self.assertEqual(pid, proc.pid)
+        self.assertTrue(os.WIFEXITED(status), status)
+        self.assertEqual(os.WEXITSTATUS(status), 0)
+
+        # returncode is still None but the process completed.
+        self.assertIsNone(proc.returncode)
+
+        with mock.patch("os.kill") as mock_kill:
+            proc.send_signal(signal.SIGTERM)
+
+        # send_signal() didn't call os.kill() since the process already
+        # completed.
+        mock_kill.assert_not_called()
+
+        # Don't check the returncode value: the test reads the exit status,
+        # so Popen failed to read it and uses a default returncode instead.
+        self.assertIsNotNone(proc.returncode)
+

@unittest.skipUnless(mswindows, "Windows specific tests")
 class Win32ProcessTestCase(BaseTestCase):
--- a/Misc/NEWS.d/next/Library/2019-10-29-12-21-10.bpo-38630.Dv6Xrm.rst
+++ b/Misc/NEWS.d/next/Library/2019-10-29-12-21-10.bpo-38630.Dv6Xrm.rst
@ -0,0 +1,5 @@
+On Unix, :meth:`subprocess.Popen.send_signal` now polls the process status.
+Polling reduces the risk of sending a signal to the wrong process if the
+process completed, the :attr:`subprocess.Popen.returncode` attribute is still
+``None``, and the pid has been reassigned (recycled) to a new different
+process.