This adds authentication to the forkserver control socket. In the past only filesystem permissions protected this socket from code injection into the forkserver process by limiting access to the same UID, which didn't exist when Linux abstract namespace sockets were used (see issue) meaning that any process in the same system network namespace could inject code. We've since stopped using abstract namespace sockets by default, but protecting our control sockets regardless of type is a good idea.
This reuses the HMAC based shared key auth already used by `multiprocessing.connection` sockets for other purposes.
Doing this is useful so that filesystem permissions are not relied upon and trust isn't implied by default between all processes running as the same UID with access to the unix socket.
### pyperformance benchmarks
No significant changes. Including `concurrent_imap` which exercises `multiprocessing.Pool.imap` in that suite.
### Microbenchmarks
This does _slightly_ slow down forkserver use. How much so appears to depend on the platform. Modern platforms and simple platforms are less impacted. This PR adds additional IPC round trips to the control socket to tell forkserver to spawn a new process. Systems with potentially high latency IPC are naturally impacted more.
Typically a 1-4% slowdown on a very targeted process creation microbenchmark, with a worst case overloaded system slowdown of 20%. No evidence that these slowdowns appear in practical sense. See the PR for details.
* gh-71936: Fix race condition in multiprocessing.Pool
Proxes of shared objects register a Finalizer in BaseProxy._incref(), and it
will call BaseProxy._decref() when it is GCed. This may cause a race condition
with Pool(maxtasksperchild=None) on Windows.
A connection would be closed and raised TypeError when a GC occurs between
_ConnectionBase._check_writable() and _ConnectionBase._send_bytes() in
_ConnectionBase.send() in the second or later task, and a new object
is allocated that shares the id() of a previously deleted one.
Instead of using the id() of the token (or the proxy), use a unique,
non-reusable number.
Co-Authored-By: Akinori Hattori <hattya@gmail.com>
gh-117378: Fix multiprocessing forkserver preload sys.path inheritance.
`sys.path` was not properly being sent from the parent process when launching
the multiprocessing forkserver process to preload imports. This bug has been
there since the forkserver start method was introduced in Python 3.4. It was
always _supposed_ to inherit `sys.path` the same way the spawn method does.
Observable behavior change: A `''` value in `sys.path` will now be replaced in
the forkserver's `sys.path` with an absolute pathname
`os.path.abspath(os.getcwd())` saved at the time that `multiprocessing` was
imported in the parent process as it already was when using the spawn start
method. **This will only be observable during forkserver preload imports**.
The code invoked before calling things in another process already correctly sets `sys.path`.
Which is likely why this went unnoticed for so long as a mere performance issue in
some configurations.
A workaround for the bug on impacted Pythons is to set PYTHONPATH in the
environment before multiprocessing's forkserver process was started. Not perfect
as that is then inherited by other children, etc, but likely good enough for many
people's purposes.
Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
Change the default multiprocessing start method away from fork to forkserver or spawn on the remaining platforms where it was fork. See the issue for context. This makes the default far more thread safe (other than for people spawning threads at import time... - don't do that!).
Co-authored-by: blurb-it[bot] <43283697+blurb-it[bot]@users.noreply.github.com>
Co-authored-by: Hugo van Kemenade <1324225+hugovk@users.noreply.github.com>
Increases the multiprocessing connection buffer size from 8k to 64k for efficiency, without overallocating.
Co-authored-by: Bénédikt Tran <10796600+picnixz@users.noreply.github.com>
Co-authored-by: Victor Stinner <vstinner@python.org>
See https://github.com/python/cpython/issues/121313 for analysis, but this greatly reduces memory overallocation and overhead when multiprocessing is sending non-small data over its pipes between processes.
This flag was added as an escape hatch in gh-91401 and backported to
Python 3.10. The flag broke at some point between its addition and now.
As there is currently no publicly known environments that require this,
remove it rather than work on fixing it.
This leaves the flag in the subprocess module to not break code which
may have used / checked the flag itself.
discussion: https://discuss.python.org/t/subprocess-use-vfork-escape-hatch-broken-fix-or-remove/56915/2
Make error message for index() methods consistent
Remove the repr of the searched value (which can be arbitrary large)
from ValueError messages for list.index(), range.index(), deque.index(),
deque.remove() and ShareableList.index(). Make the error messages
consistent with error messages for other index() and remove()
methods.
Tools such as ruff can ignore "imported but unused" warnings if a
line ends with "# noqa: F401". It avoids the temptation to remove
an import which is used effectively.
This builds on https://github.com/python/cpython/pull/106807, which adds
a return code to ResourceTracker, to make future debugging easier.
Testing this “in situ” proved difficult, since the global ResourceTracker is
involved in test infrastructure. So, the tests here create a new instance and
feed it fake data.
---------
Co-authored-by: Yonatan Bitton <yonatan.bitton@perception-point.io>
Co-authored-by: Yonatan Bitton <bityob@gmail.com>
Co-authored-by: Antoine Pitrou <antoine@python.org>
We add _winapi.BatchedWaitForMultipleObjects to wait for larger numbers of handles.
This is an internal module, hence undocumented, and should be used with caution.
Check the docstring for info before using BatchedWaitForMultipleObjects.
`_multiprocessing` is only used under the `if _winapi:` block, this moves the import to be within the `_winapi` ImportError handling try/except for equivalent treatment.
Increase the backlog for multiprocessing.connection.Listener` objects created
by `multiprocessing.manager` and `multiprocessing.resource_sharer` to
significantly reduce the risk of getting a connection refused error when creating
a `multiprocessing.connection.Connection` to them.
On Windows, Process.terminate() no longer sets the returncode
attribute to always call WaitForSingleObject() in Process.wait().
Previously, sometimes the process was still running after
TerminateProcess() even if GetExitCodeProcess() is not STILL_ACTIVE.
Add a track parameter to shared memory to allow resource tracking via the side-launched resource tracker process to be disabled on platforms that use it (POSIX).
This allows people who do not want automated cleanup at process exit because they are using the shared memory with processes not participating in Python's resource tracking to use the shared_memory API.
Co-authored-by: Łukasz Langa <lukasz@langa.pl>
Co-authored-by: Guido van Rossum <gvanrossum@gmail.com>
Co-authored-by: Antoine Pitrou <pitrou@free.fr>
Co-authored-by: Gregory P. Smith <greg@krypto.org>
Make `multiprocessing.managers.{DictProxy,ListProxy}` generic for type annotation use. `ListProxy[str]` for example.
Co-authored-by: Alex Waygood <Alex.Waygood@Gmail.com>
Co-authored-by: Gregory P. Smith <greg@krypto.org>
concurrent.futures: The *executor manager thread* now catches
exceptions when adding an item to the *call queue*. During Python
finalization, creating a new thread can now raise RuntimeError. Catch
the exception and call terminate_broken() in this case.
Add test_python_finalization_error() to test_concurrent_futures.
concurrent.futures._ExecutorManagerThread changes:
* terminate_broken() no longer calls shutdown_workers() since the
call queue is no longer working anymore (read and write ends of
the queue pipe are closed).
* terminate_broken() now terminates child processes, not only
wait until they complete.
* _ExecutorManagerThread.terminate_broken() now holds shutdown_lock
to prevent race conditons with ProcessPoolExecutor.submit().
multiprocessing.Queue changes:
* Add _terminate_broken() method.
* _start_thread() sets _thread to None on exception to prevent
leaking "dangling threads" even if the thread was not started
yet.
On Windows, multiprocessing Popen.terminate() now catchs
PermissionError and get the process exit code. If the process is
still running, raise again the PermissionError. Otherwise, the
process terminated as expected: store its exit code.
Follow-up of gh-107219.
* Only close the connection writer on Windows.
* Also use existing constant _winapi.ERROR_OPERATION_ABORTED instead of
WSA_OPERATION_ABORTED.
Fix a race condition in concurrent.futures. When a process in the
process pool was terminated abruptly (while the future was running or
pending), close the connection write end. If the call queue is
blocked on sending bytes to a worker process, closing the connection
write end interrupts the send, so the queue can be closed.
Changes:
* _ExecutorManagerThread.terminate_broken() now closes
call_queue._writer.
* multiprocessing PipeConnection.close() now interrupts
WaitForMultipleObjects() in _send_bytes() by cancelling the
overlapped operation.
gh-107275 introduced a regression where a SemLock would fail being passed along nested child processes, as the `is_fork_ctx` attribute would be left missing after the first deserialization.
---------
Co-authored-by: blurb-it[bot] <43283697+blurb-it[bot]@users.noreply.github.com>
Co-authored-by: Antoine Pitrou <pitrou@free.fr>
Ensure multiprocessing SemLock is valid for spawn Process before serializing it.
Creating a multiprocessing SemLock with a fork context, and then trying to pass it to a spawn-created Process, would segfault if not detected early.
---------
Co-authored-by: blurb-it[bot] <43283697+blurb-it[bot]@users.noreply.github.com>
Co-authored-by: Antoine Pitrou <pitrou@free.fr>
Adding the `rtype_cache` to the `warnings.warn` message improves the
previous, somewhat vague message from
```
/Users/username/cpython/Lib/multiprocessing/resource_tracker.py:224: UserWarning: resource_tracker: There appear to be 6 leaked semaphore objects to clean up at shutdown
```
to
```
/Users/username/cpython/Lib/multiprocessing/resource_tracker.py:224: UserWarning: resource_tracker: There appear to be 6 leaked semaphore objects to clean up at shutdown: {'/mp-yor5cvj8', '/mp-10jx8eqr', '/mp-eobsx9tt', '/mp-0lml23vl', '/mp-9dgtsa_m', '/mp-frntyv4s'}
```
---------
Co-authored-by: blurb-it[bot] <43283697+blurb-it[bot]@users.noreply.github.com>
Prevent `multiprocessing.spawn` from failing to *import* in environments
where `sys.executable` is `None`. This regressed in 3.11 with the addition
of support for path-like objects in multiprocessing.
Adds a test decorator to have tests only run when part of test_multiprocessing_spawn to `_test_multiprocessing.py` so we can start to avoid re-running the same not-global-state specific test in all 3 modes when there is no need.
Fix a race condition in the internal `multiprocessing.process` cleanup
logic that could manifest as an unintended `AttributeError` when calling
`BaseProcess.close()`.
---------
Co-authored-by: Oleg Iarygin <oleg@arhadthedev.net>
Co-authored-by: Gregory P. Smith <greg@krypto.org>
bpo-17258: `multiprocessing` now supports stronger HMAC algorithms for inter-process connection authentication rather than only HMAC-MD5.
Signed-off-by: Christian Heimes <christian@python.org>
gpshead: I Reworked to be more robust while keeping the idea.
The protocol modification idea remains, but we now take advantage of the
message length as an indicator of legacy vs modern protocol version. No
more regular expression usage. We now default to HMAC-SHA256, but do so
in a way that will be compatible when communicating with older clients
or older servers. No protocol transition period is needed.
More integration tests to verify these claims remain true are required. I'm
unaware of anyone depending on multiprocessing connections between
different Python versions.
---------
Signed-off-by: Christian Heimes <christian@python.org>
Co-authored-by: Gregory P. Smith [Google] <greg@krypto.org>
This starts the process. Users who don't specify their own start method
and use the default on platforms where it is 'fork' will see a
DeprecationWarning upon multiprocessing.Pool() construction or upon
multiprocessing.Process.start() or concurrent.futures.ProcessPool use.
See the related issue and documentation within this change for details.
Describe the multiprocessing connection protocol.
It isn't a good protocol, but it is what it is. This way we can more
easily reason about making changes to it in a backwards compatible way.
Linux abstract sockets are insecure as they lack any form of filesystem
permissions so their use allows anyone on the system to inject code into
the process.
This removes the default preference for abstract sockets in
multiprocessing introduced in Python 3.9+ via
https://github.com/python/cpython/pull/18866 while fixing
https://github.com/python/cpython/issues/84031.
Explicit use of an abstract socket by a user now generates a
RuntimeWarning. If we choose to keep this warning, it should be
backported to the 3.7 and 3.8 branches.