test.regrtest now uses process groups in the multiprocessing mode
(-jN command line option) if process groups are available: if
os.setsid() and os.killpg() functions are available.
bpo-37531, bpo-38207: On timeout, regrtest no longer attempts to call
`popen.communicate() again: it can hang until all child processes
using stdout and stderr pipes completes. Kill the worker process and
ignores its output.
Reenable test_regrtest.test_multiprocessing_timeout().
bpo-37531: Change also the faulthandler timeout of the main process
from 1 minute to 5 minutes, for Python slowest buildbots.
* Add log() method: add timestamp and load average prefixes
to main messages.
* WindowsLoadTracker:
* LOAD_FACTOR_1 is now computed using SAMPLING_INTERVAL
* Initialize the load to the arithmetic mean of the first 5 values
of the Processor Queue Length value (so over 5 seconds), rather
than 0.0.
* Handle BrokenPipeError and when typeperf exit.
* format_duration(1.5) now returns '1.5 sec', rather than
'1 sec 500 ms'
* Fix TestWorkerProcess.__repr__(): start_time is only valid
if _popen is not None.
* Fix _kill(): don't set _killed to True if _popen is None.
* _run_process(): only set _killed to False after calling
run_test_in_subprocess().
* Windows: Fix counter name in WindowsLoadTracker. Counter names are
localized: use the registry to get the counter name. Original
change written by Lorenz Mende.
* Regrtest.main() now ensures that the Windows load tracker is also
killed if an exception is raised
* TestWorkerProcess now ensures that worker processes are no longer
running before exiting: kill also worker processes when an
exception is raised.
* Enhance regrtest messages and warnings: include test name,
duration, add a worker identifier, etc.
* Rename MultiprocessRunner to TestWorkerProcess
* Use print_warning() to display warnings.
Co-Authored-By: Lorenz Mende <Lorenz.mende@gmail.com>
When using multiprocesss (-jN), the main process now uses a timeout
of 60 seconds instead of the double of the --timeout value. The
buildbot server stops a job which does not produce any output in 1200
seconds.
A root cause of bpo-37936 is that it's easy to write a .gitignore
rule that's intended to apply to a specific file (e.g., the
`pyconfig.h` generated by `./configure`) but actually applies to all
similarly-named files in the tree (e.g., `PC/pyconfig.h`.)
Specifically, any rule with no non-trailing slashes is applied in an
"unrooted" way, to files anywhere in the tree. This means that if we
write the rules in the most obvious-looking way, then
* for specific files we want to ignore that happen to be in
subdirectories (like `Modules/config.c`), the rule will work
as intended, staying "rooted" to the top of the tree; but
* when a specific file we want to ignore happens to be at the root of
the repo (like `platform`), then the obvious rule (`platform`) will
apply much more broadly than intended: if someone tries to add a
file or directory named `platform` somewhere else in the tree, it
will unexpectedly get ignored.
That's surprising behavior that can make the .gitignore file's
behavior feel finicky and unpredictable.
To avoid it, we can simply always give a rule "rooted" behavior when
that's what's intended, by systematically using leading slashes.
Further, to help make the pattern obvious when looking at the file and
minimize any need for thinking about the syntax when adding new rules:
separate the rules into one group for each type, with brief comments
identifying them.
For most of these rules it's clear whether they're meant to be rooted
or unrooted, but in a handful of cases I've only guessed. In that
case the safer default (the choice that won't hide information) is the
narrower, rooted meaning, with a leading slash. If for some of these
the unrooted meaning is desired after all, it'll be easy to move them
to the unrooted section at the top.
* Write a message when killing a worker process
* Put a timeout on the second popen.communicate() call
(after killing the process)
* Put a timeout on popen.wait() call
* Catch popen.kill() and popen.wait() exceptions
Mark some individual tests to skip when --pgo is used. The tests
marked increase the PGO task time significantly and likely don't
help improve optimization of the final executable.
Reduce the number of unit tests run for the PGO generation task. This
speeds up the task by a factor of about 15x. Running the full unit test
suite is slow. This change may result in a slightly less optimized build
since not as many code branches will be executed. If you are willing to
wait for the much slower build, the old behavior can be restored using
'./configure [..] PROFILE_TASK="-m test --pgo-extended"'. We make no
guarantees as to which PGO task set produces a faster build. Users who
care should run their own relevant benchmarks as results can depend on
the environment, workload, and compiler tool chain.
bpo-15386, bpo-37473: test_import, regrtest and libregrtest no longer
import importlib as soon as possible, as the first import, "to test
bpo-15386".
It is tested by test_import.test_there_can_be_only_one().
Sort test_import imports.
urllib.request tests now call urlcleanup() to remove temporary files
created by urlretrieve() tests and to clear the _opener global
variable set by urlopen() and functions calling indirectly urlopen().
regrtest now checks if urllib.request._url_tempfiles and
urllib.request._opener are changed by tests.
Under some conditions the earlier fix for bpo-18075, "Infinite recursion
tests triggering a segfault on Mac OS X", now causes failures on macOS
when attempting to change stack limit with resource.setrlimit
resource.RLIMIT_STACK, like regrtest does when running the test suite.
The reverted change had specified a non-default stack size when linking
the python executable on macOS. As of macOS 10.14.4, the previous
code causes a hard failure when running tests, although similar
failures had been seen under some conditions under some earlier
systems. Reverting the change to the interpreter stack size at link
time helped for release builds but caused some tests to fail when
built --with-pydebug. Try the opposite approach: continue to build
the interpreter with an increased stack size on macOS and remove
the failing setrlimit call in regrtest initialization. This will
definitely avoid the resource.RLIMIT_STACK error and should have
no, or fewer, side effects.
* regrtest: Add --cleanup option to remove "test_python_*" directories
of previous failed test jobs.
* Add "make cleantest" to run "python3 -m test --cleanup".
regrtest now uses sys.unraisablehook() to mark a test as "environment
altered" (ENV_CHANGED) if it emits an "unraisable exception".
Moreover, regrtest logs a warning in this case.
Use "python3 -m test --fail-env-changed" to catch unraisable
exceptions in tests.
When using multiprocessing (-jN option), worker processes now create
their temporary directory inside the temporary directory of the
main process. So the main process is able to remove temporary
directories of worker processes even if they crash or when they are
killed by regrtest on KeyboardInterrupt (CTRL+c).
Rework also how multiprocessing arguments are parsed in main.py.
"python3 -m test -jN ..." now continues the execution of next tests
when a worker process crash (CHILD_ERROR state). Previously, the test
suite stopped immediately. Use --failfast to stop at the first error.
Moreover, --forever now also implies --failfast.
regrtest now always detects uncollectable objects. Previously, the
check was only enabled by --findleaks. The check now also works with
-jN/--multiprocess N.
--findleaks becomes a deprecated alias to --fail-env-changed.
Rewrite run_tests_multiprocess() function as a new MultiprocessRunner
class with multiple methods to better report errors and stop
immediately when needed.
Changes:
* Worker processes are now killed immediately if tests are
interrupted or if a test does crash (CHILD_ERROR): worker
processes are killed.
* Rewrite how errors in a worker thread are reported to
the main thread. No longer ignore BaseException or parsing errors
silently.
* Remove 'finished' variable: use worker.is_alive() instead
* Always compute omitted tests. Add Regrtest.get_executed() method.
* Add TestResult and MultiprocessResult types to ensure that results
always have the same fields.
* runtest() now handles KeyboardInterrupt
* accumulate_result() and format_test_result() now takes a TestResult
* cleanup_test_droppings() is now called by runtest() and mark the
test as ENV_CHANGED if the test leaks support.TESTFN file.
* runtest() now includes code "around" the test in the test timing
* Add print_warning() in test.libregrtest.utils to standardize how
libregrtest logs warnings to ease parsing the test output.
* support.unload() is now called with abstest rather than test_name
* Rename 'test' variable/parameter to 'test_name'
* dash_R(): remove unused the_module parameter
* Remove unused imports
dash_R() function of libregrtest doesn't call support.gc_collect()
directly anymore: it's already called by dash_R_cleanup().
Call dash_R_cleanup() before starting the loop.
Fix reference leak hunting in regrtest: compute also deltas (of
reference count, allocated memory blocks, file descriptor count)
during warmup, to ensure that everything is initialized before
starting to hunt reference leaks.
Other changes:
* Replace gc.collect() with support.gc_collect()
* Move calls to read memory statistics from dash_R_cleanup() to
dash_R()
* Pass regrtest 'ns' to dash_R()
* dash_R() is now more quiet with --quiet option (don't display
progress).
* Precompute the full range for "for it in range(repcount):" to
ensure that the iteration doesn't allocate anything new.
* dash_R() now is responsible to call warm_caches().
While Windows exposes the system processor queue length, the raw value
used for load calculations on Unix systems, it does not provide an API
to access the averaged value. Hence to calculate the load we must track
and average it ourselves. We can't use multiprocessing or a thread to
read it in the background while the tests run since using those would
conflict with test_multiprocessing and test_xxsubprocess.
Thus, we use Window's asynchronous IO API to run the tracker in the
background with it sampling at the correct rate. When we wish to access
the load we check to see if there's new data on the stream, if there is,
we update our load values.
TextTestRunner of unittest.runner now uses time.perf_counter() rather
than time.time() to measure the execution time of a test: time.time()
can go backwards, whereas time.perf_counter() is monotonic.
Similar change made in libregrtest, pprint and random.