* Add attributes to Regrtest and RunTests:
* fail_env_changed
* num_workers
* Rename MultiprocessTestRunner to RunWorkers. Add num_workers
parameters to RunWorkers constructor. Remove RunWorkers.ns
attribute.
* Rename TestWorkerProcess to WorkerThread.
* get_running() now returns a string like: "running (...): ...".
* Regrtest.action_run_tests() now selects the number of worker
processes, instead of the command line parser.
* Add attributes to Regrtest and RunTests:
* gc_threshold
* memory_limit
* python_cmd
* use_resources
* Remove WorkerJob class. Add as_json() and from_json() methods to
RunTests. A worker process now only uses RunTests for all
parameters.
* Add tests on support.set_memlimit() in test_support. Create
_parse_memlimit() and also adds tests on it.
* Remove 'ns' parameter from runtest.py.
* Rename dash_R() runtest_refleak(). The function now gets
huntrleaks and quiet arguments, instead of 'ns' argument.
* Add attributes to Regrtest and RunTests:
* verbose
* quiet
* huntrleaks
* test_dir
* Add HuntRefleak class.
Refator Regrtest class:
* Rename finalize() finalize_tests().
* Pass tracer to run_test() and finalize_tests(). Remove Regrtest.tracer.
* run_test() does less things: move code to its caller.
* Regrtest.__init__() now copies 'ns' namespace attributes to
Regrtest attributes. Regrtest match_tests and ignore_tests
attributes have type FilterTuple (tuple), instead of a list.
* Add RunTests.copy(). Regrtest._rerun_failed_tests() now uses
RunTests.copy().
* Replace Regrtest.all_tests (list) with Regrtest.first_runtests
(RunTests).
* Make random_seed maximum 10x larger (9 digits, instead of 8).
* main() now calls _parse_args() and pass 'ns' to Regrtest
constructor. Remove kwargs argument from Regrtest.main().
* _parse_args() checks ns.huntrleaks.
* set_temp_dir() is now responsible to call expanduser().
* Regrtest.main() sets self.tests earlier.
* Add TestTuple and TestList types.
* Rename MatchTests to FilterTuple and rename MatchTestsDict
to FilterTestDict.
* TestResult.get_rerun_match_tests() return type
is now FilterTuple: return a tuple instead of a list.
RunTests.tests type becomes TestTuple.
When using --rerun option, regrtest now re-runs failed tests
in verbose mode in fresh worker processes to have more
deterministic behavior. So it can write its final report even
if a test killed a worker progress.
Add --fail-rerun option to regrtest: exit with non-zero exit code
if a test failed pass passed when re-run in verbose mode (in a
fresh process). That's now more useful since tests can pass
when re-run in a fresh worker progress, whereas they failed
when run after other tests when tests are run sequentially.
Rename --verbose2 option (-w) to --rerun. Keep --verbose2 as a
deprecated alias.
Changes:
* Fix and enhance statistics in regrtest summary. Add "(filtered)"
when --match and/or --ignore options are used.
* Add RunTests class.
* Add TestResult.get_rerun_match_tests() method
* Rewrite code to serialize/deserialize worker arguments as JSON
using a new WorkerJob class.
* Fix stats when a test is run with --forever --rerun.
* If failed test names cannot be parsed, log a warning and don't
filter tests.
* test_regrtest.test_rerun_success() now uses a marker file, since
the test is re-run in a separated process.
* Add tests on normalize_test_name() function.
* Add test_success() and test_skip() tests to test_regrtest.
test_netrc, test_pep646_syntax and test_xml_etree now return results
in the test_main() function.
Changes:
* Rewrite TestResult as a dataclass with a new State class.
* Add test.support.TestStats class and Regrtest.stats_dict attribute.
* libregrtest.runtest functions now modify a TestResult instance
in-place.
* libregrtest summary lists the number of run tests and skipped
tests, and denied resources.
* Add TestResult.has_meaningful_duration() method.
* Compute TestResult duration in the upper function.
* Use time.perf_counter() instead of time.monotonic().
* Regrtest: rename 'resource_denieds' attribute to 'resource_denied'.
* Rename CHILD_ERROR to MULTIPROCESSING_ERROR.
* Use match/case syntadx to have different code depending on the
test state.
Co-authored-by: Alex Waygood <Alex.Waygood@Gmail.com>
Split test_multiprocessing_fork, test_multiprocessing_forkserver and
test_multiprocessing_spawn into test packages. Each package is made
of 4 sub-tests: processes, threads, manager and misc. It allows
running more tests in parallel and so reduce the total test duration.
Currently, test_asyncio package is only splitted into sub-tests when
using command "./python -m test". With this change, it's also
splitted when passing it on the command line:
"./python -m test test_asyncio".
Remove the concept of "STDTESTS". Python is now mature enough to not
have to bother with that anymore. Removing STDTESTS simplify the
code.
When running the Python test suite with -jN option, if a worker stdout
cannot be decoded from the locale encoding report a failed testn so the
exitcode is non-zero.
Display the sanitizers present in libregrtest.
Having this in the CI output for tests with the relevant environment
variable displayed will help make it easier to do what we need to
create an equivalent local test run.
These are stubs to be used for adding hypothesis (https://hypothesis.readthedocs.io/en/latest/) tests to the standard library.
When the tests are run in an environment where `hypothesis` and its various dependencies are not installed, the stubs will turn any tests with examples into simple parameterized tests and any tests without examples are skipped.
It also adds hypothesis tests for the `zoneinfo` module, and a Github Actions workflow to run the hypothesis tests as a non-required CI job.
The full hypothesis interface is not stubbed out — missing stubs can be added as necessary.
Co-authored-by: Zac Hatfield-Dodds <zac.hatfield.dodds@gmail.com>
This runs test_asyncio sub-tests in parallel using sharding from Cinder. This suite is typically the longest-pole in runs because it is a test package with a lot of further sub-tests otherwise run serially. By breaking out the sub-tests as independent modules we can run a lot more in parallel.
After porting we can see the direct impact on a multicore system.
Without this change:
Running make test is 5 min 26 seconds
With this change:
Running make test takes 3 min 39 seconds
That'll vary based on system and parallelism. On a `-j 4` run similar to what CI and buildbot systems often do, it reduced the overall test suite completion latency by 10%.
The drawbacks are that this implementation is hacky and due to the sorting of the tests it obscures when the asyncio tests occur and involves changing CPython test infrastructure but, the wall time saved it is worth it, especially in low-core count CI runs as it pulls a long tail. The win for productivity and reserved CI resource usage is significant.
Future tests that deserve to be refactored into split up suites to benefit from are test_concurrent_futures and the way the _test_multiprocessing suite gets run for all start methods. As exposed by passing the -o flag to python -m test to get a list of the 10 longest running tests.
---------
Co-authored-by: Carl Meyer <carl@oddbird.net>
Co-authored-by: Gregory P. Smith <greg@krypto.org> [Google, LLC]
This is the implementation of PEP683
Motivation:
The PR introduces the ability to immortalize instances in CPython which bypasses reference counting. Tagging objects as immortal allows up to skip certain operations when we know that the object will be around for the entire execution of the runtime.
Note that this by itself will bring a performance regression to the runtime due to the extra reference count checks. However, this brings the ability of having truly immutable objects that are useful in other contexts such as immutable data sharing between sub-interpreters.
Remove the distutils package. It was deprecated in Python 3.10 by PEP
632 "Deprecate distutils module". For projects still using distutils
and cannot be updated to something else, the setuptools project can
be installed: it still provides distutils.
* Remove Lib/distutils/ directory
* Remove test_distutils
* Remove references to distutils
* Skip test_check_c_globals and test_peg_generator since they use
distutils
The Python test suite now fails wit exit code 4 if no tests ran. It
should help detecting typos in test names and test methods.
* Add "EXITCODE_" constants to Lib/test/libregrtest/main.py.
* Fix a typo: "NO TEST RUN" becomes "NO TESTS RAN"
On Windows, when the Python test suite is run with the -jN option,
the ANSI code page is now used as the encoding for the stdout
temporary file, rather than using UTF-8 which can lead to decoding
errors.
- Emscripten's default umask is too strict, see
https://github.com/emscripten-core/emscripten/issues/17269
- getuid/getgid and geteuid/getegid are stubs that always return 0
(root). Disable effective uid/gid syscalls and fix tests that use
chmod() current user.
- Cannot drop X bit from directory.
regrtest now also implements checking for leaked temporary files and
directories when using -jN for N >= 2. Use tempfile.mkdtemp() to
create the temporary directory. Skip this check on WASI.
When running tests with -jN, create a temporary directory per process
and mark a test as "environment changed" if a test leaks a temporary
file or directory.
Replace the child process `typeperf.exe` with a daemon thread that reads the performance counters directly. This prevents the issues that arise from inherited handles in grandchild processes (see issue37531 for discussion).
We only use the load tracker when running tests in multiprocess mode. This prevents inadvertent interactions with tests expecting a single threaded environment. Displaying load is really only helpful for buildbots running in multiprocess mode anyway.
Remove the asyncore and asynchat modules, deprecated in Python
3.6: use the asyncio module instead.
Remove the smtpd module, deprecated in Python 3.6: the aiosmtpd
module can be used instead, it is based on asyncio.
* Remove asyncore, asynchat and smtpd documentation
* Remove test_asyncore, test_asynchat and test_smtpd
* Rename Lib/asynchat.py to Lib/test/support/_asynchat.py
* Rename Lib/asyncore.py to Lib/test/support/_asyncore.py
* Rename Lib/smtpd.py to Lib/test/support/_smtpd.py
* Remove DeprecationWarning from private _asyncore, _asynchat and
_smtpd modules
* _smtpd: remove deprecated properties
Remove the --findleaks command line option of regrtest: use the
--fail-env-changed option instead. Since Python 3.7, it was a
deprecated alias to the --fail-env-changed option.
support.print_warning() now stores the original value of
sys.__stderr__ and uses it to log warnings. libregrtest uses the same
stream to log unraisable exceptions and uncaught threading
exceptions.
Partially revert commit dbe213de7ef28712bbfdb9d94a33abb9c33ef0c2:
libregrtest no longer replaces sys.__stdout__, sys.__stderr__, and
stdout and stderr file descriptors.
Remove also a few unused imports in libregrtest.
libregrtest -W/--verbose3 now also replace sys.__stdout__,
sys.__stderr__, and stdout and stderr file descriptors (fd 1 and fd
2).
support.print_warning() messages are now logged in the expected
order.
The "./python -m test test_eintr -W" command no longer logs into
stdout if the test pass.
When libregrtest spawns a worker process, stderr is now written into
stdout to keep messages order. Use a single pipe for stdout and
stderr, rather than two pipes. Previously, messages were out of order
which made analysis of buildbot logs harder
libregrtest now clears the type cache later to reduce the risk of
false alarm when checking for reference leaks. Previously, the type
cache was cleared too early and libregrtest raised a false alarm
about reference leaks under very specific conditions.
Move also support.gc_collect() outside clear/cleanup functions to
make the garbage collection more explicit.
Co-authored-by: Irit Katriel <1055913+iritkatriel@users.noreply.github.com>
* Move to a static argparse.Namespace subclass
* Roughly annotate runtest.py
* Refactor libregrtest to use lossless test result objects
* Only re-run test methods that match names of previously failing test methods
* Adopt tests to cover test method name matching
Co-authored-by: Pablo Galindo Salgado <Pablogsal@gmail.com>
* Add co_firstinstr field to code object.
* Implement barebones quickening.
* Use non-quickened bytecode when tracing.
* Add NEWS item
* Add new file to Windows build.
* Don't specialize instructions with EXTENDED_ARG.
test.libregrtest now marks a test as ENV_CHANGED (altered the
execution environment) if a thread raises an exception but does not
catch it. It sets a hook on threading.excepthook. Use
--fail-env-changed option to mark the test as failed.
libregrtest regrtest_unraisable_hook() explicitly flushs
sys.stdout, sys.stderr and sys.__stderr__.
RegressionTestResult.USE_XML must now be set to True to get the JUnit
XML output.
Reduce the number of imports when --junit-xml=FILE option is not
used: 153 => 144 (-9).
Move clear_caches() from libregrtest.refleak to libregrtest.utils to
avoid importing libregrtest.refleak when it's not needed.
clear_caches() now only calls re.purge() if 're' is in sys.modules.
Reduce the number of modules imported by libregrtest.
saved_test_environment no longer imports modules at startup, but try
to get them from sys.modules. If an module is missing, skip the test.
It also sets directly support.environment_altered.
runtest() now now two saved_test_environment instances: one before
importing the test module, one after importing it.
Remove imports from test.libregrtest.save_env:
* asyncio
* logging
* multiprocessing
* shutil
* sysconfig
* urllib.request
* warnings
When a test method imports a module (ex: warnings) and the test
has a side effect (ex: add a warnings filter), the side effect is not
detected, because the module was not imported when Python
enters the saved_test_environment context manager.
test_repl.test_close_stdin() now calls
support.suppress_msvcrt_asserts() to fix the test on Windows.
* Move suppress_msvcrt_asserts() from test.libregrtest.setup to
test.support. Make its verbose parameter optional: verbose=False by
default.
* Add msvcrt.GetErrorMode().
* SuppressCrashReport now uses GetErrorMode() and SetErrorMode() of
the msvcrt module, rather than using ctypes.
* Remove also an unused variable (deadline) in wait_process().
Log "Warning -- ..." test warnings into sys.__stderr__ rather than
sys.stderr, to ensure to display them even if sys.stderr is captured.
test.libregrtest.utils.print_warning() now calls
test.support.print_warning().
When building Python in some uncommon platforms there are some known tests that will fail. Right now, the test suite has the ability to ignore entire tests using the -x option and to receive a filter file using the --matchfile filter. The problem with the --matchfile option is that it receives a file with patterns to accept and when you want to ignore a couple of tests and subtests, is too cumbersome to lists ALL tests that are not the ones that you want to accept and he problem with -x is that is not easy to ignore just a subtests that fail and the whole test needs to be ignored.
For these reasons, add a new option to allow to ignore a list of test and subtests for these situations.
test.regrtest now uses process groups in the multiprocessing mode
(-jN command line option) if process groups are available: if
os.setsid() and os.killpg() functions are available.
bpo-37531, bpo-38207: On timeout, regrtest no longer attempts to call
`popen.communicate() again: it can hang until all child processes
using stdout and stderr pipes completes. Kill the worker process and
ignores its output.
Reenable test_regrtest.test_multiprocessing_timeout().
bpo-37531: Change also the faulthandler timeout of the main process
from 1 minute to 5 minutes, for Python slowest buildbots.
* Add log() method: add timestamp and load average prefixes
to main messages.
* WindowsLoadTracker:
* LOAD_FACTOR_1 is now computed using SAMPLING_INTERVAL
* Initialize the load to the arithmetic mean of the first 5 values
of the Processor Queue Length value (so over 5 seconds), rather
than 0.0.
* Handle BrokenPipeError and when typeperf exit.
* format_duration(1.5) now returns '1.5 sec', rather than
'1 sec 500 ms'
* Fix TestWorkerProcess.__repr__(): start_time is only valid
if _popen is not None.
* Fix _kill(): don't set _killed to True if _popen is None.
* _run_process(): only set _killed to False after calling
run_test_in_subprocess().
* Windows: Fix counter name in WindowsLoadTracker. Counter names are
localized: use the registry to get the counter name. Original
change written by Lorenz Mende.
* Regrtest.main() now ensures that the Windows load tracker is also
killed if an exception is raised
* TestWorkerProcess now ensures that worker processes are no longer
running before exiting: kill also worker processes when an
exception is raised.
* Enhance regrtest messages and warnings: include test name,
duration, add a worker identifier, etc.
* Rename MultiprocessRunner to TestWorkerProcess
* Use print_warning() to display warnings.
Co-Authored-By: Lorenz Mende <Lorenz.mende@gmail.com>
When using multiprocesss (-jN), the main process now uses a timeout
of 60 seconds instead of the double of the --timeout value. The
buildbot server stops a job which does not produce any output in 1200
seconds.
A root cause of bpo-37936 is that it's easy to write a .gitignore
rule that's intended to apply to a specific file (e.g., the
`pyconfig.h` generated by `./configure`) but actually applies to all
similarly-named files in the tree (e.g., `PC/pyconfig.h`.)
Specifically, any rule with no non-trailing slashes is applied in an
"unrooted" way, to files anywhere in the tree. This means that if we
write the rules in the most obvious-looking way, then
* for specific files we want to ignore that happen to be in
subdirectories (like `Modules/config.c`), the rule will work
as intended, staying "rooted" to the top of the tree; but
* when a specific file we want to ignore happens to be at the root of
the repo (like `platform`), then the obvious rule (`platform`) will
apply much more broadly than intended: if someone tries to add a
file or directory named `platform` somewhere else in the tree, it
will unexpectedly get ignored.
That's surprising behavior that can make the .gitignore file's
behavior feel finicky and unpredictable.
To avoid it, we can simply always give a rule "rooted" behavior when
that's what's intended, by systematically using leading slashes.
Further, to help make the pattern obvious when looking at the file and
minimize any need for thinking about the syntax when adding new rules:
separate the rules into one group for each type, with brief comments
identifying them.
For most of these rules it's clear whether they're meant to be rooted
or unrooted, but in a handful of cases I've only guessed. In that
case the safer default (the choice that won't hide information) is the
narrower, rooted meaning, with a leading slash. If for some of these
the unrooted meaning is desired after all, it'll be easy to move them
to the unrooted section at the top.
* Write a message when killing a worker process
* Put a timeout on the second popen.communicate() call
(after killing the process)
* Put a timeout on popen.wait() call
* Catch popen.kill() and popen.wait() exceptions
Mark some individual tests to skip when --pgo is used. The tests
marked increase the PGO task time significantly and likely don't
help improve optimization of the final executable.
Reduce the number of unit tests run for the PGO generation task. This
speeds up the task by a factor of about 15x. Running the full unit test
suite is slow. This change may result in a slightly less optimized build
since not as many code branches will be executed. If you are willing to
wait for the much slower build, the old behavior can be restored using
'./configure [..] PROFILE_TASK="-m test --pgo-extended"'. We make no
guarantees as to which PGO task set produces a faster build. Users who
care should run their own relevant benchmarks as results can depend on
the environment, workload, and compiler tool chain.
bpo-15386, bpo-37473: test_import, regrtest and libregrtest no longer
import importlib as soon as possible, as the first import, "to test
bpo-15386".
It is tested by test_import.test_there_can_be_only_one().
Sort test_import imports.
urllib.request tests now call urlcleanup() to remove temporary files
created by urlretrieve() tests and to clear the _opener global
variable set by urlopen() and functions calling indirectly urlopen().
regrtest now checks if urllib.request._url_tempfiles and
urllib.request._opener are changed by tests.
Under some conditions the earlier fix for bpo-18075, "Infinite recursion
tests triggering a segfault on Mac OS X", now causes failures on macOS
when attempting to change stack limit with resource.setrlimit
resource.RLIMIT_STACK, like regrtest does when running the test suite.
The reverted change had specified a non-default stack size when linking
the python executable on macOS. As of macOS 10.14.4, the previous
code causes a hard failure when running tests, although similar
failures had been seen under some conditions under some earlier
systems. Reverting the change to the interpreter stack size at link
time helped for release builds but caused some tests to fail when
built --with-pydebug. Try the opposite approach: continue to build
the interpreter with an increased stack size on macOS and remove
the failing setrlimit call in regrtest initialization. This will
definitely avoid the resource.RLIMIT_STACK error and should have
no, or fewer, side effects.
* regrtest: Add --cleanup option to remove "test_python_*" directories
of previous failed test jobs.
* Add "make cleantest" to run "python3 -m test --cleanup".
regrtest now uses sys.unraisablehook() to mark a test as "environment
altered" (ENV_CHANGED) if it emits an "unraisable exception".
Moreover, regrtest logs a warning in this case.
Use "python3 -m test --fail-env-changed" to catch unraisable
exceptions in tests.
When using multiprocessing (-jN option), worker processes now create
their temporary directory inside the temporary directory of the
main process. So the main process is able to remove temporary
directories of worker processes even if they crash or when they are
killed by regrtest on KeyboardInterrupt (CTRL+c).
Rework also how multiprocessing arguments are parsed in main.py.
"python3 -m test -jN ..." now continues the execution of next tests
when a worker process crash (CHILD_ERROR state). Previously, the test
suite stopped immediately. Use --failfast to stop at the first error.
Moreover, --forever now also implies --failfast.
regrtest now always detects uncollectable objects. Previously, the
check was only enabled by --findleaks. The check now also works with
-jN/--multiprocess N.
--findleaks becomes a deprecated alias to --fail-env-changed.
Rewrite run_tests_multiprocess() function as a new MultiprocessRunner
class with multiple methods to better report errors and stop
immediately when needed.
Changes:
* Worker processes are now killed immediately if tests are
interrupted or if a test does crash (CHILD_ERROR): worker
processes are killed.
* Rewrite how errors in a worker thread are reported to
the main thread. No longer ignore BaseException or parsing errors
silently.
* Remove 'finished' variable: use worker.is_alive() instead
* Always compute omitted tests. Add Regrtest.get_executed() method.
* Add TestResult and MultiprocessResult types to ensure that results
always have the same fields.
* runtest() now handles KeyboardInterrupt
* accumulate_result() and format_test_result() now takes a TestResult
* cleanup_test_droppings() is now called by runtest() and mark the
test as ENV_CHANGED if the test leaks support.TESTFN file.
* runtest() now includes code "around" the test in the test timing
* Add print_warning() in test.libregrtest.utils to standardize how
libregrtest logs warnings to ease parsing the test output.
* support.unload() is now called with abstest rather than test_name
* Rename 'test' variable/parameter to 'test_name'
* dash_R(): remove unused the_module parameter
* Remove unused imports
dash_R() function of libregrtest doesn't call support.gc_collect()
directly anymore: it's already called by dash_R_cleanup().
Call dash_R_cleanup() before starting the loop.
Fix reference leak hunting in regrtest: compute also deltas (of
reference count, allocated memory blocks, file descriptor count)
during warmup, to ensure that everything is initialized before
starting to hunt reference leaks.
Other changes:
* Replace gc.collect() with support.gc_collect()
* Move calls to read memory statistics from dash_R_cleanup() to
dash_R()
* Pass regrtest 'ns' to dash_R()
* dash_R() is now more quiet with --quiet option (don't display
progress).
* Precompute the full range for "for it in range(repcount):" to
ensure that the iteration doesn't allocate anything new.
* dash_R() now is responsible to call warm_caches().
While Windows exposes the system processor queue length, the raw value
used for load calculations on Unix systems, it does not provide an API
to access the averaged value. Hence to calculate the load we must track
and average it ourselves. We can't use multiprocessing or a thread to
read it in the background while the tests run since using those would
conflict with test_multiprocessing and test_xxsubprocess.
Thus, we use Window's asynchronous IO API to run the tracker in the
background with it sampling at the correct rate. When we wish to access
the load we check to see if there's new data on the stream, if there is,
we update our load values.
TextTestRunner of unittest.runner now uses time.perf_counter() rather
than time.time() to measure the execution time of a test: time.time()
can go backwards, whereas time.perf_counter() is monotonic.
Similar change made in libregrtest, pprint and random.
Fix bug in `Lib/test/libregrtest/runtest.py` that makes running tests an extra time than the specified number of runs.
Add check for invalid --huntrleaks/-R parameters.
If tests are re-run, use "xxx then yyy" result format (ex: "FAILURE
then SUCCESS") to show that some failing tests have been re-run.
Add also test_regrtest.test_rerun_fail() test.
* "running:" progress: Format number of seconds as hours and minutes
* format_duration(): count also minutes as hours
* Create Lib/test/libregrtest/utils.py