Apply BOLT optimizations to libpython for shared builds. Most of the C
code is in libpython so it is critical to apply BOLT there fully realize
BOLT benefits.
This change also reworks how BOLT instrumentation is applied. It
effectively removes the readelf based logic added in gh-101525 and
replaces it with a mechanism that saves a copy of the pre-bolt binary
and restores that copy when necessary. This allows us to perform BOLT
optimizations without having to manually delete the output binary to
force a new bolt run.
Also:
- add a clean-bolt target for purging BOLT files and hook that up to the
clean target
- .gitignore BOLT related files
Before and after this refactor, `make` will no-op after a previous run.
Both versions should also share common make DAG deficiencies where
targets fail to trigger as often as they need to or can trigger
prematurely in certain scenarios. e.g. after this change you may need to
`rm profile-bolt-stamp` to force a BOLT run because there aren't
appropriate non-phony targets for BOLT's make target to depend on.
To make it easier to iterate on custom BOLT settings, the flags to pass
to instrumentation and application are now defined in configure and can
be overridden by passing BOLT_INSTRUMENT_FLAGS and BOLT_APPLY_FLAGS.
The word 'dependent' is both an adjective and a noun. A 'dependant' is a British alternative spelling for the noun form. In idlelib.sidebar, 'OS-dependant' is an adjective and clearly wrong. In 'Using', 'dependant' as a noun would be acceptable in Britain, but we use American spellings in Python docs.
https://www.merriam-webster.com/words-at-play/spelling-variants-dependent-vs-dependant
Remove the bundled setuptools wheel from ensurepip, and stop installing setuptools in environments created by venv.
Co-Authored-by: Hugo van Kemenade <hugovk@users.noreply.github.com>
Co-authored-by: C.A.M. Gerlach <CAM.Gerlach@Gerlach.CAM>
Co-authored-by: Oleg Iarygin <oleg@arhadthedev.net>
It has had no effect on non-macOS platforms for a long time, and has had
the non-obvious effect of invoking `pkg_config` and not setting
`-DUSING_APPLE_OS_LIBFFI` on macOS since GH-22855.
Add COMPILEALL_OPTS variable in Makefile to override compileall
options (default: -j0) in "make install". Also merge the compileall
commands into a single command building PYC files for the all
optimization levels (0, 1, 2) at once.
Co-authored-by: Gregory P. Smith <greg@krypto.org>
This adds support for comparing pystats collected from two different builds.
- The `--json-output` can be used to load in a set of raw stats and output a
JSON file.
- Two of these JSON files can be provided on the next run, and then comparative
results between the two are output.
Integer to and from text conversions via CPython's bignum `int` type is not safe against denial of service attacks due to malicious input. Very large input strings with hundred thousands of digits can consume several CPU seconds.
This PR comes fresh from a pile of work done in our private PSRT security response team repo.
Signed-off-by: Christian Heimes [Red Hat] <christian@python.org>
Tons-of-polishing-up-by: Gregory P. Smith [Google] <greg@krypto.org>
Reviews via the private PSRT repo via many others (see the NEWS entry in the PR).
<!-- gh-issue-number: gh-95778 -->
* Issue: gh-95778
<!-- /gh-issue-number -->
I wrote up [a one pager for the release managers](https://docs.google.com/document/d/1KjuF_aXlzPUxTK4BMgezGJ2Pn7uevfX7g0_mvgHlL7Y/edit#). Much of that text wound up in the Issue. Backports PRs already exist. See the issue for links.
* gh-96132: Add some comments and minor fixes missed in the original PR
* Update Doc/using/cmdline.rst
Co-authored-by: Kumar Aditya <59607654+kumaraditya303@users.noreply.github.com>
Co-authored-by: Kumar Aditya <59607654+kumaraditya303@users.noreply.github.com>
⚠️⚠️ Note for reviewers, hackers and fellow systems/low-level/compiler engineers ⚠️⚠️
If you have a lot of experience with this kind of shenanigans and want to improve the **first** version, **please make a PR against my branch** or **reach out by email** or **suggest code changes directly on GitHub**.
If you have any **refinements or optimizations** please, wait until the first version is merged before starting hacking or proposing those so we can keep this PR productive.
* Add support for the BOLT post-link binary optimizer
Using [bolt](https://github.com/llvm/llvm-project/tree/main/bolt)
provides a fairly large speedup without any code or functionality
changes. It provides roughly a 1% speedup on pyperformance, and a
4% improvement on the Pyston web macrobenchmarks.
It is gated behind an `--enable-bolt` configure arg because not all
toolchains and environments are supported. It has been tested on a
Linux x86_64 toolchain, using llvm-bolt built from the LLVM 14.0.6
sources (their binary distribution of this version did not include bolt).
Compared to [a previous attempt](https://github.com/faster-cpython/ideas/issues/224),
this commit uses bolt's preferred "instrumentation" approach, as well as adds some non-PIE
flags which enable much better optimizations from bolt.
The effects of this change are a bit more dependent on CPU microarchitecture
than other changes, since it optimizes i-cache behavior which seems
to be a bit more variable between architectures. The 1%/4% numbers
were collected on an Intel Skylake CPU, and on an AMD Zen 3 CPU I
got a slightly larger speedup (2%/4%), and on a c6i.xlarge EC2 instance
I got a slightly lower speedup (1%/3%).
The low speedup on pyperformance is not entirely unexpected, because
BOLT improves i-cache behavior, and the benchmarks in the pyperformance
suite are small and tend to fit in i-cache.
This change uses the existing pgo profiling task (`python -m test --pgo`),
though I was able to measure about a 1% macrobenchmark improvement by
using the macrobenchmarks as the training task. I personally think that
both the PGO and BOLT tasks should be updated to use macrobenchmarks,
but for the sake of splitting up the work this PR uses the existing pgo task.
* Simplify the build flags
* Add a NEWS entry
* Update Makefile.pre.in
Co-authored-by: Dong-hee Na <donghee.na92@gmail.com>
* Update configure.ac
Co-authored-by: Dong-hee Na <donghee.na92@gmail.com>
* Add myself to ACKS
* Add docs
* Other review comments
* fix tab/space issue
* Make it more clear that --enable-bolt is experimental
* Add link to bolt's github page
Co-authored-by: Dong-hee Na <donghee.na92@gmail.com>
Remove the "configure --with-cxx-main" build option: it didn't work
for many years. Remove the MAINCC variable from configure and
Makefile.
The MAINCC variable was added by the issue gh-42471: commit
0f48d98b74. Previously, --with-cxx-main
was named --with-cxx.
Keep CXX and LDCXXSHARED variables, even if they are no longer used
by Python build system.
If an HTTP link is redirected to a same looking HTTPS link, the latter can
be used directly without changes in readability and behavior.
It protects from a men-in-the-middle attack.
This change does not affect Python examples.