If Py_NOGIL is defined and Py_SET_REFCNT() is called with a reference
count larger than UINT32_MAX, make the object immortal.
Set _Py_IMMORTAL_REFCNT constant type to Py_ssize_t to fix the
following compiler warning:
Include/internal/pycore_global_objects_fini_generated.h:14:24:
warning: comparison of integers of different signs: 'Py_ssize_t'
(aka 'long') and 'unsigned int' [-Wsign-compare]
if (Py_REFCNT(obj) < _Py_IMMORTAL_REFCNT) {
~~~~~~~~~~~~~~ ^ ~~~~~~~~~~~~~~~~~~~
The "Check if generated files are up to date" job of GitHub Actions
now runs the "autoreconf -ivf -Werror" command instead of the "make
regen-configure" command to avoid depending on the external quay.io
server.
Add Tools/build/regen-configure.sh script to regenerate the configure
with an Ubuntu container image. The
"quay.io/tiran/cpython_autoconf:271" container image
(https://github.com/tiran/cpython_autoconf) is no longer used.
* Add mimalloc v2.12
Modified src/alloc.c to remove include of alloc-override.c and not
compile new handler.
Did not include the following files:
- include/mimalloc-new-delete.h
- include/mimalloc-override.h
- src/alloc-override-osx.c
- src/alloc-override.c
- src/static.c
- src/region.c
mimalloc is thread safe and shares a single heap across all runtimes,
therefore finalization and getting global allocated blocks across all
runtimes is different.
* mimalloc: minimal changes for use in Python:
- remove debug spam for freeing large allocations
- use same bytes (0xDD) for freed allocations in CPython and mimalloc
This is important for the test_capi debug memory tests
* Don't export mimalloc symbol in libpython.
* Enable mimalloc as Python allocator option.
* Add mimalloc MIT license.
* Log mimalloc in Lib/test/pythoninfo.py.
* Document new mimalloc support.
* Use macro defs for exports as done in:
https://github.com/python/cpython/pull/31164/
Co-authored-by: Sam Gross <colesbury@gmail.com>
Co-authored-by: Christian Heimes <christian@python.org>
Co-authored-by: Victor Stinner <vstinner@python.org>
Remove replace_stdout(): call sys.stdout.reconfigure() instead of set
the error handler to backslashreplace.
display_header() logs an empty line and flush stdout.
Remove encoding workaround in display_header() since stdout error
handler is now set to backslashreplace earlier.
* Add --fast-ci and --slow-ci options to libregrtest:
* --fast-ci uses a default timeout of 10 minutes and "-u all,-cpu"
(skip slowest tests).
* --slow-ci uses a default timeout of 20 minues and "-u all" (run
all tests).
* regrtest header now lists test resources.
* Makefile changes:
* "make test", "make hostrunnertest" and "make coverage-report" now
use --fast-ci option and TESTTIMEOUT variable.
* "make buildbottest" now uses "--slow-ci". Remove options which
became redundant with "--slow-ci".
* "make testall" and "make testuniversal" now use --slow-ci option
and TESTTIMEOUT variable.
* "make testall" now uses "find -exec rm ..." instead of
"find ... -print|xargs rm ...", same as "make clean".
* GitHub Actions workflow:
* Ubuntu and Address Sanitizer jobs now use "make test". Remove
options which became redundant with "--fast-ci".
* Windows jobs now use --fast-ci option.
* Use -j0 to detect the number of CPUs.
* Set Makefile TESTTIMEOUT default to an empty string, since
--slow-ci and --fast-ci use different default timeout. It's now
accepted to pass "--timeout=" to regrtest: treated as not timeout.
* Tools/scripts/run_tests.py now uses --fast-ci option.
* Tools/buildbot/test.bat now uses --slow-ci option. Remove
--timeout=1200 option, redundant with --slow-ci.
Document PyMODINIT_FUNC macro.
Remove links to PyAPI_FUNC() and PyAPI_DATA() macros since they are
not documented. These macros should only be used to define the Python
C API. They should not be used outside Python code base.
Fix a race condition in "make regen-all". The deepfreeze.c source and
files generated by Argument Clinic are now generated or updated
before generating "global objects". Previously, some identifiers may
miss depending on the order in which these files were generated.
* "make regen-global-objects": Make sure that deepfreeze.c is
generated and up to date, and always run "make clinic".
* "make clinic" no longer runs generate_global_objects.py script.
* "make regen-deepfreeze" now only updates deepfreeze.c (C file).
It doesn't build deepfreeze.o (object) anymore.
* Remove misleading messages in "make regen-global-objects" and
"make clinic". They are now outdated, these commands are now
safe to use.
* Document generates files in Doc/using/configure.rst.
Co-authored-by: Erlend E. Aasland <erlend@python.org>
Statistics gathering is now off by default. Use the "-X pystats"
command line option or set the new PYTHONSTATS environment variable
to 1 to turn statistics gathering on at Python startup.
Statistics are no longer dumped at exit if statistics gathering was
off or statistics have been cleared.
Changes:
* Add PYTHONSTATS environment variable.
* sys._stats_dump() now returns False if statistics are not dumped
because they are all equal to zero.
* Add PyConfig._pystats member.
* Add tests on sys functions and on setting PyConfig._pystats to 1.
* Add Include/cpython/pystats.h and Include/internal/pycore_pystats.h
header files.
* Rename '_py_stats' variable to '_Py_stats'.
* Exclude Include/cpython/pystats.h from the Py_LIMITED_API.
* Move pystats.h include from object.h to Python.h.
* Add _Py_StatsOn() and _Py_StatsOff() functions. Remove
'_py_stats_struct' variable from the API: make it static in
specialize.c.
* Document API in Include/pystats.h and Include/cpython/pystats.h.
* Complete pystats documentation in Doc/using/configure.rst.
* Don't write "all zeros" stats: if _stats_off() and _stats_clear()
or _stats_dump() were called.
* _PyEval_Fini() now always call _Py_PrintSpecializationStats() which
does nothing if stats are all zeros.
Co-authored-by: Michael Droettboom <mdboom@gmail.com>
Python built with "configure --with-trace-refs" (tracing references)
is now ABI compatible with Python release build and debug build.
Moreover, it now also supports the Limited API.
Change Py_TRACE_REFS build:
* Remove _PyObject_EXTRA_INIT macro.
* The PyObject structure no longer has two extra members (_ob_prev
and _ob_next).
* Use a hash table (_Py_hashtable_t) to trace references (all
objects): PyInterpreterState.object_state.refchain.
* Py_TRACE_REFS build is now ABI compatible with release build and
debug build.
* Limited C API extensions can now be built with Py_TRACE_REFS:
xxlimited, xxlimited_35, _testclinic_limited.
* No longer rename PyModule_Create2() and PyModule_FromDefAndSpec2()
functions to PyModule_Create2TraceRefs() and
PyModule_FromDefAndSpec2TraceRefs().
* _Py_PrintReferenceAddresses() is now called before
finalize_interp_delete() which deletes the refchain hash table.
* test_tracemalloc find_trace() now also filters by size to ignore
the memory allocated by _PyRefchain_Trace().
Test changes for Py_TRACE_REFS:
* Add test.support.Py_TRACE_REFS constant.
* Add test_sys.test_getobjects() to test sys.getobjects() function.
* test_exceptions skips test_recursion_normalizing_with_no_memory()
and test_memory_error_in_PyErr_PrintEx() if Python is built with
Py_TRACE_REFS.
* test_repl skips test_no_memory().
* test_capi skisp test_set_nomemory().
Apply BOLT optimizations to libpython for shared builds. Most of the C
code is in libpython so it is critical to apply BOLT there fully realize
BOLT benefits.
This change also reworks how BOLT instrumentation is applied. It
effectively removes the readelf based logic added in gh-101525 and
replaces it with a mechanism that saves a copy of the pre-bolt binary
and restores that copy when necessary. This allows us to perform BOLT
optimizations without having to manually delete the output binary to
force a new bolt run.
Also:
- add a clean-bolt target for purging BOLT files and hook that up to the
clean target
- .gitignore BOLT related files
Before and after this refactor, `make` will no-op after a previous run.
Both versions should also share common make DAG deficiencies where
targets fail to trigger as often as they need to or can trigger
prematurely in certain scenarios. e.g. after this change you may need to
`rm profile-bolt-stamp` to force a BOLT run because there aren't
appropriate non-phony targets for BOLT's make target to depend on.
To make it easier to iterate on custom BOLT settings, the flags to pass
to instrumentation and application are now defined in configure and can
be overridden by passing BOLT_INSTRUMENT_FLAGS and BOLT_APPLY_FLAGS.
It has had no effect on non-macOS platforms for a long time, and has had
the non-obvious effect of invoking `pkg_config` and not setting
`-DUSING_APPLE_OS_LIBFFI` on macOS since GH-22855.
Add COMPILEALL_OPTS variable in Makefile to override compileall
options (default: -j0) in "make install". Also merge the compileall
commands into a single command building PYC files for the all
optimization levels (0, 1, 2) at once.
Co-authored-by: Gregory P. Smith <greg@krypto.org>
This adds support for comparing pystats collected from two different builds.
- The `--json-output` can be used to load in a set of raw stats and output a
JSON file.
- Two of these JSON files can be provided on the next run, and then comparative
results between the two are output.
* Add support for the BOLT post-link binary optimizer
Using [bolt](https://github.com/llvm/llvm-project/tree/main/bolt)
provides a fairly large speedup without any code or functionality
changes. It provides roughly a 1% speedup on pyperformance, and a
4% improvement on the Pyston web macrobenchmarks.
It is gated behind an `--enable-bolt` configure arg because not all
toolchains and environments are supported. It has been tested on a
Linux x86_64 toolchain, using llvm-bolt built from the LLVM 14.0.6
sources (their binary distribution of this version did not include bolt).
Compared to [a previous attempt](https://github.com/faster-cpython/ideas/issues/224),
this commit uses bolt's preferred "instrumentation" approach, as well as adds some non-PIE
flags which enable much better optimizations from bolt.
The effects of this change are a bit more dependent on CPU microarchitecture
than other changes, since it optimizes i-cache behavior which seems
to be a bit more variable between architectures. The 1%/4% numbers
were collected on an Intel Skylake CPU, and on an AMD Zen 3 CPU I
got a slightly larger speedup (2%/4%), and on a c6i.xlarge EC2 instance
I got a slightly lower speedup (1%/3%).
The low speedup on pyperformance is not entirely unexpected, because
BOLT improves i-cache behavior, and the benchmarks in the pyperformance
suite are small and tend to fit in i-cache.
This change uses the existing pgo profiling task (`python -m test --pgo`),
though I was able to measure about a 1% macrobenchmark improvement by
using the macrobenchmarks as the training task. I personally think that
both the PGO and BOLT tasks should be updated to use macrobenchmarks,
but for the sake of splitting up the work this PR uses the existing pgo task.
* Simplify the build flags
* Add a NEWS entry
* Update Makefile.pre.in
Co-authored-by: Dong-hee Na <donghee.na92@gmail.com>
* Update configure.ac
Co-authored-by: Dong-hee Na <donghee.na92@gmail.com>
* Add myself to ACKS
* Add docs
* Other review comments
* fix tab/space issue
* Make it more clear that --enable-bolt is experimental
* Add link to bolt's github page
Co-authored-by: Dong-hee Na <donghee.na92@gmail.com>
Remove the "configure --with-cxx-main" build option: it didn't work
for many years. Remove the MAINCC variable from configure and
Makefile.
The MAINCC variable was added by the issue gh-42471: commit
0f48d98b74. Previously, --with-cxx-main
was named --with-cxx.
Keep CXX and LDCXXSHARED variables, even if they are no longer used
by Python build system.
When Python is built with "./configure --enable-pystats" (if the
Py_STATS macro is defined), the _Py_GetSpecializationStats() function
must be exported, since it's used by the _opcode extension which is
built as a shared library.
- Remove ``--with-tclk-*`` options from `configure`
- Use pkg-config to detect `_tkinter` dependencies (Tcl/Tk, X11)
- Manual override via environment variables `TCLTK_CFLAGS` and `TCLTK_LIBS`