Commit Graph

318 Commits

Author SHA1 Message Date
mpage 2e95c5ba3b
gh-115999: Implement thread-local bytecode and enable specialization for `BINARY_OP` (#123926)
Each thread specializes a thread-local copy of the bytecode, created on the first RESUME, in free-threaded builds. All copies of the bytecode for a code object are stored in the co_tlbc array on the code object. Threads reserve a globally unique index identifying its copy of the bytecode in all co_tlbc arrays at thread creation and release the index at thread destruction. The first entry in every co_tlbc array always points to the "main" copy of the bytecode that is stored at the end of the code object. This ensures that no bytecode is copied for programs that do not use threads.

Thread-local bytecode can be disabled at runtime by providing either -X tlbc=0 or PYTHON_TLBC=0. Disabling thread-local bytecode also disables specialization.

Concurrent modifications to the bytecode made by the specializing interpreter and instrumentation use atomics, with specialization taking care not to overwrite an instruction that was instrumented concurrently.
2024-11-04 11:13:32 -08:00
Eric Snow 6d93690954
gh-125604: Move _Py_AuditHookEntry, etc. Out of pycore_runtime.h (gh-125605)
This is essentially a cleanup, moving a handful of API declarations to the header files where they fit best, creating new ones when needed.

We do the following:

* add pycore_debug_offsets.h and move _Py_DebugOffsets, etc. there
* inline struct _getargs_runtime_state and struct _gilstate_runtime_state in _PyRuntimeState
* move struct _reftracer_runtime_state to the existing pycore_object_state.h
* add pycore_audit.h and move to it _Py_AuditHookEntry , _PySys_Audit(), and _PySys_ClearAuditHooks
* add audit.h and cpython/audit.h and move the existing audit-related API there
*move the perfmap/trampoline API from cpython/sysmodule.h to cpython/ceval.h, and remove the now-empty cpython/sysmodule.h
2024-10-18 09:26:08 -06:00
Sam Gross b482538523
gh-124218: Refactor per-thread reference counting (#124844)
Currently, we only use per-thread reference counting for heap type objects and
the naming reflects that. We will extend it to a few additional types in an
upcoming change to avoid scaling bottlenecks when creating nested functions.

Rename some of the files and functions in preparation for this change.
2024-10-01 17:05:42 +00:00
Wulian 27a62e7371
gh-124102: Update internal PCbuild docs to accurately list build dependencies (GH-124103) 2024-09-23 23:09:22 +00:00
Irit Katriel 1a9d8917a3
gh-121404: split compile.c into compile.c and codegen.c (#123651) 2024-09-09 18:21:51 +01:00
Jonathan Protzenko 325e9b8ef4
gh-99108: Add HACL* Blake2 implementation to hashlib (GH-119316)
This replaces the existing hashlib Blake2 module with a single implementation that uses HACL\*'s Blake2b/Blake2s implementations. We added support for all the modes exposed by the Python API, including tree hashing, leaf nodes, and so on. We ported and merged all of these changes upstream in HACL\*, added test vectors based on Python's existing implementation, and exposed everything needed for hashlib.

This was joint work done with @R1kM.

See the PR for much discussion and benchmarking details.   TL;DR: On many systems, 8-50% faster (!) than `libb2`, on some systems it appeared 10-20% slower than `libb2`.
2024-08-13 21:42:19 +00:00
Sam Gross dc09301067
gh-122417: Implement per-thread heap type refcounts (#122418)
The free-threaded build partially stores heap type reference counts in
distributed manner in per-thread arrays. This avoids reference count
contention when creating or destroying instances.

Co-authored-by: Ken Jin <kenjin@python.org>
2024-08-06 14:36:57 -04:00
Sam Gross 5716cc3529
gh-100240: Use a consistent implementation for freelists (#121934)
This combines and updates our freelist handling to use a consistent
implementation. Objects in the freelist are linked together using the
first word of memory block.

If configured with freelists disabled, these operations are essentially
no-ops.
2024-07-22 12:08:27 -04:00
Victor Stinner f8373db153
gh-112136: Restore removed _PyArg_Parser (#121262)
Restore the private _PyArg_Parser structure and the private
_PyArg_ParseTupleAndKeywordsFast() function, previously removed
in Python 3.13 alpha 1.

Recreate Include/cpython/modsupport.h header file.
2024-07-03 18:36:57 +02:00
Ken Jin 22b0de2755
gh-117139: Convert the evaluation stack to stack refs (#118450)
This PR sets up tagged pointers for CPython.

The general idea is to create a separate struct _PyStackRef for everything on the evaluation stack to store the bits. This forces the C compiler to warn us if we try to cast things or pull things out of the struct directly.

Only for free threading: We tag the low bit if something is deferred - that means we skip incref and decref operations on it. This behavior may change in the future if Mark's plans to defer all objects in the interpreter loop pans out.

This implies a strict stack reference discipline is required. ALL incref and decref operations on stackrefs must use the stackref variants. It is unsafe to untag something then do normal incref/decref ops on it.

The new incref and decref variants are called dup and close. They mimic a "handle" API operating on these stackrefs.

Please read Include/internal/pycore_stackref.h for more information!

---------

Co-authored-by: Mark Shannon <9448417+markshannon@users.noreply.github.com>
2024-06-27 03:10:43 +08:00
Victor Stinner 9e4a81f00f
gh-120642: Move private PyCode APIs to the internal C API (#120643)
* Move _Py_CODEUNIT and related functions to pycore_code.h.
* Move _Py_BackoffCounter to pycore_backoff.h.
* Move Include/cpython/optimizer.h content to pycore_optimizer.h.
* Remove Include/cpython/optimizer.h.
* Remove PyUnstable_Replace_Executor().

Rename functions:

* PyUnstable_GetExecutor() => _Py_GetExecutor()
* PyUnstable_GetOptimizer() => _Py_GetOptimizer()
* PyUnstable_SetOptimizer() => _Py_SetTier2Optimizer()
* PyUnstable_Optimizer_NewCounter() => _PyOptimizer_NewCounter()
* PyUnstable_Optimizer_NewUOpOptimizer() => _PyOptimizer_NewUOpOptimizer()
2024-06-26 13:54:03 +02:00
Sam Gross 8f17d69b7b
gh-119344: Make critical section API public (#119353)
This makes the following macros public as part of the non-limited C-API for
locking a single object or two objects at once.

* `Py_BEGIN_CRITICAL_SECTION(op)` / `Py_END_CRITICAL_SECTION()`
* `Py_BEGIN_CRITICAL_SECTION2(a, b)` / `Py_END_CRITICAL_SECTION2()`

The supporting functions and structs used by the macros are also exposed for
cases where C macros are not available.
2024-06-21 15:50:18 -04:00
Sam Gross 3af7263037
gh-117511: Make PyMutex public in the non-limited API (#117731) 2024-06-20 11:29:08 -04:00
Victor Stinner f9d47fed9f
gh-119853: Add Include/refcount.h to projects (#119860) 2024-05-31 21:21:30 +02:00
Steve Dower 460546529b
gh-118734: Fixes Windows build when Use_TIER2 is unspecified (#118735) 2024-05-07 23:01:18 +02:00
Pablo Galindo Salgado 1b22d801b8
gh-118518: Allow perf to work without frame pointers (#112254) 2024-05-05 03:07:29 +02:00
Guido van Rossum 7d83f7bcc4
gh-118335: Configure Tier 2 interpreter at build time (#118339)
The code for Tier 2 is now only compiled when configured
with `--enable-experimental-jit[=yes|interpreter]`.

We drop support for `PYTHON_UOPS` and -`Xuops`,
but you can disable the interpreter or JIT
at runtime by setting `PYTHON_JIT=0`.
You can also build it without enabling it by default
using `--enable-experimental-jit=yes-off`;
enable with `PYTHON_JIT=1`.

On Windows, the `build.bat` script supports
`--experimental-jit`, `--experimental-jit-off`,
`--experimental-interpreter`.

In the C code, `_Py_JIT` is defined as before
when the JIT is enabled; the new variable
`_Py_TIER2` is defined when the JIT *or* the
interpreter is enabled. It is actually a bitmask:
1: JIT; 2: default-off; 4: interpreter.
2024-04-30 18:26:34 -07:00
Ken Jin dc6b12d1b2
gh-117139: Add header for tagged pointers (GH-118330)
---------

Co-authored-by: Sam Gross <655866+colesbury@users.noreply.github.com>
2024-05-01 04:46:13 +08:00
Eric Snow 03e3e31723
gh-76785: Rename _xxsubinterpreters to _interpreters (gh-117791)
See https://discuss.python.org/t/pep-734-multiple-interpreters-in-the-stdlib/41147/26.
2024-04-24 16:18:24 +00:00
Irit Katriel 04697bcfaf
gh-117494: extract the Instruction Sequence data structure into a separate file (#117496) 2024-04-04 15:47:26 +00:00
Guido van Rossum 060a96f1a9
gh-116968: Reimplement Tier 2 counters (#117144)
Introduce a unified 16-bit backoff counter type (``_Py_BackoffCounter``),
shared between the Tier 1 adaptive specializer and the Tier 2 optimizer. The
API used for adaptive specialization counters is changed but the behavior is
(supposed to be) identical.

The behavior of the Tier 2 counters is changed:
- There are no longer dynamic thresholds (we never varied these).
- All counters now use the same exponential backoff.
- The counter for ``JUMP_BACKWARD`` starts counting down from 16.
- The ``temperature`` in side exits starts counting down from 64.
2024-04-04 15:03:27 +00:00
Eric Snow f341d6017d
gh-76785: Add PyInterpreterConfig Helpers (gh-117170)
These helpers make it easier to customize and inspect the config used to initialize interpreters.  This is especially valuable in our tests.  I found inspiration from the PyConfig API for the PyInterpreterConfig dict conversion stuff.  As part of this PR I've also added a bunch of tests.
2024-04-02 20:35:52 +00:00
Sam Gross 19c1dd60c5
gh-117323: Make `cell` thread-safe in free-threaded builds (#117330)
Use critical sections to lock around accesses to cell contents. The critical sections are no-ops in the default (with GIL) build.
2024-03-29 13:35:43 -04:00
Eric Snow 617158e078
gh-76785: Drop PyInterpreterID_Type (gh-117101)
I added it quite a while ago as a strategy for managing interpreter lifetimes relative to the PEP 554 (now 734) implementation.  Relatively recently I refactored that implementation to no longer rely on InterpreterID objects.  Thus now I'm removing it.
2024-03-21 17:15:02 +00:00
Guido van Rossum 9c7b3688e6
gh-108716: Cleanup remaining deepfreeze infrastructure (#116919)
Keep Tools/build/deepfreeze.py around (we may repurpose it for deepfreezing non-code objects),
and keep basic "clean" targets that remove the output of former deep-freeze activities,
to keep the build directories of current devs clean.
2024-03-18 11:13:11 -07:00
Mark Shannon 10fbcd6c5d
GH-115816: Make tier2 optimizer symbols testable, and add a few tests. (GH-115953) 2024-02-27 10:51:26 +00:00
Steve Dower 37a13b9413
gh-115582: Make default PC/pyconfig.h work for free-threaded builds with manual /DPy_GIL_DISABLED (GH-115850) 2024-02-26 19:14:14 +00:00
Sam Gross 5903190727
gh-115103: Implement delayed memory reclamation (QSBR) (#115180)
This adds a safe memory reclamation scheme based on FreeBSD's "GUS" and
quiescent state based reclamation (QSBR). The API provides a mechanism
for callers to detect when it is safe to free memory that may be
concurrently accessed by readers.
2024-02-16 15:25:19 -05:00
mpage a95b1a56bb
gh-115041: Add wrappers that are atomic only in free-threaded builds (#115046)
These are intended to be used in places where atomics are required in
free-threaded builds but not in the default build. We don't want to
introduce the potential performance overhead of an atomic operation in the
default build.
2024-02-14 15:15:05 -05:00
Sam Gross a3af3cb4f4
gh-110481: Implement inter-thread queue for biased reference counting (#114824)
Biased reference counting maintains two refcount fields in each object:
`ob_ref_local` and `ob_ref_shared`. The true refcount is the sum of these two
fields. In some cases, when refcounting operations are split across threads,
the ob_ref_shared field can be negative (although the total refcount must be
at least zero). In this case, the thread that decremented the refcount
requests that the owning thread give up ownership and merge the refcount
fields.
2024-02-09 17:08:32 -05:00
Brandt Bucher f6d9e5926b
GH-113464: Add a JIT backend for tier 2 (GH-113465)
Add an option (--enable-experimental-jit for configure-based builds
or --experimental-jit for PCbuild-based ones) to build an
*experimental* just-in-time compiler, based on copy-and-patch (https://fredrikbk.com/publications/copy-and-patch.pdf).

See Tools/jit/README.md for more information on how to install the required build-time tooling.
2024-01-28 18:48:48 -08:00
Sam Gross b52fc70d1a
gh-112529: Implement GC for free-threaded builds (#114262)
* gh-112529: Implement GC for free-threaded builds

This implements a mark and sweep GC for the free-threaded builds of
CPython. The implementation relies on mimalloc to find GC tracked
objects (i.e., "containers").
2024-01-25 10:27:36 -08:00
Sam Gross 1d6d5e854c
gh-112529: Use GC heaps for GC allocations in free-threaded builds (gh-114157)
* gh-112529: Use GC heaps for GC allocations in free-threaded builds

The free-threaded build's garbage collector implementation will need to
find GC objects by traversing mimalloc heaps. This hooks up the
allocation calls with the correct heaps by using a thread-local
"current_obj_heap" variable.

* Refactor out setting heap based on type
2024-01-21 01:14:45 +09:00
Brandt Bucher 30e6cbdba2
GH-113860: Get rid of `_PyUOpExecutorObject` (GH-113954) 2024-01-12 11:58:23 +00:00
Donghee Na 57bdc6c30d
gh-111968: Introduce _PyFreeListState and _PyFreeListState_GET API (gh-113584) 2024-01-10 08:04:41 +09:00
Pablo Galindo Salgado a03ec20bcd
gh-110721: Remove unused code from suggestions.c after moving PyErr_Display to use the traceback module (#113712) 2024-01-08 15:10:45 +00:00
Sam Gross 99854ce170
gh-113688: Split up gcmodule.c (gh-113715)
This splits part of Modules/gcmodule.c of into Python/gc.c, which
now contains the core garbage collection implementation. The Python
module remain in the Modules/gcmodule.c file.
2024-01-05 12:17:16 -08:00
Itamar Oren 178919cf21
gh-113258: Write frozen modules to the build tree on Windows (GH-113303)
This ensures the source directory is not modified at build time, and different builds (e.g. different versions or GIL vs no-GIL) do not have conflicts.
2024-01-03 17:30:20 +00:00
Itamar Oren 2feec0fc7f
gh-113039: Avoid using leading dots in the include path for frozen getpath.py (GH-113022) 2023-12-18 17:04:40 +00:00
Steve Dower 79dad03747
gh-111650: Ensure pyconfig.h includes Py_GIL_DISABLED on Windows (GH-112778) 2023-12-13 15:38:45 +00:00
Eric Snow a49b427b02
gh-76785: More Fixes for test.support.interpreters (gh-113012)
This brings the module (along with the associated extension modules) mostly in sync with PEP 734.  There are only a few small things to wrap up.
2023-12-12 17:43:30 +00:00
Sam Gross db460735af
gh-112538: Add internal-only _PyThreadStateImpl "wrapper" for PyThreadState (gh-112560)
Every PyThreadState instance is now actually a _PyThreadStateImpl.
It is safe to cast from `PyThreadState*` to `_PyThreadStateImpl*` and back.
The _PyThreadStateImpl will contain fields that we do not want to expose
in the public C API.
2023-12-07 12:11:45 -07:00
Victor Stinner 62802b6228
gh-111545: Add Include/cpython/pyhash.h header file (#112063)
Move non-limited C API to a new Include/cpython/pyhash.h header file.
2023-11-15 01:19:20 +01:00
Sam Gross 31c90d5838
gh-111569: Implement Python critical section API (gh-111571)
Critical sections are helpers to replace the global interpreter lock
with finer grained locking.  They provide similar guarantees to the GIL
and avoid the deadlock risk that plain locking involves.  Critical
sections are implicitly ended whenever the GIL would be released.  They
are resumed when the GIL would be acquired.  Nested critical sections
behave as if the sections were interleaved.
2023-11-08 15:39:29 -07:00
Guido van Rossum 7e135a48d6
gh-111520: Integrate the Tier 2 interpreter in the Tier 1 interpreter (#111428)
- There is no longer a separate Python/executor.c file.
- Conventions in Python/bytecodes.c are slightly different -- don't use `goto error`,
  you must use `GOTO_ERROR(error)` (same for others like `unused_local_error`).
- The `TIER_ONE` and `TIER_TWO` symbols are only valid in the generated (.c.h) files.
- In Lib/test/support/__init__.py, `Py_C_RECURSION_LIMIT` is imported from `_testcapi`.
- On Windows, in debug mode, stack allocation grows from 8MiB to 12MiB.
- **Beware!** This changes the env vars to enable uops and their debugging
  to `PYTHON_UOPS` and `PYTHON_LLTRACE`.
2023-11-01 13:13:02 -07:00
Dino Viehland c42347d025
gh-90815: Exclude mimalloc .c files from Windows build (#111532)
* Don't include mimalloc .c's in Windows build
* Fix warnings on Windows related to mimalloc
2023-10-31 11:54:35 -07:00
Eric Snow c6fe0869ab
gh-76785: Move the Cross-Interpreter Code to Its Own File (gh-111502)
This is partly to clear this stuff out of pystate.c, but also in preparation for moving some code out of _xxsubinterpretersmodule.c.  This change also moves this stuff to the internal API (new: Include/internal/pycore_crossinterp.h).  @vstinner did this previously and I undid it.  Now I'm re-doing it. :/
2023-10-30 16:53:10 -06:00
Dino Viehland 05f2f0ac92
gh-90815: Add mimalloc memory allocator (#109914)
* Add mimalloc v2.12

Modified src/alloc.c to remove include of alloc-override.c and not
compile new handler.

Did not include the following files:

 - include/mimalloc-new-delete.h
 - include/mimalloc-override.h
 - src/alloc-override-osx.c
 - src/alloc-override.c
 - src/static.c
 - src/region.c

mimalloc is thread safe and shares a single heap across all runtimes,
therefore finalization and getting global allocated blocks across all
runtimes is different.

* mimalloc: minimal changes for use in Python:

 - remove debug spam for freeing large allocations
 - use same bytes (0xDD) for freed allocations in CPython and mimalloc
   This is important for the test_capi debug memory tests

* Don't export mimalloc symbol in libpython.
* Enable mimalloc as Python allocator option.
* Add mimalloc MIT license.
* Log mimalloc in Lib/test/pythoninfo.py.
* Document new mimalloc support.
* Use macro defs for exports as done in:
  https://github.com/python/cpython/pull/31164/

Co-authored-by: Sam Gross <colesbury@gmail.com>
Co-authored-by: Christian Heimes <christian@python.org>
Co-authored-by: Victor Stinner <vstinner@python.org>
2023-10-30 15:43:11 +00:00
Donghee Na 2dcc57008b
gh-109693: Remove pycore_atomic.h (gh-110992) 2023-10-18 00:33:50 +09:00
Victor Stinner be5e8a0103
gh-110964: Remove private _PyArg functions (#110966)
Move the following private functions and structures to
pycore_modsupport.h internal C API:

* _PyArg_BadArgument()
* _PyArg_CheckPositional()
* _PyArg_NoKeywords()
* _PyArg_NoPositional()
* _PyArg_ParseStack()
* _PyArg_ParseStackAndKeywords()
* _PyArg_Parser structure
* _PyArg_UnpackKeywords()
* _PyArg_UnpackKeywordsWithVararg()
* _PyArg_UnpackStack()
* _Py_ANY_VARARGS()

Changes:

* Python/getargs.h now includes pycore_modsupport.h to export
  functions.
* clinic.py now adds pycore_modsupport.h when one of these functions
  is used.
* Add pycore_modsupport.h includes when a C extension uses one of
  these functions.
* Define Py_BUILD_CORE_MODULE in C extensions which now include
  directly or indirectly (via code generated by Argument Clinic)
  pycore_modsupport.h:

  * _csv
  * _curses_panel
  * _dbm
  * _gdbm
  * _multiprocessing.posixshmem
  * _sqlite.row
  * _statistics
  * grp
  * resource
  * syslog

* _testcapi: bad_get() no longer uses METH_FASTCALL calling
  convention but METH_VARARGS. Replace _PyArg_UnpackStack() with
  PyArg_ParseTuple().
* _testcapi: add PYTESTCAPI_NEED_INTERNAL_API macro which is defined
  by _testcapi sub-modules which need the internal C API
  (pycore_modsupport.h): exceptions.c, float.c, vectorcall.c,
  watchers.c.
* Remove Include/cpython/modsupport.h header file.
  Include/modsupport.h no longer includes the removed header file.
* Fix mypy clinic.py
2023-10-17 14:30:31 +02:00