This is essentially a cleanup, moving a handful of API declarations to the header files where they fit best, creating new ones when needed.
We do the following:
* add pycore_debug_offsets.h and move _Py_DebugOffsets, etc. there
* inline struct _getargs_runtime_state and struct _gilstate_runtime_state in _PyRuntimeState
* move struct _reftracer_runtime_state to the existing pycore_object_state.h
* add pycore_audit.h and move to it _Py_AuditHookEntry , _PySys_Audit(), and _PySys_ClearAuditHooks
* add audit.h and cpython/audit.h and move the existing audit-related API there
*move the perfmap/trampoline API from cpython/sysmodule.h to cpython/ceval.h, and remove the now-empty cpython/sysmodule.h
Use a `_PyStackRef` and defer the reference to `f_executable` when
possible. This avoids some reference count contention in the common case
of executing the same code object from multiple threads concurrently in
the free-threaded build.
This PR adds the ability to enable the GIL if it was disabled at
interpreter startup, and modifies the multi-phase module initialization
path to enable the GIL when loading a module, unless that module's spec
includes a slot indicating it can run safely without the GIL.
PEP 703 called the constant for the slot `Py_mod_gil_not_used`; I went
with `Py_MOD_GIL_NOT_USED` for consistency with gh-104148.
A warning will be issued up to once per interpreter for the first
GIL-using module that is loaded. If `-v` is given, a shorter message
will be printed to stderr every time a GIL-using module is loaded
(including the first one that issues a warning).