The existence of background threads running on a subinterpreter was preventing interpreters from getting properly destroyed, as well as impacting the ability to run the interpreter again. It also affected how we wait for non-daemon threads to finish.
We add PyInterpreterState.threads.main, with some internal C-API functions.
* Remove unused <locale.h> includes.
* Remove unused <fcntl.h> include in traceback.h.
* Remove redundant <assert.h> and <stddef.h> includes. They are already
included by "Python.h".
* Remove <object.h> include in faulthandler.c. Python.h already includes it.
* Add missing <stdbool.h> in pycore_pythread.h if HAVE_PTHREAD_STUBS
is defined.
* Fix also warnings in pthread_stubs.h: don't redefine macros if they
are already defined, like the __NEED_pthread_t macro.
* pycore_pythread.h is now the central place to make sure that
_POSIX_THREADS and _POSIX_SEMAPHORES macros are defined if
available.
* Make sure that pycore_pythread.h is included when _POSIX_THREADS
and _POSIX_SEMAPHORES macros are tested.
* PY_TIMEOUT_MAX is now defined as a constant, since its value
depends on _POSIX_THREADS, instead of being defined as a macro.
* Prevent integer overflow in the preprocessor when computing
PY_TIMEOUT_MAX_VALUE on Windows:
replace "0xFFFFFFFELL * 1000 < LLONG_MAX"
with "0xFFFFFFFELL < LLONG_MAX / 1000".
* Document the change and give hints how to fix affected code.
* Add an exception for PY_TIMEOUT_MAX name to smelly.py
* Add PY_TIMEOUT_MAX to the stable ABI
These are the most popular specializations of `LOAD_ATTR` and `STORE_ATTR`
that weren't already viable uops:
* Split LOAD_ATTR_METHOD_WITH_VALUES
* Split LOAD_ATTR_METHOD_NO_DICT
* Split LOAD_ATTR_SLOT
* Split STORE_ATTR_SLOT
* Split STORE_ATTR_INSTANCE_VALUE
Also:
* Add `-v` flag to code generator which prints a list of non-viable uops
(easter-egg: it can print execution counts -- see source)
* Double _Py_UOP_MAX_TRACE_LENGTH to 128
I had dropped one of the DEOPT_IF() calls! :-(
PyMutex is a one byte lock with fast, inlineable lock and unlock functions for the common uncontended case. The design is based on WebKit's WTF::Lock.
PyMutex is built using the _PyParkingLot APIs, which provides a cross-platform futex-like API (based on WebKit's WTF::ParkingLot). This internal API will be used for building other synchronization primitives used to implement PEP 703, such as one-time initialization and events.
This also includes tests and a mini benchmark in Tools/lockbench/lockbench.py to compare with the existing PyThread_type_lock.
Uncontended acquisition + release:
* Linux (x86-64): PyMutex: 11 ns, PyThread_type_lock: 44 ns
* macOS (arm64): PyMutex: 13 ns, PyThread_type_lock: 18 ns
* Windows (x86-64): PyMutex: 13 ns, PyThread_type_lock: 38 ns
PR Overview:
The primary purpose of this PR is to implement PyMutex, but there are a number of support pieces (described below).
* PyMutex: A 1-byte lock that doesn't require memory allocation to initialize and is generally faster than the existing PyThread_type_lock. The API is internal only for now.
* _PyParking_Lot: A futex-like API based on the API of the same name in WebKit. Used to implement PyMutex.
* _PyRawMutex: A word sized lock used to implement _PyParking_Lot.
* PyEvent: A one time event. This was used a bunch in the "nogil" fork and is useful for testing the PyMutex implementation, so I've included it as part of the PR.
* pycore_llist.h: Defines common operations on doubly-linked list. Not strictly necessary (could do the list operations manually), but they come up frequently in the "nogil" fork. ( Similar to https://man.freebsd.org/cgi/man.cgi?queue)
---------
Co-authored-by: Eric Snow <ericsnowcurrently@gmail.com>
There is a WIP proposal to enable webassembly stack switching which have been
implemented in v8:
https://github.com/WebAssembly/js-promise-integration
It is not possible to switch stacks that contain JS frames so the Emscripten JS
trampolines that allow calling functions with the wrong number of arguments
don't work in this case. However, the js-promise-integration proposal requires
the [type reflection for Wasm/JS API](https://github.com/WebAssembly/js-types)
proposal, which allows us to actually count the number of arguments a function
expects.
For better compatibility with stack switching, this PR checks if type reflection
is available, and if so we use a switch block to decide the appropriate
signature. If type reflection is unavailable, we should use the current EMJS
trampoline.
We cache the function argument counts since when I didn't cache them performance
was negatively affected.
Co-authored-by: T. Wouters <thomas@python.org>
Co-authored-by: Brett Cannon <brett@python.org>
This makes the internal representation in the code generator simpler: there's a list of ops, and a list of macros, and there's no special-casing needed for ops that aren't macros. (There's now special-casing for ops that are also macros, but that's simpler.)
* Rename SAVE_IP to _SET_IP
* Rename EXIT_TRACE to _EXIT_TRACE
* Rename SAVE_CURRENT_IP to _SAVE_CURRENT_IP
* Rename INSERT to _INSERT (This is for Ken Jin's abstract interpreter)
* Rename IS_NONE to _IS_NONE
* Rename JUMP_TO_TOP to _JUMP_TO_TOP
This adds a 16-bit inline cache entry to the conditional branch instructions POP_JUMP_IF_{FALSE,TRUE,NONE,NOT_NONE} and their instrumented variants, which is used to keep track of the branch direction.
Each time we encounter these instructions we shift the cache entry left by one and set the bottom bit to whether we jumped.
Then when it's time to translate such a branch to Tier 2 uops, we use the bit count from the cache entry to decided whether to continue translating the "didn't jump" branch or the "jumped" branch.
The counter is initialized to a pattern of alternating ones and zeros to avoid bias.
The .pyc file magic number is updated. There's a new test, some fixes for existing tests, and a few miscellaneous cleanups.
Fix _thread.start_new_thread() race condition. If a thread is created
during Python finalization, the newly spawned thread now exits
immediately instead of trying to access freed memory and lead to a
crash.
thread_run() calls PyEval_AcquireThread() which checks if the thread
must exit. The problem was that tstate was dereferenced earlier in
_PyThreadState_Bind() which leads to a crash most of the time.
Move _PyThreadState_CheckConsistency() from thread_run() to
_PyThreadState_Bind().
thread_run() of _threadmodule.c now calls
_PyThreadState_CheckConsistency() to check if tstate is a dangling
pointer when Python is built in debug mode.
Rename ceval_gil.c is_tstate_valid() to
_PyThreadState_CheckConsistency() to reuse it in _threadmodule.c.
Statistics gathering is now off by default. Use the "-X pystats"
command line option or set the new PYTHONSTATS environment variable
to 1 to turn statistics gathering on at Python startup.
Statistics are no longer dumped at exit if statistics gathering was
off or statistics have been cleared.
Changes:
* Add PYTHONSTATS environment variable.
* sys._stats_dump() now returns False if statistics are not dumped
because they are all equal to zero.
* Add PyConfig._pystats member.
* Add tests on sys functions and on setting PyConfig._pystats to 1.
* Add Include/cpython/pystats.h and Include/internal/pycore_pystats.h
header files.
* Rename '_py_stats' variable to '_Py_stats'.
* Exclude Include/cpython/pystats.h from the Py_LIMITED_API.
* Move pystats.h include from object.h to Python.h.
* Add _Py_StatsOn() and _Py_StatsOff() functions. Remove
'_py_stats_struct' variable from the API: make it static in
specialize.c.
* Document API in Include/pystats.h and Include/cpython/pystats.h.
* Complete pystats documentation in Doc/using/configure.rst.
* Don't write "all zeros" stats: if _stats_off() and _stats_clear()
or _stats_dump() were called.
* _PyEval_Fini() now always call _Py_PrintSpecializationStats() which
does nothing if stats are all zeros.
Co-authored-by: Michael Droettboom <mdboom@gmail.com>
Move the private _PyErr_WriteUnraisableMsg() functions to the
internal C API (pycore_pyerrors.h).
Move write_unraisable_exc() from _testcapi to _testinternalcapi.
* Move <ctype.h>, <limits.h> and <stdarg.h> standard includes to
Python.h.
* Move "pystats.h" include from object.h to Python.h.
* Remove redundant "pymem.h" include in objimpl.h and "pyport.h"
include in pymem.h; Python.h already includes them earlier.
* Remove redundant <wchar.h> include in unicodeobject.h; Python.h
already includes it.
* Move _SGI_MP_SOURCE define from Python.h to pyport.h.
* pycore_condvar.h includes explicitly <unistd.h> for the
_POSIX_THREADS macro.
pycore_create_interpreter() now returns a status, rather than
calling Py_FatalError().
* PyInterpreterState_New() now calls Py_ExitStatusException() instead
of calling Py_FatalError() directly.
* Replace Py_FatalError() with PyStatus in init_interpreter() and
_PyObject_InitState().
* _PyErr_SetFromPyStatus() now raises RuntimeError, instead of
ValueError. It can now call PyErr_NoMemory(), raise MemoryError,
if it detects _PyStatus_NO_MEMORY() error message.
Move the private _PyLong_Sign() and _PyLong_NumBits() functions
to the internal C API (pycore_long.h).
Modules/_testcapi/long.c now uses the internal C API.
This adds a new header that provides atomic operations on common data
types. The intention is that this will be exposed through Python.h,
although that is not the case yet. The only immediate use is in
the test file.
Co-authored-by: Sam Gross <colesbury@gmail.com>
Python built with "configure --with-trace-refs" (tracing references)
is now ABI compatible with Python release build and debug build.
Moreover, it now also supports the Limited API.
Change Py_TRACE_REFS build:
* Remove _PyObject_EXTRA_INIT macro.
* The PyObject structure no longer has two extra members (_ob_prev
and _ob_next).
* Use a hash table (_Py_hashtable_t) to trace references (all
objects): PyInterpreterState.object_state.refchain.
* Py_TRACE_REFS build is now ABI compatible with release build and
debug build.
* Limited C API extensions can now be built with Py_TRACE_REFS:
xxlimited, xxlimited_35, _testclinic_limited.
* No longer rename PyModule_Create2() and PyModule_FromDefAndSpec2()
functions to PyModule_Create2TraceRefs() and
PyModule_FromDefAndSpec2TraceRefs().
* _Py_PrintReferenceAddresses() is now called before
finalize_interp_delete() which deletes the refchain hash table.
* test_tracemalloc find_trace() now also filters by size to ignore
the memory allocated by _PyRefchain_Trace().
Test changes for Py_TRACE_REFS:
* Add test.support.Py_TRACE_REFS constant.
* Add test_sys.test_getobjects() to test sys.getobjects() function.
* test_exceptions skips test_recursion_normalizing_with_no_memory()
and test_memory_error_in_PyErr_PrintEx() if Python is built with
Py_TRACE_REFS.
* test_repl skips test_no_memory().
* test_capi skisp test_set_nomemory().
Remove _PyErr_ChainExceptions(), _PyErr_ChainExceptions1() and
_PyErr_SetFromPyStatus() functions from the public C API.
* Move the private _PyErr_ChainExceptions() and
_PyErr_ChainExceptions1() function to the internal C API
(pycore_pyerrors.h).
* Move the private _PyErr_SetFromPyStatus() to the internal C API
(pycore_initconfig.h).
* No longer export the _PyErr_ChainExceptions() function.
* Move run_in_subinterp_with_config() from _testcapi to
_testinternalcapi.
There is no need to export the _Py_ForgetReference() function of the
Py_TRACE_REFS build. It's not used by shared extensions. Enhance also
its comment.
Move PyUnstable_ExecutableKinds and associated macros from the
internal C API to the public C API.
Rename constants: replace "PY_" prefix with "PyUnstable_" prefix.
Also remove NOP instructions.
The "stubs" are not optimized in this fashion (their SAVE_IP should always be preserved since it's where to jump next, and they don't contain NOPs by their nature).
Move the following private API to the internal C API (pycore_long.h):
* _PyLong_Copy()
* _PyLong_FromDigits()
* _PyLong_New()
No longer export most of these functions.
The remove private _PyGILState_GetInterpreterStateUnsafe() function
from the public C API: move it the internal C API (pycore_pystate.h).
No longer export the function.
The remove private _Py_UniversalNewlineFgetsWithSize() function from
the public C API: move it the internal C API (pycore_fileutils.h).
No longer export the function.
Remove these private functions from the public C API:
* _PyRun_AnyFileObject()
* _PyRun_InteractiveLoopObject()
* _PyRun_SimpleFileObject()
* _Py_SourceAsString()
Move them to the internal C API: add a new pycore_pythonrun.h header
file. No longer export these functions.
Remove the private _Py_Identifier type and related private functions
from the public C API:
* _PyObject_GetAttrId()
* _PyObject_LookupSpecialId()
* _PyObject_SetAttrId()
* _PyType_LookupId()
* _Py_IDENTIFIER()
* _Py_static_string()
* _Py_static_string_init()
Move them to the internal C API: add a new pycore_identifier.h header
file. No longer export these functions.
Move these private functions to the internal C API
(pycore_abstract.h):
* _Py_convert_optional_to_ssize_t()
* _PyNumber_Index()
Argument Clinic now emits #include "pycore_abstract.h" when these
functions are used.
The parser of the c-analyzer tool now uses a list of files which use
the limited C API, rather than a list of files using the internal C
API.
Move the private _PyLong converter functions to the internal C API
* _PyLong_FileDescriptor_Converter(): moved to pycore_fileutils.h
* _PyLong_Size_t_Converter(): moved to pycore_long.h
Argument Clinic now emits includes for pycore_fileutils.h and
pycore_long.h when these functions are used.
Move these private functions to the internal C API (pycore_long.h):
* _PyLong_UnsignedInt_Converter()
* _PyLong_UnsignedLongLong_Converter()
* _PyLong_UnsignedLong_Converter()
* _PyLong_UnsignedShort_Converter()
Argument Clinic now emits #include "pycore_long.h" when these
functions are used.
Instead of using `GO_TO_INSTRUCTION(CALL_PY_EXACT_ARGS)` we just add the macro elements of the latter to the macro for the former. This requires lengthening the uops array in struct opcode_macro_expansion. (It also required changes to stacking.py that were merged already.)
Move private functions to the internal C API (pycore_sysmodule.h):
* _PySys_GetAttr()
* _PySys_GetSizeOf()
No longer export most of these functions.
Fix also a typo in Include/cpython/optimizer.h: add a missing space.
Move private functions to the internal C API (pycore_dict.h):
* _PyDictView_Intersect()
* _PyDictView_New()
* _PyDict_ContainsId()
* _PyDict_DelItemId()
* _PyDict_DelItem_KnownHash()
* _PyDict_GetItemIdWithError()
* _PyDict_GetItem_KnownHash()
* _PyDict_HasSplitTable()
* _PyDict_NewPresized()
* _PyDict_Next()
* _PyDict_Pop()
* _PyDict_SetItemId()
* _PyDict_SetItem_KnownHash()
* _PyDict_SizeOf()
No longer export most of these functions.
Move also the _PyDictViewObject structure to the internal C API.
Move dict_getitem_knownhash() function from _testcapi to the
_testinternalcapi extension. Update test_capi.test_dict for this
change.
Move private _PyEval functions to the internal C API
(pycore_ceval.h):
* _PyEval_GetBuiltin()
* _PyEval_GetBuiltinId()
* _PyEval_GetSwitchInterval()
* _PyEval_MakePendingCalls()
* _PyEval_SetProfile()
* _PyEval_SetSwitchInterval()
* _PyEval_SetTrace()
No longer export most of these functions.
106320: Remove private float C API functions
Remove private C API functions:
* _Py_parse_inf_or_nan()
* _Py_string_to_number_with_underscores()
Move these functions to the internal C API and no longer export them.
No longer export _PyCompile_AstOptimize() internal C API function.
Change comment style to "// comment" and add comment explaining why
other functions have to be exported.
No longer export _PyUnicode_FromId() internal C API function.
Change comment style to "// comment" and add comment explaining why
other functions have to be exported.
Update Tools/build/generate_token.py to update Include/internal/pycore_token.h
comments.
Remove the internal _PyDict_GetItemStringWithError() function. It can
now be replaced with the new public PyDict_ContainsString() and
PyDict_GetItemStringRef() functions.
getargs.c now now uses a strong reference for current_arg.
find_keyword() returns a strong reference.
No longer export these 5 internal C API functions:
* _PyArena_AddPyObject()
* _PyArena_Free()
* _PyArena_Malloc()
* _PyArena_New()
* _Py_FatalRefcountErrorFunc()
Change comment style to "// comment" and add comment explaining why
other functions have to be exported.
No longer export these 2 internal C API functions:
* _PyTime_AsNanoseconds()
* _PyTime_GetSystemClockWithInfo()
Change comment style to "// comment" and add comment explaining why
other functions have to be exported.
Remove private _PyDict_GetItemStringWithError() function of the
public C API: the new PyDict_GetItemStringRef() can be used instead.
* Move private _PyDict_GetItemStringWithError() to the internal C API.
* _testcapi get_code_extra_index() uses PyDict_GetItemStringRef().
Avoid using private functions in _testcapi which tests the public C
API.
* Add missing includes.
* Remove unused includes.
* Update old include/symbol names to newer names.
* Mention at least one included symbol.
* Sort includes.
* Update Tools/cases_generator/generate_cases.py used to generated
pycore_opcode_metadata.h.
* Update Parser/asdl_c.py used to generate pycore_ast.h.
* Cleanup also includes in _testcapimodule.c and _testinternalcapi.c.
* pycore_intrinsics.h does nothing if included twice
(add #ifndef and #define).
* Update Tools/cases_generator/generate_cases.py to generate the
Py_BUILD_CORE test.
* _bz2, _lzma, _opcode and zlib extensions now define the
Py_BUILD_CORE_MODULE macro to use internal headers
(pycore_code.h, pycore_intrinsics.h and pycore_blocks_output_buffer.h).
This finishes the work begun in gh-107760. When, while projecting a superblock, we encounter a call to a short, simple function, the superblock will now enter the function using `_PUSH_FRAME`, continue through it, and leave it using `_POP_FRAME`, and then continue through the original code. Multiple frame pushes and pops are even possible. It is also possible to stop appending to the superblock in the middle of a called function, when running out of space or encountering an unsupported bytecode.
* Split `CALL_PY_EXACT_ARGS` into uops
This is only the first step for doing `CALL` in Tier 2.
The next step involves tracing into the called code object and back.
After that we'll have to do the remaining `CALL` specialization.
Finally we'll have to deal with `KW_NAMES`.
Note: this moves setting `frame->return_offset` directly in front of
`DISPATCH_INLINED()`, to make it easier to move it into `_PUSH_FRAME`.
Introducing a new file, stacking.py, that takes over several responsibilities related to symbolic evaluation of push/pop operations, with more generality.
The linked list of objects was a global variable, which broke isolation between interpreters, causing crashes. To solve this, we've moved the linked list to each interpreter.
There's no need to use a dummy uop to skip unused cache entries. The macro syntax lets you write `unused/1` instead.
Similarly, move `unused/5` from op `_LOAD_ATTR_INSTANCE_VALUE` to macro `LOAD_ATTR_INSTANCE_VALUE`.
This fixes a crasher due to a race condition, triggered infrequently when two isolated (own GIL) subinterpreters simultaneously initialize their sys or builtins modules. The crash happened due the combination of the "detached" thread state we were using and the "last holder" logic we use for the GIL. It turns out it's tricky to use the same thread state for different threads. Who could have guessed?
We solve the problem by eliminating the one object we were still sharing between interpreters. We replace it with a low-level hashtable, using the "raw" allocator to avoid tying it to the main interpreter.
We also remove the accommodations for "detached" thread states, which were a dubious idea to start with.
The _xxsubinterpreters module should not rely on internal API. Some of the functions it uses were recently moved there however. Here we move them back (and expose them properly).
We tried this before with a dict and for all interned strings. That ran into problems due to interpreter isolation. However, exclusively using a per-interpreter cache caused some inconsistency that can eliminate the benefit of interning. Here we circle back to using a global cache, but only for statically allocated strings. We also use a more-basic _Py_hashtable_t for that global cache instead of a dict.
Ideally we would only have the global cache, but the optional isolation of each interpreter's allocator means that a non-static string object must not outlive its interpreter. Thus we would have to store a copy of each such interned string in the global cache, tied to the main interpreter.
No longer export these 5 internal C API variables:
* _PyBufferWrapper_Type
* _PyImport_FrozenBootstrap
* _PyImport_FrozenStdlib
* _PyImport_FrozenTest
* _Py_SwappedOp
Fix the definition of these internal functions, replace PyAPI_DATA()
with PyAPI_FUNC():
* _PyImport_ClearExtension()
* _PyObject_IsFreed()
* _PyThreadState_GetCurrent()
No longer export these 2 internal C API functions:
* _PyEval_SignalAsyncExc()
* _PyEval_SignalReceived()
Add also comments explaining why some internal functions have to be
exported, and update existing comments.
Move private _PyMem functions to the internal C API (pycore_pymem.h):
* _PyMem_GetCurrentAllocatorName()
* _PyMem_RawStrdup()
* _PyMem_RawWcsdup()
* _PyMem_Strdup()
No longer export these functions.
Move pymem_getallocatorsname() function from _testcapi to _testinternalcapi,
since the API moved to the internal C API.
Move private _PyDict functions to the internal C API (pycore_dict.h):
* _PyDict_Contains_KnownHash()
* _PyDict_DebugMallocStats()
* _PyDict_DelItemIf()
* _PyDict_GetItemWithError()
* _PyDict_HasOnlyStringKeys()
* _PyDict_MaybeUntrack()
* _PyDict_MergeEx()
No longer export these functions.
Move private _PyObject and private _PyType functions to the internal
C API (pycore_object.h):
* _PyObject_GetMethod()
* _PyObject_IsAbstract()
* _PyObject_NextNotImplemented()
* _PyType_CalculateMetaclass()
* _PyType_GetDocFromInternalDoc()
* _PyType_GetTextSignatureFromInternalDoc()
No longer export these functions.
Move private _PyBytes functions to the internal C API
(pycore_bytesobject.h):
* _PyBytes_DecodeEscape()
* _PyBytes_FormatEx()
* _PyBytes_FromHex()
* _PyBytes_Join()
No longer export these functions.
Move private debug _PyObject functions to the internal C API
(pycore_object.h):
* _PyDebugAllocatorStats()
* _PyObject_CheckConsistency()
* _PyObject_DebugTypeStats()
* _PyObject_IsFreed()
No longer export most of these functions, except of
_PyObject_IsFreed().
Move test functions using _PyObject_IsFreed() from _testcapi to
_testinternalcapi. check_pyobject_is_freed() test no longer catch
_testcapi.error: the tested function cannot raise _testcapi.error.
Move private _PyModule API to the internal C API
(pycore_moduleobject.h):
* _PyModule_Clear()
* _PyModule_ClearDict()
* _PyModuleSpec_IsInitializing()
* _PyModule_IsExtension()
No longer export these functions.
Rename private C API constants:
* Rename PY_MONITORING_UNGROUPED_EVENTS to _PY_MONITORING_UNGROUPED_EVENTS
* Rename PY_MONITORING_EVENTS to _PY_MONITORING_EVENTS
Move the private _PyInterpreterID C API to the internal C API: add a
new pycore_interp_id.h header file.
Remove Include/interpreteridobject.h and
Include/cpython/interpreteridobject.h header files.
Move private _PyGen API to internal C API:
* _PyAsyncGenAThrow_Type
* _PyAsyncGenWrappedValue_Type
* _PyCoroWrapper_Type
* _PyGen_FetchStopIterationValue()
* _PyGen_Finalize()
* _PyGen_SetStopIterationValue()
No longer these symbols, except of the ones by the _asyncio shared
extensions.
* No longer export most private _PyHash symbols, only export the ones
which are needed by shared extensions.
* Modules/_xxtestfuzz/fuzzer.c now uses the internal C API.
By turning `assert(kwnames == NULL)` into a macro that is not in the "forbidden" list, many instructions that formerly were skipped because they contained such an assert (but no other mention of `kwnames`) are now supported in Tier 2. This covers 10 instructions in total (all specializations of `CALL` that invoke some C code):
- `CALL_NO_KW_TYPE_1`
- `CALL_NO_KW_STR_1`
- `CALL_NO_KW_TUPLE_1`
- `CALL_NO_KW_BUILTIN_O`
- `CALL_NO_KW_BUILTIN_FAST`
- `CALL_NO_KW_LEN`
- `CALL_NO_KW_ISINSTANCE`
- `CALL_NO_KW_METHOD_DESCRIPTOR_O`
- `CALL_NO_KW_METHOD_DESCRIPTOR_NOARGS`
- `CALL_NO_KW_METHOD_DESCRIPTOR_FAST`
This moves EXIT_TRACE, SAVE_IP, JUMP_TO_TOP, and
_POP_JUMP_IF_{FALSE,TRUE} from ceval.c to bytecodes.c.
They are no less special than before, but this way
they are discoverable o the copy-and-patch tooling.