* Remove _PyThreadState_Current
* Replace GET_TSTATE() with PyThreadState_GET()
* Replace GET_INTERP_STATE() with _PyInterpreterState_GET_UNSAFE()
* Replace direct access to _PyThreadState_Current with
PyThreadState_GET()
* Replace _PyThreadState_Current with
_PyRuntime.gilstate.tstate_current
* Rename SET_TSTATE() to _PyThreadState_SET(), name more
consistent with _PyThreadState_GET()
* Update outdated comments
`list.append([], None)` was profiled but `list.append([], None, **{})` was not profiled.
Enable profiling for later case.
https://bugs.python.org/issue34125
This will prevent emitting a resource warning when the execution was
interrupted by Ctrl-C between calling open() and entering a 'with' block
in "with open()".
* Added new opcode END_ASYNC_FOR.
* Setting global StopAsyncIteration no longer breaks "async for" loops.
* Jumping into an "async for" loop is now disabled.
* Jumping out of an "async for" loop no longer corrupts the stack.
* Simplify the compiler.
* Add coro.cr_origin and sys.set_coroutine_origin_tracking_depth
* Use coroutine origin information in the unawaited coroutine warning
* Stop using set_coroutine_wrapper in asyncio debug mode
* In BaseEventLoop.set_debug, enable debugging in the correct thread
This kludge is from 1992. Any C99 compiler is going to be able to handle the
ceval dispatch switch.
Anyway, we have much bigger switches than the ceval dispatch one around. (See,
e.g., Objects/unicodetype_db.h.)
The concrete PyDict_* API is used to interact with PyInterpreterState.modules in a number of places. This isn't compatible with all dict subclasses, nor with other Mapping implementations. This patch switches the concrete API usage to the corresponding abstract API calls.
We also add a PyImport_GetModule() function (and some other helpers) to reduce a bunch of code duplication.
* Add Py_UNREACHABLE() as an alias to abort().
* Use Py_UNREACHABLE() instead of assert(0)
* Convert more unreachable code to use Py_UNREACHABLE()
* Document Py_UNREACHABLE() and a few other macros.
PR #1638, for bpo-28411, causes problems in some (very) edge cases. Until that gets sorted out, we're reverting the merge. PR #3506, a fix on top of #1638, is also getting reverted.
* group the (stateful) runtime globals into various topical structs
* consolidate the topical structs under a single top-level _PyRuntimeState struct
* add a check-c-globals.py script that helps identify runtime globals
Other globals are excluded (see globals.txt and check-c-globals.py).
f_trace_lines: enable/disable line trace events
f_trace_opcodes: enable/disable opcode trace events
These are intended primarily for testing of the interpreter
itself, as they make it much easier to emulate signals
arriving at unfortunate times.
* group the (stateful) runtime globals into various topical structs
* consolidate the topical structs under a single top-level _PyRuntimeState struct
* add a check-c-globals.py script that helps identify runtime globals
Other globals are excluded (see globals.txt and check-c-globals.py).
* Improve signal delivery
Avoid using Py_AddPendingCall from signal handler, to avoid calling signal-unsafe functions.
* Remove unused function
* Improve comments
* Add stress test
* Adapt for --without-threads
* Add second stress test
* Add NEWS blurb
* Address comments @haypo
If we have a chain of generators/coroutines that are 'yield from'ing
each other, then resuming the stack works like:
- call send() on the outermost generator
- this enters _PyEval_EvalFrameDefault, which re-executes the
YIELD_FROM opcode
- which calls send() on the next generator
- which enters _PyEval_EvalFrameDefault, which re-executes the
YIELD_FROM opcode
- ...etc.
However, every time we enter _PyEval_EvalFrameDefault, the first thing
we do is to check for pending signals, and if there are any then we
run the signal handler. And if it raises an exception, then we
immediately propagate that exception *instead* of starting to execute
bytecode. This means that e.g. a SIGINT at the wrong moment can "break
the chain" – it can be raised in the middle of our yield from chain,
with the bottom part of the stack abandoned for the garbage collector.
The fix is pretty simple: there's already a special case in
_PyEval_EvalFrameEx where it skips running signal handlers if the next
opcode is SETUP_FINALLY. (I don't see how this accomplishes anything
useful, but that's another story.) If we extend this check to also
skip running signal handlers when the next opcode is YIELD_FROM, then
that closes the hole – now the exception can only be raised at the
innermost stack frame.
This shouldn't have any performance implications, because the opcode
check happens inside the "slow path" after we've already determined
that there's a pending signal or something similar for us to process;
the vast majority of the time this isn't true and the new check
doesn't run at all.
* bpo-6532: Make the thread id an unsigned integer.
From C API side the type of results of PyThread_start_new_thread() and
PyThread_get_thread_ident(), the id parameter of
PyThreadState_SetAsyncExc(), and the thread_id field of PyThreadState
changed from "long" to "unsigned long".
* Restore a check in thread_get_ident().
When LOAD_METHOD is used for calling C mehtod, PyMethodDescrObject
was passed to profilefunc from 5566bbb.
But lsprof traces only PyCFunctionObject. Additionally, there can be
some third party extension which assumes passed arg is
PyCFunctionObject without calling PyCFunction_Check().
So make PyCFunctionObject from PyMethodDescrObject when
tstate->c_profilefunc is set.
When you use `'%s' % SubClassOfStr()`, where `SubClassOfStr.__rmod__` exists, the reverse operation is ignored as normally such string formatting operations use the `PyUnicode_Format()` fast path. This patch tests for subclasses of `str` first and picks the slow path in that case.
Patch by Martijn Pieters.
* Move all functions to call objects in a new Objects/call.c file.
* Rename fast_function() to _PyFunction_FastCallKeywords().
* Copy null_error() from Objects/abstract.c
* Inline type_error() in call.c to not have to copy it, it was only
called once.
* Export _PyEval_EvalCodeWithName() since it is now called
from call.c.
* Move all functions to call objects in a new Objects/call.c file.
* Rename fast_function() to _PyFunction_FastCallKeywords().
* Copy null_error() from Objects/abstract.c
* Inline type_error() in call.c to not have to copy it, it was only
called once.
* Export _PyEval_EvalCodeWithName() since it is now called
from call.c.
Issue #29227: Inline call_function() into _PyEval_EvalFrameDefault() using
Py_LOCAL_INLINE to reduce the stack consumption.
It reduces the stack consumption, bytes per call, before => after:
test_python_call: 1152 => 1040 (-112 B)
test_python_getitem: 1008 => 976 (-32 B)
test_python_iterator: 1232 => 1120 (-112 B)
=> total: 3392 => 3136 (- 256 B)
Special thanks to INADA Naoki for pushing the patch through
the last mile, Serhiy Storchaka for reviewing the code, and to
Victor Stinner for suggesting the idea (originally implemented
in the PyPy project).
The PEP 523 modified PyEval_EvalFrameEx(): it's now an indirection to
interp->eval_frame().
Inline the call in performance critical code. Leave PyEval_EvalFrame()
unchanged, this function is only kept for backward compatibility.
Issue #28838: Rename parameters of the "calls" functions of the Python C API.
* Rename 'callable_object' and 'func' to 'callable': any Python callable object
is accepted, not only Python functions
* Rename 'method' and 'nameid' to 'name' (method name)
* Rename 'o' to 'obj'
* Move, fix and update documentation of PyObject_CallXXX() functions
in abstract.h
* Update also the documentaton of the C API (update parameter names)
Issue #28858: The change b9c9691c72c5 introduced a regression. It seems like
_PyObject_CallArg1() uses more stack memory than
PyObject_CallFunctionObjArgs().
* PyObject_CallFunctionObjArgs(func, NULL) => _PyObject_CallNoArg(func)
* PyObject_CallFunctionObjArgs(func, arg, NULL) => _PyObject_CallArg1(func, arg)
PyObject_CallFunctionObjArgs() allocates 40 bytes on the C stack and requires
extra work to "parse" C arguments to build a C array of PyObject*.
_PyObject_CallNoArg() and _PyObject_CallArg1() are simpler and don't allocate
memory on the C stack.
This change is part of the fastcall project. The change on listsort() is
related to the issue #23507.
Issue #28799:
* Remove the PyEval_GetCallStats() function.
* Deprecate the untested and undocumented sys.callstats() function.
* Remove the CALL_PROFILE special build
Use the sys.setprofile() function, cProfile or profile module to profile
function calls.
Issue #28782: Fix a bug in the implementation ``yield from`` when checking
if the next instruction is YIELD_FROM. Regression introduced by WORDCODE
(issue #26647).
Reviewed by Serhiy Storchaka and Yury Selivanov.
When Python is not compiled with PGO, the performance of Python on call_simple
and call_method microbenchmarks depend highly on the code placement. In the
worst case, the performance slowdown can be up to 70%.
The GCC __attribute__((hot)) attribute helps to keep hot code close to reduce
the risk of such major slowdown. This attribute is ignored when Python is
compiled with PGO.
The following functions are considered as hot according to statistics collected
by perf record/perf report:
* _PyEval_EvalFrameDefault()
* call_function()
* _PyFunction_FastCall()
* PyFrame_New()
* frame_dealloc()
* PyErr_Occurred()