* Add _PyObject_HasFastCall()
* partial_call() now avoids temporary tuple to pass positional
arguments if the callable supports the FASTCALL calling convention
for positional arguments.
* Fix also a performance regression in partial_call() if the callable
doesn't support FASTCALL.
PyEval_Call* APIs are not documented and they doesn't respect PY_SSIZE_T_CLEAN.
So add comment block which recommends PyObject_Call* APIs to ceval.h.
This commit also changes PyEval_CallMethod and PyEval_CallFunction
implementation same to PyObject_CallMethod and PyObject_CallFunction
to reduce future maintenance cost. Optimization to avoid temporary
tuple are copied too.
PyEval_CallFunction(callable, "i", (int)i) now calls callable(i) instead of
raising TypeError. But accepting this edge case is backward compatible.
allocated.
On PyMem_Realloc failure, _PyCode_SetExtra should free co_extra if
co_extra->ce_extras could not be allocated.
On PyMem_Realloc success, _PyCode_SetExtra should set all unused slots in
co_extra->ce_extras to NULL.
* Move all functions to call objects in a new Objects/call.c file.
* Rename fast_function() to _PyFunction_FastCallKeywords().
* Copy null_error() from Objects/abstract.c
* Inline type_error() in call.c to not have to copy it, it was only
called once.
* Export _PyEval_EvalCodeWithName() since it is now called
from call.c.
* Move all functions to call objects in a new Objects/call.c file.
* Rename fast_function() to _PyFunction_FastCallKeywords().
* Copy null_error() from Objects/abstract.c
* Inline type_error() in call.c to not have to copy it, it was only
called once.
* Export _PyEval_EvalCodeWithName() since it is now called
from call.c.
Issue #29507: Optimize slots calling Python methods. For Python methods, get
the unbound Python function and prepend arguments with self, rather than
calling the descriptor which creates a temporary PyMethodObject.
Add a new _PyObject_FastCall_Prepend() function used to call the unbound Python
method with self. It avoids the creation of a temporary tuple to pass
positional arguments.
Avoiding temporary PyMethodObject and avoiding temporary tuple makes Python
slots up to 1.46x faster. Microbenchmark on a __getitem__() method implemented
in Python:
Median +- std dev: 121 ns +- 5 ns -> 82.8 ns +- 1.0 ns: 1.46x faster (-31%)
Co-Authored-by: INADA Naoki <songofacandy@gmail.com>
Issue #29259, #29465: PyCFunction_Call() doesn't create anymore a redundant
tuple to pass positional arguments for METH_VARARGS.
Add a new cfunction_call() subfunction.
* *PyCFunction_*Call*() functions now call Py_EnterRecursiveCall().
* PyObject_Call() now calls directly _PyFunction_FastCallDict() and
PyCFunction_Call() to avoid calling Py_EnterRecursiveCall() twice per
function call
Decreased density gives better collision statistics (average of 2.5 probes in a
full table versus 3.0 previously) and fewer occurences of starting a second
possibly overlapping sequence of 10 linear probes. Makes resizes a little more
frequent but each with less work (fewer insertions and fewer collisions).
add_methods(), add_members(), and add_getset() used PyDict_SetItemString()
to register descriptor to the type's dict.
So descr_new() and PyDict_SetItemString() creates interned unicode from same
C string.
This patch takes interned unicode from descriptor, and use PyDict_SetItem()
instead of PyDict_SetItemString().
python_startup_no_site:
default: Median +- std dev: 12.7 ms +- 0.1 ms
patched: Median +- std dev: 12.5 ms +- 0.1 ms
a macro if Py_LIMITED_API is not set or set to the value between 0x03050400
and 0x03060000 (not including) or 0x03060100 or higher. Added functions
PySlice_Unpack() and PySlice_AdjustIndices().
Issue #29311: dict.get() and dict.setdefault() methods now use Argument Clinic
to parse arguments. Their calling convention changes from METH_VARARGS to
METH_FASTCALL which avoids the creation of a temporary tuple.
The signature of docstrings is also enhanced. For example,
get(...)
becomes:
get(self, key, default=None, /)
Issue #29259: use a different case for METH_VARARGS and
METH_VARARGS|METH_KEYWORDS to avoid testing again flags to decide if keywords
should be checked or not.
Issue #29259. We had 3 versions of similar code:
* PyCFunction_Call()
* _PyCFunction_FastCallDict()
* _PyCFunction_FastCallKeywords()
PyCFunction_Call() now calls _PyCFunction_FastCallDict() to factorize the code.
Issue #29259:
* Move also the !PyErr_Occurred() assertion to the top, similar to
other functions.
* Fix also comment/error messages: the function was renamed to
_PyMethodDef_RawFastCallDict()
Issue #29259, #29263. methoddescr_call() creates a PyCFunction object, call it
and the destroy it. Add a new _PyMethodDef_RawFastCallDict() method to avoid
the temporary PyCFunction object.
Issue #29286. Run Argument Clinic to get the new faster METH_FASTCALL calling
convention for functions using "boring" positional arguments.
Manually fix _elementtree: _elementtree_XMLParser_doctype() must remain
consistent with the clinic code.
Issue #29259: Write fast path in _PyCFunction_FastCallKeywords() for
METH_FASTCALL, avoid the creation of a temporary dictionary for keyword
arguments.
Cleanup also _PyCFunction_FastCallDict():
* Don't dereference func before checking that it's not NULL
* Move code to raise the "no keyword argument" exception into a new
no_keyword_error label.
Update python-gdb.py for the change.
* Indent versionchanged at method level, not class level
* Mark up ``--help`` to avoid generating an en dash
* Use forward slash in Unix command line with a dollar sign ($) prompt
Issue #29234: Inlining _PyStack_AsTuple() into callers increases their stack
consumption, Disable inlining to optimize the stack consumption.
Add _Py_NO_INLINE: use __attribute__((noinline)) of GCC and Clang.
It reduces the stack consumption, bytes per call, before => after:
test_python_call: 1040 => 976 (-64 B)
test_python_getitem: 976 => 912 (-64 B)
test_python_iterator: 1120 => 1056 (-64 B)
=> total: 3136 => 2944 (- 192 B)
Issue #29233: Replace the inefficient _PyObject_VaCallFunctionObjArgs() with
_PyObject_FastCall() in call_method() and call_maybe().
Only a few functions call call_method() and call it with a fixed number of
arguments. Avoid the complex and expensive _PyObject_VaCallFunctionObjArgs()
function, replace it with an array allocated on the stack with the exact number
of argumlents.
It reduces the stack consumption, bytes per call, before => after:
test_python_call: 1168 => 1152 (-16 B)
test_python_getitem: 1344 => 1008 (-336 B)
test_python_iterator: 1568 => 1232 (-336 B)
Remove the _PyObject_VaCallFunctionObjArgs() function which became useless.
Rename it to object_vacall() and make it private.
function_call() now simply calls _PyFunction_FastCallDict().
_PyFunction_FastCallDict() is more efficient: it contains fast paths for the
common case (optimized code object and no keyword argument).
Issue #28870: Add a new _PY_FASTCALL_SMALL_STACK constant, size of "small
stacks" allocated on the C stack to pass positional arguments to
_PyObject_FastCall().
_PyObject_Call_Prepend() now uses a small stack of 5 arguments (40 bytes)
instead of 8 (64 bytes), since it is modified to use _PY_FASTCALL_SMALL_STACK.
Special thanks to INADA Naoki for pushing the patch through
the last mile, Serhiy Storchaka for reviewing the code, and to
Victor Stinner for suggesting the idea (originally implemented
in the PyPy project).
The PEP 523 modified PyEval_EvalFrameEx(): it's now an indirection to
interp->eval_frame().
Inline the call in performance critical code. Leave PyEval_EvalFrame()
unchanged, this function is only kept for backward compatibility.
Issue #28915: Replace PyObject_CallFunction() with
PyObject_CallFunctionObjArgs() when the format string was only made of "O"
formats, PyObject* arguments.
PyObject_CallFunctionObjArgs() avoids the creation of a temporary tuple and
doesn't have to parse a format string.
Issue #28915: Replace _PyObject_CallMethodId() with
_PyObject_CallMethodIdObjArgs() when the format string only use the format 'O'
for objects, like "(O)".
_PyObject_CallMethodIdObjArgs() avoids the code to parse a format string and
avoids the creation of a temporary tuple.
Issue #28915: Use _Py_VaBuildStack() to build a C array of PyObject* and then
use _PyObject_FastCall().
The function has a special case if the stack only contains one parameter and
the parameter is a tuple: "unpack" the tuple of arguments in this case.