Make a non-Py_DEBUG, asserts-enabled build of CPython possible. This means
making sure helper functions are defined when NDEBUG is not defined, not
just when Py_DEBUG is defined.
Also fix a division-by-zero in obmalloc.c that went unnoticed because in Py_DEBUG mode, elsize is never zero.
Issue #29507: Optimize slots calling Python methods. For Python methods, get
the unbound Python function and prepend arguments with self, rather than
calling the descriptor which creates a temporary PyMethodObject.
Add a new _PyObject_FastCall_Prepend() function used to call the unbound Python
method with self. It avoids the creation of a temporary tuple to pass
positional arguments.
Avoiding temporary PyMethodObject and avoiding temporary tuple makes Python
slots up to 1.46x faster. Microbenchmark on a __getitem__() method implemented
in Python:
Median +- std dev: 121 ns +- 5 ns -> 82.8 ns +- 1.0 ns: 1.46x faster (-31%)
Co-Authored-by: INADA Naoki <songofacandy@gmail.com>
add_methods(), add_members(), and add_getset() used PyDict_SetItemString()
to register descriptor to the type's dict.
So descr_new() and PyDict_SetItemString() creates interned unicode from same
C string.
This patch takes interned unicode from descriptor, and use PyDict_SetItem()
instead of PyDict_SetItemString().
python_startup_no_site:
default: Median +- std dev: 12.7 ms +- 0.1 ms
patched: Median +- std dev: 12.5 ms +- 0.1 ms
Issue #29233: Replace the inefficient _PyObject_VaCallFunctionObjArgs() with
_PyObject_FastCall() in call_method() and call_maybe().
Only a few functions call call_method() and call it with a fixed number of
arguments. Avoid the complex and expensive _PyObject_VaCallFunctionObjArgs()
function, replace it with an array allocated on the stack with the exact number
of argumlents.
It reduces the stack consumption, bytes per call, before => after:
test_python_call: 1168 => 1152 (-16 B)
test_python_getitem: 1344 => 1008 (-336 B)
test_python_iterator: 1568 => 1232 (-336 B)
Remove the _PyObject_VaCallFunctionObjArgs() function which became useless.
Rename it to object_vacall() and make it private.
Issue #28838: Rename parameters of the "calls" functions of the Python C API.
* Rename 'callable_object' and 'func' to 'callable': any Python callable object
is accepted, not only Python functions
* Rename 'method' and 'nameid' to 'name' (method name)
* Rename 'o' to 'obj'
* Move, fix and update documentation of PyObject_CallXXX() functions
in abstract.h
* Update also the documentaton of the C API (update parameter names)
Modify type_setattro() to call directly _PyObject_GenericSetAttrWithDict()
instead of PyObject_GenericSetAttr().
PyObject_GenericSetAttr() is a thin wrapper to
_PyObject_GenericSetAttrWithDict().
Replace
_PyObject_CallArg1(func, arg)
with
PyObject_CallFunctionObjArgs(func, arg, NULL)
Using the _PyObject_CallArg1() macro increases the usage of the C stack, which
was unexpected and unwanted. PyObject_CallFunctionObjArgs() doesn't have this
issue.
Handling zero-argument super() in __init_subclass__ and
__set_name__ involved moving __class__ initialisation to
type.__new__. This requires cooperation from custom
metaclasses to ensure that the new __classcell__ entry
is passed along appropriately.
The initial implementation of that change resulted in abruptly
broken zero-argument super() support in metaclasses that didn't
adhere to the new requirements (such as Django's metaclass for
Model definitions).
The updated approach adopted here instead emits a deprecation
warning for those cases, and makes them work the same way they
did in Python 3.5.
This patch also improves the related class machinery documentation
to cover these details and to include more reader-friendly
cross-references and index entries.
Issue #28858: The change b9c9691c72c5 introduced a regression. It seems like
_PyObject_CallArg1() uses more stack memory than
PyObject_CallFunctionObjArgs().
* PyObject_CallFunctionObjArgs(func, NULL) => _PyObject_CallNoArg(func)
* PyObject_CallFunctionObjArgs(func, arg, NULL) => _PyObject_CallArg1(func, arg)
PyObject_CallFunctionObjArgs() allocates 40 bytes on the C stack and requires
extra work to "parse" C arguments to build a C array of PyObject*.
_PyObject_CallNoArg() and _PyObject_CallArg1() are simpler and don't allocate
memory on the C stack.
This change is part of the fastcall project. The change on listsort() is
related to the issue #23507.
* Callable object: callable, o, callable_object => func
* Object for method calls: o => obj
* Method name: name or nameid => method
Cleanup also the C code:
* Don't initialize variables to NULL if they are not used before their first
assignement
* Add braces for readability