When Python is not compiled with PGO, the performance of Python on call_simple
and call_method microbenchmarks depend highly on the code placement. In the
worst case, the performance slowdown can be up to 70%.
The GCC __attribute__((hot)) attribute helps to keep hot code close to reduce
the risk of such major slowdown. This attribute is ignored when Python is
compiled with PGO.
The following functions are considered as hot according to statistics collected
by perf record/perf report:
* _PyEval_EvalFrameDefault()
* call_function()
* _PyFunction_FastCall()
* PyFrame_New()
* frame_dealloc()
* PyErr_Occurred()
* BUILD_TUPLE_UNPACK and BUILD_MAP_UNPACK_WITH_CALL no longer generated with
single tuple or dict.
* Restored more informative error messages for incorrect var-positional and
var-keyword arguments.
* Removed code duplications in _PyEval_EvalCodeWithName().
* Removed redundant runtime checks and parameters in _PyStack_AsDict().
* Added a workaround and enabled previously disabled test in test_traceback.
* Removed dead code from the dis module.
Tested on macOS 10.11 dtrace, Ubuntu 16.04 SystemTap, and libbcc.
Largely based by an initial patch by Jesús Cea Avión, with some
influence from Dave Malcolm's SystemTap patch and Nikhil Benesch's
unification patch.
Things deliberately left out for simplicity:
- ustack helpers, I have no way of testing them at this point since
they are Solaris-specific
- PyFrameObject * in function__entry/function__return, this is
SystemTap-specific
- SPARC support
- dynamic tracing
- sys module dtrace facility introspection
All of those might be added later.
Issue #27830: Add _PyObject_FastCallKeywords(): avoid the creation of a
temporary dictionary for keyword arguments.
Other changes:
* Cleanup call_function() and fast_function() (ex: rename nk to nkwargs)
* Remove now useless do_call(), replaced with _PyObject_FastCallKeywords()
Issue #27213: Rework CALL_FUNCTION* opcodes to produce shorter and more
efficient bytecode:
* CALL_FUNCTION now only accepts position arguments
* CALL_FUNCTION_KW accepts position arguments and keyword arguments, but keys
of keyword arguments are packed into a constant tuple.
* CALL_FUNCTION_EX is the most generic, it expects a tuple and a dict for
positional and keyword arguments.
CALL_FUNCTION_VAR and CALL_FUNCTION_VAR_KW opcodes have been removed.
2 tests of test_traceback are currently broken: skip test, the issue #28050 was
created to track the issue.
Patch by Demur Rumed, design by Serhiy Storchaka, reviewed by Serhiy Storchaka
and Victor Stinner.
Issue #27830: Similar to _PyObject_FastCallDict(), but keyword arguments are
also passed in the same C array than positional arguments, rather than being
passed as a Python dict.
Issue #27809: PyEval_CallObjectWithKeywords() doesn't increment temporary the
reference counter of the args tuple (positional arguments). The caller already
holds a strong reference to it.
Issue #27128. When a Python function is called with no arguments, but all
parameters have a default value: use default values as arguments for the fast
path.
Issue #27128: Modify PyEval_CallObjectWithKeywords() to use
_PyObject_FastCall() when args==NULL and kw==NULL. It avoids the creation of a
temporary empty tuple for positional arguments.
Issue #27128: Add _PyObject_FastCall(), a new calling convention avoiding a
temporary tuple to pass positional parameters in most cases, but create a
temporary tuple if needed (ex: for the tp_call slot).
The API is prepared to support keyword parameters, but the full implementation
will come later (_PyFunction_FastCall() doesn't support keyword parameters
yet).
Add also:
* _PyStack_AsTuple() helper function: convert a "stack" of parameters to
a tuple.
* _PyCFunction_FastCall(): fast call implementation for C functions
* _PyFunction_FastCall(): fast call implementation for Python functions
Issue #27558: Fix a SystemError in the implementation of "raise" statement.
In a brand new thread, raise a RuntimeError since there is no active
exception to reraise.
Patch written by Xiang Zhang.
Issue #27128, #18295: replace int type with Py_ssize_t for index variables used
for positional arguments. It should help to avoid integer overflow and help to
emit better machine code for "i++" (no trap needed for overflow).
Make also the total_args variable constant.
* Add comments
* Add empty lines for readability
* PEP 7 style for if block
* Remove useless assert(globals != NULL); (globals is tested a few lines
before)
Don't fallback to PyDict_GetItemWithError() if the hash is unknown: compute the
hash instead. Add also comments to explain the optimization a little bit.
requested name doesn't exist in globals: clear the KeyError exception before
calling PyObject_GetItem(). Fail also if the raised exception is not a
KeyError.
Summary of changes:
1. Coroutines now have a distinct, separate from generators
type at the C level: PyGen_Type, and a new typedef PyCoroObject.
PyCoroObject shares the initial segment of struct layout with
PyGenObject, making it possible to reuse existing generators
machinery. The new type is exposed as 'types.CoroutineType'.
As a consequence of having a new type, CO_GENERATOR flag is
no longer applied to coroutines.
2. Having a separate type for coroutines made it possible to add
an __await__ method to the type. Although it is not used by the
interpreter (see details on that below), it makes coroutines
naturally (without using __instancecheck__) conform to
collections.abc.Coroutine and collections.abc.Awaitable ABCs.
[The __instancecheck__ is still used for generator-based
coroutines, as we don't want to add __await__ for generators.]
3. Add new opcode: GET_YIELD_FROM_ITER. The opcode is needed to
allow passing native coroutines to the YIELD_FROM opcode.
Before this change, 'yield from o' expression was compiled to:
(o)
GET_ITER
LOAD_CONST
YIELD_FROM
Now, we use GET_YIELD_FROM_ITER instead of GET_ITER.
The reason for adding a new opcode is that GET_ITER is used
in some contexts (such as 'for .. in' loops) where passing
a coroutine object is invalid.
4. Add two new introspection functions to the inspec module:
getcoroutinestate(c) and getcoroutinelocals(c).
5. inspect.iscoroutine(o) is updated to test if 'o' is a native
coroutine object. Before this commit it used abc.Coroutine,
and it was requested to update inspect.isgenerator(o) to use
abc.Generator; it was decided, however, that inspect functions
should really be tailored for checking for native types.
6. sys.set_coroutine_wrapper(w) API is updated to work with only
native coroutines. Since types.coroutine decorator supports
any type of callables now, it would be confusing that it does
not work for all types of coroutines.
7. Exceptions logic in generators C implementation was updated
to raise clearer messages for coroutines:
Before: TypeError("generator raised StopIteration")
After: TypeError("coroutine raised StopIteration")