When Python is not compiled with PGO, the performance of Python on call_simple
and call_method microbenchmarks depend highly on the code placement. In the
worst case, the performance slowdown can be up to 70%.
The GCC __attribute__((hot)) attribute helps to keep hot code close to reduce
the risk of such major slowdown. This attribute is ignored when Python is
compiled with PGO.
The following functions are considered as hot according to statistics collected
by perf record/perf report:
* _PyEval_EvalFrameDefault()
* call_function()
* _PyFunction_FastCall()
* PyFrame_New()
* frame_dealloc()
* PyErr_Occurred()
* BUILD_TUPLE_UNPACK and BUILD_MAP_UNPACK_WITH_CALL no longer generated with
single tuple or dict.
* Restored more informative error messages for incorrect var-positional and
var-keyword arguments.
* Removed code duplications in _PyEval_EvalCodeWithName().
* Removed redundant runtime checks and parameters in _PyStack_AsDict().
* Added a workaround and enabled previously disabled test in test_traceback.
* Removed dead code from the dis module.
Tested on macOS 10.11 dtrace, Ubuntu 16.04 SystemTap, and libbcc.
Largely based by an initial patch by Jesús Cea Avión, with some
influence from Dave Malcolm's SystemTap patch and Nikhil Benesch's
unification patch.
Things deliberately left out for simplicity:
- ustack helpers, I have no way of testing them at this point since
they are Solaris-specific
- PyFrameObject * in function__entry/function__return, this is
SystemTap-specific
- SPARC support
- dynamic tracing
- sys module dtrace facility introspection
All of those might be added later.
Issue #27830: Add _PyObject_FastCallKeywords(): avoid the creation of a
temporary dictionary for keyword arguments.
Other changes:
* Cleanup call_function() and fast_function() (ex: rename nk to nkwargs)
* Remove now useless do_call(), replaced with _PyObject_FastCallKeywords()
Issue #27213: Rework CALL_FUNCTION* opcodes to produce shorter and more
efficient bytecode:
* CALL_FUNCTION now only accepts position arguments
* CALL_FUNCTION_KW accepts position arguments and keyword arguments, but keys
of keyword arguments are packed into a constant tuple.
* CALL_FUNCTION_EX is the most generic, it expects a tuple and a dict for
positional and keyword arguments.
CALL_FUNCTION_VAR and CALL_FUNCTION_VAR_KW opcodes have been removed.
2 tests of test_traceback are currently broken: skip test, the issue #28050 was
created to track the issue.
Patch by Demur Rumed, design by Serhiy Storchaka, reviewed by Serhiy Storchaka
and Victor Stinner.
Issue #27830: Similar to _PyObject_FastCallDict(), but keyword arguments are
also passed in the same C array than positional arguments, rather than being
passed as a Python dict.
Issue #27809: PyEval_CallObjectWithKeywords() doesn't increment temporary the
reference counter of the args tuple (positional arguments). The caller already
holds a strong reference to it.
Issue #27128. When a Python function is called with no arguments, but all
parameters have a default value: use default values as arguments for the fast
path.
Issue #27128: Modify PyEval_CallObjectWithKeywords() to use
_PyObject_FastCall() when args==NULL and kw==NULL. It avoids the creation of a
temporary empty tuple for positional arguments.
Issue #27128: Add _PyObject_FastCall(), a new calling convention avoiding a
temporary tuple to pass positional parameters in most cases, but create a
temporary tuple if needed (ex: for the tp_call slot).
The API is prepared to support keyword parameters, but the full implementation
will come later (_PyFunction_FastCall() doesn't support keyword parameters
yet).
Add also:
* _PyStack_AsTuple() helper function: convert a "stack" of parameters to
a tuple.
* _PyCFunction_FastCall(): fast call implementation for C functions
* _PyFunction_FastCall(): fast call implementation for Python functions
Issue #27558: Fix a SystemError in the implementation of "raise" statement.
In a brand new thread, raise a RuntimeError since there is no active
exception to reraise.
Patch written by Xiang Zhang.
Issue #27128, #18295: replace int type with Py_ssize_t for index variables used
for positional arguments. It should help to avoid integer overflow and help to
emit better machine code for "i++" (no trap needed for overflow).
Make also the total_args variable constant.
* Add comments
* Add empty lines for readability
* PEP 7 style for if block
* Remove useless assert(globals != NULL); (globals is tested a few lines
before)
Don't fallback to PyDict_GetItemWithError() if the hash is unknown: compute the
hash instead. Add also comments to explain the optimization a little bit.
requested name doesn't exist in globals: clear the KeyError exception before
calling PyObject_GetItem(). Fail also if the raised exception is not a
KeyError.
Summary of changes:
1. Coroutines now have a distinct, separate from generators
type at the C level: PyGen_Type, and a new typedef PyCoroObject.
PyCoroObject shares the initial segment of struct layout with
PyGenObject, making it possible to reuse existing generators
machinery. The new type is exposed as 'types.CoroutineType'.
As a consequence of having a new type, CO_GENERATOR flag is
no longer applied to coroutines.
2. Having a separate type for coroutines made it possible to add
an __await__ method to the type. Although it is not used by the
interpreter (see details on that below), it makes coroutines
naturally (without using __instancecheck__) conform to
collections.abc.Coroutine and collections.abc.Awaitable ABCs.
[The __instancecheck__ is still used for generator-based
coroutines, as we don't want to add __await__ for generators.]
3. Add new opcode: GET_YIELD_FROM_ITER. The opcode is needed to
allow passing native coroutines to the YIELD_FROM opcode.
Before this change, 'yield from o' expression was compiled to:
(o)
GET_ITER
LOAD_CONST
YIELD_FROM
Now, we use GET_YIELD_FROM_ITER instead of GET_ITER.
The reason for adding a new opcode is that GET_ITER is used
in some contexts (such as 'for .. in' loops) where passing
a coroutine object is invalid.
4. Add two new introspection functions to the inspec module:
getcoroutinestate(c) and getcoroutinelocals(c).
5. inspect.iscoroutine(o) is updated to test if 'o' is a native
coroutine object. Before this commit it used abc.Coroutine,
and it was requested to update inspect.isgenerator(o) to use
abc.Generator; it was decided, however, that inspect functions
should really be tailored for checking for native types.
6. sys.set_coroutine_wrapper(w) API is updated to work with only
native coroutines. Since types.coroutine decorator supports
any type of callables now, it would be confusing that it does
not work for all types of coroutines.
7. Exceptions logic in generators C implementation was updated
to raise clearer messages for coroutines:
Before: TypeError("generator raised StopIteration")
After: TypeError("coroutine raised StopIteration")
* adds missing INCREF in WITH_CLEANUP_START
* adds missing DECREF in WITH_CLEANUP_FINISH
* adds several new tests Yury created while investigating this
which returned an invalid result (result+error or no result without error) in
the exception message.
Add also unit test to check that the exception contains the name of the
function.
Special case: the final _PyEval_EvalFrameEx() check doesn't mention the
function since it didn't execute a single function but a whole frame.
raise a SystemError if a function returns a result and raises an exception.
The SystemError is chained to the previous exception.
Refactor also PyObject_Call() and PyCFunction_Call() to make them more readable.
Remove some checks which became useless (duplicate checks).
Change reviewed by Serhiy Storchaka.
At entry, save or swap the exception state even if PyEval_EvalFrameEx() is
called with throwflag=0. At exit, the exception state is now always restored or
swapped, not only if why is WHY_YIELD or WHY_RETURN. Patch co-written with
Antoine Pitrou.
name, and use it in the representation of a generator (``repr(gen)``). The
default name of the generator (``__name__`` attribute) is now get from the
function instead of the code. Use ``gen.gi_code.co_name`` to get the name of
the code.
crash when a generator is created in a C thread that is destroyed while the
generator is still used. The issue was that a generator contains a frame, and
the frame kept a reference to the Python state of the destroyed C thread. The
crash occurs when a trace function is setup.
with an assertion error if they are called with an exception set
(PyErr_Occurred()).
If these functions are called with an exception set, the exception may be
cleared and so the caller looses its exception.
Add also assertions to PyEval_CallObjectWithKeywords() and call_function() to
check if the function succeed with no exception set, or the function failed
with an exception set.
ImportError.
The exception is raised by import when a module could not be found.
Technically this is defined as no viable loader could be found for the
specified module. This includes ``from ... import`` statements so that
the module usage is consistent for all situations where import
couldn't find what was requested.
This should allow for the common idiom of::
try:
import something
except ImportError:
pass
to be updated to using ModuleNotFoundError and not accidentally mask
ImportError messages that should propagate (e.g. issues with a
loader).
This work was driven by the fact that the ``from ... import``
statement needed to be able to tell the difference between an
ImportError that simply couldn't find a module (and thus silence the
exception so that ceval can raise it) and an ImportError that
represented an actual problem.
Note that this is a potentially disruptive change since it may
release some system resources which would otherwise remain
perpetually alive (e.g. database connections kept in thread-local
storage).
- Make many variables local to the opcode; Kill u, v, w, and x.
- Force every opcode to end with DISPATCH or jump to error handling.
- Simplify error handling.
- Check error statuses in more places.
Closes#16191.
It is now possible to use a custom type for the __builtins__ namespace, instead
of a dict. It can be used for sandboxing for example. Raise also a NameError
instead of ImportError if __build_class__ name if not found in __builtins__.
This allows generators that are using yield from to be seen by debuggers. It
also kills the f_yieldfrom field on frame objects.
Patch mostly from Mark Shannon with a few tweaks by me.
These were just an artifact of the old unicode concatenation hack and likely
just penalized other kinds of adding. Also, this fixes __(i)add__ on string
subclasses.