Issue #29259, #29263. methoddescr_call() creates a PyCFunction object, call it
and the destroy it. Add a new _PyMethodDef_RawFastCallDict() method to avoid
the temporary PyCFunction object.
Issue #29234: Inlining _PyStack_AsTuple() into callers increases their stack
consumption, Disable inlining to optimize the stack consumption.
Add _Py_NO_INLINE: use __attribute__((noinline)) of GCC and Clang.
It reduces the stack consumption, bytes per call, before => after:
test_python_call: 1040 => 976 (-64 B)
test_python_getitem: 976 => 912 (-64 B)
test_python_iterator: 1120 => 1056 (-64 B)
=> total: 3136 => 2944 (- 192 B)
Issue #29233: Replace the inefficient _PyObject_VaCallFunctionObjArgs() with
_PyObject_FastCall() in call_method() and call_maybe().
Only a few functions call call_method() and call it with a fixed number of
arguments. Avoid the complex and expensive _PyObject_VaCallFunctionObjArgs()
function, replace it with an array allocated on the stack with the exact number
of argumlents.
It reduces the stack consumption, bytes per call, before => after:
test_python_call: 1168 => 1152 (-16 B)
test_python_getitem: 1344 => 1008 (-336 B)
test_python_iterator: 1568 => 1232 (-336 B)
Remove the _PyObject_VaCallFunctionObjArgs() function which became useless.
Rename it to object_vacall() and make it private.
Issue #28838: The documentation is of the Python C API is more complete and
more up to date than this old comment.
Removal suggested by Antoine Pitrou.
Issue #28870: Add a new _PY_FASTCALL_SMALL_STACK constant, size of "small
stacks" allocated on the C stack to pass positional arguments to
_PyObject_FastCall().
_PyObject_Call_Prepend() now uses a small stack of 5 arguments (40 bytes)
instead of 8 (64 bytes), since it is modified to use _PY_FASTCALL_SMALL_STACK.
Rewrite all comments to use the same style than other Python header files:
comment functions *before* their declaration, no newline between the comment
and the declaration.
Reformat some comments, add newlines, to make them easier to read.
Quote argument like 'arg' to mention an argument in a comment.
Special thanks to INADA Naoki for pushing the patch through
the last mile, Serhiy Storchaka for reviewing the code, and to
Victor Stinner for suggesting the idea (originally implemented
in the PyPy project).
Issue #28838: Rename parameters of the "calls" functions of the Python C API.
* Rename 'callable_object' and 'func' to 'callable': any Python callable object
is accepted, not only Python functions
* Rename 'method' and 'nameid' to 'name' (method name)
* Rename 'o' to 'obj'
* Move, fix and update documentation of PyObject_CallXXX() functions
in abstract.h
* Update also the documentaton of the C API (update parameter names)
Replace
_PyObject_CallArg1(func, arg)
with
PyObject_CallFunctionObjArgs(func, arg, NULL)
Using the _PyObject_CallArg1() macro increases the usage of the C stack, which
was unexpected and unwanted. PyObject_CallFunctionObjArgs() doesn't have this
issue.
* Callable object: callable, o, callable_object => func
* Object for method calls: o => obj
* Method name: name or nameid => method
Cleanup also the C code:
* Don't initialize variables to NULL if they are not used before their first
assignement
* Add braces for readability
Issue #28799:
* Remove the PyEval_GetCallStats() function.
* Deprecate the untested and undocumented sys.callstats() function.
* Remove the CALL_PROFILE special build
Use the sys.setprofile() function, cProfile or profile module to profile
function calls.
When Python is not compiled with PGO, the performance of Python on call_simple
and call_method microbenchmarks depend highly on the code placement. In the
worst case, the performance slowdown can be up to 70%.
The GCC __attribute__((hot)) attribute helps to keep hot code close to reduce
the risk of such major slowdown. This attribute is ignored when Python is
compiled with PGO.
The following functions are considered as hot according to statistics collected
by perf record/perf report:
* _PyEval_EvalFrameDefault()
* call_function()
* _PyFunction_FastCall()
* PyFrame_New()
* frame_dealloc()
* PyErr_Occurred()
new exception with setting current exception as __cause__.
_PyErr_FormatFromCause(exception, format, args...) is equivalent to Python
raise exception(format % args) from sys.exc_info()[1]
new exception with setting current exception as __cause__.
_PyErr_FormatFromCause(exception, format, args...) is equivalent to Python
raise exception(format % args) from sys.exc_info()[1]
Issue #28127: Add a function to check that a dictionary remains consistent
after any change.
By default, tables are not checked, only basic attributes. Define DEBUG_PYDICT
(ex: gcc -D DEBUG_PYDICT) to also check dictionary "content".
* BUILD_TUPLE_UNPACK and BUILD_MAP_UNPACK_WITH_CALL no longer generated with
single tuple or dict.
* Restored more informative error messages for incorrect var-positional and
var-keyword arguments.
* Removed code duplications in _PyEval_EvalCodeWithName().
* Removed redundant runtime checks and parameters in _PyStack_AsDict().
* Added a workaround and enabled previously disabled test in test_traceback.
* Removed dead code from the dis module.
Issue #27810:
* Modify vgetargskeywordsfast() to work on a C array of PyObject* rather than
working on a tuple directly.
* Add _PyArg_ParseStack()
* Argument Clinic now emits code using the new METH_FASTCALL calling convention
Issue #27810: Add a new calling convention for C functions:
PyObject* func(PyObject *self, PyObject **args,
Py_ssize_t nargs, PyObject *kwnames);
Where args is a C array of positional arguments followed by values of keyword
arguments. nargs is the number of positional arguments, kwnames are keys of
keyword arguments. kwnames can be NULL.
Tested on macOS 10.11 dtrace, Ubuntu 16.04 SystemTap, and libbcc.
Largely based by an initial patch by Jesús Cea Avión, with some
influence from Dave Malcolm's SystemTap patch and Nikhil Benesch's
unification patch.
Things deliberately left out for simplicity:
- ustack helpers, I have no way of testing them at this point since
they are Solaris-specific
- PyFrameObject * in function__entry/function__return, this is
SystemTap-specific
- SPARC support
- dynamic tracing
- sys module dtrace facility introspection
All of those might be added later.
Issue #27830: Add _PyObject_FastCallKeywords(): avoid the creation of a
temporary dictionary for keyword arguments.
Other changes:
* Cleanup call_function() and fast_function() (ex: rename nk to nkwargs)
* Remove now useless do_call(), replaced with _PyObject_FastCallKeywords()
Issue #27213: Rework CALL_FUNCTION* opcodes to produce shorter and more
efficient bytecode:
* CALL_FUNCTION now only accepts position arguments
* CALL_FUNCTION_KW accepts position arguments and keyword arguments, but keys
of keyword arguments are packed into a constant tuple.
* CALL_FUNCTION_EX is the most generic, it expects a tuple and a dict for
positional and keyword arguments.
CALL_FUNCTION_VAR and CALL_FUNCTION_VAR_KW opcodes have been removed.
2 tests of test_traceback are currently broken: skip test, the issue #28050 was
created to track the issue.
Patch by Demur Rumed, design by Serhiy Storchaka, reviewed by Serhiy Storchaka
and Victor Stinner.
Issue #26058: Add a new private version to the builtin dict type, incremented
at each dictionary creation and at each dictionary change.
Implementation of the PEP 509.
Issue #27350: `dict` implementation is changed like PyPy. It is more compact
and preserves insertion order.
_PyDict_Dummy() function has been removed.
Disable test_gdb: python-gdb.py is not updated yet to the new structure of
compact dictionaries (issue #28023).
Patch written by INADA Naoki.
Issue #27776: The os.urandom() function does now block on Linux 3.17 and newer
until the system urandom entropy pool is initialized to increase the security.
This change is part of the PEP 524.
Issue #27841: Add _PyObject_Call_Prepend() helper function to prepend an
argument to existing arguments to call a function. This helper uses fast calls.
Modify method_call() and slot_tp_new() to use _PyObject_Call_Prepend().
Issue #27830: Similar to _PyObject_FastCallDict(), but keyword arguments are
also passed in the same C array than positional arguments, rather than being
passed as a Python dict.
Multi-phase extension module import now correctly allows the
``m_methods`` field to be used to add module level functions
to instances of non-module types returned from ``Py_create_mod``.
Patch by Xiang Zhang.
Issue #27128: Add _PyObject_FastCall(), a new calling convention avoiding a
temporary tuple to pass positional parameters in most cases, but create a
temporary tuple if needed (ex: for the tp_call slot).
The API is prepared to support keyword parameters, but the full implementation
will come later (_PyFunction_FastCall() doesn't support keyword parameters
yet).
Add also:
* _PyStack_AsTuple() helper function: convert a "stack" of parameters to
a tuple.
* _PyCFunction_FastCall(): fast call implementation for C functions
* _PyFunction_FastCall(): fast call implementation for Python functions
* Optimize tracemalloc_add_trace(): modify hashtable entry data (trace) if the
memory block is already tracked, rather than trying to remove the old trace
and then add a new trace.
* Add _Py_HASHTABLE_ENTRY_WRITE_DATA() macro
Issue #26530:
* Add C functions _PyTraceMalloc_Track() and _PyTraceMalloc_Untrack() to track
memory blocks using the tracemalloc module.
* Add _PyTraceMalloc_GetTraceback() to get the traceback of an object.
Issue #26567:
* Add a new function PyErr_ResourceWarning() function to pass the destroyed
object
* Add a source attribute to warnings.WarningMessage
* Add warnings._showwarnmsg() which uses tracemalloc to get the traceback where
source object was allocated.
Issue #26563:
* Add _PyGILState_GetInterpreterStateUnsafe() function: the single
PyInterpreterState used by this process' GILState implementation.
* Enhance _Py_DumpTracebackThreads() to retrieve the interpreter state from
autoInterpreterState in last resort. The function now accepts NULL for interp
and current_tstate parameters.
* test_faulthandler: fix a ResourceWarning when test is interrupted by CTRL+c
Issue #26564:
* Expose _Py_DumpASCII() and _Py_DumpDecimal() in traceback.h
* Change the type of the second _Py_DumpASCII() parameter from int to unsigned
long
* Rewrite _Py_DumpDecimal() and dump_hexadecimal() to write directly characters
in the expected order, avoid the need of reversing the string.
* dump_hexadecimal() limits width to the size of the buffer
* _Py_DumpASCII() does nothing if the object is not a Unicode string
* dump_frame() wrtites "???" as the line number if the line number is negative
Issue #10915, #15751, #26558:
* PyGILState_Check() now returns 1 (success) before the creation of the GIL and
after the destruction of the GIL. It allows to use the function early in
Python initialization and late in Python finalization.
* Add a flag to disable PyGILState_Check(). Disable PyGILState_Check() when
Py_NewInterpreter() is called
* Add assert(PyGILState_Check()) to: _Py_dup(), _Py_fstat(), _Py_read()
and _Py_write()
Issue #26516:
* Add PYTHONMALLOC environment variable to set the Python memory
allocators and/or install debug hooks.
* PyMem_SetupDebugHooks() can now also be used on Python compiled in release
mode.
* The PYTHONMALLOCSTATS environment variable can now also be used on Python
compiled in release mode. It now has no effect if set to an empty string.
* In debug mode, debug hooks are now also installed on Python memory allocators
when Python is configured without pymalloc.
Issue #26146: Add a new kind of AST node: ast.Constant. It can be used by
external AST optimizers, but the compiler does not emit directly such node.
An optimizer can replace the following AST nodes with ast.Constant:
* ast.NameConstant: None, False, True
* ast.Num: int, float, complex
* ast.Str: str
* ast.Bytes: bytes
* ast.Tuple if items are constants too: tuple
* frozenset
Update code to accept ast.Constant instead of ast.Num and/or ast.Str:
* compiler
* docstrings
* ast.literal_eval()
* Tools/parser/unparse.py
Issue #26161: Use Py_uintptr_t instead of void* for atomic pointers in
pyatomic.h. Use atomic_uintptr_t when <stdatomic.h> is used.
Using void* causes compilation warnings depending on which implementation of
atomic types is used.
Issue #25843: When compiling code, don't merge constants if they are equal but
have a different types. For example, "f1, f2 = lambda: 1, lambda: 1.0" is now
correctly compiled to two different functions: f1() returns 1 (int) and f2()
returns 1.0 (int), even if 1 and 1.0 are equal.
Add a new _PyCode_ConstantKey() private function.
Issue #25843: When compiling code, don't merge constants if they are equal but
have a different types. For example, "f1, f2 = lambda: 1, lambda: 1.0" is now
correctly compiled to two different functions: f1() returns 1 (int) and f2()
returns 1.0 (int), even if 1 and 1.0 are equal.
Add a new _PyCode_ConstantKey() private function.
issue25909 - Correct the documentation of PyMapping_Items, PyMapping_Keys and
PyMapping_Values in Include/abstract.h and Doc/c-api/mapping.rst.
Patch contributed by Sonali Gupta.
Issue #26107: The format of the co_lnotab attribute of code objects changes to
support negative line number delta.
Changes:
* assemble_lnotab(): if line number delta is less than -128 or greater than
127, emit multiple (offset_delta, lineno_delta) in co_lnotab
* update functions decoding co_lnotab to use signed 8-bit integers
- dis.findlinestarts()
- PyCode_Addr2Line()
- _PyCode_CheckLineNumber()
- frame_setlineno()
* update lnotab_notes.txt
* increase importlib MAGIC_NUMBER to 3361
* document the change in What's New in Python 3.6
* cleanup also PyCode_Optimize() to use better variable names
Issue #26154: Add a new private _PyThreadState_UncheckedGet() function which
gets the current thread state, but don't call Py_FatalError() if it is NULL.
Python 3.5.1 removed the _PyThreadState_Current symbol from the Python C API to
no more expose complex and private atomic types. Atomic types depends on the
compiler or can even depend on compiler options. The new function
_PyThreadState_UncheckedGet() allows to get the variable value without having
to care of the exact implementation of atomic types.
Changes:
* Replace direct usage of the _PyThreadState_Current variable with a call to
_PyThreadState_UncheckedGet().
* In pystate.c, replace direct usage of the _PyThreadState_Current variable
with the PyThreadState_GET() macro for readability.
* Document also PyThreadState_Get() in pystate.h
Also document that the separate functions that delete objects are preferred;
using PyObject_SetAttr(), _SetAttrString(), and PySequence_SetItem() to
delete is deprecated.
This changes the main documentation, doc strings, source code comments, and a
couple error messages in the test suite. In some cases the word was removed
or edited some other way to fix the grammar.
* Don't overallocate by 400% when recode is needed: only overallocate on demand
using _PyBytesWriter.
* Use _PyLong_DigitValue to convert hexadecimal digit to int
* Create _PyBytes_DecodeEscapeRecode() subfunction
Issue #25401: Optimize bytes.fromhex() and bytearray.fromhex(): they are now
between 2x and 3.5x faster. Changes:
* Use a fast-path working on a char* string for ASCII string
* Use a slow-path for non-ASCII string
* Replace slow hex_digit_to_int() function with a O(1) lookup in
_PyLong_DigitValue precomputed table
* Use _PyBytesWriter API to handle the buffer
* Add unit tests to check the error position in error messages
Issue #25399: Don't create temporary bytes objects: modify _PyBytes_Format() to
create work directly on bytearray objects.
* Rename _PyBytes_Format() to _PyBytes_FormatEx() just in case if something
outside CPython uses it
* _PyBytes_FormatEx() now uses (char*, Py_ssize_t) for the input string, so
bytearray_format() doesn't need tot create a temporary input bytes object
* Add use_bytearray parameter to _PyBytes_FormatEx() which is passed to
_PyBytesWriter, to create a bytearray buffer instead of a bytes buffer
Most formatting operations are now between 2.5 and 5 times faster.
Issue #25274: sys.setrecursionlimit() now raises a RecursionError if the new
recursion limit is too low depending at the current recursion depth. Modify
also the "lower-water mark" formula to make it monotonic. This mark is used to
decide when the overflowed flag of the thread state is reset.