mirror of https://github.com/python/cpython
gh-119786: move frames documentation to InternalDocs and add details (#121009)
This commit is contained in:
parent
9e4a81f00f
commit
c87876763e
|
@ -14,6 +14,8 @@ it is not, please report that through the
|
|||
|
||||
[Compiler Design](compiler.md)
|
||||
|
||||
[Frames](frames.md)
|
||||
|
||||
[Adaptive Instruction Families](adaptive.md)
|
||||
|
||||
[The Source Code Locations Table](locations.md)
|
||||
|
|
|
@ -1,51 +1,47 @@
|
|||
# The Frame Stack
|
||||
# Frames
|
||||
|
||||
Each call to a Python function has an activation record,
|
||||
commonly known as a "frame".
|
||||
Python semantics allows frames to outlive the activation,
|
||||
so they have (before 3.11) been allocated on the heap.
|
||||
This is expensive as it requires many allocations and
|
||||
results in poor locality of reference.
|
||||
|
||||
In 3.11, rather than have these frames scattered about memory,
|
||||
as happens for heap-allocated objects, frames are allocated
|
||||
contiguously in a per-thread stack.
|
||||
This improves performance significantly for two reasons:
|
||||
* It reduces allocation overhead to a pointer comparison and increment.
|
||||
* Stack allocated data has the best possible locality and will always be in
|
||||
CPU cache.
|
||||
|
||||
Generator and coroutines still need heap allocated activation records, but
|
||||
can be linked into the per-thread stack so as to not impact performance too much.
|
||||
|
||||
## Layout
|
||||
|
||||
Each activation record consists of four conceptual sections:
|
||||
Each call to a Python function has an activation record, commonly known as a
|
||||
"frame". It contains information about the function being executed, consisting
|
||||
of three conceptual sections:
|
||||
|
||||
* Local variables (including arguments, cells and free variables)
|
||||
* Evaluation stack
|
||||
* Specials: The per-frame object references needed by the VM: globals dict,
|
||||
code object, etc.
|
||||
* Linkage: Pointer to the previous activation record, stack depth, etc.
|
||||
* Specials: The per-frame object references needed by the VM, including
|
||||
globals dict, code object, instruction pointer, stack depth, the
|
||||
previous frame, etc.
|
||||
|
||||
### Layout
|
||||
The definition of the ``_PyInterpreterFrame`` struct is in
|
||||
[Include/internal/pycore_frame.h](https://github.com/python/cpython/blob/main/Include/internal/pycore_frame.h).
|
||||
|
||||
The specials and linkage sections are a fixed size, so are grouped together.
|
||||
# Allocation
|
||||
|
||||
Python semantics allows frames to outlive the activation, so they need to
|
||||
be allocated outside the C call stack. To reduce overhead and improve locality
|
||||
of reference, most frames are allocated contiguously in a per-thread stack
|
||||
(see ``_PyThreadState_PushFrame`` in
|
||||
[Python/pystate.c](https://github.com/python/cpython/blob/main/Python/pystate.c)).
|
||||
|
||||
Frames of generators and coroutines are embedded in the generator and coroutine
|
||||
objects, so are not allocated in the per-thread stack. See ``PyGenObject`` in
|
||||
[Include/internal/pycore_genobject.h](https://github.com/python/cpython/blob/main/Include/internal/pycore_genobject.h).
|
||||
|
||||
## Layout
|
||||
|
||||
Each activation record is laid out as:
|
||||
* Specials and linkage
|
||||
* Specials
|
||||
* Locals
|
||||
* Stack
|
||||
|
||||
This seems to provide the best performance without excessive complexity.
|
||||
It needs the interpreter to hold two pointers, a frame pointer and a stack pointer.
|
||||
The specials have a fixed size, so the offset of the locals is know. The
|
||||
interpreter needs to hold two pointers, a frame pointer and a stack pointer.
|
||||
|
||||
#### Alternative layout
|
||||
|
||||
An alternative layout that was used for part of 3.11 alpha was:
|
||||
|
||||
* Locals
|
||||
* Specials and linkage
|
||||
* Specials
|
||||
* Stack
|
||||
|
||||
This has the advantage that no copying is required when making a call,
|
||||
|
@ -53,19 +49,6 @@ as the arguments on the stack are (usually) already in the correct
|
|||
location for the parameters. However, it requires the VM to maintain
|
||||
an extra pointer for the locals, which can hurt performance.
|
||||
|
||||
A variant that only needs the need two pointers is to reverse the numbering
|
||||
of the locals, so that the last one is numbered `0`, and the first in memory
|
||||
is numbered `N-1`.
|
||||
This allows the locals, specials and linkage to accessed from the frame pointer.
|
||||
We may implement this in the future.
|
||||
|
||||
#### Note:
|
||||
|
||||
> In a contiguous stack, we would need to save one fewer registers, as the
|
||||
> top of the caller's activation record would be the same at the base of the
|
||||
> callee's. However, since some activation records are kept on the heap we
|
||||
> cannot do this.
|
||||
|
||||
### Generators and Coroutines
|
||||
|
||||
Generators and coroutines contain a `_PyInterpreterFrame`
|
||||
|
@ -92,25 +75,23 @@ section. The `frame_obj` field is initially `NULL`.
|
|||
The `PyFrameObject` may outlive a stack-allocated `_PyInterpreterFrame`.
|
||||
If it does then `_PyInterpreterFrame` is copied into the `PyFrameObject`,
|
||||
except the evaluation stack which must be empty at this point.
|
||||
The linkage section is updated to reflect the new location of the frame.
|
||||
The previous frame link is updated to reflect the new location of the frame.
|
||||
|
||||
This mechanism provides the appearance of persistent, heap-allocated
|
||||
frames for each activation, but with low runtime overhead.
|
||||
|
||||
### Generators and Coroutines
|
||||
|
||||
|
||||
Generator objects have a `_PyInterpreterFrame` embedded in them.
|
||||
This means that creating a generator requires only a single allocation,
|
||||
reducing allocation overhead and improving locality of reference.
|
||||
The embedded frame is linked into the per-thread frame when iterated or
|
||||
awaited.
|
||||
Generators (objects of type ``PyGen_Type``, ``PyCoro_Type`` or
|
||||
``PyAsyncGen_Type``) have a `_PyInterpreterFrame` embedded in them, so
|
||||
that they can be created with a single memory allocation.
|
||||
When such an embedded frame is iterated or awaited, it can be linked with
|
||||
frames on the per-thread stack via the linkage fields.
|
||||
|
||||
If a frame object associated with a generator outlives the generator, then
|
||||
the embedded `_PyInterpreterFrame` is copied into the frame object.
|
||||
|
||||
|
||||
All the above applies to coroutines and async generators as well.
|
||||
the embedded `_PyInterpreterFrame` is copied into the frame object (see
|
||||
``take_ownership()`` in
|
||||
[Python/frame.c](https://github.com/python/cpython/blob/main/Python/frame.c)).
|
||||
|
||||
### Field names
|
||||
|
||||
|
@ -119,7 +100,7 @@ Thus, some of the field names may be a bit misleading.
|
|||
|
||||
For example the `f_globals` field has a `f_` prefix implying it belongs to the
|
||||
`PyFrameObject` struct, although it belongs to the `_PyInterpreterFrame` struct.
|
||||
We may rationalize this naming scheme for 3.12.
|
||||
We may rationalize this naming scheme for a later version.
|
||||
|
||||
|
||||
### Shim frames
|
Loading…
Reference in New Issue