This reduces the size of the data segment by **300 KB** of the executable because if the modules are deep-frozen then the marshalled frozen data just wastes space. This was inspired by comment by @gvanrossum in https://github.com/python/cpython/pull/29118#issuecomment-958521863. Note: There is a new option `--deepfreeze-only` in `freeze_modules.py` to change this behavior, it is on be default to save disk space.
```console
# du -s ./python before
27892 ./python
# du -s ./python after
27524 ./python
```
Automerge-Triggered-By: GH:ericsnowcurrently
This gains 10% or more in startup time for `python -c pass` on UNIX-ish systems.
The Makefile.pre.in generating code builds on Eric's work for bpo-45020, but the .c file generator is new.
Windows version TBD.
Currently custom modules (the array set on PyImport_FrozenModules) replace all the frozen stdlib modules. That can be problematic and is unlikely to be what the user wants. This change treats the custom frozen modules as additions instead. They take precedence over all other frozen modules except for those needed to bootstrap the import system. If the "code" field of an entry in the custom array is NULL then that frozen module is treated as disabled, which allows a custom entry to disable a frozen stdlib module.
This change allows us to get rid of is_essential_frozen_module() and simplifies the logic for which frozen modules should be ignored.
https://bugs.python.org/issue45395
Rename Include/namespaceobject.h to
Include/internal/pycore_namespace.h.
The _testmultiphase extension is now built with the
Py_BUILD_CORE_MODULE macro defined to access _PyNamespace_Type.
object.c: remove unused "pycore_context.h" include.
Currently frozen modules do not have __file__ set. In their spec, origin is set to "frozen" and they are marked as not having a location. (Similarly, for frozen packages __path__ is set to an empty list.) However, for frozen stdlib modules we are able to extrapolate __file__ as long as we can determine the stdlib directory at runtime. (We now do so since gh-28586.) Having __file__ set is helpful for a number of reasons. Likewise, having a non-empty __path__ means we can import submodules of a frozen package from the filesystem (e.g. we could partially freeze the encodings module).
This change sets __file__ (and adds to __path__) for frozen stdlib modules. It uses sys._stdlibdir (from gh-28586) and the frozen module alias information (from gh-28655). All that work is done in FrozenImporter (in Lib/importlib/_bootstrap.py).
Also, if a frozen module is imported before importlib is bootstrapped (during interpreter initialization) then we fix up that module and its spec during the importlib bootstrapping step (i.e. imporlib._bootstrap._setup()) to match what gets set by FrozenImporter, including setting the file info (if the stdlib dir is known). To facilitate this, modules imported using PyImport_ImportFrozenModule() have __origname__ set using the frozen module alias info. __origname__ is popped off during importlib bootstrap.
(To be clear, even with this change the new code to set __file__ during fixups in imporlib._bootstrap._setup() doesn't actually get triggered yet. This is because sys._stdlibdir hasn't been set yet in interpreter initialization at the point importlib is bootstrapped. However, we do fix up such modules at that point to otherwise match the result of importing through FrozenImporter, just not the __file__ and __path__ parts. Doing so will require changes in the order in which things happen during interpreter initialization. That can be addressed separately. Once it is, the file-related fixup code from this PR will kick in.)
Here are things this change does not do:
* set __file__ for non-stdlib modules (no way of knowing the parent dir)
* set __file__ if the stdlib dir is not known (nor assume the expense of finding it)
* relatedly, set __file__ if the stdlib is in a zip file
* verify that the filename set to __file__ actually exists (too expensive)
* update __path__ for frozen packages that alias a non-package (since there is no package dir)
Other things this change skips, but we may do later:
* set __file__ on modules imported using PyImport_ImportFrozenModule()
* set co_filename when we unmarshal the frozen code object while importing the module (e.g. in FrozenImporter.exec_module()) -- this would allow tracebacks to show source lines
* implement FrozenImporter.get_filename() and FrozenImporter.get_source()
https://bugs.python.org/issue21736
In the list of generated frozen modules at the top of Tools/scripts/freeze_modules.py, you will find that some of the modules have a different name than the module (or .py file) that is actually frozen. Let's call each case an "alias". Aliases do not come into play until we get to the (generated) list of modules in Python/frozen.c. (The tool for freezing modules, Programs/_freeze_module, is only concerned with the source file, not the module it will be used for.)
Knowledge of which frozen modules are aliases (and the identity of the original module) normally isn't important. However, this information is valuable when we go to set __file__ on frozen stdlib modules. This change updates Tools/scripts/freeze_modules.py to map aliases to the original module name (or None if not a stdlib module) in Python/frozen.c. We also add a helper function in Python/import.c to look up a frozen module's alias and add the result of that function to the frozen info returned from find_frozen().
https://bugs.python.org/issue45020
Before this change we end up duplicating effort and throwing away data in FrozenImporter.find_spec(). Now we do the work once in find_spec() and the only thing we do in FrozenImporter.exec_module() is turn the raw frozen data into a code object and then exec it.
We've added _imp.find_frozen(), add an arg to _imp.get_frozen_object(), and updated FrozenImporter. We've also moved some code around to reduce duplication, get a little more consistency in outcomes, and be more efficient.
Note that this change is mostly necessary if we want to set __file__ on frozen stdlib modules. (See https://bugs.python.org/issue21736.)
https://bugs.python.org/issue45324
Currently we freeze several modules into the runtime. For each of these modules it is essential to bootstrapping the runtime that they be frozen. Any other stdlib module that we later freeze into the runtime is not essential. We can just as well import from the .py file. This PR lets users explicitly choose which should be used, with the new "-X frozen_modules=[on|off]" CLI flag. The default is "off" for now.
https://bugs.python.org/issue45020
There are a few things I missed in gh-27980. This is a follow-up that will make subsequent PRs cleaner. It includes fixes to tests and tools that reference the frozen modules.
https://bugs.python.org/issue45019
Py_RunMain() now resets PyImport_Inittab to its initial value at
exit. It must be possible to call PyImport_AppendInittab() or
PyImport_ExtendInittab() at each Python initialization.
* pycore_ast.h no longer defines the Yield macro.
* Fix a compiler warning on Windows: "warning C4005: 'Yield': macro
redefinition".
* Python-ast.c now defines directly functions with their real
_Py_xxx() name, rather than xxx().
* Remove "#undef Yield" in C files including pycore_ast.h.
These functions were undocumented and excluded from the limited C
API.
Most names defined by these header files were not prefixed by "Py"
and so could create names conflicts. For example, Python-ast.h
defined a "Yield" macro which was conflict with the "Yield" name used
by the Windows <winbase.h> header.
Use the Python ast module instead.
* Move Include/asdl.h to Include/internal/pycore_asdl.h.
* Move Include/Python-ast.h to Include/internal/pycore_ast.h.
* Remove ast.h header file.
* pycore_symtable.h no longer includes Python-ast.h.
Pass the current interpreter (interp) rather than the current Python
thread state (tstate) to internal functions which only use the
interpreter.
Modified functions:
* _PyXXX_Fini() and _PyXXX_ClearFreeList() functions
* _PyEval_SignalAsyncExc(), make_pending_calls()
* _PySys_GetObject(), sys_set_object(), sys_set_object_id(), sys_set_object_str()
* should_audit(), set_flags_from_config(), make_flags()
* _PyAtExit_Call()
* init_stdio_encoding()
* etc.
Convert the _imp extension module to the multi-phase initialization
API (PEP 489).
* Add _PyImport_BootstrapImp() which fix a bootstrap issue: import
the _imp module before importlib is initialized.
* Add create_builtin() sub-function, used by _imp_create_builtin().
* Initialize PyInterpreterState.import_func earlier, in
pycore_init_builtins().
* Remove references to _PyImport_Cleanup(). This function has been
renamed to finalize_modules() and moved to pylifecycle.c.
* Call _PyTime_Init() and _PyWarnings_InitState() earlier during the
Python initialization.
* Inline _PyImportHooks_Init() into _PySys_InitCore().
* The _warnings initialization function no longer call
_PyWarnings_InitState() to prevent resetting filters_version to 0.
* _PyWarnings_InitState() now returns an int and no longer clear the
state in case of error (it's done anyway at Python exit).
* Rework init_importlib(), fix refleaks on errors.
Move private _PyGC_CollectNoFail() to the internal C API.
Remove the private _PyGC_CollectIfEnabled() which was just an alias
to the public PyGC_Collect() function since Python 3.8.
Rename functions:
* collect() => gc_collect_main()
* collect_with_callback() => gc_collect_with_callback()
* collect_generations() => gc_collect_generations()
If PyDict_GetItemWithError is only used to check whether the key is in dict,
it is better to use PyDict_Contains instead.
And if it is used in combination with PyDict_SetItem, PyDict_SetDefault can
replace the combination.
* PyMapping_HasKey() is not safe because it silences all exceptions and can return incorrect result.
* Informative exceptions from PyMapping_DelItem() are overridden with RuntimeError and
the original exception raised before calling remove_module() is lost.
* There is a race condition between PyMapping_HasKey() and PyMapping_DelItem().
Since _PyImport_ReInitLock() now calls _PyThread_at_fork_reinit() on
the import lock, the lock is now in a known state: unlocked. It
became safe to acquire it after fork.
PyOS_AfterFork_Child() helper functions now return a PyStatus:
PyOS_AfterFork_Child() is now responsible to handle errors.
* Move _PySignal_AfterFork() to the internal C API
* Add #ifdef HAVE_FORK on _PyGILState_Reinit(), _PySignal_AfterFork()
and _PyInterpreterState_DeleteExceptMain().
I can add another commit with the new test case I wrote to verify that the warning was being printed before my change, stopped printing after my change, and that the function does not return null after my change.
Automerge-Triggered-By: @brettcannon
Otherwise we leave a dangling pointer to free'd memory. If we
then initialize a new interpreter in the same process and call
PyImport_ExtendInittab, we will (likely) crash when calling
PyMem_RawRealloc(inittab_copy, ...) since the pointer address
is bogus.
Automerge-Triggered-By: @brettcannon
PyFrame_GetCode(frame): return a borrowed reference to the frame
code.
Replace frame->f_code with PyFrame_GetCode(frame) in most code,
except in frameobject.c, genobject.c and ceval.c.
Also add PyFrame_GetLineNumber() to the limited C API.
Rename _PyInterpreterState_GET_UNSAFE() to _PyInterpreterState_GET()
for consistency with _PyThreadState_GET() and to have a shorter name
(help to fit into 80 columns).
Add also "assert(tstate != NULL);" to the function.
Don't access PyInterpreterState.config member directly anymore, but
use new functions:
* _PyInterpreterState_GetConfig()
* _PyInterpreterState_SetConfig()
* _Py_GetConfig()
Replace _PyInterpreterState_Get() function call with
_PyInterpreterState_GET_UNSAFE() macro which is more efficient but
don't check if tstate or interp is NULL.
_Py_GetConfigsAsDict() now uses _PyThreadState_GET().
The Py_FatalError() function is replaced with a macro which logs
automatically the name of the current function, unless the
Py_LIMITED_API macro is defined.
Changes:
* Add _Py_FatalErrorFunc() function.
* Remove the function name from the message of Py_FatalError() calls
which included the function name.
* Update tests.
The bulk of this patch was generated automatically with:
for name in \
PyObject_Vectorcall \
Py_TPFLAGS_HAVE_VECTORCALL \
PyObject_VectorcallMethod \
PyVectorcall_Function \
PyObject_CallOneArg \
PyObject_CallMethodNoArgs \
PyObject_CallMethodOneArg \
;
do
echo $name
git grep -lwz _$name | xargs -0 sed -i "s/\b_$name\b/$name/g"
done
old=_PyObject_FastCallDict
new=PyObject_VectorcallDict
git grep -lwz $old | xargs -0 sed -i "s/\b$old\b/$new/g"
and then cleaned up:
- Revert changes to in docs & news
- Revert changes to backcompat defines in headers
- Nudge misaligned comments
Currently, during runtime destruction, `_PyImport_Cleanup` is clearing the interpreter state before clearing out the modules themselves. This leads to a segfault on modules that rely on the module state to clear themselves up.
For example, let's take the small snippet added in the issue by @DinoV :
```
import _struct
class C:
def __init__(self):
self.pack = _struct.pack
def __del__(self):
self.pack('I', -42)
_struct.x = C()
```
The module `_struct` uses the module state to run `pack`. Therefore, the module state has to be alive until after the module has been cleared out to successfully run `C.__del__`. This happens at line 606, when `_PyImport_Cleanup` calls `_PyModule_Clear`. In fact, the loop that calls `_PyModule_Clear` has in its comments:
> Now, if there are any modules left alive, clear their globals to minimize potential leaks. All C extension modules actually end up here, since they are kept alive in the interpreter state.
That means that we can't clear the module state (which is used by C Extensions) before we run that loop.
Moving `_PyInterpreterState_ClearModules` until after it, fixes the segfault in the code snippet.
Finally, this updates a test in `io` to correctly assert the error that it now throws (since it now finds the io module state). The test that uses this is: `test_create_at_shutdown_without_encoding`. Given this test is now working is a proof that the module state now stays alive even when `__del__` is called at module destruction time. Thus, I didn't add a new tests for this.
https://bugs.python.org/issue38076