Python now supports checking bytecode cache up-to-dateness with a hash of the
source contents rather than volatile source metadata. See the PEP for details.
While a fairly straightforward idea, quite a lot of code had to be modified due
to the pervasiveness of pyc implementation details in the codebase. Changes in
this commit include:
- The core changes to importlib to understand how to read, validate, and
regenerate hash-based pycs.
- Support for generating hash-based pycs in py_compile and compileall.
- Modifications to our siphash implementation to support passing a custom
key. We then expose it to importlib through _imp.
- Updates to all places in the interpreter, standard library, and tests that
manually generate or parse pyc files to grok the new format.
- Support in the interpreter command line code for long options like
--check-hash-based-pycs.
- Tests and documentation for all of the above.
PyImport_ExtendInittab() now uses PyMem_RawRealloc() rather than
PyMem_Realloc(). PyImport_ExtendInittab() can be called before
Py_Initialize() whereas only the PyMem_Raw allocator is supposed to
be used before Py_Initialize().
Add _PyImport_Fini2() to release the memory allocated by
PyImport_ExtendInittab() at exit. PyImport_ExtendInittab() now forces
the usage of the default raw allocator, to be able to release memory
in _PyImport_Fini2().
Don't export these functions anymore to be C API, only to
Py_BUILD_CORE:
* _PyExc_Fini()
* _PyImport_Fini()
* _PyGC_DumpShutdownStats()
* _PyGC_Fini()
* _PyType_Fini()
* _Py_HashRandomization_Fini()
Add import_find_and_load() helper function. The addition of
the importtime option has made PyImport_ImportModuleLevelObject() large
and so using a helper seems worthwhile. It also makes it clearer that
abs_name is the only argument needed by _find_and_load().
Py_Main() now handles two more -X options:
* -X showrefcount: new _PyCoreConfig.show_ref_count field
* -X showalloccount: new _PyCoreConfig.show_alloc_count field
Parse more env vars in Py_Main():
* Add more options to _PyCoreConfig:
* faulthandler
* tracemalloc
* importtime
* Move code to parse environment variables from _Py_InitializeCore()
to Py_Main(). This change fixes a regression from Python 3.6:
PYTHONUNBUFFERED is now read before calling pymain_init_stdio().
* _PyFaulthandler_Init() and _PyTraceMalloc_Init() now take an
argument to decide if the module has to be enabled at startup.
* tracemalloc_start() is now responsible to check the maximum number
of frames.
Other changes:
* Cleanup Py_Main():
* Rename some pymain_xxx() subfunctions
* Add pymain_run_python() subfunction
* Cleanup Py_NewInterpreter()
* _PyInterpreterState_Enable() now reports failure
* init_hash_secret() now considers pyurandom() failure as an "user
error": don't fail with abort().
* pymain_optlist_append() and pymain_strdup() now sets err on memory
allocation failure.
* Don't use "Python runtime" anymore to parse command line options or
to get environment variables: pymain_init() is now a strict
separation.
* Use an error message rather than "crashing" directly with
Py_FatalError(). Limit the number of calls to Py_FatalError(). It
prepares the code to handle errors more nicely later.
* Warnings options (-W, PYTHONWARNINGS) and "XOptions" (-X) are now
only added to the sys module once Python core is properly
initialized.
* _PyMain is now the well identified owner of some important strings
like: warnings options, XOptions, and the "program name". The
program name string is now properly freed at exit.
pymain_free() is now responsible to free the "command" string.
* Rename most methods in Modules/main.c to use a "pymain_" prefix to
avoid conflits and ease debug.
* Replace _Py_CommandLineDetails_INIT with memset(0)
* Reorder a lot of code to fix the initialization ordering. For
example, initializing standard streams now comes before parsing
PYTHONWARNINGS.
* Py_Main() now handles errors when adding warnings options and
XOptions.
* Add _PyMem_GetDefaultRawAllocator() private function.
* Cleanup _PyMem_Initialize(): remove useless global constants: move
them into _PyMem_Initialize().
* Call _PyRuntime_Initialize() as soon as possible:
_PyRuntime_Initialize() now returns an error message on failure.
* Add _PyInitError structure and following macros:
* _Py_INIT_OK()
* _Py_INIT_ERR(msg)
* _Py_INIT_USER_ERR(msg): "user" error, don't abort() in that case
* _Py_INIT_FAILED(err)
* Rewrite win_perf_counter() to only use integers internally.
* Add _PyTime_MulDiv() which compute "ticks * mul / div"
in two parts (int part and remaining) to prevent integer overflow.
* Clock frequency is checked at initialization for integer overflow.
* Enhance also pymonotonic() to reduce the precision loss on macOS
(mach_absolute_time() clock).
time.clock() and time.perf_counter() now use again C double
internally.
Remove also _PyTime_GetWinPerfCounterWithInfo(): use
_PyTime_GetPerfCounterDoubleWithInfo() instead on Windows.
The concrete PyDict_* API is used to interact with PyInterpreterState.modules in a number of places. This isn't compatible with all dict subclasses, nor with other Mapping implementations. This patch switches the concrete API usage to the corresponding abstract API calls.
We also add a PyImport_GetModule() function (and some other helpers) to reduce a bunch of code duplication.
A bunch of code currently uses PyInterpreterState.modules directly instead of PyImport_GetModuleDict(). This complicates efforts to make changes relative to sys.modules. This patch switches to using PyImport_GetModuleDict() uniformly. Also, a number of related uses of sys.modules are updated for uniformity for the same reason.
Note that this code was already reviewed and merged as part of #1638. I reverted that and am now splitting it up into more focused parts.
PR #1638, for bpo-28411, causes problems in some (very) edge cases. Until that gets sorted out, we're reverting the merge. PR #3506, a fix on top of #1638, is also getting reverted.
* group the (stateful) runtime globals into various topical structs
* consolidate the topical structs under a single top-level _PyRuntimeState struct
* add a check-c-globals.py script that helps identify runtime globals
Other globals are excluded (see globals.txt and check-c-globals.py).
* Rewrite importlib _get_module_lock(): it is now responsible to hold
the imp lock directly.
* _find_and_load() now holds the module lock to check if name is in
sys.modules to prevent a race condition
* bpo-6532: Make the thread id an unsigned integer.
From C API side the type of results of PyThread_start_new_thread() and
PyThread_get_thread_ident(), the id parameter of
PyThreadState_SetAsyncExc(), and the thread_id field of PyThreadState
changed from "long" to "unsigned long".
* Restore a check in thread_get_ident().
Issue #28915: Replace _PyObject_CallMethodId() with
_PyObject_CallMethodIdObjArgs() in various modules when the format string was
only made of "O" formats, PyObject* arguments.
_PyObject_CallMethodIdObjArgs() avoids the creation of a temporary tuple and
doesn't have to parse a format string.
Issue #28858: The change b9c9691c72c5 introduced a regression. It seems like
_PyObject_CallArg1() uses more stack memory than
PyObject_CallFunctionObjArgs().
* PyObject_CallFunctionObjArgs(func, NULL) => _PyObject_CallNoArg(func)
* PyObject_CallFunctionObjArgs(func, arg, NULL) => _PyObject_CallArg1(func, arg)
PyObject_CallFunctionObjArgs() allocates 40 bytes on the C stack and requires
extra work to "parse" C arguments to build a C array of PyObject*.
_PyObject_CallNoArg() and _PyObject_CallArg1() are simpler and don't allocate
memory on the C stack.
This change is part of the fastcall project. The change on listsort() is
related to the issue #23507.
or builtins for importing submodules or "from import". Fixed a crash if
raise a warning about unabling to resolve package from __spec__ or
__package__.
* Replace PyUnicode_RPartition() with PyUnicode_FindChar() and
PyUnicode_Substring() to avoid the creation of a temporary tuple.
* Use PyUnicode_FromFormat() to build a string and avoid the single_dot ('.')
singleton
Thanks Serhiy Storchaka for your review.
with no known parent package.
Previously SystemError was raised if the parent package didn't exist
(e.g., __package__ was set to '').
Thanks to Florent Xicluna and Yongzhi Pan for reporting the issue.
In a previous change, __spec__.parent was prioritized over
__package__. That is a backwards-compatibility break, but we do
eventually want __spec__ to be the ground truth for module details. So
this change reverts the change in semantics and instead raises an
ImportWarning when __package__ != __spec__.parent to give people time
to adjust to using spec objects.
not defined for a relative import.
This is the start of work to try and clean up import semantics to rely
more on a module's spec than on the myriad attributes that get set on
a module. Thanks to Rose Ames for the patch.
Known limitations of the current implementation:
- documentation changes are incomplete
- there's a reference leak I haven't tracked down yet
The leak is most visible by running:
./python -m test -R3:3 test_importlib
However, you can also see it by running:
./python -X showrefcount
Importing the array or _testmultiphase modules, and
then deleting them from both sys.modules and the local
namespace shows significant increases in the total
number of active references each cycle. By contrast,
with _testcapi (which continues to use single-phase
initialisation) the global refcounts stabilise after
a couple of cycles.
__file__.
This causes _frozen_importlib to no longer have __file__ set as well
as any frozen module imported using imp.init_frozen() (which is
deprecated).
The new syntax is highly human readable while still preventing false
positives. The syntax also extends Python syntax to denote "self" and
positional-only parameters, allowing inspect.Signature objects to be
totally accurate for all supported builtins in Python 3.4.
annotate text signatures in docstrings, resulting in fewer false
positives. "self" parameters are also explicitly marked, allowing
inspect.Signature() to authoritatively detect (and skip) said parameters.
Issue #20326: Argument Clinic now generates separate checksums for the
input and output sections of the block, allowing external tools to verify
that the input has not changed (and thus the output is not out-of-date).
PyMethodDescr_Type, _PyMethodWrapper_Type, and PyWrapperDescr_Type)
have been modified to provide introspection information for builtins.
Also: many additional Lib, test suite, and Argument Clinic fixes.
Forgot to raise ModuleNotFoundError when None is found in sys.modules.
This led to introducing the C function PyErr_SetImportErrorSubclass()
to make setting ModuleNotFoundError easier.
Also updated the reference docs to mention ModuleNotFoundError
appropriately. Updated the docs for ModuleNotFoundError to mention the
None in sys.modules case.
Lastly, it was noticed that PyErr_SetImportError() was not setting an
exception when returning None in one case. That issue is now fixed.
Previously __path__ was set to [__name__], but that could lead to bad
results if someone managed to circumvent the frozen importer and
somehow ended up with a finder that thought __name__ was a legit
directory/location.
Lib/imp.py for imp.source_from_cache() instead of its own C version.
Also change PyImport_ExecCodeModuleObject() to not infer the source
path from the bytecode path like
PyImport_ExecCodeModuleWithPathnames() does. This makes the function
less magical.
This also has the side-effect of removing all uses of MAXPATHLEN in
Python/import.c which can cause failures on really long filenames.
defined in sysmodule.c instead of straight out of a Unicode object.
Thanks to Amaury Forgeot d'Arc for noticing the bug and Eric Snow for
writing the patch.
This introduces a new function, imp.extension_suffixes(), which is
currently undocumented. That is forthcoming once issue #14657 is
resolved and how to expose file suffixes is decided.
* In debug mode, fill the string data with invalid characters
* Simplify also reference counting in PyCodec_BackslashReplaceErrors()
and PyCodec_XMLCharRefReplaceError()
__loader__.
Since import now sets __loader__ on all modules it creates and
imp.reload() already relied on the attribute for modules that import
didn't create, the only potential compatibility issue is if people
were deleting the attribute on modules and expecting imp.reload() to
continue to work.
rewriting functionality in pure Python.
To start, imp.new_module() has been rewritten in pure Python, put into
importlib (privately) and then publicly exposed in imp.
of sys.modules when possible.
This is being done for two reasons. One is to gain a little bit of
performance by skipping an unnecessary dict lookup in sys.modules. But
the other (and main) reason is to be a little bit more clear in how
things should work from the perspective of import's interactions with
loaders. Otherwise loaders can easily forget to return the module even
though PEP 302 explicitly states they are expected to return the module
they loaded.
importlib._bootstrap is now frozen into Python/importlib.h and stored
as _frozen_importlib in sys.modules. Py_Initialize() loads the frozen
code along with sys and imp and then uses _frozen_importlib._install()
to set builtins.__import__() w/ _frozen_importlib.__import__().
This allows generators that are using yield from to be seen by debuggers. It
also kills the f_yieldfrom field on frame objects.
Patch mostly from Mark Shannon with a few tweaks by me.
find_module() now raises a RuntimeError, instead of ImportError, on an error on
sys.path or sys.meta_path because load_package() and import_submodule() returns
None and clear the exception if a ImportError occurred.
find_module() now raises a RuntimeError, instead of ImportError, on an error on
sys.path or sys.meta_path because load_package() and import_submodule() returns
None and clear the exception if a ImportError occurred.
paths.
__import__ does a little trick when importing from bytecode by
back-patching the co_filename paths to point to the file location
where the code object was loaded from, *not* where the code object was
originally created. This allows co_filename to point to a valid path.
Problem is that co_filename is immutable from Python, so a private
function -- imp._fix_co_filename() -- had to be introduced in order to
get things working properly. Originally the plan was to add a file
argument to marshal.loads(), but that failed as the algorithm used by
__import__ is not fully recursive as one might expect, so to be fully
backwards-compatible the code used by __import__ needed to be exposed.
This closes issue #6811 by taking a different approach than outlined
in the issue.