I recently added some tests to test_imp, but @warsaw is removing that file in gh-98573. The tests are worth keeping so here I'm moving them to test_import.
This is related to fixing the refleaks introduced by commit 096d009. I haven't been able to find the leak yet, but these changes are a consequence of that effort. This includes some cleanup, some tweaks to the existing tests, and a bunch of new test cases. The only change here that might have impact outside the tests in question is in imp.py, where I update imp.load_dynamic() to use spec_from_file_location() instead of creating a ModuleSpec directly.
Also note that I've updated the tests to only skip if we're checking for refleaks (regrtest's --huntrleaks), whereas in gh-101969 I had skipped the tests entirely. The tests will be useful for some upcoming work and I'd rather the refleaks not hold that up. (It isn't clear how quickly we'll be able to fix the leaking code, though it will certainly be done in the short term.)
https://github.com/python/cpython/issues/102251
gh-101891 is causing failures under `$> ./python -m test test_imp -R 3:3`. Furthermore, with that fixed, "test_singlephase_variants" is leaking references. This change addresses the first part, but skips the leaking tests until we can follow up with a fix.
https://github.com/python/cpython/issues/101758
This change is almost entirely moving code around and hiding import state behind internal API. We introduce no changes to behavior, nor to non-internal API. (Since there was already going to be a lot of churn, I took this as an opportunity to re-organize import.c into topically-grouped sections of code.) The motivation is to simplify a number of upcoming changes.
Specific changes:
* move existing import-related code to import.c, wherever possible
* add internal API for interacting with import state (both global and per-interpreter)
* use only API outside of import.c (to limit churn there when changing the location, etc.)
* consolidate the import-related state of PyInterpreterState into a single struct field (this changes layout slightly)
* add macros for import state in import.c (to simplify changing the location)
* group code in import.c into sections
*remove _PyState_AddModule()
https://github.com/python/cpython/issues/101758
The new test exercises the most important variants for single-phase init extension modules. We also add some explanation about those variants to import.c.
https://github.com/python/cpython/issues/101758
Editors don't agree that `test_source_encoding.py` was valid koi8-r, making it
hard to edit that file without the editor breaking it in some way (see gh-96272).
Only one test actually relied on the koi8-r encoding and it was a duplicate of a
test from the deprecated `imp` module's `test_imp`, so here we replace
`test_pep263` with `test_import_encoded_module` stolen from `test_imp` and
set `test_source_encoding.py`'s encoding to utf-8 to make editing it easier
going forward.
Doing this provides significant performance gains for runtime startup (~15% with all the imported modules frozen). We don't yet freeze all the imported modules because there are a few hiccups in the build systems we need to sort out first. (See bpo-45186 and bpo-45188.)
Note that in PR GH-28320 we added a command-line flag (-X frozen_modules=[on|off]) that allows users to opt out of (or into) using frozen modules. The default is still "off" but we will change it to "on" as soon as we can do it in a way that does not cause contributors pain.
https://bugs.python.org/issue45020
* bpo-35321: Set the spec origin to frozen in frozen modules
This fix correctly sets the spec origin to
"frozen" for the _frozen_importlib module. Note that the
origin was already correctly set in _frozen_importlib_external.
* 📜🤖 Added by blurb_it.
* Always return bytes from _HackedGetData.get_data().
Ensure the imp.load_source shim always returns bytes by reopening the file in
binary mode if needed. Hash-based pycs have to receive the source code in bytes.
It's tempting to change imp.get_suffixes() to always return 'rb' as a mode, but
that breaks some stdlib tests and likely 3rdparty code, too.
Python now supports checking bytecode cache up-to-dateness with a hash of the
source contents rather than volatile source metadata. See the PEP for details.
While a fairly straightforward idea, quite a lot of code had to be modified due
to the pervasiveness of pyc implementation details in the codebase. Changes in
this commit include:
- The core changes to importlib to understand how to read, validate, and
regenerate hash-based pycs.
- Support for generating hash-based pycs in py_compile and compileall.
- Modifications to our siphash implementation to support passing a custom
key. We then expose it to importlib through _imp.
- Updates to all places in the interpreter, standard library, and tests that
manually generate or parse pyc files to grok the new format.
- Support in the interpreter command line code for long options like
--check-hash-based-pycs.
- Tests and documentation for all of the above.
Based on patch by Victor Stinner.
Add private C API function _PyUnicode_AsUnicode() which is similar to
PyUnicode_AsUnicode(), but checks for null characters.
To resolve a compatibility problem found with py2exe and
pywin32, imp.load_dynamic() once again ignores previously loaded modules
to support Python modules replacing themselves with extension modules.
Patch by Petr Viktorin.
The concept of .pyo files no longer exists. Now .pyc files have an
optional `opt-` tag which specifies if any extra optimizations beyond
the peepholer were applied.
To make sure there is no issue with code that is both Python 2 and 3
compatible, there are no plans to remove the module any sooner than
Python 4 (unless the community moves to Python 3 solidly before then).