Python now supports checking bytecode cache up-to-dateness with a hash of the
source contents rather than volatile source metadata. See the PEP for details.
While a fairly straightforward idea, quite a lot of code had to be modified due
to the pervasiveness of pyc implementation details in the codebase. Changes in
this commit include:
- The core changes to importlib to understand how to read, validate, and
regenerate hash-based pycs.
- Support for generating hash-based pycs in py_compile and compileall.
- Modifications to our siphash implementation to support passing a custom
key. We then expose it to importlib through _imp.
- Updates to all places in the interpreter, standard library, and tests that
manually generate or parse pyc files to grok the new format.
- Support in the interpreter command line code for long options like
--check-hash-based-pycs.
- Tests and documentation for all of the above.
Use sys.modules.get() in the "with _ModuleLockManager(name):" block
to protect the dictionary key with the module lock and use an atomic
get to prevent race condition.
Remove also _bootstrap._POPULATE since it was unused
(_bootstrap_external now has its own _POPULATE object), add a new
_SENTINEL object instead.
* Rewrite importlib _get_module_lock(): it is now responsible to hold
the imp lock directly.
* _find_and_load() now holds the module lock to check if name is in
sys.modules to prevent a race condition
Previously AttributeError was raised, but that's not very reflective of the fact that the requested module can't be found since the specified parent isn't actually a package.
PEP 432 specifies a number of large changes to interpreter startup code, including exposing a cleaner C-API. The major changes depend on a number of smaller changes. This patch includes all those smaller changes.
Special thanks to INADA Naoki for pushing the patch through
the last mile, Serhiy Storchaka for reviewing the code, and to
Victor Stinner for suggesting the idea (originally implemented
in the PyPy project).
Handling zero-argument super() in __init_subclass__ and
__set_name__ involved moving __class__ initialisation to
type.__new__. This requires cooperation from custom
metaclasses to ensure that the new __classcell__ entry
is passed along appropriately.
The initial implementation of that change resulted in abruptly
broken zero-argument super() support in metaclasses that didn't
adhere to the new requirements (such as Django's metaclass for
Model definitions).
The updated approach adopted here instead emits a deprecation
warning for those cases, and makes them work the same way they
did in Python 3.5.
This patch also improves the related class machinery documentation
to cover these details and to include more reader-friendly
cross-references and index entries.
The __class__ cell used by zero-argument super() is now initialized
from type.__new__ rather than __build_class__, so class methods
relying on that will now work correctly when called from metaclass
methods during class creation.
Patch by Martin Teichmann.
Issue #27213: Rework CALL_FUNCTION* opcodes to produce shorter and more
efficient bytecode:
* CALL_FUNCTION now only accepts position arguments
* CALL_FUNCTION_KW accepts position arguments and keyword arguments, but keys
of keyword arguments are packed into a constant tuple.
* CALL_FUNCTION_EX is the most generic, it expects a tuple and a dict for
positional and keyword arguments.
CALL_FUNCTION_VAR and CALL_FUNCTION_VAR_KW opcodes have been removed.
2 tests of test_traceback are currently broken: skip test, the issue #28050 was
created to track the issue.
Patch by Demur Rumed, design by Serhiy Storchaka, reviewed by Serhiy Storchaka
and Victor Stinner.
Windows.
Originally only b'PYTHONCASEOK' was being checked for in os.environ,
but that won't work under Windows where all environment variables are
strings (on OS X they are bytes).
Thanks to Eryk Sun for the bug report.
modules can't be lazily loaded.
Thanks to Python 3.6 allowing for types.ModuleType to have its
__class__ mutated, the restriction can be lifted by calling
create_module() on the wrapped loader.
Issue #26637: The importlib module now emits an ImportError rather than a
TypeError if __import__() is tried during the Python shutdown process but
sys.path is already cleared (set to None).
Issue #26538: libregrtest: Fix setup_tests() to keep module.__path__ type
(_NamespacePath), don't convert to a list.
Add _NamespacePath.__setitem__() method to importlib._bootstrap_external.
importlib.util.LazyLoader.
The class was checking its argument as to whether its implementation
of create_module() came directly from importlib.abc.Loader. The
problem is that the classes coming from imoprtlib.machinery do not
directly inherit from the ABC as they come from _frozen_importlib.
Because the documentation has always said that create_module() was
ignored, the check has simply been removed.
In a previous change, __spec__.parent was prioritized over
__package__. That is a backwards-compatibility break, but we do
eventually want __spec__ to be the ground truth for module details. So
this change reverts the change in semantics and instead raises an
ImportWarning when __package__ != __spec__.parent to give people time
to adjust to using spec objects.
Issue #26107: The format of the co_lnotab attribute of code objects changes to
support negative line number delta.
Changes:
* assemble_lnotab(): if line number delta is less than -128 or greater than
127, emit multiple (offset_delta, lineno_delta) in co_lnotab
* update functions decoding co_lnotab to use signed 8-bit integers
- dis.findlinestarts()
- PyCode_Addr2Line()
- _PyCode_CheckLineNumber()
- frame_setlineno()
* update lnotab_notes.txt
* increase importlib MAGIC_NUMBER to 3361
* document the change in What's New in Python 3.6
* cleanup also PyCode_Optimize() to use better variable names
not defined for a relative import.
This is the start of work to try and clean up import semantics to rely
more on a module's spec than on the myriad attributes that get set on
a module. Thanks to Rose Ames for the patch.
Summary of changes:
1. Coroutines now have a distinct, separate from generators
type at the C level: PyGen_Type, and a new typedef PyCoroObject.
PyCoroObject shares the initial segment of struct layout with
PyGenObject, making it possible to reuse existing generators
machinery. The new type is exposed as 'types.CoroutineType'.
As a consequence of having a new type, CO_GENERATOR flag is
no longer applied to coroutines.
2. Having a separate type for coroutines made it possible to add
an __await__ method to the type. Although it is not used by the
interpreter (see details on that below), it makes coroutines
naturally (without using __instancecheck__) conform to
collections.abc.Coroutine and collections.abc.Awaitable ABCs.
[The __instancecheck__ is still used for generator-based
coroutines, as we don't want to add __await__ for generators.]
3. Add new opcode: GET_YIELD_FROM_ITER. The opcode is needed to
allow passing native coroutines to the YIELD_FROM opcode.
Before this change, 'yield from o' expression was compiled to:
(o)
GET_ITER
LOAD_CONST
YIELD_FROM
Now, we use GET_YIELD_FROM_ITER instead of GET_ITER.
The reason for adding a new opcode is that GET_ITER is used
in some contexts (such as 'for .. in' loops) where passing
a coroutine object is invalid.
4. Add two new introspection functions to the inspec module:
getcoroutinestate(c) and getcoroutinelocals(c).
5. inspect.iscoroutine(o) is updated to test if 'o' is a native
coroutine object. Before this commit it used abc.Coroutine,
and it was requested to update inspect.isgenerator(o) to use
abc.Generator; it was decided, however, that inspect functions
should really be tailored for checking for native types.
6. sys.set_coroutine_wrapper(w) API is updated to work with only
native coroutines. Since types.coroutine decorator supports
any type of callables now, it would be confusing that it does
not work for all types of coroutines.
7. Exceptions logic in generators C implementation was updated
to raise clearer messages for coroutines:
Before: TypeError("generator raised StopIteration")
After: TypeError("coroutine raised StopIteration")
Known limitations of the current implementation:
- documentation changes are incomplete
- there's a reference leak I haven't tracked down yet
The leak is most visible by running:
./python -m test -R3:3 test_importlib
However, you can also see it by running:
./python -X showrefcount
Importing the array or _testmultiphase modules, and
then deleting them from both sys.modules and the local
namespace shows significant increases in the total
number of active references each cycle. By contrast,
with _testcapi (which continues to use single-phase
initialisation) the global refcounts stabilise after
a couple of cycles.
The concept of .pyo files no longer exists. Now .pyc files have an
optional `opt-` tag which specifies if any extra optimizations beyond
the peepholer were applied.
importlib.abc.Loader.exec_module() is also defined.
Before this change, create_module() was optional **and** could return
None to trigger default semantics. This change now reduces the
options for choosing default semantics to one and in the most
backporting-friendly way (define create_module() to return None).
Along the way, dismantle importlib._bootstrap._SpecMethods as it was
no longer relevant and constructing the new function required
partially dismantling the class anyway.
This code was an artifact of issuing a DeprecationWarning for the lack
of loader.exec_module(). However, we have deferred such warnings to
later Python versions.
old methods now provide implementations when PEP 451 APIs are present.
This should help with backwards-compatibility with code which has not
been updated to work with PEP 451.
Early in the PEP 451 implementation some of the importlib loaders had
their own _get_spec() methods to simplify accommodating them. However,
later implementations removed the need. They simply failed to remove
this code at the same time. :)
module loaders.
Due to the fact that the call signatures for extension modules and
built-in modules does not allow for the specifying of what module to
initialize and that on Windows all extension modules are built-in
modules, work to clean up built-in and extension module initialization
will have to wait until Python 3.5. Because of this the semantics of
exec_module() would be incorrect, so removing the methods for now is
the best option; load_module() is still used as a fallback by
importlib and so this won't affect semantics.
importlib.machinery.FileFinder.
While originally moved to stop special-casing '' as PathFinder farther
up the typical call chain now uses the cwd in the instance of '', it
was deemed an unnecessary risk to breaking subclasses of FileFinder to
take the special-casing out.
exists when checking for a package.
Before there was an isdir check and then various isfile checks for
possible __init__ files when looking for a package.
This change drops the isdir check by leaning
on the assumption that a directory will not contain something named
after the module being imported which is not a directory. If the module
is a package then it saves a stat call. If there is nothing in the
directory with the potential package name it also saves a stat call.
Only if there is something in the directory named the same thing as
the potential package will the number of stat calls increase
(due to more wasteful __init__ checks).
Semantically there is no change as the isdir check moved
down so that namespace packages continue to have no chance of
accidentally collecting non-existent directories.
and stop importlib.machinery.FileFinder treating '' as '.'.
Previous PathFinder transformed '' into '.' which led to __file__ for
modules imported from the cwd to always be relative paths. This meant
the values of the attribute were wrong as soon as the cwd changed.
This change now means that as long as the site module is run (which
makes all entries in sys.path absolute) then all values for __file__
will also be absolute unless it's for __main__ when specified by file
path in a relative way (modules imported by runpy will have an
absolute path).
Now that PathFinder is no longer treating '' as '.' it only makes
sense for FileFinder to stop doing so as well. Now no transformation
is performed for the directory given to the __init__ method.
Thanks to Madison May for the initial patch.
This commit fixes a regression that sneaked into Python 3.3 where importlib
was not respecting -E when checking for the PYTHONCASEOK environment variable.
This commit fixes a regression that sneaked into Python 3.3 where importlib
was not respecting -E when checking for the PYTHONCASEOK environment variable.
importlib._bootstrap._get_sourcefile().
Thanks to its only use by the C API, it was never properly tested
until now.
Thanks to Neal Norwitz for discovering the bug and Madison May for the patch.
The private attribute was leaking out of importlib and led to at least
one person noticing it. Switch to another hack which won't leak
outside of importlib and is nearly as robust.
The helper function makes it easier to implement
imoprtlib.abc.InspectLoader.get_source() by making that function
require just the raw bytes for source code and handling all other
details.
UnicodeDecodeError as ImportError. That was over-reaching the point of
raising ImportError in get_source() (which is to signal the source
code was not found when it should have). Conflating the two exceptions
with ImportError could lead to masking errors with the source which
should be known outside of whether there was an error simply getting
the source to begin with.
Forgot to raise ModuleNotFoundError when None is found in sys.modules.
This led to introducing the C function PyErr_SetImportErrorSubclass()
to make setting ModuleNotFoundError easier.
Also updated the reference docs to mention ModuleNotFoundError
appropriately. Updated the docs for ModuleNotFoundError to mention the
None in sys.modules case.
Lastly, it was noticed that PyErr_SetImportError() was not setting an
exception when returning None in one case. That issue is now fixed.
ImportError.
The exception is raised by import when a module could not be found.
Technically this is defined as no viable loader could be found for the
specified module. This includes ``from ... import`` statements so that
the module usage is consistent for all situations where import
couldn't find what was requested.
This should allow for the common idiom of::
try:
import something
except ImportError:
pass
to be updated to using ModuleNotFoundError and not accidentally mask
ImportError messages that should propagate (e.g. issues with a
loader).
This work was driven by the fact that the ``from ... import``
statement needed to be able to tell the difference between an
ImportError that simply couldn't find a module (and thus silence the
exception so that ceval can raise it) and an ImportError that
represented an actual problem.
importlib.abc.Loader.init_module_attrs() and implement
importlib.abc.InspectLoader.load_module().
The importlib.abc.Loader.init_module_attrs() method sets the various
attributes on the module being loaded. It is done unconditionally to
support reloading. Typically people used
importlib.util.module_for_loader, but since that's a decorator there
was no way to override it's actions, so init_module_attrs() came into
existence to allow for overriding. This is also why module_for_loader
is now pending deprecation (having its other use replaced by
importlib.util.module_to_load).
All of this allowed for importlib.abc.InspectLoader.load_module() to
be implemented. At this point you can now implement a loader with
nothing more than get_code() (which only requires get_source();
package support requires is_package()). Thanks to init_module_attrs()
the implementation of load_module() is basically a context manager
containing 2 methods calls, a call to exec(), and a return statement.
handle providing (and cleaning up if needed) the module to be loaded.
A future commit will use the context manager in
Lib/importlib/_bootstrap.py and thus why the code is placed there
instead of in Lib/importlib/util.py.
While the previous location was fine, it makes more sense to have the
method higher up in the inheritance chain, especially at a point where
get_source() is defined which is the earliest source_to_code() could
programmatically be used in the inheritance tree in importlib.abc.
attributes to None.
The long-term goal is for people to be able to rely on these
attributes existing and checking for None to see if they have been
set. Since import itself sets these attributes when a loader does not
the only instances when the attributes are None are from someone
overloading __import__() and not using a loader or someone creating a
module from scratch.
This patch also unifies module initialization. Before you could have
different attributes with default values depending on how the module
object was created. Now the only way to not get the same default set
of attributes is to circumvent initialization by calling
ModuleType.__new__() directly.
the default exception/value when called instead of raising/returning
NotimplementedError/NotImplemented (except where appropriate).
This should allow for the ABCs to act as the bottom/end of the MRO with expected
default results.
As part of this work, also make importlib.abc.Loader.module_repr()
optional instead of an abstractmethod.
__loader__ is not set on a module. This brings the exception in line
with when __loader__ is None (which is equivalent to not having the
attribute defined).
First, because the mtime can exceed 4 bytes, make sure to mask it down to 4
bytes before getting its little-endian representation for writing out to a .pyc
file.
Two, cap an rsplit() call to 1 split, else can lead to too many values being
returned for unpacking.
importlib.machinery.FileFinder when the directory has become
unreadable or a file. This brings semantics in line with Python 3.2
import.
Reported and diagnosed by David Pritchard.
fromlist of __import__ propagate.
The problem previously was that if something listed in fromlist didn't
exist then that's okay. The fix for that was too broad in terms of
catching ImportError.
The trick with the solution to this issue is that the proper
refactoring of import thanks to importlib doesn't allow for a way to
distinguish (portably) between an ImportError because finders couldn't
find a loader, or a loader raised the exception. In Python 3.4 the
hope is to introduce a new exception (e.g. ModuleNotFound) to make it
clean to differentiate why ImportError was raised.
When the fromlist argument is specified for __import__() and the
attribute doesn't already exist, an import is attempted. If that fails
(e.g. module doesn't exist), the ImportError will now be silenced (for
backwards-compatibility). This *does not* affect
``from ... import ...`` statements.
Thanks to Eric Snow for the patch and Simon Feltman for reporting the
regression.
state of the import system. Also make importlib.invalidate_caches()
work with sys.meta_path instead of sys.path_importer_cache to
completely separate the path-based import system from the overall
import system.
Patch by Eric Snow.
Lib/imp.py for imp.source_from_cache() instead of its own C version.
Also change PyImport_ExecCodeModuleObject() to not infer the source
path from the bytecode path like
PyImport_ExecCodeModuleWithPathnames() does. This makes the function
less magical.
This also has the side-effect of removing all uses of MAXPATHLEN in
Python/import.c which can cause failures on really long filenames.
statement (e.g. ``from distutils import msvc9compiler``) that triggers
an ImportError of its own (e.g. the non-existence of winreg), let that
exception propagate instead of raising a generic ImportError for the
module being requested (e.g. msvc9compiler).
module name into consideration when determining whether a module is a
package or not. This prevents importing a module's __init__ module
directly and having it considered a package, which can lead to
duplicate sub-modules.
Thanks to Ronan Lamy for reporting the bug.
The long-term goal is to deprecate imp.find_module() in favour of this
API, but it will take some time as some APIs explicitly return/use what
imp.find_module() returns.
importlib.abc.FileLoader.load_module()/get_filename() and
importlib.machinery.ExtensionFileLoader.load_module() have their
single argument be optional as the loader's constructor has all the
ncessary information.
This allows for the deprecation of
imp.load_source()/load_compile()/load_package().
importlib.machinery that provide the suffix details for import.
The attributes were not put on imp so as to compartmentalize
everything importlib needs for setting up imports in
importlib.machinery.
This also led to an indirect deprecation of inspect.getmoduleinfo() as
it directly returned imp.get_suffix's returned tuple which no longer
makes sense.
This introduces a new function, imp.extension_suffixes(), which is
currently undocumented. That is forthcoming once issue #14657 is
resolved and how to expose file suffixes is decided.
importlib.util.module_for_loader also will set __loader__ along with
__package__. This is in conjunction to a forthcoming update to PEP 302
which will make these two attributes required for loaders to set.
be implicit.
Added a warning for when sys.path_hooks is found to be empty. Also
changed the meaning of None in sys.path_importer_cache to represent
trying sys.path_hooks again (an interpretation of previous semantics).
Also added a warning for when None was found.
The long-term goal is for None in sys.path_importer_cache to represent
the same as imp.NullImporter: no finder found for that sys.path entry.
importlib.machinery.(FileFinder, SourceFileLoader,
_SourcelessFileLoader, ExtensionFileLoader).
This exposes all of importlib's mechanisms that will become public on
the sys module.
for performance. While get_magic() could move to Lib/imp.py, having to
support PyImport_GetMagicNumber() would lead to equal, if not more, C
code than sticking with the status quo.
rewriting functionality in pure Python.
To start, imp.new_module() has been rewritten in pure Python, put into
importlib (privately) and then publicly exposed in imp.
of sys.modules when possible.
This is being done for two reasons. One is to gain a little bit of
performance by skipping an unnecessary dict lookup in sys.modules. But
the other (and main) reason is to be a little bit more clear in how
things should work from the perspective of import's interactions with
loaders. Otherwise loaders can easily forget to return the module even
though PEP 302 explicitly states they are expected to return the module
they loaded.
importlib._bootstrap is now frozen into Python/importlib.h and stored
as _frozen_importlib in sys.modules. Py_Initialize() loads the frozen
code along with sys and imp and then uses _frozen_importlib._install()
to set builtins.__import__() w/ _frozen_importlib.__import__().
importation, then respect that injection.
Discovered thanks to Lib/xml/parsers/expat.py injecting
xml.parsers.expat.errors and etree now importing that directly as a
module.
It seems better to cache the finder for the cwd under its full path
insetad of '' in case the cwd changes. Otherwise FileFinder needs to
dynamically change itself based on whether it is given '' instead of
caching a finder for every change to the cwd.
Thanks to os.environ under Windows only updating the dict and not the
environment itself (as exposed by nt.environ), tests using
PYTHONCASEOK always fail. Now the tests are skipped when os.environ
does not do what is expected.
This required updating the code to use posix instead of os. This is
all being done to make bootstrapping easier to removing dependencies
that are kept in importlib.__init__ and thus outside of the single
file to bootstrap from.