This commit replaces the Python implementation of the tokenize module with an implementation
that reuses the real C tokenizer via a private extension module. The tokenize module now implements
a compatibility layer that transforms tokens from the C tokenizer into Python tokenize tokens for backward
compatibility.
As the C tokenizer does not emit some tokens that the Python tokenizer provides (such as comments and non-semantic newlines), a new special mode has been added to the C tokenizer mode that currently is only used via
the extension module that exposes it to the Python layer. This new mode forces the C tokenizer to emit these new extra tokens and add the appropriate metadata that is needed to match the old Python implementation.
Co-authored-by: Pablo Galindo <pablogsal@gmail.com>
This is the implementation of PEP683
Motivation:
The PR introduces the ability to immortalize instances in CPython which bypasses reference counting. Tagging objects as immortal allows up to skip certain operations when we know that the object will be around for the entire execution of the runtime.
Note that this by itself will bring a performance regression to the runtime due to the extra reference count checks. However, this brings the ability of having truly immutable objects that are useful in other contexts such as immutable data sharing between sub-interpreters.
* The majority of the monitoring code is in instrumentation.c
* The new instrumentation bytecodes are in bytecodes.c
* legacy_tracing.c adapts the new API to the old sys.setrace and sys.setprofile APIs
Enforcing (optionally) the restriction set by PEP 489 makes sense. Furthermore, this sets the stage for a potential restriction related to a per-interpreter GIL.
This change includes the following:
* add tests for extension module subinterpreter compatibility
* add _PyInterpreterConfig.check_multi_interp_extensions
* add Py_RTFLAGS_MULTI_INTERP_EXTENSIONS
* add _PyImport_CheckSubinterpIncompatibleExtensionAllowed()
* fail iff the module does not implement multi-phase init and the current interpreter is configured to check
https://github.com/python/cpython/issues/98627
`warnings.warn()` gains the ability to skip stack frames based on code
filename prefix rather than only a numeric `stacklevel=` via a new
`skip_file_prefixes=` keyword argument.
builtins and extension module functions and methods that expect boolean values for parameters now accept any Python object rather than just a bool or int type. This is more consistent with how native Python code itself behaves.
Integer to and from text conversions via CPython's bignum `int` type is not safe against denial of service attacks due to malicious input. Very large input strings with hundred thousands of digits can consume several CPU seconds.
This PR comes fresh from a pile of work done in our private PSRT security response team repo.
Signed-off-by: Christian Heimes [Red Hat] <christian@python.org>
Tons-of-polishing-up-by: Gregory P. Smith [Google] <greg@krypto.org>
Reviews via the private PSRT repo via many others (see the NEWS entry in the PR).
<!-- gh-issue-number: gh-95778 -->
* Issue: gh-95778
<!-- /gh-issue-number -->
I wrote up [a one pager for the release managers](https://docs.google.com/document/d/1KjuF_aXlzPUxTK4BMgezGJ2Pn7uevfX7g0_mvgHlL7Y/edit#). Much of that text wound up in the Issue. Backports PRs already exist. See the issue for links.
⚠️⚠️ Note for reviewers, hackers and fellow systems/low-level/compiler engineers ⚠️⚠️
If you have a lot of experience with this kind of shenanigans and want to improve the **first** version, **please make a PR against my branch** or **reach out by email** or **suggest code changes directly on GitHub**.
If you have any **refinements or optimizations** please, wait until the first version is merged before starting hacking or proposing those so we can keep this PR productive.
* gh-93503: Add APIs to set profiling and tracing functions in all threads in the C-API
* Use a separate API
* Fix NEWS entry
* Add locks around the loop
* Document ignoring exceptions
* Use the new APIs in the sys module
* Update docs
We only statically initialize for core code and builtin modules. Extension modules still create
the tuple at runtime. We'll solve that part of interpreter isolation separately.
This change includes generated code. The non-generated changes are in:
* Tools/clinic/clinic.py
* Python/getargs.c
* Include/cpython/modsupport.h
* Makefile.pre.in (re-generate global strings after running clinic)
* very minor tweaks to Modules/_codecsmodule.c and Python/Python-tokenize.c
All other changes are generated code (clinic, global strings).
Add a closure keyword-only parameter to exec(). It can only be specified when exec-ing a code object that uses free variables. When specified, it must be a tuple, with exactly the number of cell variables referenced by the code object. closure has a default value of None, and it must be None if the code object doesn't refer to any free variables.
Currently frozen modules do not have __file__ set. In their spec, origin is set to "frozen" and they are marked as not having a location. (Similarly, for frozen packages __path__ is set to an empty list.) However, for frozen stdlib modules we are able to extrapolate __file__ as long as we can determine the stdlib directory at runtime. (We now do so since gh-28586.) Having __file__ set is helpful for a number of reasons. Likewise, having a non-empty __path__ means we can import submodules of a frozen package from the filesystem (e.g. we could partially freeze the encodings module).
This change sets __file__ (and adds to __path__) for frozen stdlib modules. It uses sys._stdlibdir (from gh-28586) and the frozen module alias information (from gh-28655). All that work is done in FrozenImporter (in Lib/importlib/_bootstrap.py).
Also, if a frozen module is imported before importlib is bootstrapped (during interpreter initialization) then we fix up that module and its spec during the importlib bootstrapping step (i.e. imporlib._bootstrap._setup()) to match what gets set by FrozenImporter, including setting the file info (if the stdlib dir is known). To facilitate this, modules imported using PyImport_ImportFrozenModule() have __origname__ set using the frozen module alias info. __origname__ is popped off during importlib bootstrap.
(To be clear, even with this change the new code to set __file__ during fixups in imporlib._bootstrap._setup() doesn't actually get triggered yet. This is because sys._stdlibdir hasn't been set yet in interpreter initialization at the point importlib is bootstrapped. However, we do fix up such modules at that point to otherwise match the result of importing through FrozenImporter, just not the __file__ and __path__ parts. Doing so will require changes in the order in which things happen during interpreter initialization. That can be addressed separately. Once it is, the file-related fixup code from this PR will kick in.)
Here are things this change does not do:
* set __file__ for non-stdlib modules (no way of knowing the parent dir)
* set __file__ if the stdlib dir is not known (nor assume the expense of finding it)
* relatedly, set __file__ if the stdlib is in a zip file
* verify that the filename set to __file__ actually exists (too expensive)
* update __path__ for frozen packages that alias a non-package (since there is no package dir)
Other things this change skips, but we may do later:
* set __file__ on modules imported using PyImport_ImportFrozenModule()
* set co_filename when we unmarshal the frozen code object while importing the module (e.g. in FrozenImporter.exec_module()) -- this would allow tracebacks to show source lines
* implement FrozenImporter.get_filename() and FrozenImporter.get_source()
https://bugs.python.org/issue21736
In the list of generated frozen modules at the top of Tools/scripts/freeze_modules.py, you will find that some of the modules have a different name than the module (or .py file) that is actually frozen. Let's call each case an "alias". Aliases do not come into play until we get to the (generated) list of modules in Python/frozen.c. (The tool for freezing modules, Programs/_freeze_module, is only concerned with the source file, not the module it will be used for.)
Knowledge of which frozen modules are aliases (and the identity of the original module) normally isn't important. However, this information is valuable when we go to set __file__ on frozen stdlib modules. This change updates Tools/scripts/freeze_modules.py to map aliases to the original module name (or None if not a stdlib module) in Python/frozen.c. We also add a helper function in Python/import.c to look up a frozen module's alias and add the result of that function to the frozen info returned from find_frozen().
https://bugs.python.org/issue45020
Before this change we end up duplicating effort and throwing away data in FrozenImporter.find_spec(). Now we do the work once in find_spec() and the only thing we do in FrozenImporter.exec_module() is turn the raw frozen data into a code object and then exec it.
We've added _imp.find_frozen(), add an arg to _imp.get_frozen_object(), and updated FrozenImporter. We've also moved some code around to reduce duplication, get a little more consistency in outcomes, and be more efficient.
Note that this change is mostly necessary if we want to set __file__ on frozen stdlib modules. (See https://bugs.python.org/issue21736.)
https://bugs.python.org/issue45324
Currently we freeze several modules into the runtime. For each of these modules it is essential to bootstrapping the runtime that they be frozen. Any other stdlib module that we later freeze into the runtime is not essential. We can just as well import from the .py file. This PR lets users explicitly choose which should be used, with the new "-X frozen_modules=[on|off]" CLI flag. The default is "off" for now.
https://bugs.python.org/issue45020
There are a few things I missed in gh-27980. This is a follow-up that will make subsequent PRs cleaner. It includes fixes to tests and tools that reference the frozen modules.
https://bugs.python.org/issue45019
* Add co_firstinstr field to code object.
* Implement barebones quickening.
* Use non-quickened bytecode when tracing.
* Add NEWS item
* Add new file to Windows build.
* Don't specialize instructions with EXTENDED_ARG.
This adds a new function named sys._current_exceptions() which is equivalent ot
sys._current_frames() except that it returns the exceptions currently handled
by other threads. It is equivalent to calling sys.exc_info() for each running
thread.
Previously, the result could have been an instance of a subclass of int.
Also revert bpo-26202 and make attributes start, stop and step of the range
object having exact type int.
Add private function _PyNumber_Index() which preserves the old behavior
of PyNumber_Index() for performance to use it in the conversion functions
like PyLong_AsLong().
In ArgumentClinic, value "NULL" should now be used only for unrepresentable default values
(like in the optional third parameter of getattr). "None" should be used if None is accepted
as argument and passing None has the same effect as not passing the argument at all.