- io.TextIOWrapper (and hence the open() builtin) now use the
internal codec marking system added for issue #19619
- also tweaked the C code to only look up the encoding once,
rather than multiple times
- the existing output type checks remain in place to deal with
unmarked third party codecs.
annotate text signatures in docstrings, resulting in fewer false
positives. "self" parameters are also explicitly marked, allowing
inspect.Signature() to authoritatively detect (and skip) said parameters.
Issue #20326: Argument Clinic now generates separate checksums for the
input and output sections of the block, allowing external tools to verify
that the input has not changed (and thus the output is not out-of-date).
PyMethodDescr_Type, _PyMethodWrapper_Type, and PyWrapperDescr_Type)
have been modified to provide introspection information for builtins.
Also: many additional Lib, test suite, and Argument Clinic fixes.
crash when a generator is created in a C thread that is destroyed while the
generator is still used. The issue was that a generator contains a frame, and
the frame kept a reference to the Python state of the destroyed C thread. The
crash occurs when a trace function is setup.
str.encode, bytes.decode and bytearray.decode now use an
internal API to throw LookupError for known non-text encodings,
rather than attempting the encoding or decoding operation and
then throwing a TypeError for an unexpected output type.
The latter mechanism remains in place for third party non-text
encodings.
- output type errors now redirect users to the type-neutral
convenience functions in the codecs module
- stateless errors that occur during encoding and decoding
will now be automatically wrapped in exceptions that give
the name of the codec involved
_PyUnicode_CompareWithId() is faster than PyUnicode_CompareWithASCIIString()
when both strings are equal and interned.
Add also _PyId_builtins identifier for "builtins" common string.
instead of creating temporary Unicode string objects
Add also more identifiers in pythonrun.c to avoid temporary Unicode string
objets for the interactive interpreter.
- don't call PyErr_NoMemory with interpreter is not initialised
- note that it's OK to call _PyMem_RawStrDup here
- don't include this in the limited API
- capitalise "IO"
- be explicit that a non-zero return indicates an error
- include versionadded marker in docs
This new pre-initialization API allows embedding
applications like Blender to force a particular
encoding and error handler for the standard IO streams.
Also refactors Modules/_testembed.c to let us start
testing multiple embedding scenarios.
(Initial patch by Bastien Montagne)
The setobject freelist was consuming memory but not providing much value.
Even when a freelisted setobject was available, most of the setobject
fields still needed to be initialized and the small table still required
a memset(). This meant that the custom freelisting scheme for sets was
providing almost no incremental benefit over the default Python freelist
scheme used by _PyObject_Malloc() in Objects/obmalloc.c.
-I
Run Python in isolated mode. This also implies -E and -s. In isolated mode
sys.path contains neither the script’s directory nor the user’s
site-packages directory. All PYTHON* environment variables are ignored,
too. Further restrictions may be imposed to prevent the user from
injecting malicious code.
PyStructSequence_InitType() except that it has a return value (0 on success,
-1 on error).
* PyStructSequence_InitType2() now raises MemoryError on memory allocation failure
* Fix also some calls to PyDict_SetItemString(): handle error
Add new enum:
* PyMemAllocatorDomain
Add new structures:
* PyMemAllocator
* PyObjectArenaAllocator
Add new functions:
* PyMem_RawMalloc(), PyMem_RawRealloc(), PyMem_RawFree()
* PyMem_GetAllocator(), PyMem_SetAllocator()
* PyObject_GetArenaAllocator(), PyObject_SetArenaAllocator()
* PyMem_SetupDebugHooks()
Changes:
* PyMem_Malloc()/PyObject_Realloc() now always call malloc()/realloc(), instead
of calling PyObject_Malloc()/PyObject_Realloc() in debug mode.
* PyObject_Malloc()/PyObject_Realloc() now falls back to
PyMem_Malloc()/PyMem_Realloc() for allocations larger than 512 bytes.
* Redesign debug checks on memory block allocators as hooks, instead of using C
macros
* Add a new PyMemAllocators structure
* New functions:
- PyMem_RawMalloc(), PyMem_RawRealloc(), PyMem_RawFree(): GIL-free memory
allocator functions
- PyMem_GetRawAllocators(), PyMem_SetRawAllocators()
- PyMem_GetAllocators(), PyMem_SetAllocators()
- PyMem_SetupDebugHooks()
- _PyObject_GetArenaAllocators(), _PyObject_SetArenaAllocators()
* Add unit test for PyMem_Malloc(0) and PyObject_Malloc(0)
* Add unit test for new get/set allocators functions
* PyObject_Malloc() now falls back on PyMem_Malloc() instead of malloc() if
size is bigger than SMALL_REQUEST_THRESHOLD, and PyObject_Realloc() falls
back on PyMem_Realloc() instead of realloc()
* PyMem_Malloc() and PyMem_Realloc() now always call malloc() and realloc(),
instead of calling PyObject_Malloc() and PyObject_Realloc() in debug mode
Forgot to raise ModuleNotFoundError when None is found in sys.modules.
This led to introducing the C function PyErr_SetImportErrorSubclass()
to make setting ModuleNotFoundError easier.
Also updated the reference docs to mention ModuleNotFoundError
appropriately. Updated the docs for ModuleNotFoundError to mention the
None in sys.modules case.
Lastly, it was noticed that PyErr_SetImportError() was not setting an
exception when returning None in one case. That issue is now fixed.
ImportError.
The exception is raised by import when a module could not be found.
Technically this is defined as no viable loader could be found for the
specified module. This includes ``from ... import`` statements so that
the module usage is consistent for all situations where import
couldn't find what was requested.
This should allow for the common idiom of::
try:
import something
except ImportError:
pass
to be updated to using ModuleNotFoundError and not accidentally mask
ImportError messages that should propagate (e.g. issues with a
loader).
This work was driven by the fact that the ``from ... import``
statement needed to be able to tell the difference between an
ImportError that simply couldn't find a module (and thus silence the
exception so that ceval can raise it) and an ImportError that
represented an actual problem.
Note that this is a potentially disruptive change since it may
release some system resources which would otherwise remain
perpetually alive (e.g. database connections kept in thread-local
storage).
* Add also min_char attribute to _PyUnicodeWriter structure (currently unused)
* _PyUnicodeWriter_Init() has no more argument (except the writer itself):
min_length and overallocate must be set explicitly
* In error handlers, only enable overallocation if the replacement string
is longer than 1 character
* CJK decoders don't use overallocation anymore
* Set min_length, instead of preallocating memory using
_PyUnicodeWriter_Prepare(), in many decoders
* _PyUnicode_DecodeUnicodeInternal() checks for integer overflow
Write a function to enable more optimizations:
* If the substring is the whole string and overallocation is disabled, just
keep a reference to the string, don't copy characters
* Avoid a call to the expensive _PyUnicode_FindMaxChar() function when
possible
computation as the overflow behavior of signed integers is undefined.
NOTE: This change is smaller compared to 3.2 as much of this cleanup had
already been done. I added the comment that my change in 3.2 added so that the
code would match up. Otherwise this just adds or synchronizes appropriate UL
designations on some constants to be pedantic.
In practice we require compiling everything with -fwrapv which forces overflow
to be defined as twos compliment but this keeps the code cleaner for checkers
or in the case where someone has compiled it without -fwrapv or their
compiler's equivalent. We could work to get rid of the -fwrapv requirement
in 3.4 but that requires more planning.
Found by Clang trunk's Undefined Behavior Sanitizer (UBSan).
Cleanup only - no functionality or hash values change.
computation as the overflow behavior of signed integers is undefined.
NOTE: This change is smaller compared to 3.2 as much of this cleanup had
already been done. I added the comment that my change in 3.2 added so that the
code would match up. Otherwise this just adds or synchronizes appropriate UL
designations on some constants to be pedantic.
In practice we require compiling everything with -fwrapv which forces overflow
to be defined as twos compliment but this keeps the code cleaner for checkers
or in the case where someone has compiled it without -fwrapv or their
compiler's equivalent.
Found by Clang trunk's Undefined Behavior Sanitizer (UBSan).
Cleanup only - no functionality or hash values change.
computation as the overflow behavior of signed integers is undefined.
In practice we require compiling everything with -fwrapv which forces overflow
to be defined as twos compliment but this keeps the code cleaner for checkers
or in the case where someone has compiled it without -fwrapv or their
compiler's equivalent.
Found by Clang trunk's Undefined Behavior Sanitizer (UBSan).
Cleanup only - no functionality or hash values change.
longer required as of Python 2.5+ when the gc_refs changed from an int (4
bytes) to a Py_ssize_t (8 bytes) as the minimum size is 16 bytes.
The use of a 'long double' triggered a warning by Clang trunk's
Undefined-Behavior Sanitizer as on many platforms a long double requires
16-byte alignment but the Python memory allocator only guarantees 8 byte
alignment.
So our code would allocate and use these structures with technically improper
alignment. Though it didn't matter since the 'dummy' field is never used.
This silences that warning.
Spelunking into code history, the double was added in 2001 to force better
alignment on some platforms and changed to a long double in 2002 to appease
Tru64. That issue should no loner be present since the upgrade from int to
Py_ssize_t where the minimum structure size increased to 16 (unless anyone
knows of a platform where ssize_t is 4 bytes?) or 24 bytes depending on if the
build uses 4 or 8 byte pointers.
We can probably get rid of the double and this union hack all together today.
That is a slightly more invasive change that can be left for later.
A more correct non-hacky alternative if any alignment issues are still found
would be to use a compiler specific alignment declaration on the structure and
determine which value to use at configure time.
ASCII/surrogateescape codec is now used, instead of the locale encoding, to
decode the command line arguments. This change fixes inconsistencies with
os.fsencode() and os.fsdecode() because these operating systems announces an
ASCII locale encoding, whereas the ISO-8859-1 encoding is used in practice.
Previously, excessive nesting in expressions would blow the
stack and segfault the interpreter. Now, a hard limit based
on the configured recursion limit and a hardcoded scaling
factor is applied.
... (unsigned long and unsigned int) to avoid an undefined behaviour with
Py_TPFLAGS_TYPE_SUBCLASS ((1 << 31). PyType_GetFlags() result type is now
unsigned too (unsigned long, instead of long).
* Simplify the code: replace 4 steps with one unique step using the
_PyUnicodeWriter API. PyUnicode_Format() has the same design. It avoids to
store intermediate results which require to allocate an array of pointers on
the heap.
* Use the _PyUnicodeWriter API for speed (and its convinient API):
overallocate the buffer to reduce the number of "realloc()"
* Implement "width" and "precision" in Python, don't rely on sprintf(). It
avoids to need of a temporary buffer allocated on the heap: only use a small
buffer allocated in the stack.
* Add _PyUnicodeWriter_WriteCstr() function
* Split PyUnicode_FromFormatV() into two functions: add
unicode_fromformat_arg().
* Inline parse_format_flags(): the format of an argument is now only parsed
once, it's no more needed to have a subfunction.
* Optimize PyUnicode_FromFormatV() for characters between two "%" arguments:
search the next "%" and copy the substring in one chunk, instead of copying
character per character.