builtins and extension module functions and methods that expect boolean values for parameters now accept any Python object rather than just a bool or int type. This is more consistent with how native Python code itself behaves.
The implementation of __sizeof__() methods using _PyObject_SIZE() now
use an unsigned type (size_t) to compute the size, rather than a signed
type (Py_ssize_t).
Cast explicitly signed (Py_ssize_t) values to unsigned type
(Py_ssize_t).
Fix potential race condition in code patterns:
* Replace "Py_DECREF(var); var = new;" with "Py_SETREF(var, new);"
* Replace "Py_XDECREF(var); var = new;" with "Py_XSETREF(var, new);"
* Replace "Py_CLEAR(var); var = new;" with "Py_XSETREF(var, new);"
Other changes:
* Replace "old = var; var = new; Py_DECREF(var)"
with "Py_SETREF(var, new);"
* Replace "old = var; var = new; Py_XDECREF(var)"
with "Py_XSETREF(var, new);"
* And remove the "old" variable.
We only statically initialize for core code and builtin modules. Extension modules still create
the tuple at runtime. We'll solve that part of interpreter isolation separately.
This change includes generated code. The non-generated changes are in:
* Tools/clinic/clinic.py
* Python/getargs.c
* Include/cpython/modsupport.h
* Makefile.pre.in (re-generate global strings after running clinic)
* very minor tweaks to Modules/_codecsmodule.c and Python/Python-tokenize.c
All other changes are generated code (clinic, global strings).
`TextIOWrapper.__init__()` called `os.device_encoding(file.fileno())` if fileno is 0-2 and encoding=None.
But it is very rarely works, and never documented behavior.
We're no longer using _Py_IDENTIFIER() (or _Py_static_string()) in any core CPython code. It is still used in a number of non-builtin stdlib modules.
The replacement is: PyUnicodeObject (not pointer) fields under _PyRuntimeState, statically initialized as part of _PyRuntime. A new _Py_GET_GLOBAL_IDENTIFIER() macro facilitates lookup of the fields (along with _Py_GET_GLOBAL_STRING() for non-identifier strings).
https://bugs.python.org/issue46541#msg411799 explains the rationale for this change.
The core of the change is in:
* (new) Include/internal/pycore_global_strings.h - the declarations for the global strings, along with the macros
* Include/internal/pycore_runtime_init.h - added the static initializers for the global strings
* Include/internal/pycore_global_objects.h - where the struct in pycore_global_strings.h is hooked into _PyRuntimeState
* Tools/scripts/generate_global_objects.py - added generation of the global string declarations and static initializers
I've also added a --check flag to generate_global_objects.py (along with make check-global-objects) to check for unused global strings. That check is added to the PR CI config.
The remainder of this change updates the core code to use _Py_GET_GLOBAL_IDENTIFIER() instead of _Py_IDENTIFIER() and the related _Py*Id functions (likewise for _Py_GET_GLOBAL_STRING() instead of _Py_static_string()). This includes adding a few functions where there wasn't already an alternative to _Py*Id(), replacing the _Py_Identifier * parameter with PyObject *.
The following are not changed (yet):
* stop using _Py_IDENTIFIER() in the stdlib modules
* (maybe) get rid of _Py_IDENTIFIER(), etc. entirely -- this may not be doable as at least one package on PyPI using this (private) API
* (maybe) intern the strings during runtime init
https://bugs.python.org/issue46541
* Move _PyObject_CallNoArgs() to pycore_call.h (internal C API).
* _ssl, _sqlite and _testcapi extensions now call the public
PyObject_CallNoArgs() function, rather than _PyObject_CallNoArgs().
* _lsprof extension is now built with Py_BUILD_CORE_MODULE macro
defined to get access to internal _PyObject_CallNoArgs().
Fix typo in the private _PyObject_CallNoArg() function name: rename
it to _PyObject_CallNoArgs() to be consistent with the public
function PyObject_CallNoArgs().
* Constructors of subclasses of some buitin classes (e.g. tuple, list,
frozenset) no longer accept arbitrary keyword arguments.
* Subclass of set can now define a __new__() method with additional
keyword parameters without overriding also __init__().
open(), io.open(), codecs.open() and fileinput.FileInput no longer
accept "U" ("universal newline") in the file mode. This flag was
deprecated since Python 3.3.
This works by not caching the handle and instead getting the handle from
the file descriptor each time, so that if the actual handle changes by
fd redirection closing/opening the console handle beneath our feet, we
will keep working correctly.
See [PEP 597](https://www.python.org/dev/peps/pep-0597/).
* Add `-X warn_default_encoding` and `PYTHONWARNDEFAULTENCODING`.
* Add EncodingWarning
* Add io.text_encoding()
* open(), TextIOWrapper() emits EncodingWarning when encoding is omitted and warn_default_encoding is enabled.
* _pyio.TextIOWrapper() uses UTF-8 as fallback default encoding used when failed to import locale module. (used during building Python)
* bz2, configparser, gzip, lzma, pathlib, tempfile modules use io.text_encoding().
* What's new entry
When very large data remains in TextIOWrapper, flush() may fail forever.
So prevent that data larger than chunk_size is remained in TextIOWrapper internal
buffer.
Co-Authored-By: Eryk Sun
* Rename _Py_GetLocaleEncoding() to _Py_GetLocaleEncodingObject()
* Add _Py_GetLocaleEncoding() which returns a wchar_t* string to
share code between _Py_GetLocaleEncodingObject()
and config_get_locale_encoding().
* _Py_GetLocaleEncodingObject() now decodes nl_langinfo(CODESET)
from the current locale encoding with surrogateescape,
rather than using UTF-8.
_io.TextIOWrapper no longer calls getpreferredencoding(False) of
_bootlocale to get the locale encoding, but calls
_Py_GetLocaleEncoding() instead.
Add config_get_fs_encoding() sub-function. Reorganize also
config_get_locale_encoding() code.
Use _PyLong_GetZero() and _PyLong_GetOne() in Modules/ directory.
_cursesmodule.c and zoneinfo.c are now built with
Py_BUILD_CORE_MODULE macro defined.
Use _PyType_HasFeature() in the _io module and in structseq
implementation. Replace PyType_HasFeature() opaque function call with
_PyType_HasFeature() inlined function.
Previously, the result could have been an instance of a subclass of int.
Also revert bpo-26202 and make attributes start, stop and step of the range
object having exact type int.
Add private function _PyNumber_Index() which preserves the old behavior
of PyNumber_Index() for performance to use it in the conversion functions
like PyLong_AsLong().
Move PyInterpreterState.fs_codec into a new
PyInterpreterState.unicode structure.
Give a name to the fs_codec structure and use this structure in
unicodeobject.c.
Rename _PyInterpreterState_GET_UNSAFE() to _PyInterpreterState_GET()
for consistency with _PyThreadState_GET() and to have a shorter name
(help to fit into 80 columns).
Add also "assert(tstate != NULL);" to the function.
Don't access PyInterpreterState.config member directly anymore, but
use new functions:
* _PyInterpreterState_GetConfig()
* _PyInterpreterState_SetConfig()
* _Py_GetConfig()
Update _asyncio, _bz2, _csv, _curses, _datetime,
_io, _operator, _pickle, _queue, blake2,
multibytecodec and overlapped C extension modules
to use PyModule_AddType().
The truncate() method of io.BufferedReader() should raise
UnsupportedOperation when it is called on a read-only
io.BufferedReader() instance.
https://bugs.python.org/issue35950
Automerge-Triggered-By: @methane
gcc -Wcast-qual turns up a number of instances of casting away constness of pointers. Some of these can be safely modified, by either:
Adding the const to the type cast, as in:
- return _PyUnicode_FromUCS1((unsigned char*)s, size);
+ return _PyUnicode_FromUCS1((const unsigned char*)s, size);
or, Removing the cast entirely, because it's not necessary (but probably was at one time), as in:
- PyDTrace_FUNCTION_ENTRY((char *)filename, (char *)funcname, lineno);
+ PyDTrace_FUNCTION_ENTRY(filename, funcname, lineno);
These changes will not change code, but they will make it much easier to check for errors in consts
The bulk of this patch was generated automatically with:
for name in \
PyObject_Vectorcall \
Py_TPFLAGS_HAVE_VECTORCALL \
PyObject_VectorcallMethod \
PyVectorcall_Function \
PyObject_CallOneArg \
PyObject_CallMethodNoArgs \
PyObject_CallMethodOneArg \
;
do
echo $name
git grep -lwz _$name | xargs -0 sed -i "s/\b_$name\b/$name/g"
done
old=_PyObject_FastCallDict
new=PyObject_VectorcallDict
git grep -lwz $old | xargs -0 sed -i "s/\b$old\b/$new/g"
and then cleaned up:
- Revert changes to in docs & news
- Revert changes to backcompat defines in headers
- Nudge misaligned comments
When called on a closed object, readinto() segfaults on account
of a write to a freed buffer:
==220553== Process terminating with default action of signal 11 (SIGSEGV): dumping core
==220553== Access not within mapped region at address 0x2A
==220553== at 0x48408A0: memmove (vg_replace_strmem.c:1272)
==220553== by 0x58DB0C: _buffered_readinto_generic (bufferedio.c:972)
==220553== by 0x58DCBA: _io__Buffered_readinto_impl (bufferedio.c:1053)
==220553== by 0x58DCBA: _io__Buffered_readinto (bufferedio.c.h:253)
Reproducer:
reader = open ("/dev/zero", "rb")
_void = reader.read (42)
reader.close ()
reader.readinto (bytearray (42)) ### BANG!
The problem exists since 2012 when commit dc469454ec added code
to free the read buffer on close().
Signed-off-by: Philipp Gesang <philipp.gesang@intra2net.com>
open(), io.open(), codecs.open() and fileinput.FileInput no longer
accept "U" ("universal newline") in the file mode. This flag was
deprecated since Python 3.3.
In ArgumentClinic, value "NULL" should now be used only for unrepresentable default values
(like in the optional third parameter of getattr). "None" should be used if None is accepted
as argument and passing None has the same effect as not passing the argument at all.
* Use the 'p' format unit instead of manually called PyObject_IsTrue().
* Pass boolean value instead 0/1 integers to functions that needs boolean.
* Convert some arguments to boolean only once.
In development mode and in debug build, encoding and errors arguments
are now checked on string encoding and decoding operations. Examples:
open(), str.encode() and bytes.decode().
By default, for best performances, the errors argument is only
checked at the first encoding/decoding error, and the encoding
argument is sometimes ignored for empty strings.
It is now allowed to add new fields at the end of the PyTypeObject struct without having to allocate a dedicated compatibility flag in tp_flags.
This will reduce the risk of running out of bits in the 32-bit tp_flags value.
In development mode (-X dev) and in debug build, the io.IOBase
destructor now logs close() exceptions. These exceptions are silent
by default in release mode.
The previous code hardcoded `SEEK_SET`, etc. While it's very unlikely
that these values will change, it's best to use the definitions to avoid
there being mismatches in behavior with the code in the future.
Signed-off-by: Enji Cooper <yaneurabeya@gmail.com>
Use _PyArg_CheckPositional() and inlined code instead of
PyArg_UnpackTuple() and _PyArg_UnpackStack() if all parameters
are positional and use the "object" converter.
Fix invalid function cast warnings with gcc 8
for method conventions different from METH_NOARGS, METH_O and
METH_VARARGS in Argument Clinic generated code.
Two kind of mistakes:
1. Missed space. After concatenating there is no space between words.
2. Missed comma. Causes unintentional concatenating in a list of strings.
The accu.h header is no longer part of the Python C API: it has been
moved to the "internal" headers which are restricted to Python
itself.
Replace #include "accu.h" with #include "pycore_accu.h".
If buffering=1 is specified for open() in binary mode, it is silently
treated as buffering=-1 (i.e., the default buffer size).
Coupled with the fact that line buffering is always supported in Python 2,
such behavior caused several issues (e.g., bpo-10344, bpo-21332).
Warn that line buffering is not supported if open() is called with
binary mode and buffering=1.
They can be exposed when some C API calls fail due to lack of
memory.
* Failed Py_BuildValue() could cause an assertion error in the
following TextIOWrapper.tell().
* input_chunk could be decrefed twice in TextIOWrapper.seek()
after failed Py_BuildValue().
* initvalue could leak in StringIO.__getstate__() after failed
PyDict_Copy().
METH_NOARGS functions need only a single argument but they are cast
into a PyCFunction, which takes two arguments. This triggers an
invalid function cast warning in gcc8 due to the argument mismatch.
Fix this by adding a dummy unused argument.
In _io_FileIO_readall_impl(), lseek() and _Py_fstat_noraise() were called
without releasing the GIL. This can cause all threads to hang for
unlimited time when calling FileIO.read() and the NFS server is not
accessible.
kB (*kilo* byte) unit means 1000 bytes, whereas KiB ("kibibyte")
means 1024 bytes. KB was misused: replace kB or KB with KiB when
appropriate.
Same change for MB and GB which become MiB and GiB.
Change the output of Tools/iobench/iobench.py.
Round also the size of the documentation from 5.5 MB to 5 MiB.
* Maintain a list of BufferedWriter objects. Flush them on exit.
In Python 3, the buffer and the underlying file object are separate
and so the order in which objects are finalized matters. This is
unlike Python 2 where the file and buffer were a single object and
finalization was done for both at the same time. In Python 3, if
the file is finalized and closed before the buffer then the data in
the buffer is lost.
This change adds a doubly linked list of open file buffers. An atexit
hook ensures they are flushed before proceeding with interpreter
shutdown. This is addition does not remove the need to properly close
files as there are other reasons why buffered data could get lost during
finalization.
Initial patch by Armin Rigo.
* Use weakref.WeakSet instead of WeakKeyDictionary.
* Simplify buffered double-linked list types.
* In _flush_all_writers(), suppress errors from flush().
* Remove NEWS entry, use blurb.
* Take more care when flushing file buffers from atexit.
The previous implementation was not careful enough to avoid
causing issues in multi-threaded cases. Check for buf->ok
and buf->finalizing before actually doing the flush. Also,
increase the refcnt to ensure the object does not disappear.
* group the (stateful) runtime globals into various topical structs
* consolidate the topical structs under a single top-level _PyRuntimeState struct
* add a check-c-globals.py script that helps identify runtime globals
Other globals are excluded (see globals.txt and check-c-globals.py).
* group the (stateful) runtime globals into various topical structs
* consolidate the topical structs under a single top-level _PyRuntimeState struct
* add a check-c-globals.py script that helps identify runtime globals
Other globals are excluded (see globals.txt and check-c-globals.py).
* Maintain a list of BufferedWriter objects. Flush them on exit.
In Python 3, the buffer and the underlying file object are separate
and so the order in which objects are finalized matters. This is
unlike Python 2 where the file and buffer were a single object and
finalization was done for both at the same time. In Python 3, if
the file is finalized and closed before the buffer then the data in
the buffer is lost.
This change adds a doubly linked list of open file buffers. An atexit
hook ensures they are flushed before proceeding with interpreter
shutdown. This is addition does not remove the need to properly close
files as there are other reasons why buffered data could get lost during
finalization.
Initial patch by Armin Rigo.
* Use weakref.WeakSet instead of WeakKeyDictionary.
* Simplify buffered double-linked list types.
* In _flush_all_writers(), suppress errors from flush().
* Remove NEWS entry, use blurb.