* Split _Py_InitializeCore_impl() into subfunctions: add multiple pycore_init_xxx() functions
* Preliminary sys.stderr is now set earlier to get an usable
sys.stderr ealier.
* Move code into _Py_Initialize_ReconfigureCore() to be able to call
it from _Py_InitializeCore().
* Split _PyExc_Init(): create a new _PyBuiltins_AddExceptions()
function.
* Call _PyExc_Init() earlier in _Py_InitializeCore_impl()
and new_interpreter() to get working exceptions earlier.
* _Py_ReadyTypes() now returns _PyInitError rather than calling
Py_FatalError().
* Misc code cleanup
* The PyByteArray_Init() and PyByteArray_Fini() functions have been
removed. They did nothing since Python 2.7.4 and Python 3.2.0, were
excluded from the limited API (stable ABI), and were not
documented.
* Move "_PyXXX_Init()" and "_PyXXX_Fini()" declarations from
Include/cpython/pylifecycle.h to
Include/internal/pycore_pylifecycle.h. Replace
"PyAPI_FUNC(TYPE)" with "extern TYPE".
* _PyExc_Init() now returns an error on failure rather than calling
Py_FatalError(). Move macros inside _PyExc_Init() and undefine them
when done. Rewrite macros to make them look more like statement:
add ";" when using them, add "do { ... } while (0)".
* _PyUnicode_Init() now returns a _PyInitError error rather than call
Py_FatalError().
* Move stdin check from _PySys_BeginInit() to init_sys_streams().
* _Py_ReadyTypes() now returns a _PyInitError error rather than
calling Py_FatalError().
Format characters "%s" and "%V" in PyUnicode_FromFormat() and "%s" in PyBytes_FromFormat()
no longer read memory past the limit if precision is specified.
Fix an assertion error in format() in debug build for floating point
formatting with "n" format, zero padding and small width. Release build is
not impacted. Patch by Karthikeyan Singaravelan.
Fix memory leak in PyUnicode_EncodeLocale() and
PyUnicode_EncodeFSDefault() on error handling.
Changes:
* Fix unicode_encode_locale() error handling
* Fix test_codecs.LocaleCodecTest
Fix invalid function cast warnings with gcc 8
for method conventions different from METH_NOARGS, METH_O and
METH_VARARGS excluding Argument Clinic generated code.
Fix str.format(), float.__format__() and complex.__format__() methods
for non-ASCII decimal point when using the "n" formatter.
Changes:
* Rewrite _PyUnicode_InsertThousandsGrouping(): it now requires
a _PyUnicodeWriter object for the buffer and a Python str object
for digits.
* Rename FILL() macro to unicode_fill(), convert it to static inline function,
add "assert(0 <= start);" and rework its code.
Discovered using clang's MemorySanitizer when it ran python3's
test_fstring test_misformed_unicode_character_name.
An msan build will fail by simply executing: ./python -c 'u"\N"'
Use _PyObject_ASSERT() in:
* _PyDict_CheckConsistency()
* _PyType_CheckConsistency()
* _PyUnicode_CheckConsistency()
_PyObject_ASSERT() dumps the faulty object if the assertion fails
to help debugging.
* Add %T format to PyUnicode_FromFormatV(), and so to
PyUnicode_FromFormat() and PyErr_Format(), to format an object type
name: equivalent to "%s" with Py_TYPE(obj)->tp_name.
* Replace Py_TYPE(obj)->tp_name with %T format in unicodeobject.c.
* Add unit test on %T format.
* Rename unicode_fromformat_write_cstr() to
unicode_fromformat_write_utf8(), to make the intent more explicit.
Add support for the "surrogatepass" error handler in
PyUnicode_DecodeFSDefault() and PyUnicode_EncodeFSDefault()
for the UTF-8 encoding.
Changes:
* _Py_DecodeUTF8Ex() and _Py_EncodeUTF8Ex() now support the
surrogatepass error handler (_Py_ERROR_SURROGATEPASS).
* _Py_DecodeLocaleEx() and _Py_EncodeLocaleEx() now use
the _Py_error_handler enum instead of "int surrogateescape" to pass
the error handler. These functions now return -3 if the error
handler is unknown.
* Add unit tests on _Py_DecodeLocaleEx() and _Py_EncodeLocaleEx()
in test_codecs.
* Rename get_error_handler() to _Py_GetErrorHandler() and expose it
as a private function.
* _freeze_importlib doesn't need config.filesystem_errors="strict"
workaround anymore.
_PyCoreConfig_Read() is now responsible to choose the filesystem
encoding and error handler. Using Py_Main(), the encoding is now
chosen even before calling Py_Initialize().
_PyCoreConfig.filesystem_encoding is now the reference, instead of
Py_FileSystemDefaultEncoding, for the Python filesystem encoding.
Changes:
* Add filesystem_encoding and filesystem_errors to _PyCoreConfig
* _PyCoreConfig_Read() now reads the locale encoding for the file
system encoding.
* PyUnicode_EncodeFSDefault() and PyUnicode_DecodeFSDefaultAndSize()
now use the interpreter configuration rather than
Py_FileSystemDefaultEncoding and Py_FileSystemDefaultEncodeErrors
global configuration variables.
* Add _Py_SetFileSystemEncoding() and _Py_ClearFileSystemEncoding()
private functions to only modify Py_FileSystemDefaultEncoding and
Py_FileSystemDefaultEncodeErrors in coreconfig.c.
* _Py_CoerceLegacyLocale() now takes an int rather than
_PyCoreConfig for the warning.
`_PyUnicode_TransformDecimalAndSpaceToASCII()` missed trailing NUL char.
It caused buffer overflow in `_Py_string_to_number_with_underscores()`.
This bug is introduced in 9b6c60cb.
METH_NOARGS functions need only a single argument but they are cast
into a PyCFunction, which takes two arguments. This triggers an
invalid function cast warning in gcc8 due to the argument mismatch.
Fix this by adding a dummy unused argument.
Modify locale.localeconv(), time.tzname, os.strerror() and other
functions to ignore the UTF-8 Mode: always use the current locale
encoding.
Changes:
* Add _Py_DecodeLocaleEx() and _Py_EncodeLocaleEx(). On decoding or
encoding error, they return the position of the error and an error
message which are used to raise Unicode errors in
PyUnicode_DecodeLocale() and PyUnicode_EncodeLocale().
* Replace _Py_DecodeCurrentLocale() with _Py_DecodeLocaleEx().
* PyUnicode_DecodeLocale() now uses _Py_DecodeLocaleEx() for all
cases, especially for the strict error handler.
* Add _Py_DecodeUTF8Ex(): return more information on decoding error
and supports the strict error handler.
* Rename _Py_EncodeUTF8_surrogateescape() to _Py_EncodeUTF8Ex().
* Replace _Py_EncodeCurrentLocale() with _Py_EncodeLocaleEx().
* Ignore the UTF-8 mode to encode/decode localeconv(), strerror()
and time zone name.
* Remove PyUnicode_DecodeLocale(), PyUnicode_DecodeLocaleAndSize()
and PyUnicode_EncodeLocale() now ignore the UTF-8 mode: always use
the "current" locale.
* Remove _PyUnicode_DecodeCurrentLocale(),
_PyUnicode_DecodeCurrentLocaleAndSize() and
_PyUnicode_EncodeCurrentLocale().
Add new fuctions ignoring the UTF-8 mode:
* _Py_DecodeCurrentLocale()
* _Py_EncodeCurrentLocale()
* _PyUnicode_DecodeCurrentLocaleAndSize()
* _PyUnicode_EncodeCurrentLocale()
Modify the readline module to use these functions.
Re-enable test_readline.test_nonascii().
Replace Py_EncodeLocale() with _Py_EncodeLocaleRaw() in:
* _Py_wfopen()
* _Py_wreadlink()
* _Py_wrealpath()
* _Py_wstat()
* pymain_open_filename()
These functions are called early during Python intialization, only
the RAW memory allocator must be used.
Py_EncodeLocale() now uses _Py_EncodeUTF8_surrogateescape(), instead
of using temporary unicode and bytes objects. So Py_EncodeLocale()
doesn't use the Python C API anymore.
* Add -X utf8 command line option, PYTHONUTF8 environment variable
and a new sys.flags.utf8_mode flag.
* If the LC_CTYPE locale is "C" at startup: enable automatically the
UTF-8 mode.
* Add _winapi.GetACP(). encodings._alias_mbcs() now calls
_winapi.GetACP() to get the ANSI code page
* locale.getpreferredencoding() now returns 'UTF-8' in the UTF-8
mode. As a side effect, open() now uses the UTF-8 encoding by
default in this mode.
* Py_DecodeLocale() and Py_EncodeLocale() now use the UTF-8 encoding
in the UTF-8 Mode.
* Update subprocess._args_from_interpreter_flags() to handle -X utf8
* Skip some tests relying on the current locale if the UTF-8 mode is
enabled.
* Add test_utf8mode.py.
* _Py_DecodeUTF8_surrogateescape() gets a new optional parameter to
return also the length (number of wide characters).
* pymain_get_global_config() and pymain_set_global_config() now
always copy flag values, rather than only copying if the new value
is greater than the old value.