bpo-36443, bpo-36202: Since Python 3.7.0, calling Py_DecodeLocale()
before Py_Initialize() produces mojibake if the LC_CTYPE locale is
coerced and/or if the UTF-8 Mode is enabled by the user
configuration. This change fix the issue by disabling LC_CTYPE
coercion and UTF-8 Mode by default. They must now be enabled
explicitly (opt-in) using the new _Py_PreInitialize() API with
_PyPreConfig.
When embedding Python, set coerce_c_locale and utf8_mode attributes
of _PyPreConfig to -1 to enable automatically these parameters
depending on the LC_CTYPE locale, environment variables and command
line arguments
Alternative: Setting Py_UTF8Mode to 1 always explicitly enables the
UTF-8 Mode.
Changes:
* _PyPreConfig_INIT now sets coerce_c_locale and utf8_mode to 0 by
default.
* _Py_InitializeFromArgs() and _Py_InitializeFromWideArgs() can now
be called with config=NULL.
* _PyCoreConfig_Write() now updates _PyRuntime.preconfig
* Remove _PyPreCmdline_Copy()
* _PyPreCmdline_Read() now accepts _PyPreConfig and _PyCoreConfig
optional configurations.
* Rename _PyPreConfig_ReadFromArgv() to _PyPreConfig_Read(). Simplify
the code.
* Calling _PyCoreConfig_Read() no longer adds the warning options
twice: don't add a warning option if it's already in the list.
* Rename _PyCoreConfig_ReadFromArgv() to _PyCoreConfig_Read().
* Rename config_from_cmdline() to _PyCoreConfig_ReadFromArgv().
* Add more assertions on _PyCoreConfig in _PyCoreConfig_Read().
* Move some functions.
* Make some config functions private.
* Add _Py_GetConfigsAsDict() function to get all configurations as a
dict.
* dump_config() of _testembed.c now dumps preconfig as a separated
key: call _Py_GetConfigsAsDict().
* Make _PyMainInterpreterConfig_AsDict() private.
Replace messy _Py_wstrlist_xxx() functions with a new clean
_PyWstrList structure and new _PyWstrList_xxx() functions.
Changes:
* Add _PyCoreConfig.use_module_search_paths to decide if
_PyCoreConfig.module_search_paths should be computed or not, to
support empty search path list.
* _PyWstrList_Clear() sets length to 0 and items to NULL, whereas
_Py_wstrlist_clear() only freed memory.
* _PyWstrList_Append() returns an int, whereas _Py_wstrlist_append()
returned _PyInitError.
* _PyWstrList uses Py_ssize_t for the length, instead of int.
* Replace (int, wchar_t**) with _PyWstrList in:
* _PyPreConfig
* _PyCoreConfig
* _PyPreCmdline
* _PyCmdline
* Replace "int orig_argv; wchar_t **orig_argv;"
with "_PyWstrList orig_argv".
* _PyCmdline and _PyPreCmdline now also copy wchar_argv.
* Rename _PyArgv_Decode() to _PyArgv_AsWstrList().
* PySys_SetArgvEx() now pass the fixed (argc, argv) to
_PyPathConfig_ComputeArgv0() (don't pass negative argc or NULL
argv).
* _PyOS_GetOpt() uses Py_ssize_t
bpo-34247, bpo-36142: The PYTHONMALLOC environment variable has the
priority over PYTHONDEV env var and "-X dev" command line option.
For example, PYTHONMALLOC=malloc PYTHONDEVMODE=1 sets the memory
allocators to "malloc" (and not to "debug").
Add an unit test.
The development mode now uses the effective name of the debug memory
allocator ("pymalloc_debug" or "malloc_debug"). So the name doesn't
change after setting the memory allocator.
* Move 'allocator' and 'dev_mode' fields from _PyCoreConfig
to _PyPreConfig.
* Fix InitConfigTests of test_embed: dev_mode sets allocator to
"debug", add a new tests for env vars with dev mode enabled.
* Move following fields from _PyCoreConfig to _PyPreConfig:
* coerce_c_locale
* coerce_c_locale_warn
* legacy_windows_stdio
* utf8_mode
* _PyPreConfig_ReadFromArgv() is now responsible to choose the
filesystem encoding
* _PyPreConfig_Write() now sets the LC_CTYPE locale
* Add _PyPreConfig structure
* Move 'ignored' and 'use_environment' fields from _PyCoreConfig
to _PyPreConfig
* Add a new "_PyPreConfig preconfig;" field to _PyCoreConfig
Add a new _Py_INIT_EXIT() macro to be able to exit Python with an
exitcode using _PyInitError API. Rewrite function calls by
pymain_main() to use _PyInitError.
Changes:
* Remove _PyMain.err and _PyMain.status field
* Add _Py_INIT_EXIT() macro and _PyInitError.exitcode field.
* Rename _Py_FatalInitError() to _Py_ExitInitError().
test_embed.InitConfigTests tests more configuration variables.
Changes:
* InitConfigTests tests more core configuration variables:
* base_exec_prefix
* base_prefix
* exec_prefix
* home
* legacy_windows_fs_encoding
* legacy_windows_stdio
* module_search_path_env
* prefix
* "_testembed init_from_config" tests more variables:
* argv
* warnoptions
* xoptions
* InitConfigTests: add check_global_config(), check_core_config() and
check_main_config() subfunctions to cleanup the code. Move also
constants at the class level (ex: COPY_MAIN_CONFIG).
* Fix _PyCoreConfig_AsDict(): don't set stdio_encoding twice
* Use more macros in _PyCoreConfig_AsDict() and
_PyMainInterpreterConfig_AsDict() to reduce code duplication.
* Other minor cleanups.
* Fix _PyCoreConfig_SetGlobalConfig(): set also Py_FrozenFlag
* Fix _PyCoreConfig_AsDict(): export also xoptions
* Add _Py_GetGlobalVariablesAsDict() and _testcapi.get_global_config()
* test.pythoninfo: dump also global configuration variables
* _testembed now serializes global, core and main configurations
using JSON to reuse _Py_GetGlobalVariablesAsDict(),
_PyCoreConfig_AsDict() and _PyMainInterpreterConfig_AsDict(),
rather than duplicating code.
* test_embed.InitConfigTests now test much more configuration
variables
* Fix _PyMainInterpreterConfig_Copy():
copy 'install_signal_handlers' attribute
* Add _PyMainInterpreterConfig_AsDict()
* Add unit tests on the main interpreter configuration
to test_embed.InitConfigTests
* test.pythoninfo: log also main_config
* And pycore_lifecycle.h and pycore_pathconfig.h headers to
Include/internal/
* Move Py_BUILD_CORE specific code from coreconfig.h and
pylifecycle.h to pycore_pathconfig.h and pycore_lifecycle.h
* Move _Py_wstrlist_XXX() definitions and _PyPathConfig code
from pycore_state.h to pycore_pathconfig.h
* Move "Init" and "Fini" function definitions from pylifecycle.c to
pycore_lifecycle.h.
* Revert "bpo-34589: Add -X coerce_c_locale command line option (GH-9378)"
This reverts commit dbdee0073c.
* Revert "bpo-34589: C locale coercion off by default (GH-9073)"
This reverts commit 7a0791b699.
* Revert "bpo-34589: Make _PyCoreConfig.coerce_c_locale private (GH-9371)"
This reverts commit 188ebfa475.
Py_Initialize() and Py_Main() cannot enable the C locale coercion
(PEP 538) anymore: it is always disabled. It can now only be enabled
by the Python program ("python3).
test_embed: get_filesystem_encoding() doesn't have to set PYTHONUTF8
nor PYTHONCOERCECLOCALE, these variables are already set in the
parent.
_PyCoreConfig:
* Rename coerce_c_locale to _coerce_c_locale
* Rename coerce_c_locale_warn to _coerce_c_locale_warn
These fields are now private (name prefixed by "_").
Add support for the "surrogatepass" error handler in
PyUnicode_DecodeFSDefault() and PyUnicode_EncodeFSDefault()
for the UTF-8 encoding.
Changes:
* _Py_DecodeUTF8Ex() and _Py_EncodeUTF8Ex() now support the
surrogatepass error handler (_Py_ERROR_SURROGATEPASS).
* _Py_DecodeLocaleEx() and _Py_EncodeLocaleEx() now use
the _Py_error_handler enum instead of "int surrogateescape" to pass
the error handler. These functions now return -3 if the error
handler is unknown.
* Add unit tests on _Py_DecodeLocaleEx() and _Py_EncodeLocaleEx()
in test_codecs.
* Rename get_error_handler() to _Py_GetErrorHandler() and expose it
as a private function.
* _freeze_importlib doesn't need config.filesystem_errors="strict"
workaround anymore.
_PyCoreConfig_Read() is now responsible to choose the filesystem
encoding and error handler. Using Py_Main(), the encoding is now
chosen even before calling Py_Initialize().
_PyCoreConfig.filesystem_encoding is now the reference, instead of
Py_FileSystemDefaultEncoding, for the Python filesystem encoding.
Changes:
* Add filesystem_encoding and filesystem_errors to _PyCoreConfig
* _PyCoreConfig_Read() now reads the locale encoding for the file
system encoding.
* PyUnicode_EncodeFSDefault() and PyUnicode_DecodeFSDefaultAndSize()
now use the interpreter configuration rather than
Py_FileSystemDefaultEncoding and Py_FileSystemDefaultEncodeErrors
global configuration variables.
* Add _Py_SetFileSystemEncoding() and _Py_ClearFileSystemEncoding()
private functions to only modify Py_FileSystemDefaultEncoding and
Py_FileSystemDefaultEncodeErrors in coreconfig.c.
* _Py_CoerceLegacyLocale() now takes an int rather than
_PyCoreConfig for the warning.
Python now gets the locale encoding with C code to initialize the encoding
of standard streams like sys.stdout. Moreover, the encoding is now
initialized to the Python codec name to get a normalized encoding name and
to ensure that the codec is loaded. The change avoids importing
_bootlocale and _locale modules at startup by default.
When the PYTHONIOENCODING environment variable only contains an encoding,
the error handler is now is now set explicitly to "strict".
Rename also get_default_standard_stream_error_handler() to
get_stdio_errors().
Reduce the buffer to format the "cpXXX" string (Windows locale encoding).
bpo-31650, bpo-34170: Replace _Py_CheckHashBasedPycsMode with
_PyCoreConfig._check_hash_pycs_mode. Modify PyInit__imp() and
zipimport to get the parameter from the current interpreter core
configuration.
Remove Include/internal/import.h file.
* Config: Rename ignore_environment field to use_environment.
* _PyCoreConfig_Read(): if isolated is set, use_environment and
site_import are now always set to 0.
* Inline pymain_free_raw() into pymain_free()
* Move config_init_warnoptions() call into pymain_read_conf_impl()
* _PyCoreConfig_Read(): don't replace values if they are already set:
faulthandler, pycache_prefix, home.
* If _Py_InitializeCore() is called twice, the second call now copies
and apply (partially) the new configuration.
* Rename _Py_CommandLineDetails to _PyCmdline
* Move more code into pymain_init(). The core configuration created
by Py_Main() is new destroyed before running Python to reduce the
memory footprint.
* _Py_InitializeCore() now returns the created interpreter.
_Py_InitializeMainInterpreter() now expects an interpreter.
* Remove _Py_InitializeEx_Private(): _freeze_importlib now uses
_Py_InitializeFromConfig()
* _PyCoreConfig_InitPathConfig() now only computes the path
configuration if needed.
Py_Main() can again be called after Py_Initialize(), as in Python
3.6. The new configuration is ignored, except of
_PyMainInterpreterConfig.argv which is used to update sys.argv.
- new test case for pre-initialization of sys.warnoptions and sys._xoptions
- restored ability to call these APIs prior to Py_Initialize
- updated the docs for the affected APIs to make it clear they can be
called before Py_Initialize
- also enhanced the existing embedding test cases
to check for expected settings in the sys module
bpo-29240, bpo-32030: If the encoding change (C locale coerced or
UTF-8 Mode changed), Py_Main() now reads again the configuration with
the new encoding.
Changes:
* Add _Py_UnixMain() called by main().
* Rename pymain_free_pymain() to pymain_clear_pymain(), it can now be
called multipled times.
* Rename pymain_parse_cmdline_envvars() to pymain_read_conf().
* Py_Main() now clears orig_argc and orig_argv at exit.
* Remove argv_copy2, Py_Main() doesn't modify argv anymore. There is
no need anymore to get two copies of the wchar_t** argv.
* _PyCoreConfig: add coerce_c_locale and coerce_c_locale_warn.
* Py_UTF8Mode is now initialized to -1.
* Locale coercion (PEP 538) now respects -I and -E options.
* Add -X utf8 command line option, PYTHONUTF8 environment variable
and a new sys.flags.utf8_mode flag.
* If the LC_CTYPE locale is "C" at startup: enable automatically the
UTF-8 mode.
* Add _winapi.GetACP(). encodings._alias_mbcs() now calls
_winapi.GetACP() to get the ANSI code page
* locale.getpreferredencoding() now returns 'UTF-8' in the UTF-8
mode. As a side effect, open() now uses the UTF-8 encoding by
default in this mode.
* Py_DecodeLocale() and Py_EncodeLocale() now use the UTF-8 encoding
in the UTF-8 Mode.
* Update subprocess._args_from_interpreter_flags() to handle -X utf8
* Skip some tests relying on the current locale if the UTF-8 mode is
enabled.
* Add test_utf8mode.py.
* _Py_DecodeUTF8_surrogateescape() gets a new optional parameter to
return also the length (number of wide characters).
* pymain_get_global_config() and pymain_set_global_config() now
always copy flag values, rather than only copying if the new value
is greater than the old value.
When PyGILState_Ensure() is called in a non-Python thread before
PyEval_InitThreads(), only call PyEval_InitThreads() after calling
PyThreadState_New() to fix a crash.
Add an unit test in test_embed.
* Fix _PyMem_SetupAllocators("debug"): always restore allocators to
the defaults, rather than only caling _PyMem_SetupDebugHooks().
* Add _PyMem_SetDefaultAllocator() helper to set the "default"
allocator.
* Add _PyMem_GetAllocatorsName(): get the name of the allocators
* main() now uses debug hooks on memory allocators if Py_DEBUG is
defined, rather than calling directly malloc()
* Document default memory allocators in C API documentation
* _Py_InitializeCore() now fails with a fatal user error if
PYTHONMALLOC value is an unknown memory allocator, instead of
failing with a fatal internal error.
* Add new tests on the PYTHONMALLOC environment variable
* Add support.with_pymalloc()
* Add the _testcapi.WITH_PYMALLOC constant and expose it as
support.with_pymalloc().
* sysconfig.get_config_var('WITH_PYMALLOC') doesn't work on Windows, so
replace it with support.with_pymalloc().
* pythoninfo: add _testcapi collector for pymem
bpo-32096, bpo-30860: Partially revert the commit
2ebc5ce42a8a9e047e790aefbf9a94811569b2b6:
* Move structures back from Include/internal/mem.h to
Objects/obmalloc.c
* Remove _PyObject_Initialize() and _PyMem_Initialize()
* Remove Include/internal/pymalloc.h
* Add test_capi.test_pre_initialization_api():
Make sure that it's possible to call Py_DecodeLocale(), and then call
Py_SetProgramName() with the decoded string, before Py_Initialize().
PyMem_RawMalloc() and Py_DecodeLocale() can be called again before
_PyRuntimeState_Init().
Co-Authored-By: Eric Snow <ericsnowcurrently@gmail.com>