bpo-36443, bpo-36202: Since Python 3.7.0, calling Py_DecodeLocale()
before Py_Initialize() produces mojibake if the LC_CTYPE locale is
coerced and/or if the UTF-8 Mode is enabled by the user
configuration. This change fix the issue by disabling LC_CTYPE
coercion and UTF-8 Mode by default. They must now be enabled
explicitly (opt-in) using the new _Py_PreInitialize() API with
_PyPreConfig.
When embedding Python, set coerce_c_locale and utf8_mode attributes
of _PyPreConfig to -1 to enable automatically these parameters
depending on the LC_CTYPE locale, environment variables and command
line arguments
Alternative: Setting Py_UTF8Mode to 1 always explicitly enables the
UTF-8 Mode.
Changes:
* _PyPreConfig_INIT now sets coerce_c_locale and utf8_mode to 0 by
default.
* _Py_InitializeFromArgs() and _Py_InitializeFromWideArgs() can now
be called with config=NULL.
* _PyCoreConfig_Write() now updates _PyRuntime.preconfig
* Remove _PyPreCmdline_Copy()
* _PyPreCmdline_Read() now accepts _PyPreConfig and _PyCoreConfig
optional configurations.
* Rename _PyPreConfig_ReadFromArgv() to _PyPreConfig_Read(). Simplify
the code.
* Calling _PyCoreConfig_Read() no longer adds the warning options
twice: don't add a warning option if it's already in the list.
* Rename _PyCoreConfig_ReadFromArgv() to _PyCoreConfig_Read().
* Rename config_from_cmdline() to _PyCoreConfig_ReadFromArgv().
* Add more assertions on _PyCoreConfig in _PyCoreConfig_Read().
* Move some functions.
* Make some config functions private.
* Make _PyPreConfig_GetEnv(), _PyCoreConfig_GetEnv() and
_PyCoreConfig_GetEnvDup() private
* _Py_get_env_flag() first parameter becomes "int use_environment"
* Add _Py_GetConfigsAsDict() function to get all configurations as a
dict.
* dump_config() of _testembed.c now dumps preconfig as a separated
key: call _Py_GetConfigsAsDict().
* Make _PyMainInterpreterConfig_AsDict() private.
* Initialize _PyPreConfig.dev_mode to -1.
* _PyPreConfig_Read(): coreconfig has the priority over preconfig.
* _PyCoreConfig_Read() now calls _PyPreCmdline_Read() internally.
* config_from_cmdline() now pass _PyPreCmdline to config_read().
* Add _PyPreCmdline_Copy().
Prepare code to move some _PyPreConfig parameters into _PyPreCmdline.
Changes:
* _PyCoreConfig_ReadFromArgv(): remove preconfig parameter,
use _PyRuntime.preconfig.
* Add _PyPreCmdline_GetPreConfig() (called by _PyPreConfig_Read()).
* Rename _PyPreCmdline_Init() to _PyPreCmdline_SetArgv()
* Factorize _Py_PreInitializeFromPreConfig() code: add
pyinit_preinit().
* _PyPreConfig_Read() now sets coerce_c_locale to 2 if it must be
coerced.
* Remove _PyCoreConfig_ReadPreConfig().
* _PyCoreConfig_Write() now copies updated preconfig into _PyRuntime.
* Add _PyRuntime.pre_initialized: set to 1 when Python
is pre-initialized
* Add _Py_PreInitialize() and _Py_PreInitializeFromPreConfig().
* _PyCoreConfig_Read() now calls _Py_PreInitialize().
* Move _PyPreConfig_GetGlobalConfig() and
_PyCoreConfig_GetGlobalConfig() calls from main.c to preconfig.c
and coreconfig.c.
At Python initialization, the current directory is no longer
prepended to sys.path if it has been removed.
Rename _PyPathConfig_ComputeArgv0() to
_PyPathConfig_ComputeSysPath0() to avoid confusion between argv[0]
and sys.path[0].
When Python is compiled with Valgrind support, release Unicode
interned strings at exit in _PyUnicode_Fini().
* Rename _Py_ReleaseInternedUnicodeStrings() to
unicode_release_interned() and make it private.
* unicode_release_interned() is now called from _PyUnicode_Fini():
it must be called with a running Python thread state for TRASHCAN,
it cannot be called from pymain_free().
* Don't display statistics on interned strings at exit anymore
Python initialization now fails if decoding pybuilddir.txt
configuration file fails at startup.
_PyPathConfig_Calculate() now reports memory allocation failure and
decoding error on decoding pybuilddir.txt content from
UTF-8/surrogateescape.
The last parameter of _Py_wreadlink(), _Py_wrealpath() and
_Py_wgetcwd() is a length, not a size: number of characters including
the trailing NUL character.
Enhance also documentation of error conditions.
Replace messy _Py_wstrlist_xxx() functions with a new clean
_PyWstrList structure and new _PyWstrList_xxx() functions.
Changes:
* Add _PyCoreConfig.use_module_search_paths to decide if
_PyCoreConfig.module_search_paths should be computed or not, to
support empty search path list.
* _PyWstrList_Clear() sets length to 0 and items to NULL, whereas
_Py_wstrlist_clear() only freed memory.
* _PyWstrList_Append() returns an int, whereas _Py_wstrlist_append()
returned _PyInitError.
* _PyWstrList uses Py_ssize_t for the length, instead of int.
* Replace (int, wchar_t**) with _PyWstrList in:
* _PyPreConfig
* _PyCoreConfig
* _PyPreCmdline
* _PyCmdline
* Replace "int orig_argv; wchar_t **orig_argv;"
with "_PyWstrList orig_argv".
* _PyCmdline and _PyPreCmdline now also copy wchar_argv.
* Rename _PyArgv_Decode() to _PyArgv_AsWstrList().
* PySys_SetArgvEx() now pass the fixed (argc, argv) to
_PyPathConfig_ComputeArgv0() (don't pass negative argc or NULL
argv).
* _PyOS_GetOpt() uses Py_ssize_t
The value is a string for string and byte literals, None otherwise.
It is 'u' for u"..." literals, 'b' for b"..." literals, '' for "..." literals.
The 'r' (raw) prefix is ignored.
Does not apply to f-strings.
This appears sufficient to make mypy capable of using the stdlib ast module instead of typed_ast (assuming a mypy patch I'm working on).
WIP: I need to make the tests pass. @ilevkivskyi @serhiy-storchaka
https://bugs.python.org/issue36280
d_initial, the first state of a particular DFA in the parser has always been initialized to 0 in the old pgen as well as the new pgen. As this value is not used and the first state of each DFA is assumed to be the first element in the array representing it, remove d_initial from the parser to reduce complexity.
The macros were working only because our usage happened to parse correctly. Changing that usage (e.g. with pointers) would break the macros. This fixes that.
This adds a `feature_version` flag to `ast.parse()` (documented) and `compile()` (hidden) that allow tweaking the parser to support older versions of the grammar. In particular if `feature_version` is 5 or 6, the hacks for the `async` and `await` keyword from PEP 492 are reinstated. (For 7 or higher, these are unconditionally treated as keywords, but they are still special tokens rather than `NAME` tokens that the parser driver recognizes.)
https://bugs.python.org/issue35975
* _PyPreConfig_Read() now sets temporarily LC_CTYPE to the user
preferred locale, as _PyPreConfig_Write() will do permanentely.
* Fix _PyCoreConfig_Clear(): clear run_xxx attributes
* _Py_SetArgcArgv() doesn't have to be exported
* _PyCoreConfig_SetGlobalConfig() no longer applies preconfig
* _PyPreConfig_Write() now reallocates the pre-configuration with the
new memory allocator.
* It is no longer needed to force the "default raw memory allocator"
to clear pre-configuration and core configuration. Simplify the
code.
* _PyPreConfig_Write() now does nothing if called after
Py_Initialize(): no longer check if the allocator is the same.
* Remove _PyMem_GetDebugAllocatorsName(): dev mode sets again
allocator to "debug".
* _PyPreConfig_Write() now sets the memory allocator.
* _PyPreConfig_Write() gets a return type: _PyInitError.
* _Py_InitializeCore() now reads and writes the pre-configuration
(set the memory allocator, configure the locale) before reading and
writing the core configuration.
The development mode now uses the effective name of the debug memory
allocator ("pymalloc_debug" or "malloc_debug"). So the name doesn't
change after setting the memory allocator.
* Move 'allocator' and 'dev_mode' fields from _PyCoreConfig
to _PyPreConfig.
* Fix InitConfigTests of test_embed: dev_mode sets allocator to
"debug", add a new tests for env vars with dev mode enabled.
* Move following fields from _PyCoreConfig to _PyPreConfig:
* coerce_c_locale
* coerce_c_locale_warn
* legacy_windows_stdio
* utf8_mode
* _PyPreConfig_ReadFromArgv() is now responsible to choose the
filesystem encoding
* _PyPreConfig_Write() now sets the LC_CTYPE locale
* Add _PyPreConfig structure
* Move 'ignored' and 'use_environment' fields from _PyCoreConfig
to _PyPreConfig
* Add a new "_PyPreConfig preconfig;" field to _PyCoreConfig
* Revert "bpo-36097: Use only public C-API in the_xxsubinterpreters module (adding as necessary). (#12003)"
This reverts commit bcfa450f21.
* Revert "bpo-33608: Simplify ceval's DISPATCH by hoisting eval_breaker ahead of time. (gh-12062)"
This reverts commit bda918bf65.
* Revert "bpo-33608: Use _Py_AddPendingCall() in _PyCrossInterpreterData_Release(). (gh-12024)"
This reverts commit b05b711a2c.
* Revert "bpo-33608: Factor out a private, per-interpreter _Py_AddPendingCall(). (GH-11617)"
This reverts commit ef4ac967e2.
Use UTF-8 as the system encoding on VxWorks.
The main reason are:
1. The locale is frequently misconfigured.
2. Missing some functions to deal with locale in VxWorks C library.
Pgen is the oldest piece of technology in the CPython repository, building it requires various #if[n]def PGEN hacks in other parts of the code and it also depends more and more on CPython internals. This commit removes the old pgen C code and replaces it for a new version implemented in pure Python. This is a modified and adapted version of lib2to3/pgen2 that can generate grammar files compatibles with the current parser.
This commit also eliminates all the #ifdef and code branches related to pgen, simplifying the code and making it more maintainable. The regen-grammar step now uses $(PYTHON_FOR_REGEN) that can be any version of the interpreter, so the new pgen code maintains compatibility with older versions of the interpreter (this also allows regenerating the grammar with the current CI solution that uses Python3.5). The new pgen Python module also makes use of the Grammar/Tokens file that holds the token specification, so is always kept in sync and avoids having to maintain duplicate token definitions.