The getpath.py file is frozen at build time and executed as code over a namespace. It is never imported, nor is it meant to be importable or reusable. However, it should be easier to read, modify, and patch than the previous code.
This commit attempts to preserve every previously tested quirk, but these may be changed in the future to better align platforms.
The recently added PyConfig.stdlib_dir was being set with ".." entries. When __file__ was added for from modules this caused a problem on out-of-tree builds. This PR fixes that by normalizing "stdlib_dir" when it is calculated in getpath.c.
https://bugs.python.org/issue45506
The default was "off". Switching it to "on" means users get the benefit of frozen stdlib modules without having to do anything. There's a special-case for running-in-source-tree, so contributors don't get surprised when their stdlib changes don't get used.
https://bugs.python.org/issue45020
This accomplishes 2 things:
* consolidates some common code between getpath.c and getpathp.c
* makes the helpers available to code in other files
FWIW, the signature of the join_relfile() function (in fileutils.c) intentionally mirrors that of Windows' PathCchCombineEx().
Note that this change is mostly moving code around. No behavior is meant to change.
https://bugs.python.org/issue45211
Release the GIL while performing isatty() system calls on arbitrary
file descriptors. In particular, this affects os.isatty(),
os.device_encoding() and io.TextIOWrapper. By extension,
io.open() in text mode is also affected.
Fix the os.set_inheritable() function on FreeBSD 14 for file
descriptor opened with the O_PATH flag: ignore the EBADF error on
ioctl(), fallback on the fcntl() implementation.
This works by not caching the handle and instead getting the handle from
the file descriptor each time, so that if the actual handle changes by
fd redirection closing/opening the console handle beneath our feet, we
will keep working correctly.
Python no longer fails at startup with a fatal error if a command
line argument contains an invalid Unicode character.
The Py_DecodeLocale() function now escapes byte sequences which would
be decoded as Unicode characters outside the [U+0000; U+10ffff]
range.
Use MAX_UNICODE constant in unicodeobject.c.
If the nl_langinfo(CODESET) function returns an empty string, Python
now uses UTF-8 as the filesystem encoding.
In May 2010 (commit b744ba1d14), I
modified Python to log a warning and use UTF-8 as the filesystem
encoding (instead of None) if nl_langinfo(CODESET) returns an empty
string.
In August 2020 (commit 94908bbc15), I
modified Python startup to fail with a fatal error and a specific
error message if nl_langinfo(CODESET) returns an empty string. The
intent was to prevent guessing the encoding and also investigate user
configuration where this case happens.
In 10 years (2010 to 2020), I saw zero user report about the error
message related to nl_langinfo(CODESET) returning an empty string.
Today, UTF-8 became the defacto standard and it's safe to make the
assumption that the user expects UTF-8. For example,
nl_langinfo(CODESET) can return an empty string on macOS if the
LC_CTYPE locale is not supported, and UTF-8 is the default encoding
on macOS.
While this change is likely to not affect anyone in practice, it
should make UTF-8 lover happy ;-)
Rewrite also the documentation explaining how Python selects the
filesystem encoding and error handler.
* Rename _Py_GetLocaleEncoding() to _Py_GetLocaleEncodingObject()
* Add _Py_GetLocaleEncoding() which returns a wchar_t* string to
share code between _Py_GetLocaleEncodingObject()
and config_get_locale_encoding().
* _Py_GetLocaleEncodingObject() now decodes nl_langinfo(CODESET)
from the current locale encoding with surrogateescape,
rather than using UTF-8.
_io.TextIOWrapper no longer calls getpreferredencoding(False) of
_bootlocale to get the locale encoding, but calls
_Py_GetLocaleEncoding() instead.
Add config_get_fs_encoding() sub-function. Reorganize also
config_get_locale_encoding() code.
This API is relatively lightweight and organizationally, given that it's
used by multiple modules, it makes sense to move it to fileutils.
Requires making sure that _posixsubprocess is compiled with the appropriate
Py_BUIILD_CORE_BUILTIN macro.
Following symbolic links is now limited to 40 attempts, just to
prevent loops.
Add subfunctions:
* Add resolve_symlinks()
* Add calculate_argv0_path_framework()
* Add calculate_which()
* Add calculate_program_macos()
Fix also _Py_wreadlink(): readlink() result type is Py_ssize_t, not
int.
bpo-37834: Normalise handling of reparse points on Windows
* ntpath.realpath() and nt.stat() will traverse all supported reparse points (previously was mixed)
* nt.lstat() will let the OS traverse reparse points that are not name surrogates (previously would not traverse any reparse point)
* nt.[l]stat() will only set S_IFLNK for symlinks (previous behaviour)
* nt.readlink() will read destinations for symlinks and junction points only
bpo-1311: os.path.exists('nul') now returns True on Windows
* nt.stat('nul').st_mode is now S_IFCHR (previously was an error)
Python now gets the absolute path of the script filename specified on
the command line (ex: "python3 script.py"): the __file__ attribute of
the __main__ module, sys.argv[0] and sys.path[0] become an absolute
path, rather than a relative path.
* Add _Py_isabs() and _Py_abspath() functions.
* _PyConfig_Read() now tries to get the absolute path of
run_filename, but keeps the relative path if _Py_abspath() fails.
* Reimplement os._getfullpathname() using _Py_abspath().
* Use _Py_isabs() in getpath.c.
Add _Py_FORCE_UTF8_LOCALE and _Py_FORCE_UTF8_FS_ENCODING macros to
avoid factorize "#if defined(__ANDROID__) || defined(__VXWORKS__)"
and "#if defined(__APPLE__)".
Cleanup also config_init_fs_encoding().
The last parameter of _Py_wreadlink(), _Py_wrealpath() and
_Py_wgetcwd() is a length, not a size: number of characters including
the trailing NUL character.
Enhance also documentation of error conditions.
Use UTF-8 as the system encoding on VxWorks.
The main reason are:
1. The locale is frequently misconfigured.
2. Missing some functions to deal with locale in VxWorks C library.
bpo-34523, bpo-35290: C locale coercion now resets the Python
internal "force ASCII" mode. This change fix the filesystem encoding
on FreeBSD CURRENT, which has a new "C.UTF-8" locale, when
the UTF-8 mode is disabled.
Add _Py_ResetForceASCII(): _Py_SetLocaleFromEnv() now calls it.
locale.localeconv() now sets temporarily the LC_CTYPE locale to the
LC_MONETARY locale if the two locales are different and monetary
strings are non-ASCII. This temporary change affects other threads.
Changes:
* locale.localeconv() can now set LC_CTYPE to LC_MONETARY to decode
monetary fields.
* Add LocaleInfo.grouping_buffer: copy localeconv() grouping string
since it can be replaced anytime if a different thread calls
localeconv().
* _Py_GetLocaleconvNumeric() now requires a "struct lconv *"
structure, so locale.localeconv() now longer calls localeconv()
twice. Moreover, the function now requires all arguments to be
non-NULL.
* Rename STATIC_LOCALE_INFO_INIT to LocaleInfo_STATIC_INIT.
* Move _Py_GetLocaleconvNumeric() definition from fileutils.h
to pycore_fileutils.h. pycore_fileutils.h now includes locale.h.
* The _locale module is now built with Py_BUILD_CORE defined.
Add support for the "surrogatepass" error handler in
PyUnicode_DecodeFSDefault() and PyUnicode_EncodeFSDefault()
for the UTF-8 encoding.
Changes:
* _Py_DecodeUTF8Ex() and _Py_EncodeUTF8Ex() now support the
surrogatepass error handler (_Py_ERROR_SURROGATEPASS).
* _Py_DecodeLocaleEx() and _Py_EncodeLocaleEx() now use
the _Py_error_handler enum instead of "int surrogateescape" to pass
the error handler. These functions now return -3 if the error
handler is unknown.
* Add unit tests on _Py_DecodeLocaleEx() and _Py_EncodeLocaleEx()
in test_codecs.
* Rename get_error_handler() to _Py_GetErrorHandler() and expose it
as a private function.
* _freeze_importlib doesn't need config.filesystem_errors="strict"
workaround anymore.
Py_DecodeLocale() and Py_EncodeLocale() now use the UTF-8 encoding on
Windows if Py_LegacyWindowsFSEncodingFlag is zero.
pymain_read_conf() now sets Py_LegacyWindowsFSEncodingFlag in its
loop, but restore its value at exit.
On HP-UX with C or POSIX locale, sys.getfilesystemencoding() now returns
"ascii" instead of "roman8" (when the UTF-8 Mode is disabled and the C locale
is not coerced).
nl_langinfo(CODESET) announces "roman8" whereas it uses the Latin1
encoding in practice.
* The UTF-8 Mode is now also enabled by the "POSIX" locale, not only
by the "C" locale.
* On FreeBSD, Py_DecodeLocale() and Py_EncodeLocale() now also forces
the ASCII encoding if the LC_CTYPE locale is "POSIX", not only if
the LC_CTYPE locale is "C".
* test_utf8_mode.test_cmd_line() checks also that the command line
arguments are decoded from UTF-8 when the the UTF-8 Mode is enabled
with POSIX locale or C locale.
Fix a rare but potential pre-exec child process deadlock in subprocess on POSIX systems when marking file descriptors inheritable on exec in the child process. This bug appears to have been introduced in 3.4 with the inheritable file descriptors support.
This also changes Python/fileutils.c `set_inheritable` to use the "slow" two `fcntl` syscall path instead of the "fast" single `ioctl` syscall path when asked to be async signal safe (by way of being asked not to raise exceptions). `ioctl` is not a POSIX async-signal-safe approved function.
ref: http://pubs.opengroup.org/onlinepubs/9699919799/functions/V2_chap02.html
PyUnicode_DecodeLocaleAndSize(), PyUnicode_DecodeLocale() and
PyUnicode_EncodeLocale() now use always use the UTF-8 encoding on
Android, instead of the current locale encoding.
On Android API 19, mbstowcs() and wcstombs() are broken and cannot be
used.
* Add _Py_GetLocaleconvNumeric() function: decode decimal_point and
thousands_sep fields of localeconv() from the LC_NUMERIC encoding,
rather than decoding from the LC_CTYPE encoding.
* Modify locale.localeconv() and "n" formatter of str.format() (for
int, float and complex to use _Py_GetLocaleconvNumeric()
internally.
Modify locale.localeconv(), time.tzname, os.strerror() and other
functions to ignore the UTF-8 Mode: always use the current locale
encoding.
Changes:
* Add _Py_DecodeLocaleEx() and _Py_EncodeLocaleEx(). On decoding or
encoding error, they return the position of the error and an error
message which are used to raise Unicode errors in
PyUnicode_DecodeLocale() and PyUnicode_EncodeLocale().
* Replace _Py_DecodeCurrentLocale() with _Py_DecodeLocaleEx().
* PyUnicode_DecodeLocale() now uses _Py_DecodeLocaleEx() for all
cases, especially for the strict error handler.
* Add _Py_DecodeUTF8Ex(): return more information on decoding error
and supports the strict error handler.
* Rename _Py_EncodeUTF8_surrogateescape() to _Py_EncodeUTF8Ex().
* Replace _Py_EncodeCurrentLocale() with _Py_EncodeLocaleEx().
* Ignore the UTF-8 mode to encode/decode localeconv(), strerror()
and time zone name.
* Remove PyUnicode_DecodeLocale(), PyUnicode_DecodeLocaleAndSize()
and PyUnicode_EncodeLocale() now ignore the UTF-8 mode: always use
the "current" locale.
* Remove _PyUnicode_DecodeCurrentLocale(),
_PyUnicode_DecodeCurrentLocaleAndSize() and
_PyUnicode_EncodeCurrentLocale().