cpython

Commit Graph

Author	SHA1	Message	Date
Victor Stinner	3529718925	bpo-42236: os.device_encoding() respects UTF-8 Mode (GH-23119) On Unix, the os.device_encoding() function now returns 'UTF-8' rather than the device encoding if the Python UTF-8 Mode is enabled.	2020-11-04 11:20:10 +01:00
Victor Stinner	e662c398d8	bpo-42236: Use UTF-8 encoding if nl_langinfo(CODESET) fails (GH-23086) If the nl_langinfo(CODESET) function returns an empty string, Python now uses UTF-8 as the filesystem encoding. In May 2010 (commit `b744ba1d14`), I modified Python to log a warning and use UTF-8 as the filesystem encoding (instead of None) if nl_langinfo(CODESET) returns an empty string. In August 2020 (commit `94908bbc15`), I modified Python startup to fail with a fatal error and a specific error message if nl_langinfo(CODESET) returns an empty string. The intent was to prevent guessing the encoding and also investigate user configuration where this case happens. In 10 years (2010 to 2020), I saw zero user report about the error message related to nl_langinfo(CODESET) returning an empty string. Today, UTF-8 became the defacto standard and it's safe to make the assumption that the user expects UTF-8. For example, nl_langinfo(CODESET) can return an empty string on macOS if the LC_CTYPE locale is not supported, and UTF-8 is the default encoding on macOS. While this change is likely to not affect anyone in practice, it should make UTF-8 lover happy ;-) Rewrite also the documentation explaining how Python selects the filesystem encoding and error handler.	2020-11-01 23:07:23 +01:00
Victor Stinner	82458b6cdb	bpo-42236: Enhance _locale._get_locale_encoding() (GH-23083) * Rename _Py_GetLocaleEncoding() to _Py_GetLocaleEncodingObject() * Add _Py_GetLocaleEncoding() which returns a wchar_t* string to share code between _Py_GetLocaleEncodingObject() and config_get_locale_encoding(). * _Py_GetLocaleEncodingObject() now decodes nl_langinfo(CODESET) from the current locale encoding with surrogateescape, rather than using UTF-8.	2020-11-01 20:59:35 +01:00
Victor Stinner	710e826307	bpo-42208: Add _Py_GetLocaleEncoding() (GH-23050) _io.TextIOWrapper no longer calls getpreferredencoding(False) of _bootlocale to get the locale encoding, but calls _Py_GetLocaleEncoding() instead. Add config_get_fs_encoding() sub-function. Reorganize also config_get_locale_encoding() code.	2020-10-31 01:02:09 +01:00
TIGirardi	f2312037e3	bpo-38324: Fix test__locale.py Windows failures (GH-20529) Use wide-char _W_* fields of lconv structure on Windows Remove "ps_AF" from test__locale.known_numerics on Windows	2020-10-20 12:39:52 +01:00
Kyle Evans	7992579cd2	bpo-40422: Move _Py_closerange to fileutils.c (GH-22680) This API is relatively lightweight and organizationally, given that it's used by multiple modules, it makes sense to move it to fileutils. Requires making sure that _posixsubprocess is compiled with the appropriate Py_BUIILD_CORE_BUILTIN macro.	2020-10-13 22:04:44 +02:00
Serhiy Storchaka	4c8f09d7ce	bpo-36346: Make using the legacy Unicode C API optional (GH-21437) Add compile time option USE_UNICODE_WCHAR_CACHE. Setting it to 0 makes the interpreter not using the wchar_t cache and the legacy Unicode C API.	2020-07-10 23:26:06 +03:00
Serhiy Storchaka	6c6810d989	bpo-41094: Fix decoding errors with audit when open files. (GH-21095)	2020-06-24 08:46:05 +03:00
Christian Heimes	9672912e8f	bpo-40957: Fix refleak in _Py_fopen_obj() (GH-20827) Signed-off-by: Christian Heimes <christian@python.org>	2020-06-14 00:57:22 +09:00
Victor Stinner	361dcdcefc	bpo-40268: Remove unused osdefs.h includes (GH-19532) When the include is needed, add required symbol in a comment.	2020-04-15 03:24:57 +02:00
Serhiy Storchaka	8f87eefe7f	bpo-39943: Add the const qualifier to pointers on non-mutable PyBytes data. (GH-19472)	2020-04-12 14:58:27 +03:00
Victor Stinner	03a8a56fac	bpo-38353: Add subfunctions to getpath.c (GH-16572) Following symbolic links is now limited to 40 attempts, just to prevent loops. Add subfunctions: * Add resolve_symlinks() * Add calculate_argv0_path_framework() * Add calculate_which() * Add calculate_program_macos() Fix also _Py_wreadlink(): readlink() result type is Py_ssize_t, not int.	2019-10-04 02:22:39 +02:00
Zackery Spytz	5be666010e	bpo-37549: os.dup() fails for standard streams on Windows 7 (GH-15389)	2019-08-23 11:38:41 -07:00
Steve Dower	df2d4a6f3d	bpo-37834: Normalise handling of reparse points on Windows (GH-15231) bpo-37834: Normalise handling of reparse points on Windows * ntpath.realpath() and nt.stat() will traverse all supported reparse points (previously was mixed) * nt.lstat() will let the OS traverse reparse points that are not name surrogates (previously would not traverse any reparse point) * nt.[l]stat() will only set S_IFLNK for symlinks (previous behaviour) * nt.readlink() will read destinations for symlinks and junction points only bpo-1311: os.path.exists('nul') now returns True on Windows * nt.stat('nul').st_mode is now S_IFCHR (previously was an error)	2019-08-21 15:27:33 -07:00
Victor Stinner	3939c321c9	bpo-20443: _PyConfig_Read() gets the absolute path of run_filename (GH-14053) Python now gets the absolute path of the script filename specified on the command line (ex: "python3 script.py"): the __file__ attribute of the __main__ module, sys.argv[0] and sys.path[0] become an absolute path, rather than a relative path. * Add _Py_isabs() and _Py_abspath() functions. * _PyConfig_Read() now tries to get the absolute path of run_filename, but keeps the relative path if _Py_abspath() fails. * Reimplement os._getfullpathname() using _Py_abspath(). * Use _Py_isabs() in getpath.c.	2019-06-25 15:02:43 +02:00
Zackery Spytz	28fca0c422	bpo-37267: Do not check for FILE_TYPE_CHAR in os.dup() on Windows (GH-14051) On Windows, os.dup() no longer creates an inheritable fd when handling a character file.	2019-06-17 09:17:14 +02:00
Steve Dower	b82e17e626	bpo-36842: Implement PEP 578 (GH-12613) Adds sys.audit, sys.addaudithook, io.open_code, and associated C APIs.	2019-05-23 08:45:22 -07:00
Victor Stinner	e251095a3f	bpo-36775: Add _Py_FORCE_UTF8_FS_ENCODING macro (GH-13056) Add _Py_FORCE_UTF8_LOCALE and _Py_FORCE_UTF8_FS_ENCODING macros to avoid factorize "#if defined(__ANDROID__) \|\| defined(__VXWORKS__)" and "#if defined(__APPLE__)". Cleanup also config_init_fs_encoding().	2019-05-02 11:28:57 -04:00
Victor Stinner	faddaedd05	bpo-36352: Avoid hardcoded MAXPATHLEN size in getpath.c (GH-12423) * Use Py_ARRAY_LENGTH() rather than hardcoded MAXPATHLEN in getpath.c. * Pass string length to functions modifying strings.	2019-03-19 02:58:14 +01:00
Victor Stinner	1be0d1135f	bpo-36352: Clarify fileutils.h documentation (GH-12406) The last parameter of _Py_wreadlink(), _Py_wrealpath() and _Py_wgetcwd() is a length, not a size: number of characters including the trailing NUL character. Enhance also documentation of error conditions.	2019-03-18 17:47:26 +01:00
pxinwr	f4b0a1c0da	bpo-31904: Add encoding support for VxWorks RTOS (GH-12051) Use UTF-8 as the system encoding on VxWorks. The main reason are: 1. The locale is frequently misconfigured. 2. Missing some functions to deal with locale in VxWorks C library.	2019-03-04 10:02:06 +01:00
Victor Stinner	353933e712	bpo-34523: Fix C locale coercion on FreeBSD CURRENT (GH-10672) bpo-34523, bpo-35290: C locale coercion now resets the Python internal "force ASCII" mode. This change fix the filesystem encoding on FreeBSD CURRENT, which has a new "C.UTF-8" locale, when the UTF-8 mode is disabled. Add _Py_ResetForceASCII(): _Py_SetLocaleFromEnv() now calls it.	2018-11-23 13:08:26 +01:00
Victor Stinner	02e6bf7f20	bpo-28604: Fix localeconv() for different LC_MONETARY (GH-10606) locale.localeconv() now sets temporarily the LC_CTYPE locale to the LC_MONETARY locale if the two locales are different and monetary strings are non-ASCII. This temporary change affects other threads. Changes: * locale.localeconv() can now set LC_CTYPE to LC_MONETARY to decode monetary fields. * Add LocaleInfo.grouping_buffer: copy localeconv() grouping string since it can be replaced anytime if a different thread calls localeconv(). * _Py_GetLocaleconvNumeric() now requires a "struct lconv " structure, so locale.localeconv() now longer calls localeconv() twice. Moreover, the function now requires all arguments to be non-NULL. Rename STATIC_LOCALE_INFO_INIT to LocaleInfo_STATIC_INIT. * Move _Py_GetLocaleconvNumeric() definition from fileutils.h to pycore_fileutils.h. pycore_fileutils.h now includes locale.h. * The _locale module is now built with Py_BUILD_CORE defined.	2018-11-20 16:20:16 +01:00
Victor Stinner	9fc57a3848	bpo-35081: Add pycore_fileutils.h (GH-10371) Move Py_BUILD_CORE code from Include/fileutils.h to a new Include/internal/pycore_fileutils.h file.	2018-11-07 00:44:03 +01:00
Stéphane Wirtel	74a8b6ea7e	bpo-24658: Fix read/write greater than 2 GiB on macOS (GH-1705) On macOS, fix reading from and writing into a file with a size larger than 2 GiB.	2018-10-18 01:05:04 +02:00
Victor Stinner	3d4226a832	bpo-34523: Support surrogatepass in locale codecs (GH-8995) Add support for the "surrogatepass" error handler in PyUnicode_DecodeFSDefault() and PyUnicode_EncodeFSDefault() for the UTF-8 encoding. Changes: * _Py_DecodeUTF8Ex() and _Py_EncodeUTF8Ex() now support the surrogatepass error handler (_Py_ERROR_SURROGATEPASS). * _Py_DecodeLocaleEx() and _Py_EncodeLocaleEx() now use the _Py_error_handler enum instead of "int surrogateescape" to pass the error handler. These functions now return -3 if the error handler is unknown. * Add unit tests on _Py_DecodeLocaleEx() and _Py_EncodeLocaleEx() in test_codecs. * Rename get_error_handler() to _Py_GetErrorHandler() and expose it as a private function. * _freeze_importlib doesn't need config.filesystem_errors="strict" workaround anymore.	2018-08-29 22:21:32 +02:00
Victor Stinner	c5989cd876	bpo-34523: Py_DecodeLocale() use UTF-8 on Windows (GH-8998) Py_DecodeLocale() and Py_EncodeLocale() now use the UTF-8 encoding on Windows if Py_LegacyWindowsFSEncodingFlag is zero. pymain_read_conf() now sets Py_LegacyWindowsFSEncodingFlag in its loop, but restore its value at exit.	2018-08-29 19:32:47 +02:00
Victor Stinner	d500e5307a	bpo-34403: On HP-UX, force ASCII for C locale (GH-8969) On HP-UX with C or POSIX locale, sys.getfilesystemencoding() now returns "ascii" instead of "roman8" (when the UTF-8 Mode is disabled and the C locale is not coerced). nl_langinfo(CODESET) announces "roman8" whereas it uses the Latin1 encoding in practice.	2018-08-28 17:27:36 +02:00
Victor Stinner	5cb258950c	bpo-34527: POSIX locale enables the UTF-8 Mode (GH-8972) * The UTF-8 Mode is now also enabled by the "POSIX" locale, not only by the "C" locale. * On FreeBSD, Py_DecodeLocale() and Py_EncodeLocale() now also forces the ASCII encoding if the LC_CTYPE locale is "POSIX", not only if the LC_CTYPE locale is "C". * test_utf8_mode.test_cmd_line() checks also that the command line arguments are decoded from UTF-8 when the the UTF-8 Mode is enabled with POSIX locale or C locale.	2018-08-28 12:35:44 +02:00
Ville Skyttä	61f82e0e33	Spelling fixes to docs, docstrings, and comments (GH-6374)	2018-04-20 16:08:45 -04:00
Alexey Izbyshev	b3b4a9d300	bpo-32869: Fix incorrect dst buffer size for MultiByteToWideChar (#5739 ) This function expects the destination buffer size to be given in wide characters, not bytes.	2018-02-18 19:57:24 +02:00
Alexey Izbyshev	c1e46e94de	bpo-32777: Fix _Py_set_inheritable async-safety in subprocess (GH-5560) Fix a rare but potential pre-exec child process deadlock in subprocess on POSIX systems when marking file descriptors inheritable on exec in the child process. This bug appears to have been introduced in 3.4 with the inheritable file descriptors support. This also changes Python/fileutils.c `set_inheritable` to use the "slow" two `fcntl` syscall path instead of the "fast" single `ioctl` syscall path when asked to be async signal safe (by way of being asked not to raise exceptions). `ioctl` is not a POSIX async-signal-safe approved function. ref: http://pubs.opengroup.org/onlinepubs/9699919799/functions/V2_chap02.html	2018-02-05 22:09:34 -08:00
Victor Stinner	9089a26591	bpo-29240: PyUnicode_DecodeLocale() uses UTF-8 on Android (#5272 ) PyUnicode_DecodeLocaleAndSize(), PyUnicode_DecodeLocale() and PyUnicode_EncodeLocale() now use always use the UTF-8 encoding on Android, instead of the current locale encoding. On Android API 19, mbstowcs() and wcstombs() are broken and cannot be used.	2018-01-22 19:07:32 +01:00
Victor Stinner	cb064fc232	bpo-31900: Fix localeconv() encoding for LC_NUMERIC (#4174 ) * Add _Py_GetLocaleconvNumeric() function: decode decimal_point and thousands_sep fields of localeconv() from the LC_NUMERIC encoding, rather than decoding from the LC_CTYPE encoding. * Modify locale.localeconv() and "n" formatter of str.format() (for int, float and complex to use _Py_GetLocaleconvNumeric() internally.	2018-01-15 15:58:02 +01:00
Victor Stinner	7ed7aead95	bpo-29240: Fix locale encodings in UTF-8 Mode (#5170 ) Modify locale.localeconv(), time.tzname, os.strerror() and other functions to ignore the UTF-8 Mode: always use the current locale encoding. Changes: * Add _Py_DecodeLocaleEx() and _Py_EncodeLocaleEx(). On decoding or encoding error, they return the position of the error and an error message which are used to raise Unicode errors in PyUnicode_DecodeLocale() and PyUnicode_EncodeLocale(). * Replace _Py_DecodeCurrentLocale() with _Py_DecodeLocaleEx(). * PyUnicode_DecodeLocale() now uses _Py_DecodeLocaleEx() for all cases, especially for the strict error handler. * Add _Py_DecodeUTF8Ex(): return more information on decoding error and supports the strict error handler. * Rename _Py_EncodeUTF8_surrogateescape() to _Py_EncodeUTF8Ex(). * Replace _Py_EncodeCurrentLocale() with _Py_EncodeLocaleEx(). * Ignore the UTF-8 mode to encode/decode localeconv(), strerror() and time zone name. * Remove PyUnicode_DecodeLocale(), PyUnicode_DecodeLocaleAndSize() and PyUnicode_EncodeLocale() now ignore the UTF-8 mode: always use the "current" locale. * Remove _PyUnicode_DecodeCurrentLocale(), _PyUnicode_DecodeCurrentLocaleAndSize() and _PyUnicode_EncodeCurrentLocale().	2018-01-15 10:45:49 +01:00
Victor Stinner	2cba6b8579	bpo-29240: readline now ignores the UTF-8 Mode (#5145 ) Add new fuctions ignoring the UTF-8 mode: * _Py_DecodeCurrentLocale() * _Py_EncodeCurrentLocale() * _PyUnicode_DecodeCurrentLocaleAndSize() * _PyUnicode_EncodeCurrentLocale() Modify the readline module to use these functions. Re-enable test_readline.test_nonascii().	2018-01-10 22:46:15 +01:00
Victor Stinner	9bee329130	bpo-32030: Add _Py_FindEnvConfigValue() (#4963 ) Add a new _Py_FindEnvConfigValue() function: code shared between Windows and Unix implementations of _PyPathConfig_Calculate() to read the pyenv.cfg file. _Py_FindEnvConfigValue() now uses _Py_DecodeUTF8_surrogateescape() instead of using a Python Unicode string, the Python API must not be used early during Python initialization. Same change in Unix search_for_exec_prefix(): use _Py_DecodeUTF8_surrogateescape(). Cleanup also encode_current_locale(): PyMem_RawFree/PyMem_Free can be called with NULL. Fix also "NUL byte" => "NULL byte" typo.	2017-12-21 16:49:13 +01:00
Victor Stinner	9dd762013f	bpo-32030: Add _Py_EncodeLocaleRaw() (#4961 ) Replace Py_EncodeLocale() with _Py_EncodeLocaleRaw() in: * _Py_wfopen() * _Py_wreadlink() * _Py_wrealpath() * _Py_wstat() * pymain_open_filename() These functions are called early during Python intialization, only the RAW memory allocator must be used.	2017-12-21 16:20:32 +01:00
Victor Stinner	e47e698da6	bpo-32030: Add _Py_EncodeUTF8_surrogateescape() (#4960 ) Py_EncodeLocale() now uses _Py_EncodeUTF8_surrogateescape(), instead of using temporary unicode and bytes objects. So Py_EncodeLocale() doesn't use the Python C API anymore.	2017-12-21 15:45:16 +01:00
Victor Stinner	9454060e84	bpo-29240, bpo-32030: Py_Main() re-reads config if encoding changes (#4899 ) bpo-29240, bpo-32030: If the encoding change (C locale coerced or UTF-8 Mode changed), Py_Main() now reads again the configuration with the new encoding. Changes: * Add _Py_UnixMain() called by main(). * Rename pymain_free_pymain() to pymain_clear_pymain(), it can now be called multipled times. * Rename pymain_parse_cmdline_envvars() to pymain_read_conf(). * Py_Main() now clears orig_argc and orig_argv at exit. * Remove argv_copy2, Py_Main() doesn't modify argv anymore. There is no need anymore to get two copies of the wchar_t** argv. * _PyCoreConfig: add coerce_c_locale and coerce_c_locale_warn. * Py_UTF8Mode is now initialized to -1. * Locale coercion (PEP 538) now respects -I and -E options.	2017-12-16 04:54:22 +01:00
Victor Stinner	d2b02310ac	bpo-29240: Don't define decode_locale() on macOS (#4895 ) Don't define decode_locale() nor encode_locale() on macOS or Android.	2017-12-15 23:06:17 +01:00
Victor Stinner	91106cd9ff	bpo-29240: PEP 540: Add a new UTF-8 Mode (#855 ) * Add -X utf8 command line option, PYTHONUTF8 environment variable and a new sys.flags.utf8_mode flag. * If the LC_CTYPE locale is "C" at startup: enable automatically the UTF-8 mode. * Add _winapi.GetACP(). encodings._alias_mbcs() now calls _winapi.GetACP() to get the ANSI code page * locale.getpreferredencoding() now returns 'UTF-8' in the UTF-8 mode. As a side effect, open() now uses the UTF-8 encoding by default in this mode. * Py_DecodeLocale() and Py_EncodeLocale() now use the UTF-8 encoding in the UTF-8 Mode. * Update subprocess._args_from_interpreter_flags() to handle -X utf8 * Skip some tests relying on the current locale if the UTF-8 mode is enabled. * Add test_utf8mode.py. * _Py_DecodeUTF8_surrogateescape() gets a new optional parameter to return also the length (number of wide characters). * pymain_get_global_config() and pymain_set_global_config() now always copy flag values, rather than only copying if the new value is greater than the old value.	2017-12-13 12:29:09 +01:00
Victor Stinner	8c663fd60e	Replace KB unit with KiB (#4293 ) kB (kilo byte) unit means 1000 bytes, whereas KiB ("kibibyte") means 1024 bytes. KB was misused: replace kB or KB with KiB when appropriate. Same change for MB and GB which become MiB and GiB. Change the output of Tools/iobench/iobench.py. Round also the size of the documentation from 5.5 MB to 5 MiB.	2017-11-08 14:44:44 -08:00
Antoine Pitrou	a6a4dc816d	bpo-31370: Remove support for threads-less builds (#3385 ) * Remove Setup.config * Always define WITH_THREAD for compatibility.	2017-09-07 18:56:24 +02:00
Serhiy Storchaka	f7eae0adfc	[security] bpo-13617: Reject embedded null characters in wchar* strings. (#2302 ) Based on patch by Victor Stinner. Add private C API function _PyUnicode_AsUnicode() which is similar to PyUnicode_AsUnicode(), but checks for null characters.	2017-06-28 08:30:06 +03:00
Victor Stinner	0f6d73343d	bpo-29619: Convert st_ino using unsigned integer (#557 ) bpo-29619: os.stat() and os.DirEntry.inodeo() now convert inode (st_ino) using unsigned integers.	2017-03-09 17:34:28 +01:00
Xavier de Gaye	76febd0792	Issue #26919 : On Android, operating system data is now always encoded/decoded to/from UTF-8, instead of the locale encoding to avoid inconsistencies with os.fsencode() and os.fsdecode() which are already using UTF-8.	2016-12-15 20:59:58 +01:00
Xavier de Gaye	ec5d3cd533	Issue #28746 : Fix the set_inheritable() file descriptor method on platforms that do not have the ioctl FIOCLEX and FIONCLEX commands	2016-11-19 16:19:29 +01:00
Victor Stinner	54de2b1edd	Fix check_force_ascii() Issue #27938: Normalize aliases of the ASCII encoding, because _Py_normalize_encoding() now correctly normalize encoding names.	2016-09-09 23:11:52 -07:00
Steve Dower	940f33a50f	Issue #23524 : Finish removing _PyVerify_fd from sources	2016-09-08 11:21:54 -07:00

1 2 3

131 Commits