Commit Graph

106 Commits

Author SHA1 Message Date
Victor Stinner 3d4226a832
bpo-34523: Support surrogatepass in locale codecs (GH-8995)
Add support for the "surrogatepass" error handler in
PyUnicode_DecodeFSDefault() and PyUnicode_EncodeFSDefault()
for the UTF-8 encoding.

Changes:

* _Py_DecodeUTF8Ex() and _Py_EncodeUTF8Ex() now support the
  surrogatepass error handler (_Py_ERROR_SURROGATEPASS).
* _Py_DecodeLocaleEx() and _Py_EncodeLocaleEx() now use
  the _Py_error_handler enum instead of "int surrogateescape" to pass
  the error handler. These functions now return -3 if the error
  handler is unknown.
* Add unit tests on _Py_DecodeLocaleEx() and _Py_EncodeLocaleEx()
  in test_codecs.
* Rename get_error_handler() to _Py_GetErrorHandler() and expose it
  as a private function.
* _freeze_importlib doesn't need config.filesystem_errors="strict"
  workaround anymore.
2018-08-29 22:21:32 +02:00
Victor Stinner c5989cd876
bpo-34523: Py_DecodeLocale() use UTF-8 on Windows (GH-8998)
Py_DecodeLocale() and Py_EncodeLocale() now use the UTF-8 encoding on
Windows if Py_LegacyWindowsFSEncodingFlag is zero.

pymain_read_conf() now sets Py_LegacyWindowsFSEncodingFlag in its
loop, but restore its value at exit.
2018-08-29 19:32:47 +02:00
Victor Stinner d500e5307a
bpo-34403: On HP-UX, force ASCII for C locale (GH-8969)
On HP-UX with C or POSIX locale, sys.getfilesystemencoding() now returns
"ascii" instead of "roman8" (when the UTF-8 Mode is disabled and the C locale
is not coerced).

nl_langinfo(CODESET) announces "roman8" whereas it uses the Latin1
encoding in practice.
2018-08-28 17:27:36 +02:00
Victor Stinner 5cb258950c
bpo-34527: POSIX locale enables the UTF-8 Mode (GH-8972)
* The UTF-8 Mode is now also enabled by the "POSIX" locale, not only
  by the "C" locale.
* On FreeBSD, Py_DecodeLocale() and Py_EncodeLocale() now also forces
  the ASCII encoding if the LC_CTYPE locale is "POSIX", not only if
  the LC_CTYPE locale is "C".
* test_utf8_mode.test_cmd_line() checks also that the command line
  arguments are decoded from UTF-8 when the the UTF-8 Mode is enabled
  with POSIX locale or C locale.
2018-08-28 12:35:44 +02:00
Ville Skyttä 61f82e0e33 Spelling fixes to docs, docstrings, and comments (GH-6374) 2018-04-20 16:08:45 -04:00
Alexey Izbyshev b3b4a9d300 bpo-32869: Fix incorrect dst buffer size for MultiByteToWideChar (#5739)
This function expects the destination buffer size to be given
in wide characters, not bytes.
2018-02-18 19:57:24 +02:00
Alexey Izbyshev c1e46e94de bpo-32777: Fix _Py_set_inheritable async-safety in subprocess (GH-5560)
Fix a rare but potential pre-exec child process deadlock in subprocess on POSIX systems when marking file descriptors inheritable on exec in the child process.  This bug appears to have been introduced in 3.4 with the inheritable file descriptors support.

This also changes Python/fileutils.c `set_inheritable` to use the "slow" two `fcntl` syscall path instead of the "fast" single `ioctl` syscall path when asked to be async signal safe (by way of being asked not to raise exceptions).  `ioctl` is not a POSIX async-signal-safe approved function.

ref: http://pubs.opengroup.org/onlinepubs/9699919799/functions/V2_chap02.html
2018-02-05 22:09:34 -08:00
Victor Stinner 9089a26591
bpo-29240: PyUnicode_DecodeLocale() uses UTF-8 on Android (#5272)
PyUnicode_DecodeLocaleAndSize(), PyUnicode_DecodeLocale() and
PyUnicode_EncodeLocale() now use always use the UTF-8 encoding on
Android, instead of the current locale encoding.

On Android API 19, mbstowcs() and wcstombs() are broken and cannot be
used.
2018-01-22 19:07:32 +01:00
Victor Stinner cb064fc232
bpo-31900: Fix localeconv() encoding for LC_NUMERIC (#4174)
* Add _Py_GetLocaleconvNumeric() function: decode decimal_point and
  thousands_sep fields of localeconv() from the LC_NUMERIC encoding,
  rather than decoding from the LC_CTYPE encoding.
* Modify locale.localeconv() and "n" formatter of str.format() (for
  int, float and complex to use _Py_GetLocaleconvNumeric()
  internally.
2018-01-15 15:58:02 +01:00
Victor Stinner 7ed7aead95
bpo-29240: Fix locale encodings in UTF-8 Mode (#5170)
Modify locale.localeconv(), time.tzname, os.strerror() and other
functions to ignore the UTF-8 Mode: always use the current locale
encoding.

Changes:

* Add _Py_DecodeLocaleEx() and _Py_EncodeLocaleEx(). On decoding or
  encoding error, they return the position of the error and an error
  message which are used to raise Unicode errors in
  PyUnicode_DecodeLocale() and PyUnicode_EncodeLocale().
* Replace _Py_DecodeCurrentLocale() with _Py_DecodeLocaleEx().
* PyUnicode_DecodeLocale() now uses _Py_DecodeLocaleEx() for all
  cases, especially for the strict error handler.
* Add _Py_DecodeUTF8Ex(): return more information on decoding error
  and supports the strict error handler.
* Rename _Py_EncodeUTF8_surrogateescape() to _Py_EncodeUTF8Ex().
* Replace _Py_EncodeCurrentLocale() with _Py_EncodeLocaleEx().
* Ignore the UTF-8 mode to encode/decode localeconv(), strerror()
  and time zone name.
* Remove PyUnicode_DecodeLocale(), PyUnicode_DecodeLocaleAndSize()
  and PyUnicode_EncodeLocale() now ignore the UTF-8 mode: always use
  the "current" locale.
* Remove _PyUnicode_DecodeCurrentLocale(),
  _PyUnicode_DecodeCurrentLocaleAndSize() and
  _PyUnicode_EncodeCurrentLocale().
2018-01-15 10:45:49 +01:00
Victor Stinner 2cba6b8579
bpo-29240: readline now ignores the UTF-8 Mode (#5145)
Add new fuctions ignoring the UTF-8 mode:

* _Py_DecodeCurrentLocale()
* _Py_EncodeCurrentLocale()
* _PyUnicode_DecodeCurrentLocaleAndSize()
* _PyUnicode_EncodeCurrentLocale()

Modify the readline module to use these functions.

Re-enable test_readline.test_nonascii().
2018-01-10 22:46:15 +01:00
Victor Stinner 9bee329130
bpo-32030: Add _Py_FindEnvConfigValue() (#4963)
Add a new _Py_FindEnvConfigValue() function: code shared between
Windows and Unix implementations of _PyPathConfig_Calculate() to read
the pyenv.cfg file.

_Py_FindEnvConfigValue() now uses _Py_DecodeUTF8_surrogateescape()
instead of using a Python Unicode string, the Python API must not be
used early during Python initialization. Same change in Unix
search_for_exec_prefix(): use _Py_DecodeUTF8_surrogateescape().

Cleanup also encode_current_locale(): PyMem_RawFree/PyMem_Free can be
called with NULL.

Fix also "NUL byte" => "NULL byte" typo.
2017-12-21 16:49:13 +01:00
Victor Stinner 9dd762013f
bpo-32030: Add _Py_EncodeLocaleRaw() (#4961)
Replace Py_EncodeLocale() with _Py_EncodeLocaleRaw() in:

* _Py_wfopen()
* _Py_wreadlink()
* _Py_wrealpath()
* _Py_wstat()
* pymain_open_filename()

These functions are called early during Python intialization, only
the RAW memory allocator must be used.
2017-12-21 16:20:32 +01:00
Victor Stinner e47e698da6
bpo-32030: Add _Py_EncodeUTF8_surrogateescape() (#4960)
Py_EncodeLocale() now uses _Py_EncodeUTF8_surrogateescape(), instead
of using temporary unicode and bytes objects. So Py_EncodeLocale()
doesn't use the Python C API anymore.
2017-12-21 15:45:16 +01:00
Victor Stinner 9454060e84
bpo-29240, bpo-32030: Py_Main() re-reads config if encoding changes (#4899)
bpo-29240, bpo-32030: If the encoding change (C locale coerced or
UTF-8 Mode changed), Py_Main() now reads again the configuration with
the new encoding.

Changes:

* Add _Py_UnixMain() called by main().
* Rename pymain_free_pymain() to pymain_clear_pymain(), it can now be
  called multipled times.
* Rename pymain_parse_cmdline_envvars() to pymain_read_conf().
* Py_Main() now clears orig_argc and orig_argv at exit.
* Remove argv_copy2, Py_Main() doesn't modify argv anymore. There is
  no need anymore to get two copies of the wchar_t** argv.
* _PyCoreConfig: add coerce_c_locale and coerce_c_locale_warn.
* Py_UTF8Mode is now initialized to -1.
* Locale coercion (PEP 538) now respects -I and -E options.
2017-12-16 04:54:22 +01:00
Victor Stinner d2b02310ac
bpo-29240: Don't define decode_locale() on macOS (#4895)
Don't define decode_locale() nor encode_locale() on macOS or Android.
2017-12-15 23:06:17 +01:00
Victor Stinner 91106cd9ff
bpo-29240: PEP 540: Add a new UTF-8 Mode (#855)
* Add -X utf8 command line option, PYTHONUTF8 environment variable
  and a new sys.flags.utf8_mode flag.
* If the LC_CTYPE locale is "C" at startup: enable automatically the
  UTF-8 mode.
* Add _winapi.GetACP(). encodings._alias_mbcs() now calls
  _winapi.GetACP() to get the ANSI code page
* locale.getpreferredencoding() now returns 'UTF-8' in the UTF-8
  mode. As a side effect, open() now uses the UTF-8 encoding by
  default in this mode.
* Py_DecodeLocale() and Py_EncodeLocale() now use the UTF-8 encoding
  in the UTF-8 Mode.
* Update subprocess._args_from_interpreter_flags() to handle -X utf8
* Skip some tests relying on the current locale if the UTF-8 mode is
  enabled.
* Add test_utf8mode.py.
* _Py_DecodeUTF8_surrogateescape() gets a new optional parameter to
  return also the length (number of wide characters).
* pymain_get_global_config() and pymain_set_global_config() now
  always copy flag values, rather than only copying if the new value
  is greater than the old value.
2017-12-13 12:29:09 +01:00
Victor Stinner 8c663fd60e
Replace KB unit with KiB (#4293)
kB (*kilo* byte) unit means 1000 bytes, whereas KiB ("kibibyte")
means 1024 bytes. KB was misused: replace kB or KB with KiB when
appropriate.

Same change for MB and GB which become MiB and GiB.

Change the output of Tools/iobench/iobench.py.

Round also the size of the documentation from 5.5 MB to 5 MiB.
2017-11-08 14:44:44 -08:00
Antoine Pitrou a6a4dc816d bpo-31370: Remove support for threads-less builds (#3385)
* Remove Setup.config
* Always define WITH_THREAD for compatibility.
2017-09-07 18:56:24 +02:00
Serhiy Storchaka f7eae0adfc [security] bpo-13617: Reject embedded null characters in wchar* strings. (#2302)
Based on patch by Victor Stinner.

Add private C API function _PyUnicode_AsUnicode() which is similar to
PyUnicode_AsUnicode(), but checks for null characters.
2017-06-28 08:30:06 +03:00
Victor Stinner 0f6d73343d bpo-29619: Convert st_ino using unsigned integer (#557)
bpo-29619: os.stat() and os.DirEntry.inodeo() now convert inode
(st_ino) using unsigned integers.
2017-03-09 17:34:28 +01:00
Xavier de Gaye 76febd0792 Issue #26919: On Android, operating system data is now always encoded/decoded
to/from UTF-8, instead of the locale encoding to avoid inconsistencies with
os.fsencode() and os.fsdecode() which are already using UTF-8.
2016-12-15 20:59:58 +01:00
Xavier de Gaye ec5d3cd533 Issue #28746: Fix the set_inheritable() file descriptor method on platforms
that do not have the ioctl FIOCLEX and FIONCLEX commands
2016-11-19 16:19:29 +01:00
Victor Stinner 54de2b1edd Fix check_force_ascii()
Issue #27938: Normalize aliases of the ASCII encoding, because
_Py_normalize_encoding() now correctly normalize encoding names.
2016-09-09 23:11:52 -07:00
Steve Dower 940f33a50f Issue #23524: Finish removing _PyVerify_fd from sources 2016-09-08 11:21:54 -07:00
Martin Panter 3e04d5b306 Issue #27076: Merge spelling from 3.5 2016-05-26 06:03:19 +00:00
Martin Panter 46f50726a0 Issue #27076: Doc, comment and tests spelling fixes
Most fixes to Doc/ and Lib/ directories by Ville Skyttä.
2016-05-26 05:35:26 +00:00
Victor Stinner 4a3443be43 Merge 3.5 (issue #27057) 2016-05-19 16:48:06 +02:00
Victor Stinner 3116cc44af Fix os.set_inheritable() on Android
Issue #27057: Fix os.set_inheritable() on Android, ioctl() is blocked by
SELinux and fails with EACCESS. The function now falls back to fcntl().

Patch written by Michał Bednarski.
2016-05-19 16:46:18 +02:00
Victor Stinner a858bbde03 Avoid fcntl() if possible in set_inheritable()
Issue #26770: set_inheritable() avoids calling fcntl() twice if the FD_CLOEXEC
is already set/cleared. This change only impacts platforms using the fcntl()
implementation of set_inheritable() (not Linux nor Windows).
2016-04-17 16:51:52 +02:00
Victor Stinner 8a1be61849 Add more checks on the GIL
Issue #10915, #15751, #26558:

* PyGILState_Check() now returns 1 (success) before the creation of the GIL and
  after the destruction of the GIL. It allows to use the function early in
  Python initialization and late in Python finalization.
* Add a flag to disable PyGILState_Check(). Disable PyGILState_Check() when
  Py_NewInterpreter() is called
* Add assert(PyGILState_Check()) to: _Py_dup(), _Py_fstat(), _Py_read()
  and _Py_write()
2016-03-14 22:07:55 +01:00
Martin Panter 6f9b010242 Fix a couple of typos in code comments 2015-12-17 10:18:28 +00:00
Victor Stinner bc5b80bac1 Close #24784: Fix compilation without thread support
Add "#ifdef WITH_THREAD" around cals to:

* PyGILState_Check()
* _PyImport_AcquireLock()
* _PyImport_ReleaseLock()
2015-10-11 09:54:42 +02:00
Steve Dower 8fc8980c96 Issue #23524: Replace _PyVerify_fd function with calls to _set_thread_local_invalid_parameter_handler. 2015-04-12 00:26:27 -04:00
Victor Stinner 6f4fae8a95 Issue #23836: Document functions releasing the GIL in fileutils.c 2015-04-01 18:34:32 +02:00
Victor Stinner 82c3e4599d Issue #23836: Add _Py_write_noraise() function
Helper to write() which retries write() if it is interrupted by a signal (fails
with EINTR).
2015-04-01 18:34:45 +02:00
Victor Stinner e134a7fe36 Issue #23752: _Py_fstat() is now responsible to raise the Python exception
Add _Py_fstat_noraise() function when a Python exception is not welcome.
2015-03-30 10:09:31 +02:00
Victor Stinner 91afbb6088 Issue #23753: Move _Py_wstat() from Python/fileutils.c to Modules/getpath.c
I expected more users of _Py_wstat(), but in practice it's only used by
Modules/getpath.c. Move the function because it's not needed on Windows.
Windows uses PC/getpathp.c which uses the Win32 API (ex: GetFileAttributesW())
not the POSIX API.
2015-03-24 12:16:28 +01:00
Victor Stinner f329878e74 Issue #23753: Python doesn't support anymore platforms without stat() or
fstat(), these functions are always required.

Remove HAVE_STAT and HAVE_FSTAT defines, and stop supporting DONT_HAVE_STAT and
DONT_HAVE_FSTAT.
2015-03-24 10:27:50 +01:00
Victor Stinner a3c0202eb5 Issue #23708: Save/restore errno in _Py_read() and _Py_write()
Save and then restore errno because PyErr_CheckSignals() and
PyErr_SetFromErrno() can modify it.
2015-03-20 11:58:18 +01:00
Victor Stinner 7f04d4d4b7 Issue #23708: Split assertion expression in two assertions in _Py_read() and
_Py_write() to know which test failed on the buildbot "AMD64 Snow Leop 3.x".
2015-03-20 11:21:41 +01:00
Victor Stinner c1cf4f7ef9 Issue #23708: Fix _Py_read() compilation error on Windows
Fix typo: self->fd => fd
2015-03-19 23:53:04 +01:00
Victor Stinner 66aab0c4b5 Issue #23708: Add _Py_read() and _Py_write() functions to factorize code handle
EINTR error and special cases for Windows.

These functions now truncate the length to PY_SSIZE_T_MAX to have a portable
and reliable behaviour. For example, read() result is undefined if counter is
greater than PY_SSIZE_T_MAX on Linux.
2015-03-19 22:53:20 +01:00
Victor Stinner a47fc5c2dd Issue #23694: Handle EINTR in _Py_open() and _Py_fopen_obj()
Retry open()/fopen() if it fails with EINTR and the Python signal handler
doesn't raise an exception.
2015-03-18 09:52:54 +01:00
Victor Stinner e42ccd2bfd Issue #23694: Enhance _Py_fopen(), it now raises an exception on error
* If fopen() fails, OSError is raised with the original filename object.
* The GIL is now released while calling fopen()
2015-03-18 01:39:23 +01:00
Victor Stinner a555cfcb73 Issue #23694: Enhance _Py_open(), it now raises exceptions
* _Py_open() now raises exceptions on error. If open() fails, it raises an
  OSError with the filename.
* _Py_open() now releases the GIL while calling open()
* Add _Py_open_noraise() when _Py_open() cannot be used because the GIL is not
  held
2015-03-18 00:22:14 +01:00
Steve Dower 9aa31d5479 Fixes incorrect use of GetLastError where errno should be used. 2015-03-14 11:39:18 -07:00
Steve Dower 41e7244c06 Fixes incorrect use of GetLastError where errno should be used. 2015-03-14 11:38:27 -07:00
Steve Dower 8acde7dcce Issue #23524: Change back to using Windows errors for _Py_fstat instead of the errno shim. 2015-03-07 18:14:07 -08:00
Steve Dower d81431f587 Issue #23524: Replace _PyVerify_fd function with calling _set_thread_local_invalid_parameter_handler on every thread. 2015-03-06 14:47:02 -08:00