Commit Graph

1463 Commits

Author SHA1 Message Date
Steve Dower 7ebdda0dbe
bpo-36311: Fixes decoding multibyte characters around chunk boundaries and improves decoding performance (GH-15083) 2019-08-21 16:22:33 -07:00
Jeroen Demeyer 196a530e00 bpo-37483: add _PyObject_CallOneArg() function (#14558) 2019-07-04 19:31:34 +09:00
Victor Stinner ed076ed467
bpo-37388: Add PyUnicode_Decode(str, 0) fast-path (GH-14385)
Add a fast-path to PyUnicode_Decode() for size equals to 0.
2019-06-26 01:49:32 +02:00
Victor Stinner 22eb689cf3
bpo-37388: Development mode check encoding and errors (GH-14341)
In development mode and in debug build, encoding and errors arguments
are now checked on string encoding and decoding operations. Examples:
open(), str.encode() and bytes.decode().

By default, for best performances, the errors argument is only
checked at the first encoding/decoding error, and the encoding
argument is sometimes ignored for empty strings.
2019-06-26 00:51:05 +02:00
Serhiy Storchaka 894263ba80
bpo-24214: Fixed the UTF-8 and UTF-16 incremental decoders. (GH-14304)
* The UTF-8 incremental decoders fails now fast if encounter
  a sequence that can't be handled by the error handler.
* The UTF-16 incremental decoders with the surrogatepass error
  handler decodes now a lone low surrogate with final=False.
2019-06-25 11:54:18 +03:00
Inada Naoki 770847a7db
bpo-37348: optimize decoding ASCII string (GH-14283)
`_PyUnicode_Writer` is a relatively complex structure.  Initializing it is significant overhead when decoding short ASCII string.
2019-06-24 12:30:24 +09:00
Victor Stinner b45d259bdd
bpo-36710: Use tstate in pylifecycle.c (GH-14249)
In pylifecycle.c: pass tstate argument, rather than interp argument,
to functions.
2019-06-20 00:05:23 +02:00
Jeroen Demeyer 530f506ac9 bpo-36974: tp_print -> tp_vectorcall_offset and tp_reserved -> tp_as_async (GH-13464)
Automatically replace
tp_print -> tp_vectorcall_offset
tp_compare -> tp_as_async
tp_reserved -> tp_as_async
2019-05-30 19:13:39 -07:00
Inada Naoki 7d408697a9
remove unnecessary tp_dealloc (GH-13647) 2019-05-29 17:23:27 +09:00
Victor Stinner 331a6a56e9
bpo-36763: Implement the PEP 587 (GH-13592)
* Add a whole new documentation page:
  "Python Initialization Configuration"
* PyWideStringList_Append() return type is now PyStatus,
  instead of int
* PyInterpreterState_New() now calls PyConfig_Clear() if
  PyConfig_InitPythonConfig() fails.
* Rename files:

  * Python/coreconfig.c => Python/initconfig.c
  * Include/cpython/coreconfig.h => Include/cpython/initconfig.h
  * Include/internal/: pycore_coreconfig.h => pycore_initconfig.h

* Rename structures

  * _PyCoreConfig => PyConfig
  * _PyPreConfig => PyPreConfig
  * _PyInitError => PyStatus
  * _PyWstrList => PyWideStringList

* Rename PyConfig fields:

  * use_module_search_paths => module_search_paths_set
  * module_search_path_env => pythonpath_env

* Rename PyStatus field: _func => func
* PyInterpreterState: rename core_config field to config
* Rename macros and functions:

  * _PyCoreConfig_SetArgv() => PyConfig_SetBytesArgv()
  * _PyCoreConfig_SetWideArgv() => PyConfig_SetArgv()
  * _PyCoreConfig_DecodeLocale() => PyConfig_SetBytesString()
  * _PyInitError_Failed() => PyStatus_Exception()
  * _Py_INIT_ERROR_TYPE_xxx enums => _PyStatus_TYPE_xxx
  * _Py_UnixMain() => Py_BytesMain()
  * _Py_ExitInitError() => Py_ExitStatusException()
  * _Py_PreInitializeFromArgs() => Py_PreInitializeFromBytesArgs()
  * _Py_PreInitializeFromWideArgs() => Py_PreInitializeFromArgs()
  * _Py_PreInitialize() => Py_PreInitialize()
  * _Py_RunMain() => Py_RunMain()
  * _Py_InitializeFromConfig() => Py_InitializeFromConfig()
  * _Py_INIT_XXX() => _PyStatus_XXX()
  * _Py_INIT_FAILED() => _PyStatus_EXCEPTION()

* Rename 'err' PyStatus variables to 'status'
* Convert RUN_CODE() macro to config_run_code() static inline function
* Remove functions:

  * _Py_InitializeFromArgs()
  * _Py_InitializeFromWideArgs()
  * _PyInterpreterState_GetCoreConfig()
2019-05-27 16:39:22 +02:00
Zackery Spytz 14514d9084 bpo-36946: Fix possible signed integer overflow when handling slices. (GH-13375)
The final addition (cur += step) may overflow, so use size_t for "cur".
"cur" is always positive (even for negative steps), so it is safe to use
size_t here.

Co-Authored-By: Martin Panter <vadmium+py@gmail.com>
2019-05-17 10:13:03 +03:00
Zackery Spytz 1a2252ed39 bpo-36594: Fix incorrect use of %p in format strings (GH-12769)
In addition, fix some other minor violations of C99.
2019-05-06 12:56:50 -04:00
Victor Stinner 709d23dee6
bpo-36775: _PyCoreConfig only uses wchar_t* (GH-13062)
_PyCoreConfig: Change filesystem_encoding, filesystem_errors,
stdio_encoding and stdio_errors fields type from char* to wchar_t*.

Changes:

* PyInterpreterState: replace fscodec_initialized (int) with fs_codec
  structure.
* Add get_error_handler_wide() and unicode_encode_utf8() helper
  functions.
* Add error_handler parameter to unicode_encode_locale()
  and unicode_decode_locale().
* Remove _PyCoreConfig_SetString().
* Rename _PyCoreConfig_SetWideString() to _PyCoreConfig_SetString().
* Rename _PyCoreConfig_SetWideStringFromString()
  to _PyCoreConfig_DecodeLocale().
2019-05-02 14:56:30 -04:00
Victor Stinner 43fc3bb7cf
bpo-36775: Add _PyUnicode_InitEncodings() (GH-13057)
Move get_codec_name() and initfsencoding() from pylifecycle.c to
unicodeobject.c.

Rename also "init" functions in pylifecycle.c.
2019-05-02 11:54:20 -04:00
Victor Stinner e251095a3f
bpo-36775: Add _Py_FORCE_UTF8_FS_ENCODING macro (GH-13056)
Add _Py_FORCE_UTF8_LOCALE and _Py_FORCE_UTF8_FS_ENCODING macros to
avoid factorize "#if defined(__ANDROID__) || defined(__VXWORKS__)"
and "#if defined(__APPLE__)".

Cleanup also config_init_fs_encoding().
2019-05-02 11:28:57 -04:00
Victor Stinner 0fc91eef34
bpo-36389: Add _PyObject_CheckConsistency() function (GH-12803)
Add a new _PyObject_CheckConsistency() function which can be used to
help debugging. The function is available in release mode.

Add a 'check_content' parameter to _PyDict_CheckConsistency().
2019-04-12 21:51:34 +02:00
Kingsley M b015fc86f7 bpo-36549: str.capitalize now titlecases the first character instead of uppercasing it (GH-12804) 2019-04-12 08:35:39 -07:00
Serhiy Storchaka 7a465cb5ee
bpo-24214: Fixed the UTF-8 incremental decoder. (GH-12603)
The bug occurred when the encoded surrogate character is passed
to the incremental decoder in two chunks.
2019-03-30 08:23:38 +02:00
Serhiy Storchaka c1e2c288f4
bpo-36312: Fix decoders for some code pages. (GH-12369) 2019-03-20 21:45:18 +02:00
Victor Stinner fecc4f2b47
bpo-36356: Release Unicode interned strings on Valgrind (#12431)
When Python is compiled with Valgrind support, release Unicode
interned strings at exit in _PyUnicode_Fini().

* Rename _Py_ReleaseInternedUnicodeStrings() to
  unicode_release_interned() and make it private.
* unicode_release_interned() is now called from _PyUnicode_Fini():
  it must be called with a running Python thread state for TRASHCAN,
  it cannot be called from pymain_free().
* Don't display statistics on interned strings at exit anymore
2019-03-19 14:20:29 +01:00
Victor Stinner 5f9cf23502
bpo-36301: Error if decoding pybuilddir.txt fails (GH-12422)
Python initialization now fails if decoding pybuilddir.txt
configuration file fails at startup.

_PyPathConfig_Calculate() now reports memory allocation failure and
decoding error on decoding pybuilddir.txt content from
UTF-8/surrogateescape.
2019-03-19 01:46:25 +01:00
Inada Naoki 6a16b18224
bpo-36297: remove "unicode_internal" codec (GH-12342) 2019-03-18 15:44:11 +09:00
Victor Stinner 6d43f6f081
bpo-35713: Split _Py_InitializeCore into subfunctions (GH-11650)
* Split _Py_InitializeCore_impl() into subfunctions: add multiple pycore_init_xxx() functions
* Preliminary sys.stderr is now set earlier to get an usable
  sys.stderr ealier.
* Move code into _Py_Initialize_ReconfigureCore() to be able to call
  it from _Py_InitializeCore().
* Split _PyExc_Init(): create a new _PyBuiltins_AddExceptions()
  function.
* Call _PyExc_Init() earlier in _Py_InitializeCore_impl()
  and new_interpreter() to get working exceptions earlier.
* _Py_ReadyTypes() now returns _PyInitError rather than calling
  Py_FatalError().
* Misc code cleanup
2019-01-22 21:18:05 +01:00
Victor Stinner bf4ac2d2fd
bpo-35713: Rework Python initialization (GH-11647)
* The PyByteArray_Init() and PyByteArray_Fini() functions have been
  removed. They did nothing since Python 2.7.4 and Python 3.2.0, were
  excluded from the limited API (stable ABI), and were not
  documented.
* Move "_PyXXX_Init()" and "_PyXXX_Fini()" declarations from
  Include/cpython/pylifecycle.h to
  Include/internal/pycore_pylifecycle.h. Replace
  "PyAPI_FUNC(TYPE)" with "extern TYPE".
* _PyExc_Init() now returns an error on failure rather than calling
  Py_FatalError(). Move macros inside _PyExc_Init() and undefine them
  when done. Rewrite macros to make them look more like statement:
  add ";" when using them, add "do { ... } while (0)".
* _PyUnicode_Init() now returns a _PyInitError error rather than call
  Py_FatalError().
* Move stdin check from _PySys_BeginInit() to init_sys_streams().
* _Py_ReadyTypes() now returns a _PyInitError error rather than
  calling Py_FatalError().
2019-01-22 17:39:03 +01:00
Serhiy Storchaka d586ccb04f
bpo-35552: Fix reading past the end in PyUnicode_FromFormat() and PyBytes_FromFormat(). (GH-11276)
Format characters "%s" and "%V" in PyUnicode_FromFormat() and "%s" in PyBytes_FromFormat()
no longer read memory past the limit if precision is specified.
2019-01-12 10:30:35 +02:00
Xtreak 3f7983a25a bpo-35560: Remove assertion from format(float, "n") (GH-11288)
Fix an assertion error in format() in debug build for floating point
formatting with "n" format, zero padding and small width. Release build is
not impacted. Patch by Karthikeyan Singaravelan.
2019-01-07 16:09:14 +01:00
animalize a1d1425306 bpo-35636: Remove redundant check in unicode_hash(). (GH-11402)
_Py_HashBytes() does the check for empty string.
2019-01-02 14:16:06 +02:00
Serhiy Storchaka bb86bf4c4e
bpo-35444: Unify and optimize the helper for getting a builtin object. (GH-11047)
This speeds up pickling of some iterators.

This fixes also error handling in pickling methods when fail to
look up builtin "getattr".
2018-12-11 08:28:18 +02:00
Serhiy Storchaka eeb719eac6
bpo-35365: Use a wchar_t* buffer in the code page decoder. (GH-10837) 2018-12-04 10:25:50 +02:00
Serhiy Storchaka 4013c17911
bpo-35372: Fix the code page decoder for input > 2 GiB. (GH-10848) 2018-12-03 10:36:45 +02:00
Victor Stinner bde9d6bbb4
bpo-34523, bpo-35322: Fix unicode_encode_locale() (GH-10759)
Fix memory leak in PyUnicode_EncodeLocale() and
PyUnicode_EncodeFSDefault() on error handling.

Changes:

* Fix unicode_encode_locale() error handling
* Fix test_codecs.LocaleCodecTest
2018-11-28 10:26:20 +01:00
Victor Stinner 163403a63e
bpo-33954: Fix compiler warning in _PyUnicode_FastFill() (GH-10737)
'data' argument of unicode_fill() is modified, so it must not be
constant.

Add more assertions to unicode_fill(): check the maximum character
value.
2018-11-27 12:41:17 +01:00
Serhiy Storchaka 62be74290a
bpo-33012: Fix invalid function cast warnings with gcc 8. (GH-6749)
Fix invalid function cast warnings with gcc 8
for method conventions different from METH_NOARGS, METH_O and
METH_VARARGS excluding Argument Clinic generated code.
2018-11-27 13:27:31 +02:00
Victor Stinner 59423e3ddd
bpo-33954: Fix _PyUnicode_InsertThousandsGrouping() (GH-10623)
Fix str.format(), float.__format__() and complex.__format__() methods
for non-ASCII decimal point when using the "n" formatter.

Changes:

* Rewrite _PyUnicode_InsertThousandsGrouping(): it now requires
  a _PyUnicodeWriter object for the buffer and a Python str object
  for digits.
* Rename FILL() macro to unicode_fill(), convert it to static inline function,
  add "assert(0 <= start);" and rework its code.
2018-11-26 13:40:01 +01:00
Victor Stinner a42de742e7
bpo-35059: Cast void* to PyObject* (GH-10650)
Don't pass void* to Python macros: use _PyObject_CAST().
2018-11-22 10:25:22 +01:00
Victor Stinner bcda8f1d42
bpo-35081: Add Include/internal/pycore_object.h (GH-10640)
Move _PyObject_GC_TRACK() and _PyObject_GC_UNTRACK() from
Include/objimpl.h to Include/internal/pycore_object.h.
2018-11-21 22:27:47 +01:00
Gregory P. Smith 746b2d35ea
bpo-35214: Fix OOB memory access in unicode escape parser (GH-10506)
Discovered using clang's MemorySanitizer when it ran python3's
test_fstring test_misformed_unicode_character_name.

An msan build will fail by simply executing: ./python -c 'u"\N"'
2018-11-13 13:16:54 -08:00
Victor Stinner 621cebe81b
bpo-35081: Rename internal headers (GH-10275)
Rename Include/internal/ headers:

* pycore_hash.h -> pycore_pyhash.h
* pycore_lifecycle.h -> pycore_pylifecycle.h
* pycore_mem.h -> pycore_pymem.h
* pycore_state.h -> pycore_pystate.h

Add missing headers to Makefile.pre.in and PCbuild:

* pycore_condvar.h.
* pycore_hamt.h
* pycore_pyhash.h
2018-11-12 16:53:38 +01:00
Victor Stinner 9fc57a3848
bpo-35081: Add pycore_fileutils.h (GH-10371)
Move Py_BUILD_CORE code from Include/fileutils.h to a new
Include/internal/pycore_fileutils.h file.
2018-11-07 00:44:03 +01:00
Victor Stinner 27e2d1f219
bpo-35081: Add pycore_ prefix to internal header files (GH-10263)
* Rename Include/internal/ header files:

  * pyatomic.h -> pycore_atomic.h
  * ceval.h -> pycore_ceval.h
  * condvar.h -> pycore_condvar.h
  * context.h -> pycore_context.h
  * pygetopt.h -> pycore_getopt.h
  * gil.h -> pycore_gil.h
  * hamt.h -> pycore_hamt.h
  * hash.h -> pycore_hash.h
  * mem.h -> pycore_mem.h
  * pystate.h -> pycore_state.h
  * warnings.h -> pycore_warnings.h

* PCbuild project, Makefile.pre.in, Modules/Setup: add the
  Include/internal/ directory to the search paths of header files.
* Update includes. For example, replace #include "internal/mem.h"
  with #include "pycore_mem.h".
2018-11-01 00:52:28 +01:00
Victor Stinner 50fe3f8913
bpo-9263: _PyXXX_CheckConsistency() use _PyObject_ASSERT() (GH-10108)
Use _PyObject_ASSERT() in:

* _PyDict_CheckConsistency()
* _PyType_CheckConsistency()
* _PyUnicode_CheckConsistency()

_PyObject_ASSERT() dumps the faulty object if the assertion fails
to help debugging.
2018-10-26 18:47:15 +02:00
Serhiy Storchaka c46db9232f
bpo-30863: Rewrite PyUnicode_AsWideChar() and PyUnicode_AsWideCharString(). (GH-2599)
They no longer cache the wchar_t* representation of string objects.
2018-10-23 22:58:24 +03:00
Emanuele Gaifas fc8205cb4b Add missing closing quote and trailing period in str.isidentifier() docstring (GH-9756)
This rectifies commit ffc5a14d00.
2018-10-08 16:14:47 +05:30
Sanyam Khurana ffc5a14d00 bpo-33014: Clarify str.isidentifier docstring (GH-6088)
* bpo-33014: Clarify str.isidentifier docstring

* bpo-33014: Add code example in isidentifier documentation
2018-10-08 12:23:32 +05:30
Victor Stinner 998b806366
Revert "bpo-34595: Add %T format to PyUnicode_FromFormatV() (GH-9080)" (GH-9187)
This reverts commit 886483e2b9.
2018-09-12 00:23:25 +02:00
Victor Stinner 886483e2b9
bpo-34595: Add %T format to PyUnicode_FromFormatV() (GH-9080)
* Add %T format to PyUnicode_FromFormatV(), and so to
  PyUnicode_FromFormat() and PyErr_Format(), to format an object type
  name: equivalent to "%s" with Py_TYPE(obj)->tp_name.
* Replace Py_TYPE(obj)->tp_name with %T format in unicodeobject.c.
* Add unit test on %T format.
* Rename unicode_fromformat_write_cstr() to
  unicode_fromformat_write_utf8(), to make the intent more explicit.
2018-09-07 18:00:58 +02:00
Victor Stinner 3d4226a832
bpo-34523: Support surrogatepass in locale codecs (GH-8995)
Add support for the "surrogatepass" error handler in
PyUnicode_DecodeFSDefault() and PyUnicode_EncodeFSDefault()
for the UTF-8 encoding.

Changes:

* _Py_DecodeUTF8Ex() and _Py_EncodeUTF8Ex() now support the
  surrogatepass error handler (_Py_ERROR_SURROGATEPASS).
* _Py_DecodeLocaleEx() and _Py_EncodeLocaleEx() now use
  the _Py_error_handler enum instead of "int surrogateescape" to pass
  the error handler. These functions now return -3 if the error
  handler is unknown.
* Add unit tests on _Py_DecodeLocaleEx() and _Py_EncodeLocaleEx()
  in test_codecs.
* Rename get_error_handler() to _Py_GetErrorHandler() and expose it
  as a private function.
* _freeze_importlib doesn't need config.filesystem_errors="strict"
  workaround anymore.
2018-08-29 22:21:32 +02:00
Victor Stinner b2457efc78
bpo-34523: Add _PyCoreConfig.filesystem_encoding (GH-8963)
_PyCoreConfig_Read() is now responsible to choose the filesystem
encoding and error handler. Using Py_Main(), the encoding is now
chosen even before calling Py_Initialize().

_PyCoreConfig.filesystem_encoding is now the reference, instead of
Py_FileSystemDefaultEncoding, for the Python filesystem encoding.

Changes:

* Add filesystem_encoding and filesystem_errors to _PyCoreConfig
* _PyCoreConfig_Read() now reads the locale encoding for the file
  system encoding.
* PyUnicode_EncodeFSDefault() and PyUnicode_DecodeFSDefaultAndSize()
  now use the interpreter configuration rather than
  Py_FileSystemDefaultEncoding and Py_FileSystemDefaultEncodeErrors
  global configuration variables.
* Add _Py_SetFileSystemEncoding() and _Py_ClearFileSystemEncoding()
  private functions to only modify Py_FileSystemDefaultEncoding and
  Py_FileSystemDefaultEncodeErrors in coreconfig.c.
* _Py_CoerceLegacyLocale() now takes an int rather than
  _PyCoreConfig for the warning.
2018-08-29 13:25:36 +02:00
Alexey Izbyshev 74a307d48e bpo-34435: Add missing NULL check to unicode_encode_ucs1(). (GH-8823)
Reported by Svace static analyzer.
2018-08-19 21:52:04 +03:00
Zackery Spytz e349bf2358 bpo-22602: Raise an exception in the UTF-7 decoder for ill-formed sequences starting with "+". (GH-8741)
The UTF-7 decoder now raises UnicodeDecodeError for ill-formed
sequences starting with "+" (as specified in RFC 2152).
2018-08-19 07:43:38 +03:00