Commit Graph

1706 Commits

Author SHA1 Message Date
Victor Stinner 8620be99da
bpo-45061: Revert unicode_is_singleton() change (GH-28516)
Don't use a loop over 256 items, only checks for a single singleton.
2021-09-22 12:16:53 +02:00
Mohamad Mansour 8f943ca257
[codemod] Fix non-matching bracket pairs (GH-28473)
Co-authored-by: Terry Jan Reedy <tjreedy@udel.edu>
Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
Co-authored-by: Łukasz Langa <lukasz@langa.pl>
2021-09-22 01:09:00 +02:00
Victor Stinner 86f28372b1
bpo-45061: Detect refcount bug on empty string singleton (GH-28504)
Detect refcount bugs in C extensions when the empty Unicode string
singleton is destroyed by mistake.

* Move forward declarations to the top of unicodeobject.c.
* Simplifiy unicode_is_singleton().
2021-09-21 23:43:09 +02:00
Miguel Brito ed1076428c
bpo-44110: Improve string's __getitem__ error message (GH-26042) 2021-06-27 15:04:57 +03:00
Serhiy Storchaka be8b631b7a
Add more const modifiers. (GH-26691) 2021-06-12 16:11:59 +03:00
Inada Naoki 9ad8f109ac
bpo-44029: Remove Py_UNICODE APIs (GH-25881)
Remove deprecated `Py_UNICODE` APIs: `PyUnicode_Encode`,
`PyUnicode_EncodeUTF7`, `PyUnicode_EncodeUTF8`,
`PyUnicode_EncodeUTF16`, `PyUnicode_EncodeUTF32`,
`PyUnicode_EncodeLatin1`, `PyUnicode_EncodeMBCS`,
`PyUnicode_EncodeDecimal`, `PyUnicode_EncodeRawUnicodeEscape`,
`PyUnicode_EncodeCharmap`, `PyUnicode_EncodeUnicodeEscape`,
`PyUnicode_TransformDecimalToASCII`, `PyUnicode_TranslateCharmap`,
`PyUnicodeEncodeError_Create`, `PyUnicodeTranslateError_Create`.

See :pep:`393` and :pep:`624` for reference.
2021-05-07 15:58:29 +09:00
Jakub Kulík 9032cf5cb1
bpo-43667: Fix broken Unicode encoding in non-UTF locales on Solaris (GH-25096) 2021-04-30 15:21:42 +02:00
Victor Stinner 442ad74fc2
bpo-43687: Py_Initialize() creates singletons earlier (GH-25147)
Reorganize pycore_interp_init() to initialize singletons before the
the first PyType_Ready() call. Fix an issue when Python is configured
using --without-doc-strings.
2021-04-02 15:28:13 +02:00
Jessica Clarke dec0757549
bpo-43179: Generalise alignment for optimised string routines (GH-24624)
* Remove m68k-specific hack from ascii_decode

On m68k, alignments of primitives is more relaxed, with 4-byte and
8-byte types only requiring 2-byte alignment, thus using sizeof(size_t)
does not work. Instead, use the portable alternative.

Note that this is a minimal fix that only relaxes the assertion and the
condition for when to use the optimised version remains overly strict.
Such issues will be fixed tree-wide in the next commit.

NB: In C11 we could use _Alignof(size_t) instead, but for compatibility
we use autoconf.

* Optimise string routines for architectures with non-natural alignment

C only requires that sizeof(x) is a multiple of alignof(x), not that the
two are equal. Thus anywhere where we optimise based on alignment we
should be using alignof(x) not sizeof(x).

This is more annoying than it would be in C11 where we could just use
_Alignof(x) (and alignof(x) in C++11), but since we still require only
C99 we must plumb the information all the way from autoconf through the
various typedefs and defines.
2021-03-31 12:12:39 +02:00
Victor Stinner 9976834f80
bpo-35883: Py_DecodeLocale() escapes invalid Unicode characters (GH-24843)
Python no longer fails at startup with a fatal error if a command
line argument contains an invalid Unicode character.

The Py_DecodeLocale() function now escapes byte sequences which would
be decoded as Unicode characters outside the [U+0000; U+10ffff]
range.

Use MAX_UNICODE constant in unicodeobject.c.
2021-03-17 21:46:53 +01:00
Brandt Bucher 145bf269df
bpo-42128: Structural Pattern Matching (PEP 634) (GH-22917)
Co-authored-by: Guido van Rossum <guido@python.org>
Co-authored-by: Talin <viridia@gmail.com>
Co-authored-by: Pablo Galindo <pablogsal@gmail.com>
2021-02-26 14:51:55 -08:00
Victor Stinner bcb094b41f
bpo-43268: Pass interp rather than tstate to internal functions (GH-24580)
Pass the current interpreter (interp) rather than the current Python
thread state (tstate) to internal functions which only use the
interpreter.

Modified functions:

* _PyXXX_Fini() and _PyXXX_ClearFreeList() functions
* _PyEval_SignalAsyncExc(), make_pending_calls()
* _PySys_GetObject(), sys_set_object(), sys_set_object_id(), sys_set_object_str()
* should_audit(), set_flags_from_config(), make_flags()
* _PyAtExit_Call()
* init_stdio_encoding()
* etc.
2021-02-19 15:10:45 +01:00
Victor Stinner 101bf69ff1
bpo-43268: _Py_IsMainInterpreter() now expects interp (GH-24577)
The _Py_IsMainInterpreter() function now expects interp rather than
tstate.
2021-02-19 13:33:31 +01:00
Pablo Galindo a6d63a20df
Fix compiler warnings regarding loss of data (GH-23983) 2020-12-29 00:28:09 +00:00
Victor Stinner f4507231e3
bpo-42745: finalize_interp_types() calls _PyType_Fini() (GH-23953)
Call _PyType_Fini() in subinterpreters.

Fix reference leaks in subinterpreters.
2020-12-26 20:26:08 +01:00
Victor Stinner ea251806b8
bpo-40521: Per-interpreter interned strings (GH-20085)
Make the Unicode dictionary of interned strings compatible with
subinterpreters.

Remove the INTERN_NAME_STRINGS macro in typeobject.c: names are
always now interned (even if EXPERIMENTAL_ISOLATED_SUBINTERPRETERS
macro is defined).

_PyUnicode_ClearInterned() now uses PyDict_Next() to no longer
allocate memory, to ensure that the interned dictionary is cleared.
2020-12-26 02:58:33 +01:00
Victor Stinner ba3d67c2fb
bpo-39465: Fix _PyUnicode_FromId() for subinterpreters (GH-20058)
Make _PyUnicode_FromId() function compatible with subinterpreters.
Each interpreter now has an array of identifier objects (interned
strings decoded from UTF-8).

* Add PyInterpreterState.unicode.identifiers: array of identifiers
  objects.
* Add _PyRuntimeState.unicode_ids used to allocate unique indexes
  to _Py_Identifier.
* Rewrite the _Py_Identifier structure.

Microbenchmark on _PyUnicode_FromId(&PyId_a) with _Py_IDENTIFIER(a):

[ref] 2.42 ns +- 0.00 ns -> [atomic] 3.39 ns +- 0.00 ns: 1.40x slower

This change adds 1 ns per _PyUnicode_FromId() call in average.
2020-12-26 00:41:46 +01:00
Serhiy Storchaka 2ad93821a6
bpo-42431: Fix outdated bytes comments (GH-23458)
Also move definitions of internal macros F_LJUST etc to private header.
2020-12-03 12:46:16 +02:00
Victor Stinner 32bd68c839
bpo-42519: Replace PyObject_MALLOC() with PyObject_Malloc() (GH-23587)
No longer use deprecated aliases to functions:

* Replace PyObject_MALLOC() with PyObject_Malloc()
* Replace PyObject_REALLOC() with PyObject_Realloc()
* Replace PyObject_FREE() with PyObject_Free()
* Replace PyObject_Del() with PyObject_Free()
* Replace PyObject_DEL() with PyObject_Free()
2020-12-01 10:37:39 +01:00
Victor Stinner 00d7abd7ef
bpo-42519: Replace PyMem_MALLOC() with PyMem_Malloc() (GH-23586)
No longer use deprecated aliases to functions:

* Replace PyMem_MALLOC() with PyMem_Malloc()
* Replace PyMem_REALLOC() with PyMem_Realloc()
* Replace PyMem_FREE() with PyMem_Free()
* Replace PyMem_Del() with PyMem_Free()
* Replace PyMem_DEL() with PyMem_Free()

Modify also the PyMem_DEL() macro to use directly PyMem_Free().
2020-12-01 09:56:42 +01:00
Christian Heimes 07f2adedf0
bpo-40998: Address compiler warnings found by ubsan (GH-20929)
Signed-off-by: Christian Heimes <christian@python.org>

Automerge-Triggered-By: GH:tiran
2020-11-18 07:38:53 -08:00
Ikko Ashimine 38811d68ca
Fix typo in unicodeobject.c (GH-23180)
exeeds -> exceeds

Automerge-Triggered-By: GH:Mariatta
2020-11-09 21:57:34 -08:00
Victor Stinner 920cb647ba
bpo-42157: unicodedata avoids references to UCD_Type (GH-22990)
* UCD_Check() uses PyModule_Check()
* Simplify the internal _PyUnicode_Name_CAPI structure:

  * Remove size and state members
  * Remove state and self parameters of getcode() and getname()
    functions

* Remove global_module_state
2020-10-26 19:19:36 +01:00
Victor Stinner 47e1afd2a1
bpo-1635741: _PyUnicode_Name_CAPI moves to internal C API (GH-22713)
The private _PyUnicode_Name_CAPI structure of the PyCapsule API
unicodedata.ucnhash_CAPI moves to the internal C API. Moreover, the
structure gets a new state member which must be passed to the
getcode() and getname() functions.

* Move Include/ucnhash.h to Include/internal/pycore_ucnhash.h
* unicodedata module is now built with Py_BUILD_CORE_MODULE.
* unicodedata: move hashAPI variable into unicodedata_module_state.
2020-10-26 16:43:47 +01:00
Ma Lin a0c603cb9d
bpo-38252: Use 8-byte step to detect ASCII sequence in 64bit Windows build (GH-16334) 2020-10-18 17:48:38 +03:00
Max Bernstein 3635388f52
bpo-42065: Fix incorrectly formatted _codecs.charmap_decode error message (GH-19940) 2020-10-17 23:38:21 +03:00
Serhiy Storchaka e2ec0b27c0
bpo-41974: Remove complex.__float__, complex.__floordiv__, etc (GH-22593)
Remove complex special methods __int__, __float__, __floordiv__,
__mod__, __divmod__, __rfloordiv__, __rmod__ and __rdivmod__
which always raised a TypeError.
2020-10-09 14:14:37 +03:00
Serhiy Storchaka 9ece9cd65c
bpo-41909: Enable previously disabled recursion checks. (GH-22536)
Enable recursion checks which were disabled when get __bases__ of
non-type objects in issubclass() and isinstance() and when intern
strings. It fixes a stack overflow when getting __bases__ leads
to infinite recursion.

Originally recursion checks was disabled for PyDict_GetItem() which
silences all errors including the one raised in case of detected
recursion and can return incorrect result. But now the code uses
PyDict_GetItemWithError() and PyDict_SetDefault() instead.
2020-10-05 00:55:57 +03:00
Victor Stinner 583ee5a5b1
bpo-41692: Deprecate PyUnicode_InternImmortal() (GH-22486)
The PyUnicode_InternImmortal() function is now deprecated and will be
removed in Python 3.12: use PyUnicode_InternInPlace() instead.
2020-10-02 14:49:00 +02:00
Victor Stinner 7f413a5d95
bpo-40521: Fix PyUnicode_InternInPlace() (GH-22376)
Fix PyUnicode_InternInPlace() when the INTERNED_STRINGS macro is not
defined (when the EXPERIMENTAL_ISOLATED_SUBINTERPRETERS macro is
defined).
2020-09-23 14:05:32 +02:00
Victor Stinner bb083d33f7
bpo-1635741: Port _string module to multi-phase init (GH-22148)
Port the _string extension module to the multi-phase initialization
API (PEP 489).
2020-09-08 15:33:08 +02:00
Serhiy Storchaka 12f433411b
bpo-41334: Convert constructors of str, bytes and bytearray to Argument Clinic (GH-21535) 2020-07-20 15:53:55 +03:00
Serhiy Storchaka 4c8f09d7ce
bpo-36346: Make using the legacy Unicode C API optional (GH-21437)
Add compile time option USE_UNICODE_WCHAR_CACHE. Setting it to 0
makes the interpreter not using the wchar_t cache and the legacy Unicode C API.
2020-07-10 23:26:06 +03:00
Victor Stinner 8182cc2e68
bpo-39573: Use the Py_TYPE() macro (GH-21433)
Replace obj->ob_type with Py_TYPE(obj).
2020-07-10 12:40:38 +02:00
Serhiy Storchaka b3dd5cd4a3
bpo-36346: Undeprecate private function _PyUnicode_AsUnicode(). (GH-21336) 2020-07-05 18:53:45 +03:00
Victor Stinner 3549ca313a
bpo-1635741: Fix unicode_dealloc() for mortal interned string (GH-21270)
When unicode_dealloc() is called on a mortal interned string, the
string reference counter is now reset at zero.
2020-07-03 16:59:12 +02:00
Victor Stinner 666ecfb095
bpo-1635741: Release Unicode interned strings at exit (GH-21269)
* PyUnicode_InternInPlace() now ensures that interned strings are
  ready.
* Add _PyUnicode_ClearInterned().
* Py_Finalize() now releases Unicode interned strings:
  call _PyUnicode_ClearInterned().
2020-07-02 01:19:57 +02:00
Inada Naoki 038dd0f79d
bpo-36346: Raise DeprecationWarning when creating legacy Unicode (GH-20933) 2020-06-30 15:26:56 +09:00
Serhiy Storchaka 349f76c6aa
bpo-36346: Prepare for removing the legacy Unicode C API (AC only). (GH-21223) 2020-06-30 09:03:15 +03:00
Inada Naoki b3332660ad
bpo-41123: Remove PyUnicode_AsUnicodeCopy (GH-21209) 2020-06-30 12:23:07 +09:00
Serhiy Storchaka e67f7db3c3
bpo-37999: Simplify the conversion code for %c, %d, %x, etc. (GH-20437)
Since PyLong_AsLong() no longer use __int__, explicit call
of PyNumber_Index() before it is no longer needed.
2020-06-29 22:36:41 +03:00
Inada Naoki d9f2a13106
bpo-41123: Remove PyUnicode_GetMax() (GH-21192) 2020-06-29 10:46:51 +09:00
Inada Naoki 20a7902175
bpo-41123: Remove Py_UNICODE_str* functions (GH-21164)
They are undocumented and deprecated since Python 3.3.
2020-06-27 18:22:09 +09:00
Victor Stinner 91698d8caa
bpo-40521: Optimize PyBytes_FromStringAndSize(str, 0) (GH-21142)
Always create the empty bytes string singleton.

Optimize PyBytes_FromStringAndSize(str, 0): it no longer has to check
if the empty string singleton was created or not, it is always
available.

Add functions:

* _PyBytes_Init()
* bytes_get_empty(), bytes_new_empty()
* bytes_create_empty_string_singleton()
* unicode_create_empty_string_singleton()

_Py_unicode_state: rename empty structure member to empty_string.
2020-06-25 14:07:40 +02:00
Victor Stinner 2f9ada96e0
bpo-40521: Make Unicode latin1 singletons per interpreter (GH-21101)
Each interpreter now has its own Unicode latin1 singletons.

Remove "ifdef EXPERIMENTAL_ISOLATED_SUBINTERPRETERS"
and "ifdef LATIN1_SINGLETONS": always enable latin1 singletons.

Optimize unicode_result_ready(): only attempt to get a latin1
singleton for PyUnicode_1BYTE_KIND.
2020-06-24 02:22:21 +02:00
Victor Stinner 90ed8a6d71
bpo-40521: Optimize PyUnicode_New(0, maxchar) (GH-21099)
Functions of unicodeobject.c, like PyUnicode_New(), no longer check
if the empty Unicode singleton has been initialized or not. Consider
that it is always initialized. The Unicode API must not be used
before _PyUnicode_Init() or after _PyUnicode_Fini().
2020-06-24 00:34:07 +02:00
Victor Stinner f363d0a6e9
bpo-40521: Make empty Unicode string per interpreter (GH-21096)
Each interpreter now has its own empty Unicode string singleton.
2020-06-24 00:10:40 +02:00
Inada Naoki 2c4928d37e
bpo-36346: Add Py_DEPRECATED to deprecated unicode APIs (GH-20878)
Co-authored-by: Kyle Stanley <aeros167@gmail.com>
Co-authored-by: Victor Stinner <vstinner@python.org>
2020-06-17 20:09:44 +09:00
Victor Stinner 04fc4f2a46
bpo-40989: PyObject_INIT() becomes an alias to PyObject_Init() (GH-20901)
The PyObject_INIT() and PyObject_INIT_VAR() macros become aliases to,
respectively, PyObject_Init() and PyObject_InitVar() functions.

Rename _PyObject_INIT() and _PyObject_INIT_VAR() static inline
functions to, respectively, _PyObject_Init() and _PyObject_InitVar(),
and move them to pycore_object.h. Remove their return value:
their return type becomes void.

The _datetime module is now built with the Py_BUILD_CORE_MODULE macro
defined.

Remove an outdated comment on _Py_tracemalloc_config.
2020-06-16 01:28:07 +02:00
Victor Stinner d36cf5f1d2
bpo-40943: Replace PY_FORMAT_SIZE_T with "z" (GH-20781)
The PEP 353, written in 2005, introduced PY_FORMAT_SIZE_T. Python no
longer supports macOS 10.4 and Visual Studio 2010, but requires more
recent macOS and Visual Studio versions. In 2020 with Python 3.10, it
is now safe to use directly "%zu" to format size_t and "%zi" to
format Py_ssize_t.
2020-06-10 18:38:05 +02:00
Victor Stinner c96a61e816
bpo-40881: Fix unicode_release_interned() (GH-20699)
Use Py_SET_REFCNT() in unicode_release_interned().
2020-06-08 01:39:47 +02:00
Victor Stinner 297257f7bc
bpo-39465: Cleanup _PyUnicode_FromId() code (GH-20595)
Work on a local variable before filling _Py_Identifier members.
2020-06-02 14:39:45 +02:00
Serhiy Storchaka 5f4b229df7
bpo-40792: Make the result of PyNumber_Index() always having exact type int. (GH-20443)
Previously, the result could have been an instance of a subclass of int.

Also revert bpo-26202 and make attributes start, stop and step of the range
object having exact type int.

Add private function _PyNumber_Index() which preserves the old behavior
of PyNumber_Index() for performance to use it in the conversion functions
like PyLong_AsLong().
2020-05-28 10:33:45 +03:00
Victor Stinner 3d17c045b4
bpo-40521: Add PyInterpreterState.unicode (GH-20081)
Move PyInterpreterState.fs_codec into a new
PyInterpreterState.unicode structure.

Give a name to the fs_codec structure and use this structure in
unicodeobject.c.
2020-05-14 01:48:38 +02:00
Victor Stinner d6fb53fe42
bpo-39465: Remove _PyUnicode_ClearStaticStrings() from C API (GH-20078)
Remove the _PyUnicode_ClearStaticStrings() function from the C API.
Make the function fully private (declare it with "static").
2020-05-14 01:11:54 +02:00
Serhiy Storchaka 5650e76f63
bpo-40596: Fix str.isidentifier() for non-canonicalized strings containing non-BMP characters on Windows. (GH-20053) 2020-05-12 16:18:00 +03:00
Serhiy Storchaka 74ea6b5a75
bpo-40593: Improve syntax errors for invalid characters in source code. (GH-20033) 2020-05-12 12:42:04 +03:00
Victor Stinner 607b1027fe
bpo-40521: Disable Unicode caches in isolated subinterpreters (GH-19933)
When Python is built in the experimental isolated subinterpreters
mode, disable Unicode singletons and Unicode interned strings since
they are shared by all interpreters.

Temporary workaround until these caches are made per-interpreter.
2020-05-05 18:50:30 +02:00
sweeneyde a81849b031
bpo-39939: Add str.removeprefix and str.removesuffix (GH-18939)
Added str.removeprefix and str.removesuffix methods and corresponding
bytes, bytearray, and collections.UserString methods to remove affixes
from a string if present. See PEP 616 for a full description.
2020-04-22 23:05:48 +02:00
Victor Stinner e5014be049
bpo-40268: Remove a few pycore_pystate.h includes (GH-19510) 2020-04-14 17:52:15 +02:00
Victor Stinner 81a7be3fa2
bpo-40268: Rename _PyInterpreterState_GET_UNSAFE() (GH-19509)
Rename _PyInterpreterState_GET_UNSAFE() to _PyInterpreterState_GET()
for consistency with _PyThreadState_GET() and to have a shorter name
(help to fit into 80 columns).

Add also "assert(tstate != NULL);" to the function.
2020-04-14 15:14:01 +02:00
Victor Stinner da7933ecc3
bpo-40268: Add _PyInterpreterState_GetConfig() (GH-19492)
Don't access PyInterpreterState.config member directly anymore, but
use new functions:

* _PyInterpreterState_GetConfig()
* _PyInterpreterState_SetConfig()
* _Py_GetConfig()
2020-04-13 03:04:28 +02:00
Serhiy Storchaka 8f87eefe7f
bpo-39943: Add the const qualifier to pointers on non-mutable PyBytes data. (GH-19472) 2020-04-12 14:58:27 +03:00
Serhiy Storchaka cd8295ff75
bpo-39943: Add the const qualifier to pointers on non-mutable PyUnicode data. (GH-19345) 2020-04-11 10:48:40 +03:00
Victor Stinner a15e260b70
bpo-40170: Add _PyIndex_Check() internal function (GH-19426)
Add _PyIndex_Check() function to the internal C API: fast inlined
verson of PyIndex_Check().

Add Include/internal/pycore_abstract.h header file.

Replace PyIndex_Check() with _PyIndex_Check() in C files of Objects
and Python subdirectories.
2020-04-08 02:01:56 +02:00
Victor Stinner d8acf0d9aa
bpo-37388: Don't check encoding/errors during finalization (GH-19409)
str.encode() and str.decode() no longer check the encoding and errors
in development mode or in debug mode during Python finalization. The
codecs machinery can no longer work on very late calls to
str.encode() and str.decode().

This change should help to call _PyObject_Dump() to debug during late
Python finalization.
2020-04-07 16:07:42 +02:00
Serhiy Storchaka 17b4733f2f
bpo-40130: _PyUnicode_AsKind() should not be exported. (GH-19265)
Make it a static function, and pass known attributes
(kind, data, length) instead of the PyUnicode object.
2020-04-01 15:41:49 +03:00
Inada Naoki 3a8c56295d
Revert "bpo-39087: Add _PyUnicode_GetUTF8Buffer()" (GH-18985)
* Revert "bpo-39087: Add _PyUnicode_GetUTF8Buffer() (GH-17659)"

This reverts commit c7ad974d34.

* Update unicodeobject.h
2020-03-14 15:59:27 +09:00
Inada Naoki c7ad974d34
bpo-39087: Add _PyUnicode_GetUTF8Buffer() (GH-17659)
Co-authored-by: Victor Stinner <vstinner@python.org>
2020-03-14 12:43:18 +09:00
Andy Lester dffe4c0709
bpo-39573: Finish converting to new Py_IS_TYPE() macro (GH-18601) 2020-03-04 14:15:20 +01:00
Inada Naoki 02a4d57263
bpo-39087: Optimize PyUnicode_AsUTF8AndSize() (GH-18327)
Avoid using temporary bytes object.
2020-02-27 13:48:59 +09:00
Andy Lester 933fc53f3f
closes bpo-39684: Combine two if/thens and squash uninit var warning. (GH-18565) 2020-02-20 20:51:47 -08:00
Hai Shi 3d235f5c5c
bpo-39500: Fix compile warnings in unicodeobject.c (GH-18519) 2020-02-17 14:41:15 +01:00
Victor Stinner 45876a90e2
bpo-35081: Move bytes_methods.h to the internal C API (GH-18492)
Move the bytes_methods.h header file to the internal C API as
pycore_bytes_methods.h: it only contains private symbols (prefixed by
"_Py"), except of the PyDoc_STRVAR_shared() macro.
2020-02-12 22:32:34 +01:00
Benjamin Peterson 95905ce0f4
bpo-39605: Remove a cast that causes a warning. (GH-18473) 2020-02-11 19:36:14 -08:00
Andy Lester e6be9b59a9
closes bpo-39605: Fix some casts to not cast away const. (GH-18453)
gcc -Wcast-qual turns up a number of instances of casting away constness of pointers. Some of these can be safely modified, by either:

Adding the const to the type cast, as in:

-    return _PyUnicode_FromUCS1((unsigned char*)s, size);
+    return _PyUnicode_FromUCS1((const unsigned char*)s, size);

or, Removing the cast entirely, because it's not necessary (but probably was at one time), as in:

-    PyDTrace_FUNCTION_ENTRY((char *)filename, (char *)funcname, lineno);
+    PyDTrace_FUNCTION_ENTRY(filename, funcname, lineno);

These changes will not change code, but they will make it much easier to check for errors in consts
2020-02-11 18:28:35 -08:00
Petr Viktorin ffd9753a94
bpo-39245: Switch to public API for Vectorcall (GH-18460)
The bulk of this patch was generated automatically with:

    for name in \
        PyObject_Vectorcall \
        Py_TPFLAGS_HAVE_VECTORCALL \
        PyObject_VectorcallMethod \
        PyVectorcall_Function \
        PyObject_CallOneArg \
        PyObject_CallMethodNoArgs \
        PyObject_CallMethodOneArg \
    ;
    do
        echo $name
        git grep -lwz _$name | xargs -0 sed -i "s/\b_$name\b/$name/g"
    done

    old=_PyObject_FastCallDict
    new=PyObject_VectorcallDict
    git grep -lwz $old | xargs -0 sed -i "s/\b$old\b/$new/g"

and then cleaned up:

- Revert changes to in docs & news
- Revert changes to backcompat defines in headers
- Nudge misaligned comments
2020-02-11 17:46:57 +01:00
Victor Stinner f3e7ea5b8c
bpo-39500: Document PyUnicode_IsIdentifier() function (GH-18397)
PyUnicode_IsIdentifier() does not call Py_FatalError() anymore if the
string is not ready.
2020-02-11 14:29:33 +01:00
Victor Stinner 58ac700fb0
bpo-39573: Use Py_TYPE() macro in Objects directory (GH-18392)
Replace direct access to PyObject.ob_type with Py_TYPE().
2020-02-07 03:04:21 +01:00
Victor Stinner c86a11221d
bpo-39573: Add Py_SET_REFCNT() function (GH-18389)
Add a Py_SET_REFCNT() function to set the reference counter of an
object.
2020-02-07 01:24:29 +01:00
Victor Stinner bf305cc6f0
Add PyInterpreterState.fs_codec.utf8 (GH-18367)
Add a fast-path for UTF-8 encoding in PyUnicode_EncodeFSDefault()
and PyUnicode_DecodeFSDefaultAndSize().

Add _PyUnicode_FiniEncodings() helper function for _PyUnicode_Fini().
2020-02-05 17:39:57 +01:00
Victor Stinner 49932fec62
bpo-39542: Simplify _Py_NewReference() (GH-18332)
* Remove _Py_INC_REFTOTAL and _Py_DEC_REFTOTAL macros: modify
  directly _Py_RefTotal.
* _Py_ForgetReference() is no longer defined if the Py_TRACE_REFS
  macro is not defined.
* Remove _Py_NewReference() implementation from object.c:
  unify the two implementations in object.h inline function.
* Fix Py_TRACE_REFS build: _Py_INC_TPALLOCS() macro has been removed.
2020-02-03 17:55:04 +01:00
Victor Stinner ec3c99c8a7
bpo-38631: Avoid Py_FatalError() in unicodeobject.c (GH-18281)
Replace Py_FatalError() calls with _PyErr_WriteUnraisableMsg(),
_PyObject_ASSERT_FAILED_MSG() or Py_UNREACHABLE()
in unicode_dealloc() and unicode_release_interned().
2020-01-30 12:18:32 +01:00
Pablo Galindo 016b0280b8
Fix compiler warning in Objects/unicodeobject.c (GH-17440) 2019-12-02 18:09:43 +00:00
Victor Stinner d68b592dd6
bpo-38896: Remove PyUnicode_ClearFreeList() function (GH-17354)
Remove PyUnicode_ClearFreeList() function: the Unicode free list has
been removed in Python 3.3.
2019-11-23 02:30:32 +01:00
Victor Stinner 3d4833488a
bpo-38858: Call _PyUnicode_Fini() in Py_EndInterpreter() (GH-17330)
Py_EndInterpreter() now clears the filesystem codec.
2019-11-22 12:27:50 +01:00
Serhiy Storchaka 865c3b257f
bpo-28029: Make "".replace("", s, n) returning s for any n != 0. (GH-16981) 2019-10-30 12:03:53 +02:00
Zachary Ware 09895c27cd
bpo-38409: Fix grammar in str.strip() docstring (GH-16682) 2019-10-09 16:09:00 -05:00
Victor Stinner 6876257eaa
bpo-36389: _PyObject_CheckConsistency() available in release mode (GH-16612)
bpo-36389, bpo-38376: The _PyObject_CheckConsistency() function is
now also available in release mode. For example, it can be used to
debug a crash in the visit_decref() function of the GC.

Modify the following functions to also work in release mode:

* _PyDict_CheckConsistency()
* _PyObject_CheckConsistency()
* _PyType_CheckConsistency()
* _PyUnicode_CheckConsistency()

Other changes:

* _PyMem_IsPtrFreed(ptr) now also returns 1 if ptr is NULL
  (equals to 0).
* _PyBytesWriter_CheckConsistency() now returns 1 and is only used
  with assert().
* Reorder _PyObject_Dump() to write safe fields first, and only
  attempt to render repr() at the end.
2019-10-07 18:42:01 +02:00
Victor Stinner 61691d8336
bpo-38353: Cleanup includes in the internal C API (GH-16548)
Use forward declaration of types to avoid includes in the internal C
API. Add also comment to justify other includes.
2019-10-02 23:51:20 +02:00
Victor Stinner fcdb027234
bpo-38236: Dump path config at first import error (GH-16300)
Python now dumps path configuration if it fails to import the Python
codecs of the filesystem and stdio encodings.
2019-09-23 14:45:47 +02:00
Serhiy Storchaka 279f44678c
bpo-37206: Unrepresentable default values no longer represented as None. (GH-13933)
In ArgumentClinic, value "NULL" should now be used only for unrepresentable default values
(like in the optional third parameter of getattr). "None" should be used if None is accepted
as argument and passing None has the same effect as not passing the argument at all.
2019-09-14 12:24:05 +03:00
Raymond Hettinger 0138c4ceab
Fix unused variable and signed/unsigned warnings (GH-15537) 2019-08-27 09:55:13 -07:00
Steve Dower 7ebdda0dbe
bpo-36311: Fixes decoding multibyte characters around chunk boundaries and improves decoding performance (GH-15083) 2019-08-21 16:22:33 -07:00
Jeroen Demeyer 196a530e00 bpo-37483: add _PyObject_CallOneArg() function (#14558) 2019-07-04 19:31:34 +09:00
Victor Stinner ed076ed467
bpo-37388: Add PyUnicode_Decode(str, 0) fast-path (GH-14385)
Add a fast-path to PyUnicode_Decode() for size equals to 0.
2019-06-26 01:49:32 +02:00
Victor Stinner 22eb689cf3
bpo-37388: Development mode check encoding and errors (GH-14341)
In development mode and in debug build, encoding and errors arguments
are now checked on string encoding and decoding operations. Examples:
open(), str.encode() and bytes.decode().

By default, for best performances, the errors argument is only
checked at the first encoding/decoding error, and the encoding
argument is sometimes ignored for empty strings.
2019-06-26 00:51:05 +02:00
Serhiy Storchaka 894263ba80
bpo-24214: Fixed the UTF-8 and UTF-16 incremental decoders. (GH-14304)
* The UTF-8 incremental decoders fails now fast if encounter
  a sequence that can't be handled by the error handler.
* The UTF-16 incremental decoders with the surrogatepass error
  handler decodes now a lone low surrogate with final=False.
2019-06-25 11:54:18 +03:00
Inada Naoki 770847a7db
bpo-37348: optimize decoding ASCII string (GH-14283)
`_PyUnicode_Writer` is a relatively complex structure.  Initializing it is significant overhead when decoding short ASCII string.
2019-06-24 12:30:24 +09:00
Victor Stinner b45d259bdd
bpo-36710: Use tstate in pylifecycle.c (GH-14249)
In pylifecycle.c: pass tstate argument, rather than interp argument,
to functions.
2019-06-20 00:05:23 +02:00
Jeroen Demeyer 530f506ac9 bpo-36974: tp_print -> tp_vectorcall_offset and tp_reserved -> tp_as_async (GH-13464)
Automatically replace
tp_print -> tp_vectorcall_offset
tp_compare -> tp_as_async
tp_reserved -> tp_as_async
2019-05-30 19:13:39 -07:00
Inada Naoki 7d408697a9
remove unnecessary tp_dealloc (GH-13647) 2019-05-29 17:23:27 +09:00
Victor Stinner 331a6a56e9
bpo-36763: Implement the PEP 587 (GH-13592)
* Add a whole new documentation page:
  "Python Initialization Configuration"
* PyWideStringList_Append() return type is now PyStatus,
  instead of int
* PyInterpreterState_New() now calls PyConfig_Clear() if
  PyConfig_InitPythonConfig() fails.
* Rename files:

  * Python/coreconfig.c => Python/initconfig.c
  * Include/cpython/coreconfig.h => Include/cpython/initconfig.h
  * Include/internal/: pycore_coreconfig.h => pycore_initconfig.h

* Rename structures

  * _PyCoreConfig => PyConfig
  * _PyPreConfig => PyPreConfig
  * _PyInitError => PyStatus
  * _PyWstrList => PyWideStringList

* Rename PyConfig fields:

  * use_module_search_paths => module_search_paths_set
  * module_search_path_env => pythonpath_env

* Rename PyStatus field: _func => func
* PyInterpreterState: rename core_config field to config
* Rename macros and functions:

  * _PyCoreConfig_SetArgv() => PyConfig_SetBytesArgv()
  * _PyCoreConfig_SetWideArgv() => PyConfig_SetArgv()
  * _PyCoreConfig_DecodeLocale() => PyConfig_SetBytesString()
  * _PyInitError_Failed() => PyStatus_Exception()
  * _Py_INIT_ERROR_TYPE_xxx enums => _PyStatus_TYPE_xxx
  * _Py_UnixMain() => Py_BytesMain()
  * _Py_ExitInitError() => Py_ExitStatusException()
  * _Py_PreInitializeFromArgs() => Py_PreInitializeFromBytesArgs()
  * _Py_PreInitializeFromWideArgs() => Py_PreInitializeFromArgs()
  * _Py_PreInitialize() => Py_PreInitialize()
  * _Py_RunMain() => Py_RunMain()
  * _Py_InitializeFromConfig() => Py_InitializeFromConfig()
  * _Py_INIT_XXX() => _PyStatus_XXX()
  * _Py_INIT_FAILED() => _PyStatus_EXCEPTION()

* Rename 'err' PyStatus variables to 'status'
* Convert RUN_CODE() macro to config_run_code() static inline function
* Remove functions:

  * _Py_InitializeFromArgs()
  * _Py_InitializeFromWideArgs()
  * _PyInterpreterState_GetCoreConfig()
2019-05-27 16:39:22 +02:00
Zackery Spytz 14514d9084 bpo-36946: Fix possible signed integer overflow when handling slices. (GH-13375)
The final addition (cur += step) may overflow, so use size_t for "cur".
"cur" is always positive (even for negative steps), so it is safe to use
size_t here.

Co-Authored-By: Martin Panter <vadmium+py@gmail.com>
2019-05-17 10:13:03 +03:00
Zackery Spytz 1a2252ed39 bpo-36594: Fix incorrect use of %p in format strings (GH-12769)
In addition, fix some other minor violations of C99.
2019-05-06 12:56:50 -04:00
Victor Stinner 709d23dee6
bpo-36775: _PyCoreConfig only uses wchar_t* (GH-13062)
_PyCoreConfig: Change filesystem_encoding, filesystem_errors,
stdio_encoding and stdio_errors fields type from char* to wchar_t*.

Changes:

* PyInterpreterState: replace fscodec_initialized (int) with fs_codec
  structure.
* Add get_error_handler_wide() and unicode_encode_utf8() helper
  functions.
* Add error_handler parameter to unicode_encode_locale()
  and unicode_decode_locale().
* Remove _PyCoreConfig_SetString().
* Rename _PyCoreConfig_SetWideString() to _PyCoreConfig_SetString().
* Rename _PyCoreConfig_SetWideStringFromString()
  to _PyCoreConfig_DecodeLocale().
2019-05-02 14:56:30 -04:00
Victor Stinner 43fc3bb7cf
bpo-36775: Add _PyUnicode_InitEncodings() (GH-13057)
Move get_codec_name() and initfsencoding() from pylifecycle.c to
unicodeobject.c.

Rename also "init" functions in pylifecycle.c.
2019-05-02 11:54:20 -04:00
Victor Stinner e251095a3f
bpo-36775: Add _Py_FORCE_UTF8_FS_ENCODING macro (GH-13056)
Add _Py_FORCE_UTF8_LOCALE and _Py_FORCE_UTF8_FS_ENCODING macros to
avoid factorize "#if defined(__ANDROID__) || defined(__VXWORKS__)"
and "#if defined(__APPLE__)".

Cleanup also config_init_fs_encoding().
2019-05-02 11:28:57 -04:00
Victor Stinner 0fc91eef34
bpo-36389: Add _PyObject_CheckConsistency() function (GH-12803)
Add a new _PyObject_CheckConsistency() function which can be used to
help debugging. The function is available in release mode.

Add a 'check_content' parameter to _PyDict_CheckConsistency().
2019-04-12 21:51:34 +02:00
Kingsley M b015fc86f7 bpo-36549: str.capitalize now titlecases the first character instead of uppercasing it (GH-12804) 2019-04-12 08:35:39 -07:00
Serhiy Storchaka 7a465cb5ee
bpo-24214: Fixed the UTF-8 incremental decoder. (GH-12603)
The bug occurred when the encoded surrogate character is passed
to the incremental decoder in two chunks.
2019-03-30 08:23:38 +02:00
Serhiy Storchaka c1e2c288f4
bpo-36312: Fix decoders for some code pages. (GH-12369) 2019-03-20 21:45:18 +02:00
Victor Stinner fecc4f2b47
bpo-36356: Release Unicode interned strings on Valgrind (#12431)
When Python is compiled with Valgrind support, release Unicode
interned strings at exit in _PyUnicode_Fini().

* Rename _Py_ReleaseInternedUnicodeStrings() to
  unicode_release_interned() and make it private.
* unicode_release_interned() is now called from _PyUnicode_Fini():
  it must be called with a running Python thread state for TRASHCAN,
  it cannot be called from pymain_free().
* Don't display statistics on interned strings at exit anymore
2019-03-19 14:20:29 +01:00
Victor Stinner 5f9cf23502
bpo-36301: Error if decoding pybuilddir.txt fails (GH-12422)
Python initialization now fails if decoding pybuilddir.txt
configuration file fails at startup.

_PyPathConfig_Calculate() now reports memory allocation failure and
decoding error on decoding pybuilddir.txt content from
UTF-8/surrogateescape.
2019-03-19 01:46:25 +01:00
Inada Naoki 6a16b18224
bpo-36297: remove "unicode_internal" codec (GH-12342) 2019-03-18 15:44:11 +09:00
Victor Stinner 6d43f6f081
bpo-35713: Split _Py_InitializeCore into subfunctions (GH-11650)
* Split _Py_InitializeCore_impl() into subfunctions: add multiple pycore_init_xxx() functions
* Preliminary sys.stderr is now set earlier to get an usable
  sys.stderr ealier.
* Move code into _Py_Initialize_ReconfigureCore() to be able to call
  it from _Py_InitializeCore().
* Split _PyExc_Init(): create a new _PyBuiltins_AddExceptions()
  function.
* Call _PyExc_Init() earlier in _Py_InitializeCore_impl()
  and new_interpreter() to get working exceptions earlier.
* _Py_ReadyTypes() now returns _PyInitError rather than calling
  Py_FatalError().
* Misc code cleanup
2019-01-22 21:18:05 +01:00
Victor Stinner bf4ac2d2fd
bpo-35713: Rework Python initialization (GH-11647)
* The PyByteArray_Init() and PyByteArray_Fini() functions have been
  removed. They did nothing since Python 2.7.4 and Python 3.2.0, were
  excluded from the limited API (stable ABI), and were not
  documented.
* Move "_PyXXX_Init()" and "_PyXXX_Fini()" declarations from
  Include/cpython/pylifecycle.h to
  Include/internal/pycore_pylifecycle.h. Replace
  "PyAPI_FUNC(TYPE)" with "extern TYPE".
* _PyExc_Init() now returns an error on failure rather than calling
  Py_FatalError(). Move macros inside _PyExc_Init() and undefine them
  when done. Rewrite macros to make them look more like statement:
  add ";" when using them, add "do { ... } while (0)".
* _PyUnicode_Init() now returns a _PyInitError error rather than call
  Py_FatalError().
* Move stdin check from _PySys_BeginInit() to init_sys_streams().
* _Py_ReadyTypes() now returns a _PyInitError error rather than
  calling Py_FatalError().
2019-01-22 17:39:03 +01:00
Serhiy Storchaka d586ccb04f
bpo-35552: Fix reading past the end in PyUnicode_FromFormat() and PyBytes_FromFormat(). (GH-11276)
Format characters "%s" and "%V" in PyUnicode_FromFormat() and "%s" in PyBytes_FromFormat()
no longer read memory past the limit if precision is specified.
2019-01-12 10:30:35 +02:00
Xtreak 3f7983a25a bpo-35560: Remove assertion from format(float, "n") (GH-11288)
Fix an assertion error in format() in debug build for floating point
formatting with "n" format, zero padding and small width. Release build is
not impacted. Patch by Karthikeyan Singaravelan.
2019-01-07 16:09:14 +01:00
animalize a1d1425306 bpo-35636: Remove redundant check in unicode_hash(). (GH-11402)
_Py_HashBytes() does the check for empty string.
2019-01-02 14:16:06 +02:00
Serhiy Storchaka bb86bf4c4e
bpo-35444: Unify and optimize the helper for getting a builtin object. (GH-11047)
This speeds up pickling of some iterators.

This fixes also error handling in pickling methods when fail to
look up builtin "getattr".
2018-12-11 08:28:18 +02:00
Serhiy Storchaka eeb719eac6
bpo-35365: Use a wchar_t* buffer in the code page decoder. (GH-10837) 2018-12-04 10:25:50 +02:00
Serhiy Storchaka 4013c17911
bpo-35372: Fix the code page decoder for input > 2 GiB. (GH-10848) 2018-12-03 10:36:45 +02:00
Victor Stinner bde9d6bbb4
bpo-34523, bpo-35322: Fix unicode_encode_locale() (GH-10759)
Fix memory leak in PyUnicode_EncodeLocale() and
PyUnicode_EncodeFSDefault() on error handling.

Changes:

* Fix unicode_encode_locale() error handling
* Fix test_codecs.LocaleCodecTest
2018-11-28 10:26:20 +01:00
Victor Stinner 163403a63e
bpo-33954: Fix compiler warning in _PyUnicode_FastFill() (GH-10737)
'data' argument of unicode_fill() is modified, so it must not be
constant.

Add more assertions to unicode_fill(): check the maximum character
value.
2018-11-27 12:41:17 +01:00
Serhiy Storchaka 62be74290a
bpo-33012: Fix invalid function cast warnings with gcc 8. (GH-6749)
Fix invalid function cast warnings with gcc 8
for method conventions different from METH_NOARGS, METH_O and
METH_VARARGS excluding Argument Clinic generated code.
2018-11-27 13:27:31 +02:00
Victor Stinner 59423e3ddd
bpo-33954: Fix _PyUnicode_InsertThousandsGrouping() (GH-10623)
Fix str.format(), float.__format__() and complex.__format__() methods
for non-ASCII decimal point when using the "n" formatter.

Changes:

* Rewrite _PyUnicode_InsertThousandsGrouping(): it now requires
  a _PyUnicodeWriter object for the buffer and a Python str object
  for digits.
* Rename FILL() macro to unicode_fill(), convert it to static inline function,
  add "assert(0 <= start);" and rework its code.
2018-11-26 13:40:01 +01:00
Victor Stinner a42de742e7
bpo-35059: Cast void* to PyObject* (GH-10650)
Don't pass void* to Python macros: use _PyObject_CAST().
2018-11-22 10:25:22 +01:00
Victor Stinner bcda8f1d42
bpo-35081: Add Include/internal/pycore_object.h (GH-10640)
Move _PyObject_GC_TRACK() and _PyObject_GC_UNTRACK() from
Include/objimpl.h to Include/internal/pycore_object.h.
2018-11-21 22:27:47 +01:00
Gregory P. Smith 746b2d35ea
bpo-35214: Fix OOB memory access in unicode escape parser (GH-10506)
Discovered using clang's MemorySanitizer when it ran python3's
test_fstring test_misformed_unicode_character_name.

An msan build will fail by simply executing: ./python -c 'u"\N"'
2018-11-13 13:16:54 -08:00
Victor Stinner 621cebe81b
bpo-35081: Rename internal headers (GH-10275)
Rename Include/internal/ headers:

* pycore_hash.h -> pycore_pyhash.h
* pycore_lifecycle.h -> pycore_pylifecycle.h
* pycore_mem.h -> pycore_pymem.h
* pycore_state.h -> pycore_pystate.h

Add missing headers to Makefile.pre.in and PCbuild:

* pycore_condvar.h.
* pycore_hamt.h
* pycore_pyhash.h
2018-11-12 16:53:38 +01:00
Victor Stinner 9fc57a3848
bpo-35081: Add pycore_fileutils.h (GH-10371)
Move Py_BUILD_CORE code from Include/fileutils.h to a new
Include/internal/pycore_fileutils.h file.
2018-11-07 00:44:03 +01:00
Victor Stinner 27e2d1f219
bpo-35081: Add pycore_ prefix to internal header files (GH-10263)
* Rename Include/internal/ header files:

  * pyatomic.h -> pycore_atomic.h
  * ceval.h -> pycore_ceval.h
  * condvar.h -> pycore_condvar.h
  * context.h -> pycore_context.h
  * pygetopt.h -> pycore_getopt.h
  * gil.h -> pycore_gil.h
  * hamt.h -> pycore_hamt.h
  * hash.h -> pycore_hash.h
  * mem.h -> pycore_mem.h
  * pystate.h -> pycore_state.h
  * warnings.h -> pycore_warnings.h

* PCbuild project, Makefile.pre.in, Modules/Setup: add the
  Include/internal/ directory to the search paths of header files.
* Update includes. For example, replace #include "internal/mem.h"
  with #include "pycore_mem.h".
2018-11-01 00:52:28 +01:00
Victor Stinner 50fe3f8913
bpo-9263: _PyXXX_CheckConsistency() use _PyObject_ASSERT() (GH-10108)
Use _PyObject_ASSERT() in:

* _PyDict_CheckConsistency()
* _PyType_CheckConsistency()
* _PyUnicode_CheckConsistency()

_PyObject_ASSERT() dumps the faulty object if the assertion fails
to help debugging.
2018-10-26 18:47:15 +02:00
Serhiy Storchaka c46db9232f
bpo-30863: Rewrite PyUnicode_AsWideChar() and PyUnicode_AsWideCharString(). (GH-2599)
They no longer cache the wchar_t* representation of string objects.
2018-10-23 22:58:24 +03:00
Emanuele Gaifas fc8205cb4b Add missing closing quote and trailing period in str.isidentifier() docstring (GH-9756)
This rectifies commit ffc5a14d00.
2018-10-08 16:14:47 +05:30
Sanyam Khurana ffc5a14d00 bpo-33014: Clarify str.isidentifier docstring (GH-6088)
* bpo-33014: Clarify str.isidentifier docstring

* bpo-33014: Add code example in isidentifier documentation
2018-10-08 12:23:32 +05:30
Victor Stinner 998b806366
Revert "bpo-34595: Add %T format to PyUnicode_FromFormatV() (GH-9080)" (GH-9187)
This reverts commit 886483e2b9.
2018-09-12 00:23:25 +02:00
Victor Stinner 886483e2b9
bpo-34595: Add %T format to PyUnicode_FromFormatV() (GH-9080)
* Add %T format to PyUnicode_FromFormatV(), and so to
  PyUnicode_FromFormat() and PyErr_Format(), to format an object type
  name: equivalent to "%s" with Py_TYPE(obj)->tp_name.
* Replace Py_TYPE(obj)->tp_name with %T format in unicodeobject.c.
* Add unit test on %T format.
* Rename unicode_fromformat_write_cstr() to
  unicode_fromformat_write_utf8(), to make the intent more explicit.
2018-09-07 18:00:58 +02:00
Victor Stinner 3d4226a832
bpo-34523: Support surrogatepass in locale codecs (GH-8995)
Add support for the "surrogatepass" error handler in
PyUnicode_DecodeFSDefault() and PyUnicode_EncodeFSDefault()
for the UTF-8 encoding.

Changes:

* _Py_DecodeUTF8Ex() and _Py_EncodeUTF8Ex() now support the
  surrogatepass error handler (_Py_ERROR_SURROGATEPASS).
* _Py_DecodeLocaleEx() and _Py_EncodeLocaleEx() now use
  the _Py_error_handler enum instead of "int surrogateescape" to pass
  the error handler. These functions now return -3 if the error
  handler is unknown.
* Add unit tests on _Py_DecodeLocaleEx() and _Py_EncodeLocaleEx()
  in test_codecs.
* Rename get_error_handler() to _Py_GetErrorHandler() and expose it
  as a private function.
* _freeze_importlib doesn't need config.filesystem_errors="strict"
  workaround anymore.
2018-08-29 22:21:32 +02:00
Victor Stinner b2457efc78
bpo-34523: Add _PyCoreConfig.filesystem_encoding (GH-8963)
_PyCoreConfig_Read() is now responsible to choose the filesystem
encoding and error handler. Using Py_Main(), the encoding is now
chosen even before calling Py_Initialize().

_PyCoreConfig.filesystem_encoding is now the reference, instead of
Py_FileSystemDefaultEncoding, for the Python filesystem encoding.

Changes:

* Add filesystem_encoding and filesystem_errors to _PyCoreConfig
* _PyCoreConfig_Read() now reads the locale encoding for the file
  system encoding.
* PyUnicode_EncodeFSDefault() and PyUnicode_DecodeFSDefaultAndSize()
  now use the interpreter configuration rather than
  Py_FileSystemDefaultEncoding and Py_FileSystemDefaultEncodeErrors
  global configuration variables.
* Add _Py_SetFileSystemEncoding() and _Py_ClearFileSystemEncoding()
  private functions to only modify Py_FileSystemDefaultEncoding and
  Py_FileSystemDefaultEncodeErrors in coreconfig.c.
* _Py_CoerceLegacyLocale() now takes an int rather than
  _PyCoreConfig for the warning.
2018-08-29 13:25:36 +02:00
Alexey Izbyshev 74a307d48e bpo-34435: Add missing NULL check to unicode_encode_ucs1(). (GH-8823)
Reported by Svace static analyzer.
2018-08-19 21:52:04 +03:00
Zackery Spytz e349bf2358 bpo-22602: Raise an exception in the UTF-7 decoder for ill-formed sequences starting with "+". (GH-8741)
The UTF-7 decoder now raises UnicodeDecodeError for ill-formed
sequences starting with "+" (as specified in RFC 2152).
2018-08-19 07:43:38 +03:00
Victor Stinner caba55b3b7
bpo-34301: Add _PyInterpreterState_Get() helper function (GH-8592)
sys_setcheckinterval() now uses a local variable to parse arguments,
before writing into interp->check_interval.
2018-08-03 15:33:52 +02:00
INADA Naoki 16dfca4d82
bpo-34087: Fix buffer overflow in int(s) and similar functions (GH-8274)
`_PyUnicode_TransformDecimalAndSpaceToASCII()` missed trailing NUL char.
It caused buffer overflow in `_Py_string_to_number_with_underscores()`.

This bug is introduced in 9b6c60cb.
2018-07-14 12:06:43 +09:00
Bup fc93bd467e Change tp_size to tp_basicsize in comment and realign the comments (GH-6775) 2018-06-19 16:59:55 +08:00
Siddhesh Poyarekar 55edd0c185 bpo-33012: Fix invalid function cast warnings with gcc 8 for METH_NOARGS. (GH-6030)
METH_NOARGS functions need only a single argument but they are cast
into a PyCFunction, which takes two arguments.  This triggers an
invalid function cast warning in gcc8 due to the argument mismatch.
Fix this by adding a dummy unused argument.
2018-04-29 21:59:33 +03:00
Xiang Zhang 2b77a921e6
bpo-29803: remove a redandunt op and fix a comment in unicodeobject.c (#660) 2018-02-13 18:33:32 +08:00
Serhiy Storchaka b7e2d67f7c
bpo-32827: Fix usage of _PyUnicodeWriter_Prepare() in decoding errors handler. (GH-5636) 2018-02-13 08:27:33 +02:00
oldk aa0735f597 bpo-32747: Remove trailing spaces in docstrings. (GH-5491) 2018-02-02 10:52:55 +02:00
Xiang Zhang 2c7fd46e11
bpo-32583: Fix possible crashing in builtin Unicode decoders (#5325)
When using customized decode error handlers, it is possible for builtin decoders
to write out-of-bounds and then crash.
2018-01-31 20:48:05 +08:00
INADA Naoki 7cc95f5069
Fix wrong assert in unicodeobject (GH-5340) 2018-01-28 02:07:09 +09:00
INADA Naoki a49ac99029
bpo-32677: Add .isascii() to str, bytes and bytearray (GH-5342) 2018-01-27 14:06:21 +09:00
Victor Stinner 7ed7aead95
bpo-29240: Fix locale encodings in UTF-8 Mode (#5170)
Modify locale.localeconv(), time.tzname, os.strerror() and other
functions to ignore the UTF-8 Mode: always use the current locale
encoding.

Changes:

* Add _Py_DecodeLocaleEx() and _Py_EncodeLocaleEx(). On decoding or
  encoding error, they return the position of the error and an error
  message which are used to raise Unicode errors in
  PyUnicode_DecodeLocale() and PyUnicode_EncodeLocale().
* Replace _Py_DecodeCurrentLocale() with _Py_DecodeLocaleEx().
* PyUnicode_DecodeLocale() now uses _Py_DecodeLocaleEx() for all
  cases, especially for the strict error handler.
* Add _Py_DecodeUTF8Ex(): return more information on decoding error
  and supports the strict error handler.
* Rename _Py_EncodeUTF8_surrogateescape() to _Py_EncodeUTF8Ex().
* Replace _Py_EncodeCurrentLocale() with _Py_EncodeLocaleEx().
* Ignore the UTF-8 mode to encode/decode localeconv(), strerror()
  and time zone name.
* Remove PyUnicode_DecodeLocale(), PyUnicode_DecodeLocaleAndSize()
  and PyUnicode_EncodeLocale() now ignore the UTF-8 mode: always use
  the "current" locale.
* Remove _PyUnicode_DecodeCurrentLocale(),
  _PyUnicode_DecodeCurrentLocaleAndSize() and
  _PyUnicode_EncodeCurrentLocale().
2018-01-15 10:45:49 +01:00
Victor Stinner cb3ae5588b
bpo-29240: Ignore UTF-8 Mode in time module (#5148)
time.strftime() must use the current LC_CTYPE encoding, not UTF-8
if the UTF-8 mode is enabled.

Add _PyUnicode_DecodeCurrentLocale() function.
2018-01-11 10:37:59 +01:00
Victor Stinner 2cba6b8579
bpo-29240: readline now ignores the UTF-8 Mode (#5145)
Add new fuctions ignoring the UTF-8 mode:

* _Py_DecodeCurrentLocale()
* _Py_EncodeCurrentLocale()
* _PyUnicode_DecodeCurrentLocaleAndSize()
* _PyUnicode_EncodeCurrentLocale()

Modify the readline module to use these functions.

Re-enable test_readline.test_nonascii().
2018-01-10 22:46:15 +01:00
Victor Stinner 9dd762013f
bpo-32030: Add _Py_EncodeLocaleRaw() (#4961)
Replace Py_EncodeLocale() with _Py_EncodeLocaleRaw() in:

* _Py_wfopen()
* _Py_wreadlink()
* _Py_wrealpath()
* _Py_wstat()
* pymain_open_filename()

These functions are called early during Python intialization, only
the RAW memory allocator must be used.
2017-12-21 16:20:32 +01:00
Victor Stinner e47e698da6
bpo-32030: Add _Py_EncodeUTF8_surrogateescape() (#4960)
Py_EncodeLocale() now uses _Py_EncodeUTF8_surrogateescape(), instead
of using temporary unicode and bytes objects. So Py_EncodeLocale()
doesn't use the Python C API anymore.
2017-12-21 15:45:16 +01:00
Serhiy Storchaka a5552f023e
bpo-32240: Add the const qualifier to declarations of PyObject* array arguments. (#4746) 2017-12-15 13:11:11 +02:00
Victor Stinner 91106cd9ff
bpo-29240: PEP 540: Add a new UTF-8 Mode (#855)
* Add -X utf8 command line option, PYTHONUTF8 environment variable
  and a new sys.flags.utf8_mode flag.
* If the LC_CTYPE locale is "C" at startup: enable automatically the
  UTF-8 mode.
* Add _winapi.GetACP(). encodings._alias_mbcs() now calls
  _winapi.GetACP() to get the ANSI code page
* locale.getpreferredencoding() now returns 'UTF-8' in the UTF-8
  mode. As a side effect, open() now uses the UTF-8 encoding by
  default in this mode.
* Py_DecodeLocale() and Py_EncodeLocale() now use the UTF-8 encoding
  in the UTF-8 Mode.
* Update subprocess._args_from_interpreter_flags() to handle -X utf8
* Skip some tests relying on the current locale if the UTF-8 mode is
  enabled.
* Add test_utf8mode.py.
* _Py_DecodeUTF8_surrogateescape() gets a new optional parameter to
  return also the length (number of wide characters).
* pymain_get_global_config() and pymain_set_global_config() now
  always copy flag values, rather than only copying if the new value
  is greater than the old value.
2017-12-13 12:29:09 +01:00
Victor Stinner 6a54c676e6
bpo-31979: Remove unused align_maxchar() function (#4527) 2017-11-23 19:02:23 +01:00
Serhiy Storchaka 9b6c60cbce
bpo-31979: Simplify transforming decimals to ASCII (#4336)
in int(), float() and complex() parsers.

This also speeds up parsing non-ASCII numbers by around 20%.
2017-11-13 21:23:48 +02:00
Serhiy Storchaka e2f92de6a9
Add the const qualifier to "char *" variables that refer to literal strings. (#4370) 2017-11-11 13:06:26 +02:00
stratakis e8b1965639 bpo-23699: Use a macro to reduce boilerplate code in rich comparison functions (GH-793) 2017-11-02 20:32:54 +10:00
Serhiy Storchaka a2314283ff
bpo-20047: Make bytearray methods partition() and rpartition() rejecting (#4158)
separators that are not bytes-like objects.
2017-10-29 02:11:54 +03:00
Serhiy Storchaka 56cb465cc9 bpo-31825: Fixed OverflowError in the 'unicode-escape' codec (#4058)
and in codecs.escape_decode() when decode an escaped non-ascii byte.
2017-10-20 17:08:15 +03:00
Barry Warsaw b2e5794870 bpo-31338 (#3374)
* Add Py_UNREACHABLE() as an alias to abort().
* Use Py_UNREACHABLE() instead of assert(0)
* Convert more unreachable code to use Py_UNREACHABLE()
* Document Py_UNREACHABLE() and a few other macros.
2017-09-14 18:13:16 -07:00
Serhiy Storchaka e3b2b4b8d9 bpo-31393: Fix the use of PyUnicode_READY(). (#3451) 2017-09-08 09:58:51 +03:00
Eric Snow 2ebc5ce42a bpo-30860: Consolidate stateful runtime globals. (#3397)
* group the (stateful) runtime globals into various topical structs
* consolidate the topical structs under a single top-level _PyRuntimeState struct
* add a check-c-globals.py script that helps identify runtime globals

Other globals are excluded (see globals.txt and check-c-globals.py).
2017-09-07 23:51:28 -06:00
Stefan Krah f432a3234f bpo-30923: Silence fall-through warnings included in -Wextra since gcc-7.0. (#3157) 2017-08-21 13:09:59 +02:00
Serhiy Storchaka 64e461be09 bpo-22207: Add checks for possible integer overflows in unicodeobject.c. (#2623)
Based on patch by Victor Stinner.
2017-07-11 06:55:25 +03:00
Serhiy Storchaka f7eae0adfc [security] bpo-13617: Reject embedded null characters in wchar* strings. (#2302)
Based on patch by Victor Stinner.

Add private C API function _PyUnicode_AsUnicode() which is similar to
PyUnicode_AsUnicode(), but checks for null characters.
2017-06-28 08:30:06 +03:00
Serhiy Storchaka e613e6add5 bpo-30708: Check for null characters in PyUnicode_AsWideCharString(). (#2285)
Raise a ValueError if the second argument is NULL and the wchar_t\*
string contains null characters.
2017-06-27 16:03:14 +03:00
Serhiy Storchaka 40db90c1ce bpo-29802: Fix reference counting in module-level struct functions (#1213)
when pass arguments of wrong type.
2017-04-20 21:19:31 +03:00
Serhiy Storchaka b879fe82e7 Expand the PySlice_GetIndicesEx macro. (#1023) 2017-04-08 09:53:51 +03:00
Lisa Roach 43ba8861e0 bpo-29549: Fixes docstring for str.index (#256)
* Updates B.index documentation.

* Updates str.index documentation, makes it Argument Clinic compatible.

* Removes ArgumentClinic code.

* Finishes string.index documentation.

* Updates string.rindex documentation.

* Documents B.rindex.
2017-04-04 22:36:22 -07:00
Serhiy Storchaka fff9a31a91 bpo-29865: Use PyXXX_GET_SIZE macros rather than Py_SIZE for concrete types. (#748) 2017-03-21 08:53:25 +02:00
Serhiy Storchaka 004e03fb0c bpo-29116: Improve error message for concatenating str with non-str. (#710) 2017-03-19 19:38:42 +02:00
Serhiy Storchaka 202fda55c2 bpo-24037: Add Argument Clinic converter `bool(accept={int})`. (#485) 2017-03-12 10:10:47 +02:00
Serhiy Storchaka 370fd202f1 Use Py_RETURN_FALSE/Py_RETURN_TRUE rather than PyBool_FromLong(0)/PyBool_FromLong(1). (#567) 2017-03-08 20:47:48 +02:00
Serhiy Storchaka 9f8ad3f39e bpo-29568: Disable any characters between two percents for escaped percent "%%" in the format string for classic string formatting. (GH-513) 2017-03-08 11:51:19 +08:00
Martin Panter 91a8866dc1 Fix grammar in doc string, RST markup 2017-01-24 00:30:06 +00:00
Serhiy Storchaka 228b12edcc Issue #28999: Use Py_RETURN_NONE, Py_RETURN_TRUE and Py_RETURN_FALSE wherever
possible.  Patch is writen with Coccinelle.
2017-01-23 09:47:21 +02:00
Serhiy Storchaka 2a404b63d4 Issue #28769: The result of PyUnicode_AsUTF8AndSize() and PyUnicode_AsUTF8()
is now of type "const char *" rather of "char *".
2017-01-22 23:07:07 +02:00
Victor Stinner 0c4a828cad Run Argument Clinic: METH_VARARGS=>METH_FASTCALL
Issue #29286. Run Argument Clinic to get the new faster METH_FASTCALL calling
convention for functions using "boring" positional arguments.

Manually fix _elementtree: _elementtree_XMLParser_doctype() must remain
consistent with the clinic code.
2017-01-17 02:21:47 +01:00
INADA Naoki 15f94596b6 Issue #20180: forgot to update AC output. 2017-01-16 21:49:13 +09:00
INADA Naoki 3ae2056512 Issue #20180: convert unicode methods to AC. 2017-01-16 20:41:20 +09:00
Xiang Zhang 7a4da324dc Issue #29145: Merge 3.6. 2017-01-10 10:56:38 +08:00
Xiang Zhang 95403d74d7 Issue #29145: Merge 3.5. 2017-01-10 10:54:19 +08:00
Xiang Zhang b0541f4cdf Issue #29145: Fix overflow checks in str.replace() and str.join().
Based on patch by Martin Panter.
2017-01-10 10:52:00 +08:00
Xiang Zhang 62497d52d9 Issue #29044: Merge 3.6. 2016-12-22 15:31:55 +08:00
Xiang Zhang 437a5d2c25 Issue #29044: Merge 3.5. 2016-12-22 15:31:22 +08:00
Xiang Zhang ea1cf87030 Issue #29044: Fix a use-after-free in string '%c' formatter. 2016-12-22 15:30:47 +08:00
Xiang Zhang b211068f5c Issue #28822: Adjust indices handling of PyUnicode_FindChar(). 2016-12-20 22:52:33 +08:00
Xavier de Gaye 31eaf49ed9 Merge 3.6. 2016-12-15 21:01:52 +01:00
Xavier de Gaye 76febd0792 Issue #26919: On Android, operating system data is now always encoded/decoded
to/from UTF-8, instead of the locale encoding to avoid inconsistencies with
os.fsencode() and os.fsdecode() which are already using UTF-8.
2016-12-15 20:59:58 +01:00
Serhiy Storchaka fb3134f4d4 Issue #28808: PyUnicode_CompareWithASCIIString() now never raises exceptions. 2016-12-06 00:20:26 +02:00
Serhiy Storchaka 9a953dbb34 Issue #28808: PyUnicode_CompareWithASCIIString() now never raises exceptions. 2016-12-06 00:17:45 +02:00
Serhiy Storchaka 419967b832 Issue #28808: PyUnicode_CompareWithASCIIString() now never raises exceptions. 2016-12-06 00:13:34 +02:00
Victor Stinner de4ae3d486 Backed out changeset b9c9691c72c5
Issue #28858: The change b9c9691c72c5 introduced a regression. It seems like
_PyObject_CallArg1() uses more stack memory than
PyObject_CallFunctionObjArgs().
2016-12-04 22:59:09 +01:00