Commit Graph

1668 Commits

Author SHA1 Message Date
Eric Snow 89e67ada69
gh-100227: Revert gh-102925 "gh-100227: Make the Global Interned Dict Safe for Isolated Interpreters" (gh-103063)
This reverts commit 87be8d9.

This approach to keeping the interned strings safe is turning out to be too complex for my taste (due to obmalloc isolation). For now I'm going with the simpler solution, making the dict per-interpreter. We can revisit that later if we want a sharing solution.
2023-03-27 16:53:05 -06:00
Eric Snow 87be8d9522
gh-100227: Make the Global Interned Dict Safe for Isolated Interpreters (gh-102925)
This is effectively two changes.  The first (the bulk of the change) is where we add _Py_AddToGlobalDict() (and _PyRuntime.cached_objects.main_tstate, etc.).  The second (much smaller) change is where we update PyUnicode_InternInPlace() to use _Py_AddToGlobalDict() instead of calling PyDict_SetDefault() directly.

Basically, _Py_AddToGlobalDict() is a wrapper around PyDict_SetDefault() that should be used whenever we need to add a value to a runtime-global dict object (in the few cases where we are leaving the container global rather than moving it to PyInterpreterState, e.g. the interned strings dict).  _Py_AddToGlobalDict() does all the necessary work to make sure the target global dict is shared safely between isolated interpreters.  This is especially important as we move the obmalloc state to each interpreter (gh-101660), as well as, potentially, the GIL (PEP 684).

https://github.com/python/cpython/issues/100227
2023-03-22 18:30:04 -06:00
Kumar Aditya 3d872a74c8
GH-100227: cleanup initialization of global interned dict (#102682) 2023-03-14 14:22:21 +05:30
Eric Snow cbb0aa71d0
gh-102304: Consolidate Direct Usage of _Py_RefTotal (gh-102514)
This simplifies further changes to _Py_RefTotal (e.g. make it atomic or move it to PyInterpreterState).

https://github.com/python/cpython/issues/102304
2023-03-08 12:03:50 -07:00
Jelle Zijlstra 8d0f09b1be
gh-101765: unicodeobject: use Py_XDECREF correctly (#102283) 2023-02-26 14:45:37 -08:00
Jelle Zijlstra d71edbd1b7
gh-101765: Fix refcount issues in list and unicode pickling (#102265)
Followup from #101769.
2023-02-25 16:01:58 -08:00
Ionite 54dfa14c5a
gh-101765: Fix SystemError / segmentation fault in iter `__reduce__` when internal access of `builtins.__dict__` exhausts the iterator (#101769) 2023-02-24 15:02:04 -08:00
Eric Snow aa8591e9ca
gh-90111: Minor Cleanup for Runtime-Global Objects (gh-100254)
* move _PyRuntime.global_objects.interned to _PyRuntime.cached_objects.interned_strings (and use _Py_CACHED_OBJECT())
* rename _PyRuntime.global_objects to _PyRuntime.static_objects

(This also relates to gh-96075.)

https://github.com/python/cpython/issues/90111
2022-12-14 11:53:57 -07:00
Eric Snow 91a8e002c2
gh-81057: Move More Globals to _PyRuntimeState (gh-100092)
https://github.com/python/cpython/issues/81057
2022-12-07 15:56:31 -07:00
Serhiy Storchaka a87c46eab3
bpo-15999: Accept arbitrary values for boolean parameters. (#15609)
builtins and extension module functions and methods that expect boolean values for parameters now accept any Python object rather than just a bool or int type. This is more consistent with how native Python code itself behaves.
2022-12-03 11:52:21 -08:00
Serhiy Storchaka f08e52ccb0
gh-99612: Fix PyUnicode_DecodeUTF8Stateful() for ASCII-only data (GH-99613)
Previously *consumed was not set in this case.
2022-12-01 14:54:51 +02:00
Victor Stinner 135ec7cefb
gh-99537: Use Py_SETREF() function in C code (#99657)
Fix potential race condition in code patterns:

* Replace "Py_DECREF(var); var = new;" with "Py_SETREF(var, new);"
* Replace "Py_XDECREF(var); var = new;" with "Py_XSETREF(var, new);"
* Replace "Py_CLEAR(var); var = new;" with "Py_XSETREF(var, new);"

Other changes:

* Replace "old = var; var = new; Py_DECREF(var)"
  with "Py_SETREF(var, new);"
* Replace "old = var; var = new; Py_XDECREF(var)"
  with "Py_XSETREF(var, new);"
* And remove the "old" variable.
2022-11-22 13:39:11 +01:00
Eric Snow 5f55067e23
gh-81057: Move More Globals in Core Code to _PyRuntimeState (gh-99516)
https://github.com/python/cpython/issues/81057
2022-11-16 09:37:14 -07:00
Victor Stinner 1960eb005e
gh-99300: Use Py_NewRef() in Objects/ directory (#99351)
Replace Py_INCREF() and Py_XINCREF() with Py_NewRef() and
Py_XNewRef() in C files of the Objects/ directory.
2022-11-10 23:40:31 +01:00
Eric Snow 52f91c642b
gh-90868: Adjust the Generated Objects (gh-99223)
We do the following:

* move the generated _PyUnicode_InitStaticStrings() to its own file
* move the generated _PyStaticObjects_CheckRefcnt() to its own file
* include pycore_global_objects.h in extension modules instead of pycore_runtime_init.h

These changes help us avoid including things that aren't needed.

https://github.com/python/cpython/issues/90868
2022-11-08 10:03:03 -07:00
Nikita Sobolev 76f989dc3e
gh-98783: Fix crashes when `str` subclasses are used in `_PyUnicode_Equal` (#98806) 2022-10-30 02:23:20 -04:00
Victor Stinner db03c8066a
gh-98393: os module reject bytes-like, only accept bytes (#98394)
The os module and the PyUnicode_FSDecoder() function no longer accept
bytes-like paths, like bytearray and memoryview types: only the exact
bytes type is accepted for bytes strings.
2022-10-18 17:52:31 +02:00
Nikita Sobolev ccab67ba79
gh-97982: Factorize PyUnicode_Count() and unicode_count() code (#98025)
Add unicode_count_impl() to factorize PyUnicode_Count()
and unicode_count() code.
2022-10-12 18:27:53 +02:00
Victor Stinner df3a6d9beb
gh-97982: Remove asciilib_count() (#98164)
asciilib_count() is the same than ucs1lib_count(): the code is not
specialized for ASCII strings, so it's not worth it to have a
separated function. Remove asciilib_count() function.
2022-10-11 17:59:58 +02:00
Kumar Aditya 6dab8c95bd
GH-96458: Statically initialize utf8 representation of static strings (#96481) 2022-09-02 23:43:08 -07:00
Kumar Aditya 129998bd7b
GH-96075: move interned dict under runtime state (GH-96077) 2022-08-22 12:05:21 -07:00
Petr Viktorin 71c3d649b5
gh-95504: Fix negative numbers in PyUnicode_FromFormat (GH-95848)
Co-authored-by: philg314 <110174000+philg314@users.noreply.github.com>
2022-08-10 13:12:40 +02:00
Serhiy Storchaka 62f06508e7
gh-95781: More strict format string checking in PyUnicode_FromFormatV() (GH-95784)
An unrecognized format character in PyUnicode_FromFormat() and
PyUnicode_FromFormatV() now sets a SystemError.
In previous versions it caused all the rest of the format string to be
copied as-is to the result string, and any extra arguments discarded.
2022-08-08 19:21:07 +03:00
Dong-hee Na fb75d015f4
gh-91146: More reduce allocation size of list from str.split/rsplit (gh-95493)
Co-authored-by: Inada Naoki <songofacandy@gmail.com>
2022-08-01 22:15:07 +09:00
Dong-hee Na 50b2261bda
gh-91146: Reduce allocation size of list from str.split()/rsplit() (gh-95473) 2022-07-31 12:14:53 +09:00
Pamela Fox 70068b9336
Fix Unicode doc and replace use of macro with PyMem_New function (GH-94088) 2022-07-28 23:32:16 +01:00
Eric Snow 4a1dd73431
gh-94673: Add _PyStaticType_InitBuiltin() (#95152)
This is the first of several precursors to storing tp_subclasses (and tp_weaklist) on the interpreter state for static builtin types.

We do the following:

* add `_PyStaticType_InitBuiltin()`
* add `_Py_TPFLAGS_STATIC_BUILTIN`
* set it on all static builtin types in `_PyStaticType_InitBuiltin()`
* shuffle some code around to be able to use _PyStaticType_InitBuiltin()
    * rename `_PyStructSequence_InitType()` to `_PyStructSequence_InitBuiltinWithFlags()`
    * add `_PyStructSequence_InitBuiltin()`.
2022-07-25 12:47:31 -06:00
Kumar Aditya 9dff9f4814
GH-90699: Intern statically allocated strings (GH-93597)
This is similar to how strings are interned for deepfreeze.
2022-07-08 10:47:37 -07:00
Eric Snow caa279d6fd
bpo-40514: Drop EXPERIMENTAL_ISOLATED_SUBINTERPRETERS (gh-93185)
This was added for bpo-40514 (gh-84694) to test out a per-interpreter GIL. However, it has since proven unnecessary to keep the experiment in the repo. (It can be done as a branch in a fork like normal.) So here we are removing:

* the configure option
* the macro
* the code enabled by the macro
2022-05-27 17:38:01 -06:00
Kumar Aditya cb04a09d2d
GH-93207: Remove HAVE_STDARG_PROTOTYPES configure check for stdarg.h (#93215) 2022-05-27 13:30:45 +02:00
Victor Stinner 5f8c3fb997
gh-91924: Optimize unicode_check_encoding_errors() (#93200)
Avoid _PyCodec_Lookup() and PyCodec_LookupError() for most common
built-in encodings and error handlers to avoid creating a temporary
Unicode string object, whereas these encodings and error handlers are
known to be valid.
2022-05-27 00:39:49 +02:00
Victor Stinner 059b5baf98
gh-85858: Remove PyUnicode_InternImmortal() function (#92579)
Remove the PyUnicode_InternImmortal() function and the
SSTATE_INTERNED_IMMORTAL macro.

The PyUnicode_InternImmortal() function is still exported in the
stable ABI. The function is removed from the API.

PyASCIIObject.state.interned size is now a single bit, rather than 2
bits.

Keep SSTATE_NOT_INTERNED and SSTATE_INTERNED_MORTAL macros for
backward compatibility, but no longer use them internally since the
interned member is now a single bit and so can only have two values
(interned or not interned).

Update stats of _PyUnicode_ClearInterned().
2022-05-13 13:40:22 +02:00
Victor Stinner f62ad4f2c4
gh-89653: Use int type for Unicode kind (#92704)
Use the same type that PyUnicode_FromKindAndData() kind parameter
type (public C API): int.
2022-05-13 12:41:05 +02:00
Inada Naoki f9c9354a7a
gh-92536: PEP 623: Remove wstr and legacy APIs from Unicode (GH-92537) 2022-05-12 14:48:38 +09:00
Victor Stinner 804f2529d8
gh-91320: Use _PyCFunction_CAST() (#92251)
Replace "(PyCFunction)(void(*)(void))func" cast with
_PyCFunction_CAST(func).

Change generated by the command:

sed -i -e \
  's!(PyCFunction)(void(\*)(void)) *\([A-Za-z0-9_]\+\)!_PyCFunction_CAST(\1)!g' \
  $(find -name "*.c")
2022-05-03 21:42:14 +02:00
Serhiy Storchaka 18b07d773e
bpo-36819: Fix crashes in built-in encoders with weird error handlers (GH-28593)
If the error handler returns position less or equal than the starting
position of non-encodable characters, most of built-in encoders didn't
properly re-size the output buffer. This led to out-of-bounds writes,
and segfaults.
2022-05-02 12:37:48 +03:00
Serhiy Storchaka 3483299a24
gh-81548: Deprecate octal escape sequences with value larger than 0o377 (GH-91668) 2022-04-30 13:16:27 +03:00
Dennis Sweeney da6c78584b
gh-90667: Add specializations of Py_DECREF when types are known (GH-30872) 2022-04-19 19:02:19 +01:00
Kumar Aditya ab0d35d70d
bpo-46712: share more global strings in deepfreeze (gh-32152)
(for gh-90868)
2022-04-19 11:41:36 -06:00
Oleg Iarygin 2f0fc521f4
gh-91102: Use Argument Clinic for EncodingMap (#31725)
Co-authored-by: Jelle Zijlstra <jelle.zijlstra@gmail.com>
2022-04-18 13:43:56 -07:00
Kumar Aditya 8c54c3dacc
gh-91576: Speed up iteration of strings (#91574) 2022-04-18 07:18:27 -07:00
Tobias Stoeckmann 0859368335
gh-91421: Use constant value check during runtime (GH-91422)
The left-hand side expression of the if-check can be converted to a
constant by the compiler, but the addition on the right-hand side is
performed during runtime.

Move the addition from the right-hand side to the left-hand side by
turning it into a subtraction there. Since the values are known to
be large enough to not turn negative, this is a safe operation.

Prevents a very unlikely integer overflow on 32 bit systems.

Fixes GH-91421.
2022-04-12 20:01:02 -07:00
John Belmonte b0b836b20c
bpo-45995: add "z" format specifer to coerce negative 0 to zero (GH-30049)
Add "z" format specifier to coerce negative 0 to zero.

See https://github.com/python/cpython/issues/90153 (originally https://bugs.python.org/issue45995) for discussion.
This covers `str.format()` and f-strings.  Old-style string interpolation is not supported.

Co-authored-by: Mark Dickinson <dickinsm@gmail.com>
2022-04-11 15:34:18 +01:00
Raymond Hettinger d6fb104690
Fix bad grammar and import docstring for split/rsplit (GH-32381) 2022-04-08 08:36:20 -05:00
Christian Heimes 44e915028d
bpo-47182: Fix crash by named unicode characters after interpreter reinitialization (GH-32212)
Automerge-Triggered-By: GH:tiran
2022-03-31 08:14:50 -07:00
Victor Stinner c14d7e4b81
bpo-47164: Add _PyASCIIObject_CAST() macro (GH-32191)
Add macros to cast objects to PyASCIIObject*, PyCompactUnicodeObject*
and PyUnicodeObject*: _PyASCIIObject_CAST(),
_PyCompactUnicodeObject_CAST() and _PyUnicodeObject_CAST(). Using
these new macros make the code more readable and check their argument
with: assert(PyUnicode_Check(op)).

Remove redundant assert(PyUnicode_Check(op)) in macros using directly
or indirectly these new CAST macros.

Replacing existing casts with these macros.
2022-03-31 09:59:27 +02:00
Pieter Eendebak 850687df47
bpo-47070: Add _PyBytes_Repeat() (GH-31999)
Use it where appropriate: the repeat functions of `array.array`, `bytes`, `bytearray`, and `str`.
2022-03-28 04:43:45 -04:00
Jeremy Kloth 88872a29f1
bpo-47084: Clear Unicode cached representations on finalization (GH-32032) 2022-03-22 13:53:51 +01:00
Oleg Iarygin a52f82baf2
bpo-46920: Remove disabled debug code added decades ago and likely unnecessary (GH-31812) 2022-03-14 17:03:21 +01:00
Jelle Zijlstra 54ab9ad312
bpo-46881: Fix refleak from GH-31616 (GH-31805) 2022-03-11 17:05:08 +08:00
Kumar Aditya 8714b6fa27
bpo-46881: Statically allocate and initialize the latin1 characters. (GH-31616) 2022-03-09 15:02:00 -08:00
Eric Snow 81c72044a1
bpo-46541: Replace core use of _Py_IDENTIFIER() with statically initialized global objects. (gh-30928)
We're no longer using _Py_IDENTIFIER() (or _Py_static_string()) in any core CPython code.  It is still used in a number of non-builtin stdlib modules.

The replacement is: PyUnicodeObject (not pointer) fields under _PyRuntimeState, statically initialized as part of _PyRuntime.  A new _Py_GET_GLOBAL_IDENTIFIER() macro facilitates lookup of the fields (along with _Py_GET_GLOBAL_STRING() for non-identifier strings).

https://bugs.python.org/issue46541#msg411799 explains the rationale for this change.

The core of the change is in:

* (new) Include/internal/pycore_global_strings.h - the declarations for the global strings, along with the macros
* Include/internal/pycore_runtime_init.h - added the static initializers for the global strings
* Include/internal/pycore_global_objects.h - where the struct in pycore_global_strings.h is hooked into _PyRuntimeState
* Tools/scripts/generate_global_objects.py - added generation of the global string declarations and static initializers

I've also added a --check flag to generate_global_objects.py (along with make check-global-objects) to check for unused global strings.  That check is added to the PR CI config.

The remainder of this change updates the core code to use _Py_GET_GLOBAL_IDENTIFIER() instead of _Py_IDENTIFIER() and the related _Py*Id functions (likewise for _Py_GET_GLOBAL_STRING() instead of _Py_static_string()).  This includes adding a few functions where there wasn't already an alternative to _Py*Id(), replacing the _Py_Identifier * parameter with PyObject *.

The following are not changed (yet):

* stop using _Py_IDENTIFIER() in the stdlib modules
* (maybe) get rid of _Py_IDENTIFIER(), etc. entirely -- this may not be doable as at least one package on PyPI using this (private) API
* (maybe) intern the strings during runtime init

https://bugs.python.org/issue46541
2022-02-08 13:39:07 -07:00
Victor Stinner b556f53785
bpo-46670: Test if a macro is defined, not its value (GH-31178)
* audioop.c: #ifdef WORDS_BIGENDIAN
* ctypes.h: #ifdef USING_MALLOC_CLOSURE_DOT_C
* _ctypes/malloc_closure.c: #ifdef HAVE_FFI_CLOSURE_ALLOC
  and #ifdef USING_APPLE_OS_LIBFFI
* pytime.c: #ifdef __APPLE__
* unicodeobject.c: #ifdef HAVE_NON_UNICODE_WCHAR_T_REPRESENTATION
2022-02-07 01:46:51 +01:00
Victor Stinner 1626bf4ac7
bpo-46417: Clear Unicode static types at exit (GH-30806)
Add _PyUnicode_FiniTypes() function, called by
finalize_interp_types(). It clears these static types:

* EncodingMapType
* PyFieldNameIter_Type
* PyFormatterIter_Type

_PyStaticType_Dealloc() now does nothing if tp_subclasses
is not NULL.
2022-01-22 22:55:39 +01:00
Victor Stinner a1bf329bca
bpo-46417: Add missing types of _PyTypes_InitTypes() (GH-30749)
Add types removed by mistake by the commit adding
_PyTypes_FiniTypes().

Move also PyBool_Type at the end, since it depends on PyLong_Type.

PyBytes_Type and PyUnicode_Type no longer depend explicitly on
PyBaseObject_Type: it's the default of PyType_Ready().
2022-01-21 17:53:13 +01:00
Victor Stinner 35d6540c90
bpo-46006: Revert "bpo-40521: Per-interpreter interned strings (GH-20085)" (GH-30422)
This reverts commit ea251806b8.

Keep "assert(interned == NULL);" in _PyUnicode_Fini(), but only for
the main interpreter.

Keep _PyUnicode_ClearInterned() changes avoiding the creation of a
temporary Python list object.
2022-01-06 08:53:44 +01:00
Eric Snow c8749b5783
bpo-46008: Make runtime-global object/type lifecycle functions and state consistent. (gh-29998)
This change is strictly renames and moving code around.  It helps in the following ways:

* ensures type-related init functions focus strictly on one of the three aspects (state, objects, types)
* passes in PyInterpreterState * to all those functions, simplifying work on moving types/objects/state to the interpreter
* consistent naming conventions help make what's going on more clear
* keeping API related to a type in the corresponding header file makes it more obvious where to look for it

https://bugs.python.org/issue46008
2021-12-09 12:59:26 -07:00
Dennis Sweeney 03768c4d13
bpo-45885: Specialize COMPARE_OP (GH-29734)
* Add COMPARE_OP_ADAPTIVE adaptive instruction.

* Add COMPARE_OP_FLOAT_JUMP, COMPARE_OP_INT_JUMP and COMPARE_OP_STR_JUMP specialized instructions.

* Introduce and use _PyUnicode_Equal
2021-12-03 11:29:12 +00:00
Victor Stinner 5f09bb021a
bpo-35134: Add Include/cpython/longobject.h (GH-29044)
Move Include/longobject.h non-limited API to a new
Include/cpython/longobject.h header file.

Move the following definitions to the internal C API:

* _PyLong_DigitValue
* _PyLong_FormatAdvancedWriter()
* _PyLong_FormatWriter()
2021-10-19 02:04:52 +02:00
Serhiy Storchaka 39aa98346d
bpo-45467: Fix IncrementalDecoder and StreamReader in the "raw-unicode-escape" codec (GH-28944)
They support now splitting escape sequences between input chunks.

Add the third parameter "final" in codecs.raw_unicode_escape_decode().
It is True by default to match the former behavior.
2021-10-14 20:04:19 +03:00
Serhiy Storchaka c96d1546b1
bpo-45461: Fix IncrementalDecoder and StreamReader in the "unicode-escape" codec (GH-28939)
They support now splitting escape sequences between input chunks.

Add the third parameter "final" in codecs.unicode_escape_decode().
It is True by default to match the former behavior.
2021-10-14 13:17:00 +03:00
Christian Clauss 5f401f1040
Fix typos in the Objects directory (GH-28766) 2021-10-06 16:57:10 -07:00
Victor Stinner 8620be99da
bpo-45061: Revert unicode_is_singleton() change (GH-28516)
Don't use a loop over 256 items, only checks for a single singleton.
2021-09-22 12:16:53 +02:00
Mohamad Mansour 8f943ca257
[codemod] Fix non-matching bracket pairs (GH-28473)
Co-authored-by: Terry Jan Reedy <tjreedy@udel.edu>
Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
Co-authored-by: Łukasz Langa <lukasz@langa.pl>
2021-09-22 01:09:00 +02:00
Victor Stinner 86f28372b1
bpo-45061: Detect refcount bug on empty string singleton (GH-28504)
Detect refcount bugs in C extensions when the empty Unicode string
singleton is destroyed by mistake.

* Move forward declarations to the top of unicodeobject.c.
* Simplifiy unicode_is_singleton().
2021-09-21 23:43:09 +02:00
Miguel Brito ed1076428c
bpo-44110: Improve string's __getitem__ error message (GH-26042) 2021-06-27 15:04:57 +03:00
Serhiy Storchaka be8b631b7a
Add more const modifiers. (GH-26691) 2021-06-12 16:11:59 +03:00
Inada Naoki 9ad8f109ac
bpo-44029: Remove Py_UNICODE APIs (GH-25881)
Remove deprecated `Py_UNICODE` APIs: `PyUnicode_Encode`,
`PyUnicode_EncodeUTF7`, `PyUnicode_EncodeUTF8`,
`PyUnicode_EncodeUTF16`, `PyUnicode_EncodeUTF32`,
`PyUnicode_EncodeLatin1`, `PyUnicode_EncodeMBCS`,
`PyUnicode_EncodeDecimal`, `PyUnicode_EncodeRawUnicodeEscape`,
`PyUnicode_EncodeCharmap`, `PyUnicode_EncodeUnicodeEscape`,
`PyUnicode_TransformDecimalToASCII`, `PyUnicode_TranslateCharmap`,
`PyUnicodeEncodeError_Create`, `PyUnicodeTranslateError_Create`.

See :pep:`393` and :pep:`624` for reference.
2021-05-07 15:58:29 +09:00
Jakub Kulík 9032cf5cb1
bpo-43667: Fix broken Unicode encoding in non-UTF locales on Solaris (GH-25096) 2021-04-30 15:21:42 +02:00
Victor Stinner 442ad74fc2
bpo-43687: Py_Initialize() creates singletons earlier (GH-25147)
Reorganize pycore_interp_init() to initialize singletons before the
the first PyType_Ready() call. Fix an issue when Python is configured
using --without-doc-strings.
2021-04-02 15:28:13 +02:00
Jessica Clarke dec0757549
bpo-43179: Generalise alignment for optimised string routines (GH-24624)
* Remove m68k-specific hack from ascii_decode

On m68k, alignments of primitives is more relaxed, with 4-byte and
8-byte types only requiring 2-byte alignment, thus using sizeof(size_t)
does not work. Instead, use the portable alternative.

Note that this is a minimal fix that only relaxes the assertion and the
condition for when to use the optimised version remains overly strict.
Such issues will be fixed tree-wide in the next commit.

NB: In C11 we could use _Alignof(size_t) instead, but for compatibility
we use autoconf.

* Optimise string routines for architectures with non-natural alignment

C only requires that sizeof(x) is a multiple of alignof(x), not that the
two are equal. Thus anywhere where we optimise based on alignment we
should be using alignof(x) not sizeof(x).

This is more annoying than it would be in C11 where we could just use
_Alignof(x) (and alignof(x) in C++11), but since we still require only
C99 we must plumb the information all the way from autoconf through the
various typedefs and defines.
2021-03-31 12:12:39 +02:00
Victor Stinner 9976834f80
bpo-35883: Py_DecodeLocale() escapes invalid Unicode characters (GH-24843)
Python no longer fails at startup with a fatal error if a command
line argument contains an invalid Unicode character.

The Py_DecodeLocale() function now escapes byte sequences which would
be decoded as Unicode characters outside the [U+0000; U+10ffff]
range.

Use MAX_UNICODE constant in unicodeobject.c.
2021-03-17 21:46:53 +01:00
Brandt Bucher 145bf269df
bpo-42128: Structural Pattern Matching (PEP 634) (GH-22917)
Co-authored-by: Guido van Rossum <guido@python.org>
Co-authored-by: Talin <viridia@gmail.com>
Co-authored-by: Pablo Galindo <pablogsal@gmail.com>
2021-02-26 14:51:55 -08:00
Victor Stinner bcb094b41f
bpo-43268: Pass interp rather than tstate to internal functions (GH-24580)
Pass the current interpreter (interp) rather than the current Python
thread state (tstate) to internal functions which only use the
interpreter.

Modified functions:

* _PyXXX_Fini() and _PyXXX_ClearFreeList() functions
* _PyEval_SignalAsyncExc(), make_pending_calls()
* _PySys_GetObject(), sys_set_object(), sys_set_object_id(), sys_set_object_str()
* should_audit(), set_flags_from_config(), make_flags()
* _PyAtExit_Call()
* init_stdio_encoding()
* etc.
2021-02-19 15:10:45 +01:00
Victor Stinner 101bf69ff1
bpo-43268: _Py_IsMainInterpreter() now expects interp (GH-24577)
The _Py_IsMainInterpreter() function now expects interp rather than
tstate.
2021-02-19 13:33:31 +01:00
Pablo Galindo a6d63a20df
Fix compiler warnings regarding loss of data (GH-23983) 2020-12-29 00:28:09 +00:00
Victor Stinner f4507231e3
bpo-42745: finalize_interp_types() calls _PyType_Fini() (GH-23953)
Call _PyType_Fini() in subinterpreters.

Fix reference leaks in subinterpreters.
2020-12-26 20:26:08 +01:00
Victor Stinner ea251806b8
bpo-40521: Per-interpreter interned strings (GH-20085)
Make the Unicode dictionary of interned strings compatible with
subinterpreters.

Remove the INTERN_NAME_STRINGS macro in typeobject.c: names are
always now interned (even if EXPERIMENTAL_ISOLATED_SUBINTERPRETERS
macro is defined).

_PyUnicode_ClearInterned() now uses PyDict_Next() to no longer
allocate memory, to ensure that the interned dictionary is cleared.
2020-12-26 02:58:33 +01:00
Victor Stinner ba3d67c2fb
bpo-39465: Fix _PyUnicode_FromId() for subinterpreters (GH-20058)
Make _PyUnicode_FromId() function compatible with subinterpreters.
Each interpreter now has an array of identifier objects (interned
strings decoded from UTF-8).

* Add PyInterpreterState.unicode.identifiers: array of identifiers
  objects.
* Add _PyRuntimeState.unicode_ids used to allocate unique indexes
  to _Py_Identifier.
* Rewrite the _Py_Identifier structure.

Microbenchmark on _PyUnicode_FromId(&PyId_a) with _Py_IDENTIFIER(a):

[ref] 2.42 ns +- 0.00 ns -> [atomic] 3.39 ns +- 0.00 ns: 1.40x slower

This change adds 1 ns per _PyUnicode_FromId() call in average.
2020-12-26 00:41:46 +01:00
Serhiy Storchaka 2ad93821a6
bpo-42431: Fix outdated bytes comments (GH-23458)
Also move definitions of internal macros F_LJUST etc to private header.
2020-12-03 12:46:16 +02:00
Victor Stinner 32bd68c839
bpo-42519: Replace PyObject_MALLOC() with PyObject_Malloc() (GH-23587)
No longer use deprecated aliases to functions:

* Replace PyObject_MALLOC() with PyObject_Malloc()
* Replace PyObject_REALLOC() with PyObject_Realloc()
* Replace PyObject_FREE() with PyObject_Free()
* Replace PyObject_Del() with PyObject_Free()
* Replace PyObject_DEL() with PyObject_Free()
2020-12-01 10:37:39 +01:00
Victor Stinner 00d7abd7ef
bpo-42519: Replace PyMem_MALLOC() with PyMem_Malloc() (GH-23586)
No longer use deprecated aliases to functions:

* Replace PyMem_MALLOC() with PyMem_Malloc()
* Replace PyMem_REALLOC() with PyMem_Realloc()
* Replace PyMem_FREE() with PyMem_Free()
* Replace PyMem_Del() with PyMem_Free()
* Replace PyMem_DEL() with PyMem_Free()

Modify also the PyMem_DEL() macro to use directly PyMem_Free().
2020-12-01 09:56:42 +01:00
Christian Heimes 07f2adedf0
bpo-40998: Address compiler warnings found by ubsan (GH-20929)
Signed-off-by: Christian Heimes <christian@python.org>

Automerge-Triggered-By: GH:tiran
2020-11-18 07:38:53 -08:00
Ikko Ashimine 38811d68ca
Fix typo in unicodeobject.c (GH-23180)
exeeds -> exceeds

Automerge-Triggered-By: GH:Mariatta
2020-11-09 21:57:34 -08:00
Victor Stinner 920cb647ba
bpo-42157: unicodedata avoids references to UCD_Type (GH-22990)
* UCD_Check() uses PyModule_Check()
* Simplify the internal _PyUnicode_Name_CAPI structure:

  * Remove size and state members
  * Remove state and self parameters of getcode() and getname()
    functions

* Remove global_module_state
2020-10-26 19:19:36 +01:00
Victor Stinner 47e1afd2a1
bpo-1635741: _PyUnicode_Name_CAPI moves to internal C API (GH-22713)
The private _PyUnicode_Name_CAPI structure of the PyCapsule API
unicodedata.ucnhash_CAPI moves to the internal C API. Moreover, the
structure gets a new state member which must be passed to the
getcode() and getname() functions.

* Move Include/ucnhash.h to Include/internal/pycore_ucnhash.h
* unicodedata module is now built with Py_BUILD_CORE_MODULE.
* unicodedata: move hashAPI variable into unicodedata_module_state.
2020-10-26 16:43:47 +01:00
Ma Lin a0c603cb9d
bpo-38252: Use 8-byte step to detect ASCII sequence in 64bit Windows build (GH-16334) 2020-10-18 17:48:38 +03:00
Max Bernstein 3635388f52
bpo-42065: Fix incorrectly formatted _codecs.charmap_decode error message (GH-19940) 2020-10-17 23:38:21 +03:00
Serhiy Storchaka e2ec0b27c0
bpo-41974: Remove complex.__float__, complex.__floordiv__, etc (GH-22593)
Remove complex special methods __int__, __float__, __floordiv__,
__mod__, __divmod__, __rfloordiv__, __rmod__ and __rdivmod__
which always raised a TypeError.
2020-10-09 14:14:37 +03:00
Serhiy Storchaka 9ece9cd65c
bpo-41909: Enable previously disabled recursion checks. (GH-22536)
Enable recursion checks which were disabled when get __bases__ of
non-type objects in issubclass() and isinstance() and when intern
strings. It fixes a stack overflow when getting __bases__ leads
to infinite recursion.

Originally recursion checks was disabled for PyDict_GetItem() which
silences all errors including the one raised in case of detected
recursion and can return incorrect result. But now the code uses
PyDict_GetItemWithError() and PyDict_SetDefault() instead.
2020-10-05 00:55:57 +03:00
Victor Stinner 583ee5a5b1
bpo-41692: Deprecate PyUnicode_InternImmortal() (GH-22486)
The PyUnicode_InternImmortal() function is now deprecated and will be
removed in Python 3.12: use PyUnicode_InternInPlace() instead.
2020-10-02 14:49:00 +02:00
Victor Stinner 7f413a5d95
bpo-40521: Fix PyUnicode_InternInPlace() (GH-22376)
Fix PyUnicode_InternInPlace() when the INTERNED_STRINGS macro is not
defined (when the EXPERIMENTAL_ISOLATED_SUBINTERPRETERS macro is
defined).
2020-09-23 14:05:32 +02:00
Victor Stinner bb083d33f7
bpo-1635741: Port _string module to multi-phase init (GH-22148)
Port the _string extension module to the multi-phase initialization
API (PEP 489).
2020-09-08 15:33:08 +02:00
Serhiy Storchaka 12f433411b
bpo-41334: Convert constructors of str, bytes and bytearray to Argument Clinic (GH-21535) 2020-07-20 15:53:55 +03:00
Serhiy Storchaka 4c8f09d7ce
bpo-36346: Make using the legacy Unicode C API optional (GH-21437)
Add compile time option USE_UNICODE_WCHAR_CACHE. Setting it to 0
makes the interpreter not using the wchar_t cache and the legacy Unicode C API.
2020-07-10 23:26:06 +03:00
Victor Stinner 8182cc2e68
bpo-39573: Use the Py_TYPE() macro (GH-21433)
Replace obj->ob_type with Py_TYPE(obj).
2020-07-10 12:40:38 +02:00
Serhiy Storchaka b3dd5cd4a3
bpo-36346: Undeprecate private function _PyUnicode_AsUnicode(). (GH-21336) 2020-07-05 18:53:45 +03:00
Victor Stinner 3549ca313a
bpo-1635741: Fix unicode_dealloc() for mortal interned string (GH-21270)
When unicode_dealloc() is called on a mortal interned string, the
string reference counter is now reset at zero.
2020-07-03 16:59:12 +02:00
Victor Stinner 666ecfb095
bpo-1635741: Release Unicode interned strings at exit (GH-21269)
* PyUnicode_InternInPlace() now ensures that interned strings are
  ready.
* Add _PyUnicode_ClearInterned().
* Py_Finalize() now releases Unicode interned strings:
  call _PyUnicode_ClearInterned().
2020-07-02 01:19:57 +02:00
Inada Naoki 038dd0f79d
bpo-36346: Raise DeprecationWarning when creating legacy Unicode (GH-20933) 2020-06-30 15:26:56 +09:00