Commit Graph

1626 Commits

Author SHA1 Message Date
Inada Naoki e92ac0a741
Fix compiler warning in unicodeobject.c (#105050) 2023-05-29 17:31:03 +09:00
Serhiy Storchaka f3466bc040
gh-98836: Extend PyUnicode_FromFormat() (GH-98838)
* Support for conversion specifiers o (octal) and X (uppercase hexadecimal).
* Support for length modifiers j (intmax_t) and t (ptrdiff_t).
* Length modifiers are now applied to all integer conversions.
* Support for wchar_t C strings (%ls and %lV).
* Support for variable width and precision (*).
* Support for flag - (left alignment).
2023-05-22 00:32:39 +03:00
John Belmonte 69621d1b09
gh-104018: remove unused format "z" handling in string formatfloat() (#104107)
This is a cleanup overlooked in PR #104033.
2023-05-07 10:11:42 +05:30
Eric Snow a9c6e0618f
gh-99113: Add Py_MOD_PER_INTERPRETER_GIL_SUPPORTED (gh-104205)
Here we are doing no more than adding the value for Py_mod_multiple_interpreters and using it for stdlib modules.  We will start checking for it in gh-104206 (once PyInterpreterState.ceval.own_gil is added in gh-104204).
2023-05-05 21:11:27 +00:00
Eric Snow fdd878650d
gh-94673: Properly Initialize and Finalize Static Builtin Types for Each Interpreter (gh-104072)
Until now, we haven't been initializing nor finalizing the per-interpreter state properly.
2023-05-01 19:36:00 -06:00
Eric Snow d2e2e53f73
gh-94673: Ensure Builtin Static Types are Readied Properly (gh-103940)
There were cases where we do unnecessary work for builtin static types. This also simplifies some work necessary for a per-interpreter GIL.
2023-04-27 16:19:43 -06:00
Eddie Elizondo ea2c001650
gh-84436: Implement Immortal Objects (gh-19474)
This is the implementation of PEP683

Motivation:

The PR introduces the ability to immortalize instances in CPython which bypasses reference counting. Tagging objects as immortal allows up to skip certain operations when we know that the object will be around for the entire execution of the runtime.

Note that this by itself will bring a performance regression to the runtime due to the extra reference count checks. However, this brings the ability of having truly immutable objects that are useful in other contexts such as immutable data sharing between sub-interpreters.
2023-04-22 13:39:37 -06:00
Eric Snow ba65a065cf
gh-100227: Move the Dict of Interned Strings to PyInterpreterState (gh-102339)
We can revisit the options for keeping it global later, if desired.  For now the approach seems quite complex, so we've gone with the simpler isolation solution in the meantime.

https://github.com/python/cpython/issues/100227
2023-03-28 12:52:28 -06:00
Eric Snow 89e67ada69
gh-100227: Revert gh-102925 "gh-100227: Make the Global Interned Dict Safe for Isolated Interpreters" (gh-103063)
This reverts commit 87be8d9.

This approach to keeping the interned strings safe is turning out to be too complex for my taste (due to obmalloc isolation). For now I'm going with the simpler solution, making the dict per-interpreter. We can revisit that later if we want a sharing solution.
2023-03-27 16:53:05 -06:00
Eric Snow 87be8d9522
gh-100227: Make the Global Interned Dict Safe for Isolated Interpreters (gh-102925)
This is effectively two changes.  The first (the bulk of the change) is where we add _Py_AddToGlobalDict() (and _PyRuntime.cached_objects.main_tstate, etc.).  The second (much smaller) change is where we update PyUnicode_InternInPlace() to use _Py_AddToGlobalDict() instead of calling PyDict_SetDefault() directly.

Basically, _Py_AddToGlobalDict() is a wrapper around PyDict_SetDefault() that should be used whenever we need to add a value to a runtime-global dict object (in the few cases where we are leaving the container global rather than moving it to PyInterpreterState, e.g. the interned strings dict).  _Py_AddToGlobalDict() does all the necessary work to make sure the target global dict is shared safely between isolated interpreters.  This is especially important as we move the obmalloc state to each interpreter (gh-101660), as well as, potentially, the GIL (PEP 684).

https://github.com/python/cpython/issues/100227
2023-03-22 18:30:04 -06:00
Kumar Aditya 3d872a74c8
GH-100227: cleanup initialization of global interned dict (#102682) 2023-03-14 14:22:21 +05:30
Eric Snow cbb0aa71d0
gh-102304: Consolidate Direct Usage of _Py_RefTotal (gh-102514)
This simplifies further changes to _Py_RefTotal (e.g. make it atomic or move it to PyInterpreterState).

https://github.com/python/cpython/issues/102304
2023-03-08 12:03:50 -07:00
Jelle Zijlstra 8d0f09b1be
gh-101765: unicodeobject: use Py_XDECREF correctly (#102283) 2023-02-26 14:45:37 -08:00
Jelle Zijlstra d71edbd1b7
gh-101765: Fix refcount issues in list and unicode pickling (#102265)
Followup from #101769.
2023-02-25 16:01:58 -08:00
Ionite 54dfa14c5a
gh-101765: Fix SystemError / segmentation fault in iter `__reduce__` when internal access of `builtins.__dict__` exhausts the iterator (#101769) 2023-02-24 15:02:04 -08:00
Eric Snow aa8591e9ca
gh-90111: Minor Cleanup for Runtime-Global Objects (gh-100254)
* move _PyRuntime.global_objects.interned to _PyRuntime.cached_objects.interned_strings (and use _Py_CACHED_OBJECT())
* rename _PyRuntime.global_objects to _PyRuntime.static_objects

(This also relates to gh-96075.)

https://github.com/python/cpython/issues/90111
2022-12-14 11:53:57 -07:00
Eric Snow 91a8e002c2
gh-81057: Move More Globals to _PyRuntimeState (gh-100092)
https://github.com/python/cpython/issues/81057
2022-12-07 15:56:31 -07:00
Serhiy Storchaka a87c46eab3
bpo-15999: Accept arbitrary values for boolean parameters. (#15609)
builtins and extension module functions and methods that expect boolean values for parameters now accept any Python object rather than just a bool or int type. This is more consistent with how native Python code itself behaves.
2022-12-03 11:52:21 -08:00
Serhiy Storchaka f08e52ccb0
gh-99612: Fix PyUnicode_DecodeUTF8Stateful() for ASCII-only data (GH-99613)
Previously *consumed was not set in this case.
2022-12-01 14:54:51 +02:00
Victor Stinner 135ec7cefb
gh-99537: Use Py_SETREF() function in C code (#99657)
Fix potential race condition in code patterns:

* Replace "Py_DECREF(var); var = new;" with "Py_SETREF(var, new);"
* Replace "Py_XDECREF(var); var = new;" with "Py_XSETREF(var, new);"
* Replace "Py_CLEAR(var); var = new;" with "Py_XSETREF(var, new);"

Other changes:

* Replace "old = var; var = new; Py_DECREF(var)"
  with "Py_SETREF(var, new);"
* Replace "old = var; var = new; Py_XDECREF(var)"
  with "Py_XSETREF(var, new);"
* And remove the "old" variable.
2022-11-22 13:39:11 +01:00
Eric Snow 5f55067e23
gh-81057: Move More Globals in Core Code to _PyRuntimeState (gh-99516)
https://github.com/python/cpython/issues/81057
2022-11-16 09:37:14 -07:00
Victor Stinner 1960eb005e
gh-99300: Use Py_NewRef() in Objects/ directory (#99351)
Replace Py_INCREF() and Py_XINCREF() with Py_NewRef() and
Py_XNewRef() in C files of the Objects/ directory.
2022-11-10 23:40:31 +01:00
Eric Snow 52f91c642b
gh-90868: Adjust the Generated Objects (gh-99223)
We do the following:

* move the generated _PyUnicode_InitStaticStrings() to its own file
* move the generated _PyStaticObjects_CheckRefcnt() to its own file
* include pycore_global_objects.h in extension modules instead of pycore_runtime_init.h

These changes help us avoid including things that aren't needed.

https://github.com/python/cpython/issues/90868
2022-11-08 10:03:03 -07:00
Nikita Sobolev 76f989dc3e
gh-98783: Fix crashes when `str` subclasses are used in `_PyUnicode_Equal` (#98806) 2022-10-30 02:23:20 -04:00
Victor Stinner db03c8066a
gh-98393: os module reject bytes-like, only accept bytes (#98394)
The os module and the PyUnicode_FSDecoder() function no longer accept
bytes-like paths, like bytearray and memoryview types: only the exact
bytes type is accepted for bytes strings.
2022-10-18 17:52:31 +02:00
Nikita Sobolev ccab67ba79
gh-97982: Factorize PyUnicode_Count() and unicode_count() code (#98025)
Add unicode_count_impl() to factorize PyUnicode_Count()
and unicode_count() code.
2022-10-12 18:27:53 +02:00
Victor Stinner df3a6d9beb
gh-97982: Remove asciilib_count() (#98164)
asciilib_count() is the same than ucs1lib_count(): the code is not
specialized for ASCII strings, so it's not worth it to have a
separated function. Remove asciilib_count() function.
2022-10-11 17:59:58 +02:00
Kumar Aditya 6dab8c95bd
GH-96458: Statically initialize utf8 representation of static strings (#96481) 2022-09-02 23:43:08 -07:00
Kumar Aditya 129998bd7b
GH-96075: move interned dict under runtime state (GH-96077) 2022-08-22 12:05:21 -07:00
Petr Viktorin 71c3d649b5
gh-95504: Fix negative numbers in PyUnicode_FromFormat (GH-95848)
Co-authored-by: philg314 <110174000+philg314@users.noreply.github.com>
2022-08-10 13:12:40 +02:00
Serhiy Storchaka 62f06508e7
gh-95781: More strict format string checking in PyUnicode_FromFormatV() (GH-95784)
An unrecognized format character in PyUnicode_FromFormat() and
PyUnicode_FromFormatV() now sets a SystemError.
In previous versions it caused all the rest of the format string to be
copied as-is to the result string, and any extra arguments discarded.
2022-08-08 19:21:07 +03:00
Dong-hee Na fb75d015f4
gh-91146: More reduce allocation size of list from str.split/rsplit (gh-95493)
Co-authored-by: Inada Naoki <songofacandy@gmail.com>
2022-08-01 22:15:07 +09:00
Dong-hee Na 50b2261bda
gh-91146: Reduce allocation size of list from str.split()/rsplit() (gh-95473) 2022-07-31 12:14:53 +09:00
Pamela Fox 70068b9336
Fix Unicode doc and replace use of macro with PyMem_New function (GH-94088) 2022-07-28 23:32:16 +01:00
Eric Snow 4a1dd73431
gh-94673: Add _PyStaticType_InitBuiltin() (#95152)
This is the first of several precursors to storing tp_subclasses (and tp_weaklist) on the interpreter state for static builtin types.

We do the following:

* add `_PyStaticType_InitBuiltin()`
* add `_Py_TPFLAGS_STATIC_BUILTIN`
* set it on all static builtin types in `_PyStaticType_InitBuiltin()`
* shuffle some code around to be able to use _PyStaticType_InitBuiltin()
    * rename `_PyStructSequence_InitType()` to `_PyStructSequence_InitBuiltinWithFlags()`
    * add `_PyStructSequence_InitBuiltin()`.
2022-07-25 12:47:31 -06:00
Kumar Aditya 9dff9f4814
GH-90699: Intern statically allocated strings (GH-93597)
This is similar to how strings are interned for deepfreeze.
2022-07-08 10:47:37 -07:00
Eric Snow caa279d6fd
bpo-40514: Drop EXPERIMENTAL_ISOLATED_SUBINTERPRETERS (gh-93185)
This was added for bpo-40514 (gh-84694) to test out a per-interpreter GIL. However, it has since proven unnecessary to keep the experiment in the repo. (It can be done as a branch in a fork like normal.) So here we are removing:

* the configure option
* the macro
* the code enabled by the macro
2022-05-27 17:38:01 -06:00
Kumar Aditya cb04a09d2d
GH-93207: Remove HAVE_STDARG_PROTOTYPES configure check for stdarg.h (#93215) 2022-05-27 13:30:45 +02:00
Victor Stinner 5f8c3fb997
gh-91924: Optimize unicode_check_encoding_errors() (#93200)
Avoid _PyCodec_Lookup() and PyCodec_LookupError() for most common
built-in encodings and error handlers to avoid creating a temporary
Unicode string object, whereas these encodings and error handlers are
known to be valid.
2022-05-27 00:39:49 +02:00
Victor Stinner 059b5baf98
gh-85858: Remove PyUnicode_InternImmortal() function (#92579)
Remove the PyUnicode_InternImmortal() function and the
SSTATE_INTERNED_IMMORTAL macro.

The PyUnicode_InternImmortal() function is still exported in the
stable ABI. The function is removed from the API.

PyASCIIObject.state.interned size is now a single bit, rather than 2
bits.

Keep SSTATE_NOT_INTERNED and SSTATE_INTERNED_MORTAL macros for
backward compatibility, but no longer use them internally since the
interned member is now a single bit and so can only have two values
(interned or not interned).

Update stats of _PyUnicode_ClearInterned().
2022-05-13 13:40:22 +02:00
Victor Stinner f62ad4f2c4
gh-89653: Use int type for Unicode kind (#92704)
Use the same type that PyUnicode_FromKindAndData() kind parameter
type (public C API): int.
2022-05-13 12:41:05 +02:00
Inada Naoki f9c9354a7a
gh-92536: PEP 623: Remove wstr and legacy APIs from Unicode (GH-92537) 2022-05-12 14:48:38 +09:00
Victor Stinner 804f2529d8
gh-91320: Use _PyCFunction_CAST() (#92251)
Replace "(PyCFunction)(void(*)(void))func" cast with
_PyCFunction_CAST(func).

Change generated by the command:

sed -i -e \
  's!(PyCFunction)(void(\*)(void)) *\([A-Za-z0-9_]\+\)!_PyCFunction_CAST(\1)!g' \
  $(find -name "*.c")
2022-05-03 21:42:14 +02:00
Serhiy Storchaka 18b07d773e
bpo-36819: Fix crashes in built-in encoders with weird error handlers (GH-28593)
If the error handler returns position less or equal than the starting
position of non-encodable characters, most of built-in encoders didn't
properly re-size the output buffer. This led to out-of-bounds writes,
and segfaults.
2022-05-02 12:37:48 +03:00
Serhiy Storchaka 3483299a24
gh-81548: Deprecate octal escape sequences with value larger than 0o377 (GH-91668) 2022-04-30 13:16:27 +03:00
Dennis Sweeney da6c78584b
gh-90667: Add specializations of Py_DECREF when types are known (GH-30872) 2022-04-19 19:02:19 +01:00
Kumar Aditya ab0d35d70d
bpo-46712: share more global strings in deepfreeze (gh-32152)
(for gh-90868)
2022-04-19 11:41:36 -06:00
Oleg Iarygin 2f0fc521f4
gh-91102: Use Argument Clinic for EncodingMap (#31725)
Co-authored-by: Jelle Zijlstra <jelle.zijlstra@gmail.com>
2022-04-18 13:43:56 -07:00
Kumar Aditya 8c54c3dacc
gh-91576: Speed up iteration of strings (#91574) 2022-04-18 07:18:27 -07:00
Tobias Stoeckmann 0859368335
gh-91421: Use constant value check during runtime (GH-91422)
The left-hand side expression of the if-check can be converted to a
constant by the compiler, but the addition on the right-hand side is
performed during runtime.

Move the addition from the right-hand side to the left-hand side by
turning it into a subtraction there. Since the values are known to
be large enough to not turn negative, this is a safe operation.

Prevents a very unlikely integer overflow on 32 bit systems.

Fixes GH-91421.
2022-04-12 20:01:02 -07:00