Commit Graph

542 Commits

Author SHA1 Message Date
Oscar Benjamin 817fa28f81
gh-90716: Refactor PyLong_FromString to separate concerns (GH-96808)
This is a preliminary PR to refactor `PyLong_FromString` which is currently quite messy and has spaghetti like code that mixes up different concerns as well as duplicating logic.

In particular:

- `PyLong_FromString` now only handles sign, base and prefix detection and calls a new function `long_from_string_base` to parse the main body of the string.
- The `long_from_string_base` function handles all string validation and then calls `long_from_binary_base` or a new function `long_from_non_binary_base` to construct the actual `PyLong`.
- The existing `long_from_binary_base` function is simplified by factoring duplicated logic to `long_from_string_base`.
- The new function `long_from_non_binary_base` factors out much of the code from `PyLong_FromString` including in particular the quadratic algorithm reffered to in gh-95778 so that this can be seen separately from unrelated concerns such as string validation.
2022-09-25 10:09:50 +01:00
Victor Stinner e841ffc915
gh-95778: Mention sys.set_int_max_str_digits() in error message (#96874)
When ValueError is raised if an integer is larger than the limit,
mention sys.set_int_max_str_digits() in the error message.
2022-09-16 20:04:37 +02:00
Mark Dickinson b126196838
gh-95778: Correctly pre-check for int-to-str conversion (#96537)
Converting a large enough `int` to a decimal string raises `ValueError` as expected. However, the raise comes _after_ the quadratic-time base-conversion algorithm has run to completion. For effective DOS prevention, we need some kind of check before entering the quadratic-time loop. Oops! =)

The quick fix: essentially we catch _most_ values that exceed the threshold up front. Those that slip through will still be on the small side (read: sufficiently fast), and will get caught by the existing check so that the limit remains exact.

The justification for the current check. The C code check is:
```c
max_str_digits / (3 * PyLong_SHIFT) <= (size_a - 11) / 10
```

In GitHub markdown math-speak, writing $M$ for `max_str_digits`, $L$ for `PyLong_SHIFT` and $s$ for `size_a`, that check is:
$$\left\lfloor\frac{M}{3L}\right\rfloor \le \left\lfloor\frac{s - 11}{10}\right\rfloor$$

From this it follows that
$$\frac{M}{3L} < \frac{s-1}{10}$$
hence that
$$\frac{L(s-1)}{M} > \frac{10}{3} > \log_2(10).$$
So
$$2^{L(s-1)} > 10^M.$$
But our input integer $a$ satisfies $|a| \ge 2^{L(s-1)}$, so $|a|$ is larger than $10^M$. This shows that we don't accidentally capture anything _below_ the intended limit in the check.

<!-- gh-issue-number: gh-95778 -->
* Issue: gh-95778
<!-- /gh-issue-number -->

Co-authored-by: Gregory P. Smith [Google LLC] <greg@krypto.org>
2022-09-04 09:21:18 -07:00
Gregory P. Smith 511ca94520
gh-95778: CVE-2020-10735: Prevent DoS by very large int() (#96499)
Integer to and from text conversions via CPython's bignum `int` type is not safe against denial of service attacks due to malicious input. Very large input strings with hundred thousands of digits can consume several CPU seconds.

This PR comes fresh from a pile of work done in our private PSRT security response team repo.

Signed-off-by: Christian Heimes [Red Hat] <christian@python.org>
Tons-of-polishing-up-by: Gregory P. Smith [Google] <greg@krypto.org>
Reviews via the private PSRT repo via many others (see the NEWS entry in the PR).

<!-- gh-issue-number: gh-95778 -->
* Issue: gh-95778
<!-- /gh-issue-number -->

I wrote up [a one pager for the release managers](https://docs.google.com/document/d/1KjuF_aXlzPUxTK4BMgezGJ2Pn7uevfX7g0_mvgHlL7Y/edit#). Much of that text wound up in the Issue. Backports PRs already exist. See the issue for links.
2022-09-02 09:35:08 -07:00
Eric Snow 4a1dd73431
gh-94673: Add _PyStaticType_InitBuiltin() (#95152)
This is the first of several precursors to storing tp_subclasses (and tp_weaklist) on the interpreter state for static builtin types.

We do the following:

* add `_PyStaticType_InitBuiltin()`
* add `_Py_TPFLAGS_STATIC_BUILTIN`
* set it on all static builtin types in `_PyStaticType_InitBuiltin()`
* shuffle some code around to be able to use _PyStaticType_InitBuiltin()
    * rename `_PyStructSequence_InitType()` to `_PyStructSequence_InitBuiltinWithFlags()`
    * add `_PyStructSequence_InitBuiltin()`.
2022-07-25 12:47:31 -06:00
Dennis Sweeney 5fcfdd87c9
GH-91432: Specialize FOR_ITER (GH-91713)
* Adds FOR_ITER_LIST and FOR_ITER_RANGE specializations.

* Adds _PyLong_AssignValue() internal function to avoid temporary boxing of ints.
2022-06-21 11:19:26 +01:00
oda-gitso 854db1a606
Remove unnecessary for loop initializer in long_lshift1() (GH-93071)
* Remove unnecessary for loop initialization.
2022-05-25 18:57:33 +01:00
Victor Stinner f62ad4f2c4
gh-89653: Use int type for Unicode kind (#92704)
Use the same type that PyUnicode_FromKindAndData() kind parameter
type (public C API): int.
2022-05-13 12:41:05 +02:00
Mark Dickinson 0ed91a26fe
gh-90213: Speed up right shifts of negative integers (GH-30277) 2022-05-02 11:19:03 -06:00
Victor Stinner 7cdaf87ec5
gh-91731: Replace Py_BUILD_ASSERT() with static_assert() (#91730)
Python 3.11 now uses C11 standard which adds static_assert()
to <assert.h>.

* In pytime.c, replace Py_BUILD_ASSERT() with preprocessor checks on
  SIZEOF_TIME_T with #error.
* On macOS, py_mach_timebase_info() now accepts timebase members with
  the same size than _PyTime_t.
* py_get_monotonic_clock() now saturates GetTickCount64() to
  _PyTime_MAX: GetTickCount64() is unsigned, whereas _PyTime_t is
  signed.
2022-04-20 19:26:40 +02:00
Dennis Sweeney d7d7e6c007
Cast to (destructor) to fix compiler warnings (GH-91711) 2022-04-20 16:15:45 +01:00
Dennis Sweeney da6c78584b
gh-90667: Add specializations of Py_DECREF when types are known (GH-30872) 2022-04-19 19:02:19 +01:00
Dennis Sweeney 8be8949116
gh-91117: Ensure integer mod and pow operations use cached small ints (GH-31843) 2022-04-11 16:07:09 -04:00
Mark Dickinson c60e6b6ad7
bpo-46311: Clean up PyLong_FromLong and PyLong_FromLongLong (GH-30496) 2022-03-01 14:20:52 +00:00
Eric Snow 81c72044a1
bpo-46541: Replace core use of _Py_IDENTIFIER() with statically initialized global objects. (gh-30928)
We're no longer using _Py_IDENTIFIER() (or _Py_static_string()) in any core CPython code.  It is still used in a number of non-builtin stdlib modules.

The replacement is: PyUnicodeObject (not pointer) fields under _PyRuntimeState, statically initialized as part of _PyRuntime.  A new _Py_GET_GLOBAL_IDENTIFIER() macro facilitates lookup of the fields (along with _Py_GET_GLOBAL_STRING() for non-identifier strings).

https://bugs.python.org/issue46541#msg411799 explains the rationale for this change.

The core of the change is in:

* (new) Include/internal/pycore_global_strings.h - the declarations for the global strings, along with the macros
* Include/internal/pycore_runtime_init.h - added the static initializers for the global strings
* Include/internal/pycore_global_objects.h - where the struct in pycore_global_strings.h is hooked into _PyRuntimeState
* Tools/scripts/generate_global_objects.py - added generation of the global string declarations and static initializers

I've also added a --check flag to generate_global_objects.py (along with make check-global-objects) to check for unused global strings.  That check is added to the PR CI config.

The remainder of this change updates the core code to use _Py_GET_GLOBAL_IDENTIFIER() instead of _Py_IDENTIFIER() and the related _Py*Id functions (likewise for _Py_GET_GLOBAL_STRING() instead of _Py_static_string()).  This includes adding a few functions where there wasn't already an alternative to _Py*Id(), replacing the _Py_Identifier * parameter with PyObject *.

The following are not changed (yet):

* stop using _Py_IDENTIFIER() in the stdlib modules
* (maybe) get rid of _Py_IDENTIFIER(), etc. entirely -- this may not be doable as at least one package on PyPI using this (private) API
* (maybe) intern the strings during runtime init

https://bugs.python.org/issue46541
2022-02-08 13:39:07 -07:00
Ken Jin 768569325a
bpo-46407: Fix long_mod refleak (GH-31025) 2022-01-31 11:41:14 +01:00
Crowthebird f10dafc430
bpo-46407: Optimizing some modulo operations (GH-30653)
Added new internal functions to compute mod without also computing the quotient.

The loops can be leaner then, which leads to modestly but reliably faster execution in contexts that know they don't need the quotient.

Code by Jeremiah Vivian (Pascual).
2022-01-27 18:46:45 -06:00
Tim Peters 7c26472d09
bpo-46504: faster code for trial quotient in x_divrem() (GH-30856)
* bpo-46504: faster code for trial quotient in x_divrem()

This brings x_divrem() back into synch with x_divrem1(), which was changed
in bpo-46406 to generate faster code to find machine-word division
quotients and remainders. Modern processors compute both with a single
machine instruction, but convincing C to exploit that requires writing
_less_ "clever" C code.
2022-01-24 19:06:00 -06:00
Gregory P. Smith c7f20f1cc8
bpo-46406: Faster single digit int division. (#30626)
* bpo-46406: Faster single digit int division.

This expresses the algorithm in a more basic manner resulting in better
instruction generation by todays compilers.

See https://mail.python.org/archives/list/python-dev@python.org/thread/ZICIMX5VFCX4IOFH5NUPVHCUJCQ4Q7QM/#NEUNFZU3TQU4CPTYZNF3WCN7DOJBBTK5
2022-01-23 10:00:41 +00:00
Victor Stinner 1781d55eb3
bpo-46417: _curses uses PyStructSequence_NewType() (GH-30736)
The _curses module now creates its ncurses_version type as a heap
type using PyStructSequence_NewType(), rather than using a static
type.

* Move _PyStructSequence_FiniType() definition to pycore_structseq.h.
* test.pythoninfo: log curses.ncurses_version.
2022-01-21 03:30:20 +01:00
Victor Stinner e9e3eab0b8
bpo-46417: Finalize structseq types at exit (GH-30645)
Add _PyStructSequence_FiniType() and _PyStaticType_Dealloc()
functions to finalize a structseq static type in Py_Finalize().
Currrently, these functions do nothing if Python is built in release
mode.

Clear static types:

* AsyncGenHooksType: sys.set_asyncgen_hooks()
* FlagsType: sys.flags
* FloatInfoType: sys.float_info
* Hash_InfoType: sys.hash_info
* Int_InfoType: sys.int_info
* ThreadInfoType: sys.thread_info
* UnraisableHookArgsType: sys.unraisablehook
* VersionInfoType: sys.version
* WindowsVersionType: sys.getwindowsversion()
2022-01-21 01:42:25 +01:00
Brandt Bucher 5cd9a162cd
bpo-46361: Fix "small" `int` caching (GH-30583) 2022-01-16 16:06:37 +00:00
Tim Peters fc05e6bfce
bpo-46020: Optimize long_pow for the common case (GH-30555)
This cuts a bit of overhead by not initializing the table of small
odd powers unless it's needed for a large exponent.
2022-01-12 12:55:02 -06:00
Tim Peters 3aa5242b54
bpo-46233: Minor speedup for bigint squaring (GH-30345)
x_mul()'s squaring code can do some redundant and/or useless
work at the end of each digit pass. A more careful analysis
of worst-case carries at various digit positions allows
making that code leaner.
2022-01-03 20:41:16 -06:00
Tim Peters 863729e9c6
bpo-46218: Change long_pow() to sliding window algorithm (GH-30319)
* bpo-46218: Change long_pow() to sliding window algorithm

The primary motivation is to eliminate long_pow's reliance on that the number of bits in a long "digit" is a multiple of 5. Now it no longer cares how many bits are in a digit.

But the sliding window approach also allows cutting the precomputed table of small powers in half, which reduces initialization overhead enough that the approach pays off for smaller exponents too. Depending on exponent bit patterns, a sliding window may also be able to save some bigint multiplies (sometimes when at least 5 consecutive exponent bits are 0, regardless of their starting bit position modulo 5).

Note: boosting the window width to 6 didn't work well overall. It give marginal speed improvements for huge exponents, but the increased overhead (the small-power table needs twice as many entries) made it a loss for smaller exponents.

Co-authored-by: Oleg Iarygin <dralife@yandex.ru>
2022-01-02 13:18:20 -06:00
Xinhang Xu 3581c7abbe
bpo-46055: Speed up binary shifting operators (GH-30044)
Co-authored-by: Mark Dickinson <dickinsm@gmail.com>
2021-12-27 18:36:55 +00:00
Mark Dickinson 360fedc2d2
bpo-46055: Streamline inner loop for right shifts (#30243) 2021-12-27 18:04:36 +00:00
Eric Snow 121f1f893a
bpo-45953: Statically initialize the small ints. (gh-30092)
The array of small PyLong objects has been statically declared. Here I also statically initialize them. Consequently they are no longer initialized dynamically during runtime init.

I've also moved them under a new sub-struct in _PyRuntimeState, in preparation for static allocation and initialization of other global objects.

https://bugs.python.org/issue45953
2021-12-13 18:04:05 -07:00
Eric Snow c8749b5783
bpo-46008: Make runtime-global object/type lifecycle functions and state consistent. (gh-29998)
This change is strictly renames and moving code around.  It helps in the following ways:

* ensures type-related init functions focus strictly on one of the three aspects (state, objects, types)
* passes in PyInterpreterState * to all those functions, simplifying work on moving types/objects/state to the interpreter
* consistent naming conventions help make what's going on more clear
* keeping API related to a type in the corresponding header file makes it more obvious where to look for it

https://bugs.python.org/issue46008
2021-12-09 12:59:26 -07:00
Dong-hee Na 345ba3f080
bpo-45510: Specialize BINARY_SUBTRACT (GH-29523) 2021-11-18 09:19:58 +00:00
Mark Shannon acc89db923
bpo-45691: Make array of small ints static to fix use-after-free error. (GH-29366) 2021-11-03 16:22:32 +00:00
Mark Shannon 4fc68560ea
Store actual ints, not pointers to them in the interpreter state. (GH-29274) 2021-10-28 17:35:43 +01:00
Victor Stinner 8e5de40f90
bpo-35134: Move classobject.h to Include/cpython/ (GH-28968)
Move classobject.h, context.h, genobject.h and longintrepr.h header
files from Include/ to Include/cpython/.

Remove redundant "#ifndef Py_LIMITED_API" in context.h.

Remove explicit #include "longintrepr.h" in C files. It's not needed,
Python.h already includes it.
2021-10-15 09:46:29 +02:00
Dennis Sweeney 3b3d30e8f7
bpo-45367: Specialize BINARY_MULTIPLY (GH-28727) 2021-10-14 15:56:33 +01:00
Victor Stinner aac29af678
bpo-45434: pyport.h no longer includes <stdlib.h> (GH-28914)
Include <stdlib.h> explicitly in C files.

Python.h includes <wchar.h>.
2021-10-13 19:25:53 +02:00
Barry Warsaw 07e737d002
bpo-45155 : Default arguments for int.to_bytes(length=1, byteorder=sys.byteorder) (#28265)
Add default arguments for int.to_bytes() and int.from_bytes()

Co-authored-by: Brandt Bucher <brandtbucher@gmail.com>
2021-09-15 19:55:24 -07:00
Mark Shannon d3eaf0cc5b
bpo-44945: Specialize BINARY_ADD (GH-27967) 2021-08-27 09:21:01 +01:00
Mark Shannon 15d50d7ed8
bpo-44946: Streamline operators and creation of ints for common case of single 'digit'. (GH-27832) 2021-08-25 14:28:43 +01:00
Mark Dickinson 5924243199
Fix a potential reference-counting bug in long_pow (GH-26690) 2021-06-13 08:19:29 +01:00
Tim Peters 9d8dd8f08a
bpo-44376 - reduce pow() overhead for small exponents (GH-26662)
Greatly reduce pow() overhead for small exponents.
2021-06-12 11:29:56 -05:00
Victor Stinner 442ad74fc2
bpo-43687: Py_Initialize() creates singletons earlier (GH-25147)
Reorganize pycore_interp_init() to initialize singletons before the
the first PyType_Ready() call. Fix an issue when Python is configured
using --without-doc-strings.
2021-04-02 15:28:13 +02:00
Brandt Bucher 145bf269df
bpo-42128: Structural Pattern Matching (PEP 634) (GH-22917)
Co-authored-by: Guido van Rossum <guido@python.org>
Co-authored-by: Talin <viridia@gmail.com>
Co-authored-by: Pablo Galindo <pablogsal@gmail.com>
2021-02-26 14:51:55 -08:00
Victor Stinner bcb094b41f
bpo-43268: Pass interp rather than tstate to internal functions (GH-24580)
Pass the current interpreter (interp) rather than the current Python
thread state (tstate) to internal functions which only use the
interpreter.

Modified functions:

* _PyXXX_Fini() and _PyXXX_ClearFreeList() functions
* _PyEval_SignalAsyncExc(), make_pending_calls()
* _PySys_GetObject(), sys_set_object(), sys_set_object_id(), sys_set_object_str()
* should_audit(), set_flags_from_config(), make_flags()
* _PyAtExit_Call()
* init_stdio_encoding()
* etc.
2021-02-19 15:10:45 +01:00
Victor Stinner 101bf69ff1
bpo-43268: _Py_IsMainInterpreter() now expects interp (GH-24577)
The _Py_IsMainInterpreter() function now expects interp rather than
tstate.
2021-02-19 13:33:31 +01:00
Victor Stinner 32bd68c839
bpo-42519: Replace PyObject_MALLOC() with PyObject_Malloc() (GH-23587)
No longer use deprecated aliases to functions:

* Replace PyObject_MALLOC() with PyObject_Malloc()
* Replace PyObject_REALLOC() with PyObject_Realloc()
* Replace PyObject_FREE() with PyObject_Free()
* Replace PyObject_Del() with PyObject_Free()
* Replace PyObject_DEL() with PyObject_Free()
2020-12-01 10:37:39 +01:00
Victor Stinner c310185c08
bpo-42161: Remove private _PyLong_Zero and _PyLong_One (GH-23003)
Use PyLong_FromLong(0) and PyLong_FromLong(1) of the public C API
instead. For Python internals, _PyLong_GetZero() and _PyLong_GetOne()
of pycore_long.h can be used.
2020-10-27 21:34:33 +01:00
Victor Stinner 8e3b9f9283
bpo-42161: Add _PyLong_GetZero() and _PyLong_GetOne() (GH-22993)
Add _PyLong_GetZero() and _PyLong_GetOne() functions and a new
internal pycore_long.h header file.

Python cannot be built without small integer singletons anymore.
2020-10-27 00:00:03 +01:00
Raymond Hettinger 4e0ce82058
Revert "bpo-26680: Incorporate is_integer in all built-in and standard library numeric types (GH-6121)" (GH-22584)
This reverts commit 58a7da9e12.
2020-10-07 16:43:44 -07:00
Robert Smallshire 58a7da9e12
bpo-26680: Incorporate is_integer in all built-in and standard library numeric types (GH-6121)
* bpo-26680: Adds support for int.is_integer() for compatibility with float.is_integer().

The int.is_integer() method always returns True.

* bpo-26680: Adds a test to ensure that False.is_integer() and True.is_integer() are always True.

* bpo-26680: Adds Real.is_integer() with a trivial implementation using conversion to int.

This default implementation is intended to reduce the workload for subclass
implementers. It is not robust in the presence of infinities or NaNs and
may have suboptimal performance for other types.

* bpo-26680: Adds Rational.is_integer which returns True if the denominator is one.

This implementation assumes the Rational is represented in it's
lowest form, as required by the class docstring.

* bpo-26680: Adds Integral.is_integer which always returns True.

* bpo-26680: Adds tests for Fraction.is_integer called as an instance method.

The tests for the Rational abstract base class use an unbound
method to sidestep the inability to directly instantiate Rational.
These tests check that everything works correct as an instance method.

* bpo-26680: Updates documentation for Real.is_integer and built-ins int and float.

The call x.is_integer() is now listed in the table of operations
which apply to all numeric types except complex, with a reference
to the full documentation for Real.is_integer().  Mention of
is_integer() has been removed from the section 'Additional Methods
on Float'.

The documentation for Real.is_integer() describes its purpose, and
mentions that it should be overridden for performance reasons, or
to handle special values like NaN.

* bpo-26680: Adds Decimal.is_integer to the Python and C implementations.

The C implementation of Decimal already implements and uses
mpd_isinteger internally, we just expose the existing function to
Python.

The Python implementation uses internal conversion to integer
using to_integral_value().

In both cases, the corresponding context methods are also
implemented.

Tests and documentation are included.

* bpo-26680: Updates the ACKS file.

* bpo-26680: NEWS entries for int, the numeric ABCs and Decimal.

Co-authored-by: Robert Smallshire <rob@sixty-north.com>
2020-10-01 17:30:08 +01:00
Serhiy Storchaka 5a2bac7fe0
bpo-41342: Convert int.__round__ to Argument Clinic (GH-21549) 2020-07-20 15:57:37 +03:00