mirror of https://github.com/python/cpython
PyLong_FromString(): Continued fraction analysis (explained in
a new comment) suggests there are almost certainly large input integers in all non-binary input bases for which one Python digit too few is initally allocated to hold the final result. Instead of assert-failing when that happens, allocate more space. Alas, I estimate it would take a few days to find a specific such case, so this isn't backed up by a new test (not to mention that such a case may take hours to run, since conversion time is quadratic in the number of digits, and preliminary attempts suggested that the smallest such inputs contain at least a million digits).
This commit is contained in:
parent
69fe4055a3
commit
9faa3eda6b
|
@ -1509,6 +1509,57 @@ convmultmax_base[base], the result is "simply"
|
||||||
(((c0*B + c1)*B + c2)*B + c3)*B + ... ))) + c_n-1
|
(((c0*B + c1)*B + c2)*B + c3)*B + ... ))) + c_n-1
|
||||||
|
|
||||||
where B = convmultmax_base[base].
|
where B = convmultmax_base[base].
|
||||||
|
|
||||||
|
Error analysis: as above, the number of Python digits `n` needed is worst-
|
||||||
|
case
|
||||||
|
|
||||||
|
n >= N * log(B)/log(BASE)
|
||||||
|
|
||||||
|
where `N` is the number of input digits in base `B`. This is computed via
|
||||||
|
|
||||||
|
size_z = (Py_ssize_t)((scan - str) * log_base_BASE[base]) + 1;
|
||||||
|
|
||||||
|
below. Two numeric concerns are how much space this can waste, and whether
|
||||||
|
the computed result can be too small. To be concrete, assume BASE = 2**15,
|
||||||
|
which is the default (and it's unlikely anyone changes that).
|
||||||
|
|
||||||
|
Waste isn't a problem: provided the first input digit isn't 0, the difference
|
||||||
|
between the worst-case input with N digits and the smallest input with N
|
||||||
|
digits is about a factor of B, but B is small compared to BASE so at most
|
||||||
|
one allocated Python digit can remain unused on that count. If
|
||||||
|
N*log(B)/log(BASE) is mathematically an exact integer, then truncating that
|
||||||
|
and adding 1 returns a result 1 larger than necessary. However, that can't
|
||||||
|
happen: whenever B is a power of 2, long_from_binary_base() is called
|
||||||
|
instead, and it's impossible for B**i to be an integer power of 2**15 when
|
||||||
|
B is not a power of 2 (i.e., it's impossible for N*log(B)/log(BASE) to be
|
||||||
|
an exact integer when B is not a power of 2, since B**i has a prime factor
|
||||||
|
other than 2 in that case, but (2**15)**j's only prime factor is 2).
|
||||||
|
|
||||||
|
The computed result can be too small if the true value of N*log(B)/log(BASE)
|
||||||
|
is a little bit larger than an exact integer, but due to roundoff errors (in
|
||||||
|
computing log(B), log(BASE), their quotient, and/or multiplying that by N)
|
||||||
|
yields a numeric result a little less than that integer. Unfortunately, "how
|
||||||
|
close can a transcendental function get to an integer over some range?"
|
||||||
|
questions are generally theoretically intractable. Computer analysis via
|
||||||
|
continued fractions is practical: expand log(B)/log(BASE) via continued
|
||||||
|
fractions, giving a sequence i/j of "the best" rational approximations. Then
|
||||||
|
j*log(B)/log(BASE) is approximately equal to (the integer) i. This shows that
|
||||||
|
we can get very close to being in trouble, but very rarely. For example,
|
||||||
|
76573 is a denominator in one of the continued-fraction approximations to
|
||||||
|
log(10)/log(2**15), and indeed:
|
||||||
|
|
||||||
|
>>> log(10)/log(2**15)*76573
|
||||||
|
16958.000000654003
|
||||||
|
|
||||||
|
is very close to an integer. If we were working with IEEE single-precision,
|
||||||
|
rounding errors could kill us. Finding worst cases in IEEE double-precision
|
||||||
|
requires better-than-double-precision log() functions, and Tim didn't bother.
|
||||||
|
Instead the code checks to see whether the allocated space is enough as each
|
||||||
|
new Python digit is added, and copies the whole thing to a larger long if not.
|
||||||
|
This should happen extremely rarely, and in fact I don't have a test case
|
||||||
|
that triggers it(!). Instead the code was tested by artificially allocating
|
||||||
|
just 1 digit at the start, so that the copying code was exercised for every
|
||||||
|
digit beyond the first.
|
||||||
***/
|
***/
|
||||||
register twodigits c; /* current input character */
|
register twodigits c; /* current input character */
|
||||||
Py_ssize_t size_z;
|
Py_ssize_t size_z;
|
||||||
|
@ -1551,6 +1602,8 @@ where B = convmultmax_base[base].
|
||||||
* being stored into.
|
* being stored into.
|
||||||
*/
|
*/
|
||||||
size_z = (Py_ssize_t)((scan - str) * log_base_BASE[base]) + 1;
|
size_z = (Py_ssize_t)((scan - str) * log_base_BASE[base]) + 1;
|
||||||
|
/* Uncomment next line to test exceedingly rare copy code */
|
||||||
|
/* size_z = 1; */
|
||||||
assert(size_z > 0);
|
assert(size_z > 0);
|
||||||
z = _PyLong_New(size_z);
|
z = _PyLong_New(size_z);
|
||||||
if (z == NULL)
|
if (z == NULL)
|
||||||
|
@ -1594,9 +1647,27 @@ where B = convmultmax_base[base].
|
||||||
/* carry off the current end? */
|
/* carry off the current end? */
|
||||||
if (c) {
|
if (c) {
|
||||||
assert(c < BASE);
|
assert(c < BASE);
|
||||||
assert(z->ob_size < size_z);
|
if (z->ob_size < size_z) {
|
||||||
*pz = (digit)c;
|
*pz = (digit)c;
|
||||||
++z->ob_size;
|
++z->ob_size;
|
||||||
|
}
|
||||||
|
else {
|
||||||
|
PyLongObject *tmp;
|
||||||
|
/* Extremely rare. Get more space. */
|
||||||
|
assert(z->ob_size == size_z);
|
||||||
|
tmp = _PyLong_New(size_z + 1);
|
||||||
|
if (tmp == NULL) {
|
||||||
|
Py_DECREF(z);
|
||||||
|
return NULL;
|
||||||
|
}
|
||||||
|
memcpy(tmp->ob_digit,
|
||||||
|
z->ob_digit,
|
||||||
|
sizeof(digit) * size_z);
|
||||||
|
Py_DECREF(z);
|
||||||
|
z = tmp;
|
||||||
|
z->ob_digit[size_z] = (digit)c;
|
||||||
|
++size_z;
|
||||||
|
}
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
Loading…
Reference in New Issue