bpo-41513: Save unnecessary steps in the hypot() calculation (#21994)

This commit is contained in:
Raymond Hettinger 2020-08-29 09:11:04 -07:00 committed by GitHub
parent e6dcd371b2
commit 27de28607a
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
1 changed files with 10 additions and 5 deletions

View File

@ -2447,6 +2447,14 @@ precision. When a value at or below 1.0 is correctly rounded, it
never goes above 1.0. And when values at or below 1.0 are squared, never goes above 1.0. And when values at or below 1.0 are squared,
they remain at or below 1.0, thus preserving the summation invariant. they remain at or below 1.0, thus preserving the summation invariant.
Another interesting assertion is that csum+lo*lo == csum. In the loop,
each scaled vector element has a magnitude less than 1.0. After the
Veltkamp split, *lo* has a maximum value of 2**-27. So the maximum
value of *lo* squared is 2**-54. The value of ulp(1.0)/2.0 is 2**-53.
Given that csum >= 1.0, we have:
lo**2 <= 2**-54 < 2**-53 == 1/2*ulp(1.0) <= ulp(csum)/2
Since lo**2 is less than 1/2 ulp(csum), we have csum+lo*lo == csum.
The square root differential correction is needed because a The square root differential correction is needed because a
correctly rounded square root of a correctly rounded sum of correctly rounded square root of a correctly rounded sum of
squares can still be off by as much as one ulp. squares can still be off by as much as one ulp.
@ -2519,11 +2527,8 @@ vector_norm(Py_ssize_t n, double *vec, double max, int found_nan)
csum += x; csum += x;
frac += (oldcsum - csum) + x; frac += (oldcsum - csum) + x;
x = lo * lo; assert(csum + lo * lo == csum);
assert(fabs(csum) >= fabs(x)); frac += lo * lo;
oldcsum = csum;
csum += x;
frac += (oldcsum - csum) + x;
} }
h = sqrt(csum - 1.0 + frac); h = sqrt(csum - 1.0 + frac);