gh-100833: Remove 'volatile' qualifiers in fsum algorithm (#100845)

This PR removes the `volatile` qualifier on various intermediate quantities
in the `math.fsum` implementation, and updates the notes preceding the
algorithm accordingly (as well as fixing some of the exsting notes). See
the linked issue #100833 for discussion.
This commit is contained in:
Mark Dickinson 2023-01-08 19:40:15 +00:00 committed by GitHub
parent b139bcd892
commit 87d3bd0e02
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
2 changed files with 22 additions and 21 deletions

View File

@ -0,0 +1 @@
Speed up :func:`math.fsum` by removing defensive ``volatile`` qualifiers.

View File

@ -1358,30 +1358,30 @@ FUNC1(tanh, tanh, 0,
Dickinson's post at <http://bugs.python.org/file10357/msum4.py>.
See those links for more details, proofs and other references.
Note 1: IEEE 754R floating point semantics are assumed,
but the current implementation does not re-establish special
value semantics across iterations (i.e. handling -Inf + Inf).
Note 1: IEEE 754 floating-point semantics with a rounding mode of
roundTiesToEven are assumed.
Note 2: No provision is made for intermediate overflow handling;
therefore, sum([1e+308, 1e-308, 1e+308]) returns 1e+308 while
sum([1e+308, 1e+308, 1e-308]) raises an OverflowError due to the
Note 2: No provision is made for intermediate overflow handling;
therefore, fsum([1e+308, -1e+308, 1e+308]) returns 1e+308 while
fsum([1e+308, 1e+308, -1e+308]) raises an OverflowError due to the
overflow of the first partial sum.
Note 3: The intermediate values lo, yr, and hi are declared volatile so
aggressive compilers won't algebraically reduce lo to always be exactly 0.0.
Also, the volatile declaration forces the values to be stored in memory as
regular doubles instead of extended long precision (80-bit) values. This
prevents double rounding because any addition or subtraction of two doubles
can be resolved exactly into double-sized hi and lo values. As long as the
hi value gets forced into a double before yr and lo are computed, the extra
bits in downstream extended precision operations (x87 for example) will be
exactly zero and therefore can be losslessly stored back into a double,
thereby preventing double rounding.
Note 3: The algorithm has two potential sources of fragility. First, C
permits arithmetic operations on `double`s to be performed in an
intermediate format whose range and precision may be greater than those of
`double` (see for example C99 §5.2.4.2.2, paragraph 8). This can happen for
example on machines using the now largely historical x87 FPUs. In this case,
`fsum` can produce incorrect results. If `FLT_EVAL_METHOD` is `0` or `1`, or
`FLT_EVAL_METHOD` is `2` and `long double` is identical to `double`, then we
should be safe from this source of errors. Second, an aggressively
optimizing compiler can re-associate operations so that (for example) the
statement `yr = hi - x;` is treated as `yr = (x + y) - x` and then
re-associated as `yr = y + (x - x)`, giving `y = yr` and `lo = 0.0`. That
re-association would be in violation of the C standard, and should not occur
except possibly in the presence of unsafe optimizations (e.g., -ffast-math,
-fassociative-math). Such optimizations should be avoided for this module.
Note 4: A similar implementation is in Modules/cmathmodule.c.
Be sure to update both when making changes.
Note 5: The signature of math.fsum() differs from builtins.sum()
Note 4: The signature of math.fsum() differs from builtins.sum()
because the start argument doesn't make sense in the context of
accurate summation. Since the partials table is collapsed before
returning a result, sum(seq2, start=sum(seq1)) may not equal the
@ -1467,7 +1467,7 @@ math_fsum(PyObject *module, PyObject *seq)
Py_ssize_t i, j, n = 0, m = NUM_PARTIALS;
double x, y, t, ps[NUM_PARTIALS], *p = ps;
double xsave, special_sum = 0.0, inf_sum = 0.0;
volatile double hi, yr, lo;
double hi, yr, lo;
iter = PyObject_GetIter(seq);
if (iter == NULL)