required for a corner case in dec_hash() in the following commit and also
usually faster. dec_hash() needs some extra precision above MPD_MAX_PREC,
and _mpd_base_ndivmod() is not audited for that.
2) Use _mpd_basemul() if the length of the smaller operand is less than
or equal to 256. While this is technically an optimization, it is
required for *testing* corner cases in dec_hash() in reasonable time.
None of the _mpd_shortadd() or _mpd_shortmul() functions read uninitialized
values. Previously zeroing was required since _mpd_real_size() was called
on the output array.
use log10() to calculate the size of the output array. The current code
has been tested on x86/amd64 (and to a lesser extent on qemu-mips qemu-sparc)
and produces sufficiently large values for all inputs tested so far (coefficient
sizes of 10**18 - 1 are hard to test exhaustively).
The new code does not rely on the correctness of log10() and resizes
the output arrays if the allocated space is insufficient.
1) _mpd_qdivmod() uses the context precision only in two places, and
the new code should be exactly equivalent to the previous code.
2) Remove misleading comment.
3) The quotient *is* an integer with exponent 0, so calling mpd_qtrunc()
is pointless.
4) Replace two instances of identical code by a single one.
5) Use _mpd_cmp_abs() instead of mpd_cmp_total_mag(): the operands
are not special.
6) Don't clear MPD_Rounded in the status (with the current code it should
not be set within the function).
2) _mpd_qpow_mpd(): Abort the loop for all specials, not only infinity.
3) _mpd_qpow_mpd(): Make the function more general and distinguish between
zero clamping and folding down the exponent. The latter case is currently
handled by setting context->clamp to 0 before calling the function.
4) _mpd_qpow_int(): Add one to the work precision in case of a negative
exponent. This is to get the same relative error (0.1 * 10**-prec)
for both positive and negative exponents. The previous relative
error for negative exponents was (0.2 * 10**-prec).
Both errors are _before_ the final rounding to the context precision.
2) Abort the loop for all specials, not only infinity.
3) Make the function more general and distinguish between zero clamping
and folding down the exponent. The latter case is currently handled
by setting context->clamp to 0 before calling the function.
2) Add rigorous error analysis to _mpd_qlog10 (ACL2 proofs exist).
3) Use the relative error as a basis for the interval generation in the
correction loop (same as in _mpd_qln()).
List all of them in the comment.
2) Use the recently stated relative error of _mpd_qln() to generate the
interval for the exact value of ln(x). See also the comment in mpd_qexp().
Underflow to zero hasn't changed: _mpd_qexp() internally uses MIN_EMIN,
so the result would also underflow to zero for all emin > MIN_EMIN.
In case digits are left, the informal argument is as follows: Underflow can
occur only once in the last multiplication of the power stage (in the Horner
stage Underflow provably cannot occur, and if Underflow occurred twice in
the power stage, the result would underflow to zero on the second occasion).
Since there is no double rounding during Underflow, the effective work
precision is now 1 <= result->digits < prec. It can be shown by a somewhat
tedious argument that abs(result - e**x) < ulp(result, result->digits).
Therefore the correct rounding loop now uses ulp(result, result->digits)
to generate the bounds for e**x in case of Underflow.
-----------------------
1) Reduce the number of iterations in the Horner scheme for operands with
a negative adjusted exponent. Previously the number was overestimated
quite generously.
2) The function _mpd_get_exp_iterations() now has an ACL2 proof and
is rewritten accordingly.
3) The proof relies on abs(op) > 9 * 10**(-prec-1), so operands without
that property are now handled by the new function _mpd_qexp_check_one().
4) The error analysis for the evaluation of the truncated Taylor series
in Hull&Abrham's paper relies on the fact that the reduced operand
'r' has fewer than context.prec digits.
Since the operands may have more than context.prec digits, a new ACL2
proof covers the case that r.digits > context.prec. To facilitate the
proof, the Horner step now uses fma instead of rounding twice in
multiply/add.
Changes in mpd_qexp():
----------------------
1) Fix a bound in the correct rounding loop that was too optimistic. In
practice results were always correctly rounded, because it is unlikely
that the error in _mpd_qexp() ever reaches the theoretical maximum.
1) Rename _mpd_qbarrett_divmod into _mpd_base_ndivmod: The function is
only marginally related to either Barrett's algorithm or to the version
in Hasselstrom's paper.
2) In places where the proof assumes exact operations, use new versions of
add/sub/multiply that set NaN/Invalid_operation if this condition is
not met. According to the proof this cannot happen, so this should be
regarded as an extra safety net.
3) Raise Division_impossible for operands with a number of digits greater
than MPD_MAX_PREC. This facilitates the audit of the function and can
practically only occur in the 32-bit version under conditions where
a MemoryError is already imminent.
4) Use _mpd_qmul() in places where the result can exceed MPD_MAX_PREC in
a well defined manner.
5) Test for mpd_isspecial(qq) in a place where the addition of one
can theoretically trigger a Malloc_error.
6) Remove redundant code in _mpd_qdivmod().
7) Add many comments.
rightfully states that an mpd_t with a coefficient flagged as MPD_CONST_DATA
must not be in the position of the result operand. In this particular case
several assumptions guarantee that a resize will never occur in all possible
code paths, which was the reason for using MPD_CONST_DATA and saving an
instruction by omitting the initialization of tmp.alloc.
For readability, tmp is now flagged as MPD_STATIC_DATA and tmp.alloc
is initialized.
Resizing is used _inside_ libmpdec functions, and it is permitted to
change x->alloc several times while setting x->len at the end of the
function. Therefore, for dynamic mpd_t x->alloc can _temporarily_ drop
below x->len. Of course the final result always has x->len <= x->alloc.
For static mpd_t this cannot happen, since resizing to a smaller
coefficient is a no-op.
2) Remove micro optimization in mpd_switch_to_dyn(): Previously only the
valid initialized part of the existing coefficient up to x->len was
copied to the new dynamic memory area. Now copying does the same as
realloc() and the entire old memory area is copied.
The rationale for this change is that it is no longer needed to memorize
the explanation given in 1).