cpython/Modules/_decimal/libmpdec/literature/mulmod-64.txt



(* Copyright (c) 2011 Stefan Krah. All rights reserved. *)


==========================================================================
                Calculate (a * b) % p using special primes
==========================================================================

A description of the algorithm can be found in the apfloat manual by
Tommila [1].


Definitions:
------------

In the whole document, "==" stands for "is congruent with".

Result of a * b in terms of high/low words:

   (1) hi * 2**64 + lo = a * b

Special primes:

   (2) p = 2**64 - z + 1, where z = 2**n

Single step modular reduction:

   (3) R(hi, lo) = hi * z - hi + lo


Strategy:
---------

   a) Set (hi, lo) to the result of a * b.

   b) Set (hi', lo') to the result of R(hi, lo).

   c) Repeat step b) until 0 <= hi' * 2**64 + lo' < 2*p.

   d) If the result is less than p, return lo'. Otherwise return lo' - p.


The reduction step b) preserves congruence:
-------------------------------------------

    hi * 2**64 + lo == hi * z - hi + lo   (mod p)

    Proof:
    ~~~~~~

       hi * 2**64 + lo = (2**64 - z + 1) * hi + z * hi - hi + lo

                       = p * hi               + z * hi - hi + lo

                       == z * hi - hi + lo   (mod p)


Maximum numbers of step b):
---------------------------

# To avoid unnecessary formalism, define:

def R(hi, lo, z):
     return divmod(hi * z - hi + lo, 2**64)

# For simplicity, assume hi=2**64-1, lo=2**64-1 after the
# initial multiplication a * b. This is of course impossible
# but certainly covers all cases.

# Then, for p1:
hi=2**64-1; lo=2**64-1; z=2**32
p1 = 2**64 - z + 1

hi, lo = R(hi, lo, z)    # First reduction
hi, lo = R(hi, lo, z)    # Second reduction
hi * 2**64 + lo < 2 * p1 # True

# For p2:
hi=2**64-1; lo=2**64-1; z=2**34
p2 = 2**64 - z + 1

hi, lo = R(hi, lo, z)    # First reduction
hi, lo = R(hi, lo, z)    # Second reduction
hi, lo = R(hi, lo, z)    # Third reduction
hi * 2**64 + lo < 2 * p2 # True

# For p3:
hi=2**64-1; lo=2**64-1; z=2**40
p3 = 2**64 - z + 1

hi, lo = R(hi, lo, z)    # First reduction
hi, lo = R(hi, lo, z)    # Second reduction
hi, lo = R(hi, lo, z)    # Third reduction
hi * 2**64 + lo < 2 * p3 # True


Step d) preserves congruence and yields a result < p:
-----------------------------------------------------

   Case hi = 0:

       Case lo < p: trivial.

       Case lo >= p:

          lo == lo - p   (mod p)             # result is congruent

          p <= lo < 2*p  ->  0 <= lo - p < p # result is in the correct range

   Case hi = 1:

       p < 2**64 /\ 2**64 + lo < 2*p  ->  lo < p  # lo is always less than p

       2**64 + lo == 2**64 + (lo - p)   (mod p)   # result is congruent

                  = lo - p   # exactly the same value as the previous RHS
                             # in uint64_t arithmetic.

       p < 2**64 + lo < 2*p  ->  0 < 2**64 + (lo - p) < p  # correct range


[1]  http://www.apfloat.org/apfloat/2.40/apfloat.pdf
Issue #7652: Integrate the decimal floating point libmpdec library to speed up the decimal module. Performance gains of the new C implementation are between 12x and 80x, depending on the application. 2012-03-21 14:25:23 -03:00

			`(* Copyright (c) 2011 Stefan Krah. All rights reserved. *)`


			`==========================================================================`
			`Calculate (a * b) % p using special primes`
			`==========================================================================`

			`A description of the algorithm can be found in the apfloat manual by`
			`Tommila [1].`


			`Definitions:`
			`------------`

			`In the whole document, "==" stands for "is congruent with".`

			`Result of a * b in terms of high/low words:`

			`(1) hi * 2*64 + lo = a b`

			`Special primes:`

			`(2) p = 264 - z + 1, where z = 2n`

			`Single step modular reduction:`

			`(3) R(hi, lo) = hi * z - hi + lo`


			`Strategy:`
			`---------`

			`a) Set (hi, lo) to the result of a * b.`

			`b) Set (hi', lo') to the result of R(hi, lo).`

			`c) Repeat step b) until 0 <= hi' * 2*64 + lo' < 2p.`

			`d) If the result is less than p, return lo'. Otherwise return lo' - p.`


			`The reduction step b) preserves congruence:`
			`-------------------------------------------`

			`hi * 2*64 + lo == hi z - hi + lo (mod p)`

			`Proof:`
			`~~~~~~`

			`hi * 264 + lo = (264 - z + 1) * hi + z * hi - hi + lo`

			`= p * hi + z * hi - hi + lo`

			`== z * hi - hi + lo (mod p)`


			`Maximum numbers of step b):`
			`---------------------------`

Fix two typos. 2013-12-08 15:08:32 -04:00			`# To avoid unnecessary formalism, define:`
Issue #7652: Integrate the decimal floating point libmpdec library to speed up the decimal module. Performance gains of the new C implementation are between 12x and 80x, depending on the application. 2012-03-21 14:25:23 -03:00
			`def R(hi, lo, z):`
			`return divmod(hi * z - hi + lo, 2**64)`

			`# For simplicity, assume hi=264-1, lo=264-1 after the`
			`# initial multiplication a * b. This is of course impossible`
			`# but certainly covers all cases.`

			`# Then, for p1:`
			`hi=264-1; lo=264-1; z=2**32`
			`p1 = 2**64 - z + 1`

			`hi, lo = R(hi, lo, z) # First reduction`
			`hi, lo = R(hi, lo, z) # Second reduction`
			`hi * 2*64 + lo < 2 p1 # True`

			`# For p2:`
			`hi=264-1; lo=264-1; z=2**34`
			`p2 = 2**64 - z + 1`

			`hi, lo = R(hi, lo, z) # First reduction`
			`hi, lo = R(hi, lo, z) # Second reduction`
			`hi, lo = R(hi, lo, z) # Third reduction`
			`hi * 2*64 + lo < 2 p2 # True`

			`# For p3:`
			`hi=264-1; lo=264-1; z=2**40`
			`p3 = 2**64 - z + 1`

			`hi, lo = R(hi, lo, z) # First reduction`
			`hi, lo = R(hi, lo, z) # Second reduction`
			`hi, lo = R(hi, lo, z) # Third reduction`
			`hi * 2*64 + lo < 2 p3 # True`


			`Step d) preserves congruence and yields a result < p:`
			`-----------------------------------------------------`

			`Case hi = 0:`

			`Case lo < p: trivial.`

			`Case lo >= p:`

			`lo == lo - p (mod p) # result is congruent`

			`p <= lo < 2*p -> 0 <= lo - p < p # result is in the correct range`

			`Case hi = 1:`

			`p < 264 /\ 264 + lo < 2*p -> lo < p # lo is always less than p`

			`264 + lo == 264 + (lo - p) (mod p) # result is congruent`

			`= lo - p # exactly the same value as the previous RHS`
			`# in uint64_t arithmetic.`

			`p < 2*64 + lo < 2p -> 0 < 2**64 + (lo - p) < p # correct range`



			`[1] http://www.apfloat.org/apfloat/2.40/apfloat.pdf`