k_mul() comments: In honor of Dijkstra, made the proof that "t3 fits"

rigorous instead of hoping for testing not to turn up counterexamples.
Call me heretical, but despite that I'm wholly confident in the proof,
and have done it two different ways now, I still put more faith in
testing ...
This commit is contained in:
Tim Peters 2002-08-15 20:06:00 +00:00
parent 9973d74b2d
commit ab86c2be24
1 changed files with 30 additions and 27 deletions

View File

@ -1757,40 +1757,43 @@ k_mul(PyLongObject *a, PyLongObject *b)
/* (*) Why adding t3 can't "run out of room" above.
We allocated space for asize + bsize result digits. We're adding t3 at an
offset of shift digits, so there are asize + bsize - shift allocated digits
remaining. Because degenerate shifts of "a" were weeded out, asize is at
least shift + 1. If bsize is odd then bsize == 2*shift + 1, else bsize ==
2*shift. Therefore there are at least shift+1 + 2*shift - shift =
Let f(x) mean the floor of x and c(x) mean the ceiling of x. Some facts
to start with:
2*shift+1 allocated digits remaining when bsize is even, or at least
2*shift+2 allocated digits remaining when bsize is odd.
1. For any integer i, i = c(i/2) + f(i/2). In particular,
bsize = c(bsize/2) + f(bsize/2).
2. shift = f(bsize/2)
3. asize <= bsize
4. Since we call k_lopsided_mul if asize*2 <= bsize, asize*2 > bsize in this
routine, so asize > bsize/2 >= f(bsize/2) in this routine.
Now in bh+bl, if bsize is even bh has at most shift digits, while if bsize
is odd bh has at most shift+1 digits. The sum bh+bl has at most
We allocated asize + bsize result digits, and add t3 into them at an offset
of shift. This leaves asize+bsize-shift allocated digit positions for t3
to fit into, = (by #1 and #2) asize + f(bsize/2) + c(bsize/2) - f(bsize/2) =
asize + c(bsize/2) available digit positions.
shift digits plus 1 bit when bsize is even
shift+1 digits plus 1 bit when bsize is odd
bh has c(bsize/2) digits, and bl at most f(size/2) digits. So bh+hl has
at most c(bsize/2) digits + 1 bit.
The same is true of ah+al, so (ah+al)(bh+bl) has at most
If asize == bsize, ah has c(bsize/2) digits, else ah has at most f(bsize/2)
digits, and al has at most f(bsize/2) digits in any case. So ah+al has at
most (asize == bsize ? c(bsize/2) : f(bsize/2)) digits + 1 bit.
2*shift digits + 2 bits when bsize is even
2*shift+2 digits + 2 bits when bsize is odd
The product (ah+al)*(bh+bl) therefore has at most
If bsize is even, we have at most 2*shift digits + 2 bits to fit into at
least 2*shift+1 digits. Since a digit has SHIFT bits, and SHIFT >= 2,
there's always enough room to fit the 2 bits into the "spare" digit.
c(bsize/2) + (asize == bsize ? c(bsize/2) : f(bsize/2)) digits + 2 bits
If bsize is odd, we have at most 2*shift+2 digits + 2 bits to fit into at
least 2*shift+2 digits, and there's not obviously enough room for the
extra two bits. We need a sharper analysis in this case. The major
laziness was in the "the same is true of ah+al" clause: ah+al can't actually
have shift+1 digits + 1 bit unless bsize is odd and asize == bsize. In that
case, we actually have (2*shift+1)*2 - shift = 3*shift+2 allocated digits
remaining, and that's obviously plenty to hold 2*shift+2 digits + 2 bits.
Else (bsize is odd and asize < bsize) ah and al each have at most shift digits,
so ah+al has at most shift digits + 1 bit, and (ah+al)*(bh+bl) has at most
2*shift+1 digits + 2 bits, and again 2*shift+2 digits is enough to hold it.
and we have asize + c(bsize/2) available digit positions. We need to show
this is always enough. An instance of c(bsize/2) cancels out in both, so
the question reduces to whether asize digits is enough to hold
(asize == bsize ? c(bsize/2) : f(bsize/2)) digits + 2 bits. If asize < bsize,
then we're asking whether asize digits >= f(bsize/2) digits + 2 bits. By #4,
asize is at least f(bsize/2)+1 digits, so this in turn reduces to whether 1
digit is enough to hold 2 bits. This is so since SHIFT=15 >= 2. If
asize == bsize, then we're asking whether bsize digits is enough to hold
f(bsize/2) digits + 2 bits, or equivalently (by #1) whether c(bsize/2) digits
is enough to hold 2 bits. This is so if bsize >= 1, which holds because
bsize >= KARATSUBA_CUTOFF >= 1.
Note that since there's always enough room for (ah+al)*(bh+bl), and that's
clearly >= each of ah*bh and al*bl, there's always enough room to subtract