bpo-46504: faster code for trial quotient in x_divrem() (GH-30856)

* bpo-46504: faster code for trial quotient in x_divrem()

This brings x_divrem() back into synch with x_divrem1(), which was changed
in bpo-46406 to generate faster code to find machine-word division
quotients and remainders. Modern processors compute both with a single
machine instruction, but convincing C to exploit that requires writing
_less_ "clever" C code.
This commit is contained in:
Tim Peters 2022-01-24 19:06:00 -06:00 committed by GitHub
parent b18fd54f8c
commit 7c26472d09
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
1 changed files with 8 additions and 1 deletions

View File

@ -2767,8 +2767,15 @@ x_divrem(PyLongObject *v1, PyLongObject *w1, PyLongObject **prem)
vtop = vk[size_w]; vtop = vk[size_w];
assert(vtop <= wm1); assert(vtop <= wm1);
vv = ((twodigits)vtop << PyLong_SHIFT) | vk[size_w-1]; vv = ((twodigits)vtop << PyLong_SHIFT) | vk[size_w-1];
/* The code used to compute the remainder via
* r = (digit)(vv - (twodigits)wm1 * q);
* and compilers generally generated code to do the * and -.
* But modern processors generally compute q and r with a single
* instruction, and modern optimizing compilers exploit that if we
* _don't_ try to optimize it.
*/
q = (digit)(vv / wm1); q = (digit)(vv / wm1);
r = (digit)(vv - (twodigits)wm1 * q); /* r = vv % wm1 */ r = (digit)(vv % wm1);
while ((twodigits)wm2 * q > (((twodigits)r << PyLong_SHIFT) while ((twodigits)wm2 * q > (((twodigits)r << PyLong_SHIFT)
| vk[size_w-2])) { | vk[size_w-2])) {
--q; --q;