mirror of https://github.com/python/cpython
bpo-46504: faster code for trial quotient in x_divrem() (GH-30856)
* bpo-46504: faster code for trial quotient in x_divrem() This brings x_divrem() back into synch with x_divrem1(), which was changed in bpo-46406 to generate faster code to find machine-word division quotients and remainders. Modern processors compute both with a single machine instruction, but convincing C to exploit that requires writing _less_ "clever" C code.
This commit is contained in:
parent
b18fd54f8c
commit
7c26472d09
|
@ -2767,8 +2767,15 @@ x_divrem(PyLongObject *v1, PyLongObject *w1, PyLongObject **prem)
|
||||||
vtop = vk[size_w];
|
vtop = vk[size_w];
|
||||||
assert(vtop <= wm1);
|
assert(vtop <= wm1);
|
||||||
vv = ((twodigits)vtop << PyLong_SHIFT) | vk[size_w-1];
|
vv = ((twodigits)vtop << PyLong_SHIFT) | vk[size_w-1];
|
||||||
|
/* The code used to compute the remainder via
|
||||||
|
* r = (digit)(vv - (twodigits)wm1 * q);
|
||||||
|
* and compilers generally generated code to do the * and -.
|
||||||
|
* But modern processors generally compute q and r with a single
|
||||||
|
* instruction, and modern optimizing compilers exploit that if we
|
||||||
|
* _don't_ try to optimize it.
|
||||||
|
*/
|
||||||
q = (digit)(vv / wm1);
|
q = (digit)(vv / wm1);
|
||||||
r = (digit)(vv - (twodigits)wm1 * q); /* r = vv % wm1 */
|
r = (digit)(vv % wm1);
|
||||||
while ((twodigits)wm2 * q > (((twodigits)r << PyLong_SHIFT)
|
while ((twodigits)wm2 * q > (((twodigits)r << PyLong_SHIFT)
|
||||||
| vk[size_w-2])) {
|
| vk[size_w-2])) {
|
||||||
--q;
|
--q;
|
||||||
|
|
Loading…
Reference in New Issue