with a single entry point, named exit points at the bottom, more self-evident
refcount adjustments, and a comment describing why the pre-increment was
necessary at all.
The real benefit of the unicode specialized function comes from
bypassing the overhead of PyObject_RichCompareBool() and not
from being in-lined (especially since there was almost no shared
data between the caller and callee). Also, the in-lining was
having a negative effect on code generation for the callee.
Simplifies the code a little bit and does the resize check
only when a new key is added (giving a small speed up in
the case where the key already exists).
Fixes possible bug in set_merge() where the set_insert_key()
call relies on a big resize at the start to make enough room
for the keys but is vulnerable to a comparision callback that
could cause the table to shrink in the middle of the merge.
Also, changed the resize threshold from two-thirds of the
mask+1 to just two-thirds. The plus one offset gave no
real benefit (afterall, the two-thirds mark is just a
heuristic and isn't a precise cut-off).
Elsewhere in the setobject.c code we do a bitwise-and with the mask
instead of using a conditional to reset to zero on wrap-around.
Using that same technique here use gives cleaner, faster, and more
consistent code.
* Move the test for an exact key match to after a hash match
* Use "used" as a loop counter instead of "fill"
* Minor improvements to variable names and code consistency
The setobject freelist was consuming memory but not providing much value.
Even when a freelisted setobject was available, most of the setobject
fields still needed to be initialized and the small table still required
a memset(). This meant that the custom freelisting scheme for sets was
providing almost no incremental benefit over the default Python freelist
scheme used by _PyObject_Malloc() in Objects/obmalloc.c.
Modern processors tend to make consecutive memory accesses cheaper than
random probes into memory.
Small sets can fit into L1 cache, so they get less benefit. But they do
come out ahead because the consecutive probes don't probe the same key
more than once and because the randomization step occurs less frequently
(or not at all).
For the open addressing step, putting the perturb shift before the index
calculation gets the upper bits into play sooner.
The Gdb prettyprint plugin depended on the dummy object being displayable.
Other solutions besides a unicode object are possible. For now, get it
back up and running.
The identity checks in lookkey() need to be there to prevent the dummy
object from leaking through Py_RichCompareBool() into user code in the
rare circumstance where the dummy's hash value exactly matches the hash
value of the actual key being looked up.
Letting the compiler decide how to optimize the multiply by five
gives it the freedom to make better choices for the best technique
for a given target machine.
For example, GCC on x86_64 produces a little bit better code:
Old-way (3 steps with a data dependency between each step):
shrq $5, %r13
leaq 1(%rbx,%r13), %rax
leaq (%rax,%rbx,4), %rbx
New-way (3 steps with no dependency between the first two steps
which can be run in parallel):
leaq (%rbx,%rbx,4), %rax # i*5
shrq $5, %r13 # perturb >>= PERTURB_SHIFT
leaq 1(%r13,%rax), %rbx # 1 + perturb + i*5
computation as the overflow behavior of signed integers is undefined.
NOTE: This change is smaller compared to 3.2 as much of this cleanup had
already been done. I added the comment that my change in 3.2 added so that the
code would match up. Otherwise this just adds or synchronizes appropriate UL
designations on some constants to be pedantic.
In practice we require compiling everything with -fwrapv which forces overflow
to be defined as twos compliment but this keeps the code cleaner for checkers
or in the case where someone has compiled it without -fwrapv or their
compiler's equivalent.
Found by Clang trunk's Undefined Behavior Sanitizer (UBSan).
Cleanup only - no functionality or hash values change.
computation as the overflow behavior of signed integers is undefined.
In practice we require compiling everything with -fwrapv which forces overflow
to be defined as twos compliment but this keeps the code cleaner for checkers
or in the case where someone has compiled it without -fwrapv or their
compiler's equivalent.
Found by Clang trunk's Undefined Behavior Sanitizer (UBSan).
Cleanup only - no functionality or hash values change.