Bump the blocksize up from 62 to 64 to speed up the modulo calculation.

Remove the old comment suggesting that it was desireable to have blocksize+2 as a multiple of the cache line length. That would have made sense only if the block structure start point was always aligned to a cache line boundary. However, the memory allocations are 16 byte aligned, so we don't really have control over whether the struct spills across cache line boundaries.
2015-02-26 23:21:29 -08:00 · 2015-02-26 23:21:29 -08:00 · daf57f25e5
parent b1e6e57a17
commit daf57f25e5
2 changed files with 4 additions and 7 deletions
--- a/Lib/test/test_deque.py
+++ b/Lib/test/test_deque.py
@ -542,7 +542,7 @@ class TestBasic(unittest.TestCase):

    @support.cpython_only
    def test_sizeof(self):
-        BLOCKLEN = 62
+        BLOCKLEN = 64
        basesize = support.calcobjsize('2P4nlP')
        blocksize = struct.calcsize('2P%dP' % BLOCKLEN)
        self.assertEqual(object.__sizeof__(deque()), basesize)
--- a/Modules/_collectionsmodule.c
+++ b/Modules/_collectionsmodule.c
@ -10,14 +10,11 @@
 /* The block length may be set to any number over 1.  Larger numbers
 * reduce the number of calls to the memory allocator, give faster
 * indexing and rotation, and reduce the link::data overhead ratio.
- *
- * Ideally, the block length will be set to two less than some
- * multiple of the cache-line length (so that the full block
- * including the leftlink and rightlink will fit neatly into
- * cache lines).
+ * Making the block length a power of two speeds-up the modulo
+ * calculation in deque_item().
 */

-#define BLOCKLEN 62
+#define BLOCKLEN 64
 #define CENTER ((BLOCKLEN - 1) / 2)

 /* A `dequeobject` is composed of a doubly-linked list of `block` nodes.