Commit Graph

101 Commits

Author SHA1 Message Date
Raymond Hettinger 87e691240b Minor neatening-up. Make assignments in same order a struct fields. Line-up comments. 2015-03-02 23:32:02 -08:00
Raymond Hettinger f9d9c79aa8 Switch the state variable to unsigned for defined wrap-around behavior. 2015-03-02 22:47:46 -08:00
Raymond Hettinger 30c9074b96 Minor beautification. Move struct definitions to the top. Fix-up a comment. 2015-03-02 22:31:35 -08:00
Raymond Hettinger f30f5b9ba6 Minor code beautification. Replace macro with in-lineable functions. 2015-03-02 22:23:37 -08:00
Raymond Hettinger 3c186ba441 Beautify and better document the use of the size_t cast for bounds checking. 2015-03-02 21:45:02 -08:00
Raymond Hettinger 7f9ea7543e Issue #23553: Use an unsigned cast to tighten-up the bounds checking logic. 2015-03-01 00:38:00 -08:00
Raymond Hettinger c20830804d Need a (size_t) cast instead of (unsigned) to be big enough for a Py_ssize_t. 2015-02-28 23:29:16 -08:00
Raymond Hettinger a473b9da15 Use unsigned division and modulo for item assignment as well. 2015-02-28 17:49:47 -08:00
Raymond Hettinger 63d1ff2a0b Convert one more division to unsigned arithmetic to speed-up deque_item(). 2015-02-28 07:41:30 -08:00
Raymond Hettinger 7e8c7956a7 Line missed in last checkin 2015-02-27 16:59:29 -08:00
Raymond Hettinger da2850f932 Since the index is always non-negative, use faster unsigned division and modulo. 2015-02-27 12:42:54 -08:00
Raymond Hettinger daf57f25e5 Bump the blocksize up from 62 to 64 to speed up the modulo calculation.
Remove the old comment suggesting that it was desireable to have
blocksize+2 as a multiple of the cache line length.  That would
have made sense only if the block structure start point was always
aligned to a cache line boundary.  However, the memory allocations
are 16 byte aligned, so we don't really have control over whether
the struct spills across cache line boundaries.
2015-02-26 23:21:29 -08:00
Raymond Hettinger eb6b554fbc Update copyright. 2015-02-10 22:37:22 -06:00
Raymond Hettinger 0e259f18f7 Optimization guides suggest copying memory in an ascending direction when possible. 2015-02-01 22:53:41 -08:00
Raymond Hettinger 507d997714 Add comment and make minor code clean-up to improve clarity. 2014-05-18 21:32:40 +01:00
Raymond Hettinger 4b0b1accb5 Issue #21101: Eliminate double hashing in the C code for collections.Counter(). 2014-05-03 16:41:19 -07:00
Raymond Hettinger 5402315626 Add implementation notes 2014-04-23 00:58:48 -07:00
Benjamin Peterson e19d9d1467 merge 3.3 (#20250) 2014-01-13 23:56:30 -05:00
Benjamin Peterson 9cb33b7d03 correct defaultdict signature in docstring (closes #20250)
Patch from Andrew Barnert.
2014-01-13 23:56:05 -05:00
Victor Stinner e7f516cbb8 Issue #19512: _count_elements() of _collections reuses PyId_get identifier
instead of literal "get" string
2013-11-06 23:52:55 +01:00
Raymond Hettinger 07573d7b24 merge 2013-10-04 16:52:39 -07:00
Raymond Hettinger cb1d96f782 Issue #18594: Make the C code more closely match the pure python code. 2013-10-04 16:51:02 -07:00
Raymond Hettinger 75f65e368e merge 2013-10-01 21:38:37 -07:00
Raymond Hettinger 224c87d60c Issue #18594: Fix the fallback path in collections.Counter(). 2013-10-01 21:36:09 -07:00
Raymond Hettinger c13516b0a0 merge 2013-10-01 01:00:59 -07:00
Raymond Hettinger 2ff2190b62 Issue #18594: Fix the fast path for collections.Counter().
The path wasn't being taken due to an over-restrictive type check.
2013-10-01 00:55:43 -07:00
Raymond Hettinger 77578204d6 Restore the data block size to 62.
The former block size traded away good fit within cache lines in
order to gain faster division in deque_item().  However, compilers
are getting smarter and can now replace the slow division operation
with a fast integer multiply and right shift.  Accordingly, it makes
sense to go back to a size that lets blocks neatly fill entire
cache-lines.

GCC-4.8 and CLANG 4.0 both compute "x // 62" with something
roughly equivalent to "x * 9520900167075897609 >> 69".
2013-07-28 02:39:49 -07:00
Raymond Hettinger 3223dd5c22 Assertions key off NDEBUG 2013-07-26 23:14:22 -07:00
Raymond Hettinger b97cc49c3a Minor code simplification by eliminating an unnecessary temporary variable. 2013-07-21 01:51:07 -07:00
Raymond Hettinger 90dea4ce43 Tweak the deque struct by moving the least used fields (maxlen and weakref) to the end. 2013-07-13 22:30:25 -07:00
Raymond Hettinger 840533bf1c Use a do-while loop in the inner loop for rotate (m is always greater than zero). 2013-07-13 17:03:58 -07:00
Raymond Hettinger 3959af9b2a Move the freeblock() call outside the main loop to speed-up and simplify the block re-use logic. 2013-07-13 02:34:08 -07:00
Raymond Hettinger d9c116ca40 Add a spacing saving heuristic to deque's extend methods 2013-07-09 00:13:21 -07:00
Raymond Hettinger b385529ddf Fix #ifdef 2013-07-07 02:07:23 -10:00
Raymond Hettinger 82df925451 Use macros for marking and checking endpoints in the doubly-linked list of blocks.
* Add comment explaining the endpoint checks
* Only do the checks in a debug build
* Simplify newblock() to only require a length argument
  and leave the link updates to the calling code.
* Also add comment for the freelisting logic.
2013-07-07 01:43:42 -10:00
Raymond Hettinger f3a67b7e57 Improve variable names in deque_count() 2013-07-06 17:49:06 -10:00
Raymond Hettinger df715ba54d Apply the PyObject_VAR_HEAD and Py_SIZE macros
to be consistent with practices in other modules.
2013-07-06 13:01:13 -10:00
Raymond Hettinger 5bfa8671bc Refactor deque_traverse().
Hoist conditional expression out of the loop.
Use rightblock as the guard instead of checking for NULL.
2013-07-06 11:58:09 -10:00
Raymond Hettinger 98054b4c1b Remove unnecessary branches from count() and reverse(). 2013-07-06 09:07:06 -10:00
Raymond Hettinger de68e0cf0e Speed-up deque indexing by changing the deque block length to a power of two.
The division and modulo calculation in deque_item() can be compiled
to fast bitwise operations when the BLOCKLEN is a power of two.

Timing before:

 ~/cpython $ py -m timeit -r7 -s 'from collections import deque' -s 'd=deque(range(10))' 'd[5]'
10000000 loops, best of 7: 0.0627 usec per loop

Timing after:

~/cpython $ py -m timeit -r7 -s 'from collections import deque' -s 'd=deque(range(10))' 'd[5]'
10000000 loops, best of 7: 0.0581 usec per loop
2013-07-05 18:05:29 -10:00
Raymond Hettinger 20b0f87e1d Misc improvements to collections.deque()
* Clarified comment on the impact of BLOCKLEN on deque_index
  (with a power-of-two, the division and modulo
   computations are done with a right-shift and bitwise-and).

* Clarified comment on the overflow check to note that
  it is general and not just applicable the 64-bit builds.

* In deque._rotate(), the "deque->" indirections are
  factored-out of the loop (loop invariant code motion),
  leaving the code cleaner looking and slightly faster.

* In deque._rotate(), replaced the memcpy() with an
  equivalent loop.  That saved the memcpy setup time
  and allowed the pointers to move in their natural
  leftward and rightward directions.

See comparative timings at:  http://pastebin.com/p0RJnT5N
2013-06-23 15:44:33 -07:00
Raymond Hettinger 986bbfc079 Backport deque.rotate() improvements. 2013-02-09 20:00:55 -05:00
Raymond Hettinger 59cf23ab07 Minor tweaks to varnames, declarations, and comments. 2013-02-07 00:57:19 -05:00
Raymond Hettinger 1f0044c473 Minor variable access clean-ups for deque.rotate(). 2013-02-05 01:30:46 -05:00
Raymond Hettinger a4409c18eb Minor edits: Tighten-up the halflen logic and touch-up the assertions and comments. 2013-02-04 00:08:12 -05:00
Raymond Hettinger 3a9ae7fd98 Issue 16398: One more assertion for good measure. 2013-02-02 12:26:37 -08:00
Raymond Hettinger 231ee4dc9d Issue 16398: Add assertions to show why memcmp is safe. 2013-02-02 11:24:43 -08:00
Raymond Hettinger 21777acd68 Issue 16398: Use memcpy() in deque.rotate(). 2013-02-02 09:56:08 -08:00
Benjamin Peterson fa3965ab76 merge 3.3 2013-01-12 21:22:33 -05:00
Benjamin Peterson 0e5c48a917 make deque_clear void, since it's infallible 2013-01-12 21:22:18 -05:00