:mod:`collections` --- High-performance container datatypes =========================================================== .. module:: collections :synopsis: High-performance datatypes .. moduleauthor:: Raymond Hettinger .. sectionauthor:: Raymond Hettinger .. versionadded:: 2.4 This module implements high-performance container datatypes. Currently, there are two datatypes, :class:`deque` and :class:`defaultdict`, and one datatype factory function, :func:`namedtuple`. Python already includes built-in containers, :class:`dict`, :class:`list`, :class:`set`, and :class:`tuple`. In addition, the optional :mod:`bsddb` module has a :meth:`bsddb.btopen` method that can be used to create in-memory or file based ordered dictionaries with string keys. Future editions of the standard library may include balanced trees and ordered dictionaries. .. versionchanged:: 2.5 Added :class:`defaultdict`. .. versionchanged:: 2.6 Added :func:`namedtuple`. .. _deque-objects: :class:`deque` objects ---------------------- .. class:: deque([iterable[, maxlen]]) Returns a new deque object initialized left-to-right (using :meth:`append`) with data from *iterable*. If *iterable* is not specified, the new deque is empty. Deques are a generalization of stacks and queues (the name is pronounced "deck" and is short for "double-ended queue"). Deques support thread-safe, memory efficient appends and pops from either side of the deque with approximately the same O(1) performance in either direction. Though :class:`list` objects support similar operations, they are optimized for fast fixed-length operations and incur O(n) memory movement costs for ``pop(0)`` and ``insert(0, v)`` operations which change both the size and position of the underlying data representation. .. versionadded:: 2.4 If *maxlen* is not specified or is *None*, deques may grow to an arbitrary length. Otherwise, the deque is bounded to the specified maximum length. Once a bounded length deque is full, when new items are added, a corresponding number of items are discarded from the opposite end. Bounded length deques provide functionality similar to the ``tail`` filter in Unix. They are also useful for tracking transactions and other pools of data where only the most recent activity is of interest. .. versionchanged:: 2.6 Added *maxlen* Deque objects support the following methods: .. method:: deque.append(x) Add *x* to the right side of the deque. .. method:: deque.appendleft(x) Add *x* to the left side of the deque. .. method:: deque.clear() Remove all elements from the deque leaving it with length 0. .. method:: deque.extend(iterable) Extend the right side of the deque by appending elements from the iterable argument. .. method:: deque.extendleft(iterable) Extend the left side of the deque by appending elements from *iterable*. Note, the series of left appends results in reversing the order of elements in the iterable argument. .. method:: deque.pop() Remove and return an element from the right side of the deque. If no elements are present, raises an :exc:`IndexError`. .. method:: deque.popleft() Remove and return an element from the left side of the deque. If no elements are present, raises an :exc:`IndexError`. .. method:: deque.remove(value) Removed the first occurrence of *value*. If not found, raises a :exc:`ValueError`. .. versionadded:: 2.5 .. method:: deque.rotate(n) Rotate the deque *n* steps to the right. If *n* is negative, rotate to the left. Rotating one step to the right is equivalent to: ``d.appendleft(d.pop())``. In addition to the above, deques support iteration, pickling, ``len(d)``, ``reversed(d)``, ``copy.copy(d)``, ``copy.deepcopy(d)``, membership testing with the :keyword:`in` operator, and subscript references such as ``d[-1]``. Example:: >>> from collections import deque >>> d = deque('ghi') # make a new deque with three items >>> for elem in d: # iterate over the deque's elements ... print elem.upper() G H I >>> d.append('j') # add a new entry to the right side >>> d.appendleft('f') # add a new entry to the left side >>> d # show the representation of the deque deque(['f', 'g', 'h', 'i', 'j']) >>> d.pop() # return and remove the rightmost item 'j' >>> d.popleft() # return and remove the leftmost item 'f' >>> list(d) # list the contents of the deque ['g', 'h', 'i'] >>> d[0] # peek at leftmost item 'g' >>> d[-1] # peek at rightmost item 'i' >>> list(reversed(d)) # list the contents of a deque in reverse ['i', 'h', 'g'] >>> 'h' in d # search the deque True >>> d.extend('jkl') # add multiple elements at once >>> d deque(['g', 'h', 'i', 'j', 'k', 'l']) >>> d.rotate(1) # right rotation >>> d deque(['l', 'g', 'h', 'i', 'j', 'k']) >>> d.rotate(-1) # left rotation >>> d deque(['g', 'h', 'i', 'j', 'k', 'l']) >>> deque(reversed(d)) # make a new deque in reverse order deque(['l', 'k', 'j', 'i', 'h', 'g']) >>> d.clear() # empty the deque >>> d.pop() # cannot pop from an empty deque Traceback (most recent call last): File "", line 1, in -toplevel- d.pop() IndexError: pop from an empty deque >>> d.extendleft('abc') # extendleft() reverses the input order >>> d deque(['c', 'b', 'a']) .. _deque-recipes: :class:`deque` Recipes ^^^^^^^^^^^^^^^^^^^^^^ This section shows various approaches to working with deques. The :meth:`rotate` method provides a way to implement :class:`deque` slicing and deletion. For example, a pure python implementation of ``del d[n]`` relies on the :meth:`rotate` method to position elements to be popped:: def delete_nth(d, n): d.rotate(-n) d.popleft() d.rotate(n) To implement :class:`deque` slicing, use a similar approach applying :meth:`rotate` to bring a target element to the left side of the deque. Remove old entries with :meth:`popleft`, add new entries with :meth:`extend`, and then reverse the rotation. With minor variations on that approach, it is easy to implement Forth style stack manipulations such as ``dup``, ``drop``, ``swap``, ``over``, ``pick``, ``rot``, and ``roll``. Multi-pass data reduction algorithms can be succinctly expressed and efficiently coded by extracting elements with multiple calls to :meth:`popleft`, applying a reduction function, and calling :meth:`append` to add the result back to the deque. For example, building a balanced binary tree of nested lists entails reducing two adjacent nodes into one by grouping them in a list:: >>> def maketree(iterable): ... d = deque(iterable) ... while len(d) > 1: ... pair = [d.popleft(), d.popleft()] ... d.append(pair) ... return list(d) ... >>> print maketree('abcdefgh') [[[['a', 'b'], ['c', 'd']], [['e', 'f'], ['g', 'h']]]] Bounded length deques provide functionality similar to the ``tail`` filter in Unix:: def tail(filename, n=10): 'Return the last n lines of a file' return deque(open(filename), n) .. _defaultdict-objects: :class:`defaultdict` objects ---------------------------- .. class:: defaultdict([default_factory[, ...]]) Returns a new dictionary-like object. :class:`defaultdict` is a subclass of the builtin :class:`dict` class. It overrides one method and adds one writable instance variable. The remaining functionality is the same as for the :class:`dict` class and is not documented here. The first argument provides the initial value for the :attr:`default_factory` attribute; it defaults to ``None``. All remaining arguments are treated the same as if they were passed to the :class:`dict` constructor, including keyword arguments. .. versionadded:: 2.5 :class:`defaultdict` objects support the following method in addition to the standard :class:`dict` operations: .. method:: defaultdict.__missing__(key) If the :attr:`default_factory` attribute is ``None``, this raises an :exc:`KeyError` exception with the *key* as argument. If :attr:`default_factory` is not ``None``, it is called without arguments to provide a default value for the given *key*, this value is inserted in the dictionary for the *key*, and returned. If calling :attr:`default_factory` raises an exception this exception is propagated unchanged. This method is called by the :meth:`__getitem__` method of the :class:`dict` class when the requested key is not found; whatever it returns or raises is then returned or raised by :meth:`__getitem__`. :class:`defaultdict` objects support the following instance variable: .. attribute:: defaultdict.default_factory This attribute is used by the :meth:`__missing__` method; it is initialized from the first argument to the constructor, if present, or to ``None``, if absent. .. _defaultdict-examples: :class:`defaultdict` Examples ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Using :class:`list` as the :attr:`default_factory`, it is easy to group a sequence of key-value pairs into a dictionary of lists:: >>> s = [('yellow', 1), ('blue', 2), ('yellow', 3), ('blue', 4), ('red', 1)] >>> d = defaultdict(list) >>> for k, v in s: ... d[k].append(v) ... >>> d.items() [('blue', [2, 4]), ('red', [1]), ('yellow', [1, 3])] When each key is encountered for the first time, it is not already in the mapping; so an entry is automatically created using the :attr:`default_factory` function which returns an empty :class:`list`. The :meth:`list.append` operation then attaches the value to the new list. When keys are encountered again, the look-up proceeds normally (returning the list for that key) and the :meth:`list.append` operation adds another value to the list. This technique is simpler and faster than an equivalent technique using :meth:`dict.setdefault`:: >>> d = {} >>> for k, v in s: ... d.setdefault(k, []).append(v) ... >>> d.items() [('blue', [2, 4]), ('red', [1]), ('yellow', [1, 3])] Setting the :attr:`default_factory` to :class:`int` makes the :class:`defaultdict` useful for counting (like a bag or multiset in other languages):: >>> s = 'mississippi' >>> d = defaultdict(int) >>> for k in s: ... d[k] += 1 ... >>> d.items() [('i', 4), ('p', 2), ('s', 4), ('m', 1)] When a letter is first encountered, it is missing from the mapping, so the :attr:`default_factory` function calls :func:`int` to supply a default count of zero. The increment operation then builds up the count for each letter. The function :func:`int` which always returns zero is just a special case of constant functions. A faster and more flexible way to create constant functions is to use :func:`itertools.repeat` which can supply any constant value (not just zero):: >>> def constant_factory(value): ... return itertools.repeat(value).next >>> d = defaultdict(constant_factory('')) >>> d.update(name='John', action='ran') >>> '%(name)s %(action)s to %(object)s' % d 'John ran to ' Setting the :attr:`default_factory` to :class:`set` makes the :class:`defaultdict` useful for building a dictionary of sets:: >>> s = [('red', 1), ('blue', 2), ('red', 3), ('blue', 4), ('red', 1), ('blue', 4)] >>> d = defaultdict(set) >>> for k, v in s: ... d[k].add(v) ... >>> d.items() [('blue', set([2, 4])), ('red', set([1, 3]))] .. _named-tuple-factory: :func:`namedtuple` Factory Function for Tuples with Named Fields ----------------------------------------------------------------- Named tuples assign meaning to each position in a tuple and allow for more readable, self-documenting code. They can be used wherever regular tuples are used, and they add the ability to access fields by name instead of position index. .. function:: namedtuple(typename, fieldnames, [verbose]) Returns a new tuple subclass named *typename*. The new subclass is used to create tuple-like objects that have fields accessable by attribute lookup as well as being indexable and iterable. Instances of the subclass also have a helpful docstring (with typename and fieldnames) and a helpful :meth:`__repr__` method which lists the tuple contents in a ``name=value`` format. The *fieldnames* are a single string with each fieldname separated by whitespace and/or commas (for example 'x y' or 'x, y'). Alternatively, the *fieldnames* can be specified as a list of strings (such as ['x', 'y']). Any valid Python identifier may be used for a fieldname except for names starting and ending with double underscores. Valid identifiers consist of letters, digits, and underscores but do not start with a digit and cannot be a :mod:`keyword` such as *class*, *for*, *return*, *global*, *pass*, *print*, or *raise*. If *verbose* is true, will print the class definition. Named tuple instances do not have per-instance dictionaries, so they are lightweight and require no more memory than regular tuples. .. versionadded:: 2.6 Example:: >>> Point = namedtuple('Point', 'x y', verbose=True) class Point(tuple): 'Point(x, y)' __slots__ = () __fields__ = ('x', 'y') def __new__(cls, x, y): return tuple.__new__(cls, (x, y)) def __repr__(self): return 'Point(x=%r, y=%r)' % self def __asdict__(self): 'Return a new dict mapping field names to their values' return dict(zip(('x', 'y'), self)) def __replace__(self, field, value): 'Return a new Point object replacing specified fields with new values' return Point(**dict(zip(('x', 'y'), self) + kwds.items())) x = property(itemgetter(0)) y = property(itemgetter(1)) >>> p = Point(11, y=22) # instantiate with positional or keyword arguments >>> p[0] + p[1] # indexable like the regular tuple (11, 22) 33 >>> x, y = p # unpack like a regular tuple >>> x, y (11, 22) >>> p.x + p.y # fields also accessable by name 33 >>> p # readable __repr__ with a name=value style Point(x=11, y=22) Named tuples are especially useful for assigning field names to result tuples returned by the :mod:`csv` or :mod:`sqlite3` modules:: EmployeeRecord = namedtuple('EmployeeRecord', 'name, age, title, department, paygrade') from itertools import starmap import csv for emp in starmap(EmployeeRecord, csv.reader(open("employees.csv", "rb"))): print emp.name, emp.title import sqlite3 conn = sqlite3.connect('/companydata') cursor = conn.cursor() cursor.execute('SELECT name, age, title, department, paygrade FROM employees') for emp in starmap(EmployeeRecord, cursor.fetchall()): print emp.name, emp.title When casting a single record to a named tuple, use the star-operator [#]_ to unpack the values:: >>> t = [11, 22] >>> Point(*t) # the star-operator unpacks any iterable object Point(x=11, y=22) When casting a dictionary to a named tuple, use the double-star-operator:: >>> d = {'x': 11, 'y': 22} >>> Point(**d) Point(x=11, y=22) In addition to the methods inherited from tuples, named tuples support two additonal methods and a read-only attribute. .. method:: somenamedtuple.__asdict__() Return a new dict which maps field names to their corresponding values: :: >>> p.__asdict__() {'x': 11, 'y': 22} .. method:: somenamedtuple.__replace__(kwargs) Return a new instance of the named tuple replacing specified fields with new values: :: >>> p = Point(x=11, y=22) >>> p.__replace__(x=33) Point(x=33, y=22) >>> for partnum, record in inventory.items(): ... inventory[partnum] = record.__replace__(price=newprices[partnum], updated=time.now()) .. attribute:: somenamedtuple.__fields__ Return a tuple of strings listing the field names. This is useful for introspection and for creating new named tuple types from existing named tuples. :: >>> p.__fields__ # view the field names ('x', 'y') >>> Color = namedtuple('Color', 'red green blue') >>> Pixel = namedtuple('Pixel', Point.__fields__ + Color.__fields__) >>> Pixel(11, 22, 128, 255, 0) Pixel(x=11, y=22, red=128, green=255, blue=0)' Since a named tuple is a regular Python class, it is easy to add or change functionality. For example, the display format can be changed by overriding the :meth:`__repr__` method: :: >>> Point = namedtuple('Point', 'x y') >>> Point.__repr__ = lambda self: 'Point(%.3f, %.3f)' % self >>> Point(x=10, y=20) Point(10.000, 20.000) Default values can be implemented by starting with a prototype instance and customizing it with :meth:`__replace__`: :: >>> Account = namedtuple('Account', 'owner balance transaction_count') >>> model_account = Account('', 0.0, 0) >>> johns_account = model_account.__replace__(owner='John') .. rubric:: Footnotes .. [#] For information on the star-operator see :ref:`tut-unpacking-arguments` and :ref:`calls`.