2007-08-15 11:28:22 -03:00
|
|
|
:mod:`bisect` --- Array bisection algorithm
|
|
|
|
===========================================
|
|
|
|
|
|
|
|
.. module:: bisect
|
|
|
|
:synopsis: Array bisection algorithms for binary searching.
|
|
|
|
.. sectionauthor:: Fred L. Drake, Jr. <fdrake@acm.org>
|
2010-09-01 03:58:25 -03:00
|
|
|
.. sectionauthor:: Raymond Hettinger <python at rcn.com>
|
Merged revisions 59605-59624 via svnmerge from
svn+ssh://pythondev@svn.python.org/python/trunk
........
r59606 | georg.brandl | 2007-12-29 11:57:00 +0100 (Sat, 29 Dec 2007) | 2 lines
Some cleanup in the docs.
........
r59611 | martin.v.loewis | 2007-12-29 19:49:21 +0100 (Sat, 29 Dec 2007) | 2 lines
Bug #1699: Define _BSD_SOURCE only on OpenBSD.
........
r59612 | raymond.hettinger | 2007-12-29 23:09:34 +0100 (Sat, 29 Dec 2007) | 1 line
Simpler documentation for itertools.tee(). Should be backported.
........
r59613 | raymond.hettinger | 2007-12-29 23:16:24 +0100 (Sat, 29 Dec 2007) | 1 line
Improve docs for itertools.groupby(). The use of xrange(0) to create a unique object is less obvious than object().
........
r59620 | christian.heimes | 2007-12-31 15:47:07 +0100 (Mon, 31 Dec 2007) | 3 lines
Added wininst-9.0.exe executable for VS 2008
Integrated bdist_wininst into PCBuild9 directory
........
r59621 | christian.heimes | 2007-12-31 15:51:18 +0100 (Mon, 31 Dec 2007) | 1 line
Moved PCbuild directory to PC/VS7.1
........
r59622 | christian.heimes | 2007-12-31 15:59:26 +0100 (Mon, 31 Dec 2007) | 1 line
Fix paths for build bot
........
r59623 | christian.heimes | 2007-12-31 16:02:41 +0100 (Mon, 31 Dec 2007) | 1 line
Fix paths for build bot, part 2
........
r59624 | christian.heimes | 2007-12-31 16:18:55 +0100 (Mon, 31 Dec 2007) | 1 line
Renamed PCBuild9 directory to PCBuild
........
2007-12-31 12:14:33 -04:00
|
|
|
.. example based on the PyModules FAQ entry by Aaron Watters <arw@pythonpros.com>
|
2007-08-15 11:28:22 -03:00
|
|
|
|
2011-01-09 23:26:08 -04:00
|
|
|
**Source code:** :source:`Lib/bisect.py`
|
|
|
|
|
2011-01-10 15:54:11 -04:00
|
|
|
--------------
|
|
|
|
|
2007-08-15 11:28:22 -03:00
|
|
|
This module provides support for maintaining a list in sorted order without
|
|
|
|
having to sort the list after each insertion. For long lists of items with
|
2022-08-09 03:31:50 -03:00
|
|
|
expensive comparison operations, this can be an improvement over
|
|
|
|
linear searches or frequent resorting.
|
|
|
|
|
|
|
|
The module is called :mod:`bisect` because it uses a basic bisection
|
|
|
|
algorithm to do its work. Unlike other bisection tools that search for a
|
|
|
|
specific value, the functions in this module are designed to locate an
|
|
|
|
insertion point. Accordingly, the functions never call an :meth:`__eq__`
|
|
|
|
method to determine whether a value has been found. Instead, the
|
|
|
|
functions only call the :meth:`__lt__` method and will return an insertion
|
|
|
|
point between values in an array.
|
2007-08-15 11:28:22 -03:00
|
|
|
|
|
|
|
The following functions are provided:
|
|
|
|
|
|
|
|
|
2020-10-20 02:04:01 -03:00
|
|
|
.. function:: bisect_left(a, x, lo=0, hi=len(a), *, key=None)
|
2007-08-15 11:28:22 -03:00
|
|
|
|
2010-09-01 03:58:25 -03:00
|
|
|
Locate the insertion point for *x* in *a* to maintain sorted order.
|
2009-04-05 19:20:44 -03:00
|
|
|
The parameters *lo* and *hi* may be used to specify a subset of the list
|
|
|
|
which should be considered; by default the entire list is used. If *x* is
|
|
|
|
already present in *a*, the insertion point will be before (to the left of)
|
|
|
|
any existing entries. The return value is suitable for use as the first
|
2010-09-01 03:58:25 -03:00
|
|
|
parameter to ``list.insert()`` assuming that *a* is already sorted.
|
2007-08-15 11:28:22 -03:00
|
|
|
|
2022-08-09 03:31:50 -03:00
|
|
|
The returned insertion point *ip* partitions the array *a* into two
|
|
|
|
slices such that ``all(elem < x for elem in a[lo : ip])`` is true for the
|
|
|
|
left slice and ``all(elem >= x for elem in a[ip : hi])`` is true for the
|
|
|
|
right slice.
|
2007-08-15 11:28:22 -03:00
|
|
|
|
2020-10-20 02:04:01 -03:00
|
|
|
*key* specifies a :term:`key function` of one argument that is used to
|
2022-05-10 19:18:58 -03:00
|
|
|
extract a comparison key from each element in the array. To support
|
|
|
|
searching complex records, the key function is not applied to the *x* value.
|
|
|
|
|
2022-08-09 03:31:50 -03:00
|
|
|
If *key* is ``None``, the elements are compared directly and
|
|
|
|
no key function is called.
|
2020-10-20 02:04:01 -03:00
|
|
|
|
|
|
|
.. versionchanged:: 3.10
|
|
|
|
Added the *key* parameter.
|
|
|
|
|
|
|
|
|
|
|
|
.. function:: bisect_right(a, x, lo=0, hi=len(a), *, key=None)
|
2022-02-14 12:16:49 -04:00
|
|
|
bisect(a, x, lo=0, hi=len(a), *, key=None)
|
2007-08-15 11:28:22 -03:00
|
|
|
|
2009-04-05 19:20:44 -03:00
|
|
|
Similar to :func:`bisect_left`, but returns an insertion point which comes
|
|
|
|
after (to the right of) any existing entries of *x* in *a*.
|
2007-08-15 11:28:22 -03:00
|
|
|
|
2022-08-09 03:31:50 -03:00
|
|
|
The returned insertion point *ip* partitions the array *a* into two slices
|
|
|
|
such that ``all(elem <= x for elem in a[lo : ip])`` is true for the left slice and
|
|
|
|
``all(elem > x for elem in a[ip : hi])`` is true for the right slice.
|
2020-10-20 02:04:01 -03:00
|
|
|
|
|
|
|
.. versionchanged:: 3.10
|
|
|
|
Added the *key* parameter.
|
|
|
|
|
2007-08-15 11:28:22 -03:00
|
|
|
|
2020-10-20 02:04:01 -03:00
|
|
|
.. function:: insort_left(a, x, lo=0, hi=len(a), *, key=None)
|
2007-08-15 11:28:22 -03:00
|
|
|
|
2020-10-20 02:04:01 -03:00
|
|
|
Insert *x* in *a* in sorted order.
|
2007-08-15 11:28:22 -03:00
|
|
|
|
2020-10-20 02:04:01 -03:00
|
|
|
This function first runs :func:`bisect_left` to locate an insertion point.
|
|
|
|
Next, it runs the :meth:`insert` method on *a* to insert *x* at the
|
|
|
|
appropriate position to maintain sort order.
|
|
|
|
|
2022-05-10 19:18:58 -03:00
|
|
|
To support inserting records in a table, the *key* function (if any) is
|
|
|
|
applied to *x* for the search step but not for the insertion step.
|
|
|
|
|
2020-10-20 02:04:01 -03:00
|
|
|
Keep in mind that the ``O(log n)`` search is dominated by the slow O(n)
|
|
|
|
insertion step.
|
|
|
|
|
|
|
|
.. versionchanged:: 3.10
|
|
|
|
Added the *key* parameter.
|
|
|
|
|
|
|
|
|
|
|
|
.. function:: insort_right(a, x, lo=0, hi=len(a), *, key=None)
|
2022-02-14 12:16:49 -04:00
|
|
|
insort(a, x, lo=0, hi=len(a), *, key=None)
|
2007-08-15 11:28:22 -03:00
|
|
|
|
2009-04-05 19:20:44 -03:00
|
|
|
Similar to :func:`insort_left`, but inserting *x* in *a* after any existing
|
|
|
|
entries of *x*.
|
2007-08-15 11:28:22 -03:00
|
|
|
|
2020-10-20 02:04:01 -03:00
|
|
|
This function first runs :func:`bisect_right` to locate an insertion point.
|
|
|
|
Next, it runs the :meth:`insert` method on *a* to insert *x* at the
|
|
|
|
appropriate position to maintain sort order.
|
|
|
|
|
2022-05-10 19:18:58 -03:00
|
|
|
To support inserting records in a table, the *key* function (if any) is
|
|
|
|
applied to *x* for the search step but not for the insertion step.
|
|
|
|
|
2020-10-20 02:04:01 -03:00
|
|
|
Keep in mind that the ``O(log n)`` search is dominated by the slow O(n)
|
|
|
|
insertion step.
|
|
|
|
|
|
|
|
.. versionchanged:: 3.10
|
|
|
|
Added the *key* parameter.
|
|
|
|
|
|
|
|
|
|
|
|
Performance Notes
|
|
|
|
-----------------
|
|
|
|
|
|
|
|
When writing time sensitive code using *bisect()* and *insort()*, keep these
|
|
|
|
thoughts in mind:
|
|
|
|
|
|
|
|
* Bisection is effective for searching ranges of values.
|
|
|
|
For locating specific values, dictionaries are more performant.
|
|
|
|
|
|
|
|
* The *insort()* functions are ``O(n)`` because the logarithmic search step
|
|
|
|
is dominated by the linear time insertion step.
|
|
|
|
|
|
|
|
* The search functions are stateless and discard key function results after
|
|
|
|
they are used. Consequently, if the search functions are used in a loop,
|
|
|
|
the key function may be called again and again on the same array elements.
|
|
|
|
If the key function isn't fast, consider wrapping it with
|
|
|
|
:func:`functools.cache` to avoid duplicate computations. Alternatively,
|
|
|
|
consider searching an array of precomputed keys to locate the insertion
|
|
|
|
point (as shown in the examples section below).
|
|
|
|
|
2010-09-01 03:58:25 -03:00
|
|
|
.. seealso::
|
|
|
|
|
2020-10-20 02:04:01 -03:00
|
|
|
* `Sorted Collections
|
|
|
|
<http://www.grantjenks.com/docs/sortedcollections/>`_ is a high performance
|
|
|
|
module that uses *bisect* to managed sorted collections of data.
|
|
|
|
|
|
|
|
* The `SortedCollection recipe
|
|
|
|
<https://code.activestate.com/recipes/577197-sortedcollection/>`_ uses
|
|
|
|
bisect to build a full-featured collection class with straight-forward search
|
|
|
|
methods and support for a key-function. The keys are precomputed to save
|
|
|
|
unnecessary calls to the key function during searches.
|
2010-09-01 03:58:25 -03:00
|
|
|
|
2010-08-07 04:36:55 -03:00
|
|
|
|
|
|
|
Searching Sorted Lists
|
|
|
|
----------------------
|
|
|
|
|
2010-09-01 03:58:25 -03:00
|
|
|
The above :func:`bisect` functions are useful for finding insertion points but
|
|
|
|
can be tricky or awkward to use for common searching tasks. The following five
|
2010-08-07 04:36:55 -03:00
|
|
|
functions show how to transform them into the standard lookups for sorted
|
|
|
|
lists::
|
|
|
|
|
2010-09-01 03:58:25 -03:00
|
|
|
def index(a, x):
|
|
|
|
'Locate the leftmost value exactly equal to x'
|
|
|
|
i = bisect_left(a, x)
|
|
|
|
if i != len(a) and a[i] == x:
|
|
|
|
return i
|
|
|
|
raise ValueError
|
|
|
|
|
|
|
|
def find_lt(a, x):
|
|
|
|
'Find rightmost value less than x'
|
|
|
|
i = bisect_left(a, x)
|
|
|
|
if i:
|
|
|
|
return a[i-1]
|
|
|
|
raise ValueError
|
|
|
|
|
|
|
|
def find_le(a, x):
|
|
|
|
'Find rightmost value less than or equal to x'
|
|
|
|
i = bisect_right(a, x)
|
|
|
|
if i:
|
|
|
|
return a[i-1]
|
|
|
|
raise ValueError
|
|
|
|
|
|
|
|
def find_gt(a, x):
|
|
|
|
'Find leftmost value greater than x'
|
|
|
|
i = bisect_right(a, x)
|
|
|
|
if i != len(a):
|
2010-08-07 04:36:55 -03:00
|
|
|
return a[i]
|
2010-09-01 03:58:25 -03:00
|
|
|
raise ValueError
|
2010-08-07 04:36:55 -03:00
|
|
|
|
2010-09-01 03:58:25 -03:00
|
|
|
def find_ge(a, x):
|
|
|
|
'Find leftmost item greater than or equal to x'
|
|
|
|
i = bisect_left(a, x)
|
|
|
|
if i != len(a):
|
2010-08-07 04:36:55 -03:00
|
|
|
return a[i]
|
2010-09-01 03:58:25 -03:00
|
|
|
raise ValueError
|
2010-08-07 04:36:55 -03:00
|
|
|
|
|
|
|
|
2020-10-20 02:04:01 -03:00
|
|
|
Examples
|
|
|
|
--------
|
2007-08-15 11:28:22 -03:00
|
|
|
|
|
|
|
.. _bisect-example:
|
|
|
|
|
2010-09-01 03:58:25 -03:00
|
|
|
The :func:`bisect` function can be useful for numeric table lookups. This
|
|
|
|
example uses :func:`bisect` to look up a letter grade for an exam score (say)
|
|
|
|
based on a set of ordered numeric breakpoints: 90 and up is an 'A', 80 to 89 is
|
|
|
|
a 'B', and so on::
|
2007-08-15 11:28:22 -03:00
|
|
|
|
2010-09-01 03:58:25 -03:00
|
|
|
>>> def grade(score, breakpoints=[60, 70, 80, 90], grades='FDCBA'):
|
|
|
|
... i = bisect(breakpoints, score)
|
|
|
|
... return grades[i]
|
2007-08-15 11:28:22 -03:00
|
|
|
...
|
2010-09-01 03:58:25 -03:00
|
|
|
>>> [grade(score) for score in [33, 99, 77, 70, 89, 90, 100]]
|
|
|
|
['F', 'A', 'C', 'C', 'B', 'A', 'A']
|
2007-08-15 11:28:22 -03:00
|
|
|
|
2022-05-11 19:56:58 -03:00
|
|
|
The :func:`bisect` and :func:`insort` functions also work with lists of
|
2022-05-10 19:18:58 -03:00
|
|
|
tuples. The *key* argument can serve to extract the field used for ordering
|
|
|
|
records in a table::
|
|
|
|
|
|
|
|
>>> from collections import namedtuple
|
|
|
|
>>> from operator import attrgetter
|
|
|
|
>>> from bisect import bisect, insort
|
|
|
|
>>> from pprint import pprint
|
|
|
|
|
|
|
|
>>> Movie = namedtuple('Movie', ('name', 'released', 'director'))
|
|
|
|
|
|
|
|
>>> movies = [
|
|
|
|
... Movie('Jaws', 1975, 'Speilberg'),
|
|
|
|
... Movie('Titanic', 1997, 'Cameron'),
|
|
|
|
... Movie('The Birds', 1963, 'Hitchcock'),
|
|
|
|
... Movie('Aliens', 1986, 'Scott')
|
|
|
|
... ]
|
|
|
|
|
2022-08-24 12:47:13 -03:00
|
|
|
>>> # Find the first movie released after 1960
|
2022-05-10 19:18:58 -03:00
|
|
|
>>> by_year = attrgetter('released')
|
|
|
|
>>> movies.sort(key=by_year)
|
|
|
|
>>> movies[bisect(movies, 1960, key=by_year)]
|
|
|
|
Movie(name='The Birds', released=1963, director='Hitchcock')
|
|
|
|
|
|
|
|
>>> # Insert a movie while maintaining sort order
|
|
|
|
>>> romance = Movie('Love Story', 1970, 'Hiller')
|
|
|
|
>>> insort(movies, romance, key=by_year)
|
|
|
|
>>> pprint(movies)
|
|
|
|
[Movie(name='The Birds', released=1963, director='Hitchcock'),
|
|
|
|
Movie(name='Love Story', released=1970, director='Hiller'),
|
|
|
|
Movie(name='Jaws', released=1975, director='Speilberg'),
|
|
|
|
Movie(name='Aliens', released=1986, director='Scott'),
|
|
|
|
Movie(name='Titanic', released=1997, director='Cameron')]
|
|
|
|
|
|
|
|
If the key function is expensive, it is possible to avoid repeated function
|
|
|
|
calls by searching a list of precomputed keys to find the index of a record::
|
2009-06-11 19:01:24 -03:00
|
|
|
|
|
|
|
>>> data = [('red', 5), ('blue', 1), ('yellow', 8), ('black', 0)]
|
2020-10-20 02:04:01 -03:00
|
|
|
>>> data.sort(key=lambda r: r[1]) # Or use operator.itemgetter(1).
|
|
|
|
>>> keys = [r[1] for r in data] # Precompute a list of keys.
|
2009-06-11 19:01:24 -03:00
|
|
|
>>> data[bisect_left(keys, 0)]
|
|
|
|
('black', 0)
|
|
|
|
>>> data[bisect_left(keys, 1)]
|
|
|
|
('blue', 1)
|
|
|
|
>>> data[bisect_left(keys, 5)]
|
|
|
|
('red', 5)
|
|
|
|
>>> data[bisect_left(keys, 8)]
|
|
|
|
('yellow', 8)
|