SF patch 658251: Install a C implementation of the Mersenne Twister as the

core generator for random.py.
This commit is contained in:
Raymond Hettinger 2002-12-29 23:03:38 +00:00
parent 5e65ce671c
commit 40f6217092
6 changed files with 981 additions and 320 deletions

View File

@ -8,30 +8,26 @@
This module implements pseudo-random number generators for various
distributions.
For integers, uniform selection from a range.
For sequences, uniform selection of a random element, and a function to
generate a random permutation of a list in-place.
For sequences, uniform selection of a random element, a function to
generate a random permutation of a list in-place, and a function for
random sampling without replacement.
On the real line, there are functions to compute uniform, normal (Gaussian),
lognormal, negative exponential, gamma, and beta distributions.
For generating distribution of angles, the circular uniform and
von Mises distributions are available.
For generating distributions of angles, the von Mises distribution
is available.
Almost all module functions depend on the basic function
\function{random()}, which generates a random float uniformly in
the semi-open range [0.0, 1.0). Python uses the standard Wichmann-Hill
generator, combining three pure multiplicative congruential
generators of modulus 30269, 30307 and 30323. Its period (how many
numbers it generates before repeating the sequence exactly) is
6,953,607,871,644. While of much higher quality than the \function{rand()}
function supplied by most C libraries, the theoretical properties
are much the same as for a single linear congruential generator of
large modulus. It is not suitable for all purposes, and is completely
unsuitable for cryptographic purposes.
The functions in this module are not threadsafe: if you want to call these
functions from multiple threads, you should explicitly serialize the calls.
Else, because no critical sections are implemented internally, calls
from different threads may see the same return values.
the semi-open range [0.0, 1.0). Python uses the Mersenne Twister as
the core generator. It produces 53-bit precision floats and has a
period of 2**19937-1. The underlying implementation in C
is both fast and threadsafe. The Mersenne Twister is one of the most
extensively tested random number generators in existence. However, being
completely deterministic, it is not suitable for all purposes, and is
completely unsuitable for cryptographic purposes.
The functions supplied by this module are actually bound methods of a
hidden instance of the \class{random.Random} class. You can
@ -39,58 +35,19 @@ instantiate your own instances of \class{Random} to get generators
that don't share state. This is especially useful for multi-threaded
programs, creating a different instance of \class{Random} for each
thread, and using the \method{jumpahead()} method to ensure that the
generated sequences seen by each thread don't overlap (see example
below).
generated sequences seen by each thread don't overlap.
Class \class{Random} can also be subclassed if you want to use a
different basic generator of your own devising: in that case, override
the \method{random()}, \method{seed()}, \method{getstate()},
\method{setstate()} and \method{jumpahead()} methods.
Here's one way to create threadsafe distinct and non-overlapping generators:
\begin{verbatim}
def create_generators(num, delta, firstseed=None):
"""Return list of num distinct generators.
Each generator has its own unique segment of delta elements
from Random.random()'s full period.
Seed the first generator with optional arg firstseed (default
is None, to seed from current time).
"""
from random import Random
g = Random(firstseed)
result = [g]
for i in range(num - 1):
laststate = g.getstate()
g = Random()
g.setstate(laststate)
g.jumpahead(delta)
result.append(g)
return result
gens = create_generators(10, 1000000)
\end{verbatim}
That creates 10 distinct generators, which can be passed out to 10
distinct threads. The generators don't share state so can be called
safely in parallel. So long as no thread calls its \code{g.random()}
more than a million times (the second argument to
\function{create_generators()}, the sequences seen by each thread will
not overlap. The period of the underlying Wichmann-Hill generator
limits how far this technique can be pushed.
Just for fun, note that since we know the period, \method{jumpahead()}
can also be used to ``move backward in time:''
\begin{verbatim}
>>> g = Random(42) # arbitrary
>>> g.random()
0.25420336316883324
>>> g.jumpahead(6953607871644L - 1) # move *back* one
>>> g.random()
0.25420336316883324
\end{verbatim}
As an example of subclassing, the \module{random} module provides
the \class{WichmannHill} class which implements an alternative generator
in pure Python. The class provides a backward compatible way to
reproduce results from earlier versions of Python which used the
Wichmann-Hill algorithm as the core generator.
\versionchanged[Substituted MersenneTwister for Wichmann-Hill]{2.3}
Bookkeeping functions:
@ -104,18 +61,6 @@ Bookkeeping functions:
If \var{x} is not \code{None} or an int or long,
\code{hash(\var{x})} is used instead.
If \var{x} is an int or long, \var{x} is used directly.
Distinct values between 0 and 27814431486575L inclusive are guaranteed
to yield distinct internal states (this guarantee is specific to the
default Wichmann-Hill generator, and may not apply to subclasses
supplying their own basic generator).
\end{funcdesc}
\begin{funcdesc}{whseed}{\optional{x}}
This is obsolete, supplied for bit-level compatibility with versions
of Python prior to 2.1.
See \function{seed} for details. \function{whseed} does not guarantee
that distinct integer arguments yield distinct internal states, and can
yield no more than about 2**24 distinct internal states in all.
\end{funcdesc}
\begin{funcdesc}{getstate}{}
@ -134,17 +79,20 @@ Bookkeeping functions:
\end{funcdesc}
\begin{funcdesc}{jumpahead}{n}
Change the internal state to what it would be if \function{random()}
were called \var{n} times, but do so quickly. \var{n} is a
non-negative integer. This is most useful in multi-threaded
Change the internal state to one different from and likely far away from
the current state. \var{n} is a non-negative integer which is used to
scramble the current state vector. This is most useful in multi-threaded
programs, in conjuction with multiple instances of the \class{Random}
class: \method{setstate()} or \method{seed()} can be used to force
all instances into the same internal state, and then
\method{jumpahead()} can be used to force the instances' states as
far apart as you like (up to the period of the generator).
class: \method{setstate()} or \method{seed()} can be used to force all
instances into the same internal state, and then \method{jumpahead()}
can be used to force the instances' states far apart.
\versionadded{2.1}
\versionchanged[Instead of jumping to a specific state, \var{n} steps
ahead, \method{jumpahead(\var{n})} jumps to another state likely to be
separated by many steps.]{2.3}
\end{funcdesc}
Functions for integers:
\begin{funcdesc}{randrange}{\optional{start,} stop\optional{, step}}
@ -279,8 +227,32 @@ these equations can be found in any statistics text.
\var{beta} is the shape parameter.
\end{funcdesc}
Alternative Generator
\begin{classdesc}{WichmannHill}{\optional{seed}}
Class that implements the Wichmann-Hill algorithm as the core generator.
Has all of the same methods as \class{Random} plus the \method{whseed}
method described below. Because this class is implemented in pure
Python, it is not threadsafe and may require locks between calls. The
period of the generator is 6,953,607,871,644 which is small enough to
require care that two independent random sequences do not overlap.
\end{classdesc}
\begin{funcdesc}{whseed}{\optional{x}}
This is obsolete, supplied for bit-level compatibility with versions
of Python prior to 2.1.
See \function{seed} for details. \function{whseed} does not guarantee
that distinct integer arguments yield distinct internal states, and can
yield no more than about 2**24 distinct internal states in all.
\end{funcdesc}
\begin{seealso}
\seetext{M. Matsumoto and T. Nishimura, ``Mersenne Twister: A
623-dimensionally equidistributed uniform pseudorandom
number generator'',
\citetitle{ACM Transactions on Modeling and Computer Simulation}
Vol. 8, No. 1, January pp.3-30 1998.}
\seetext{Wichmann, B. A. \& Hill, I. D., ``Algorithm AS 183:
An efficient and portable pseudo-random number generator'',
\citetitle{Applied Statistics} 31 (1982) 188-190.}

View File

@ -18,61 +18,26 @@
negative exponential
gamma
beta
pareto
Weibull
distributions on the circle (angles 0 to 2pi)
---------------------------------------------
circular uniform
von Mises
Translated from anonymously contributed C/C++ source.
General notes on the underlying Mersenne Twister core generator:
Multi-threading note: the random number generator used here is not thread-
safe; it is possible that two calls return the same random value. However,
you can instantiate a different instance of Random() in each thread to get
generators that don't share state, then use .setstate() and .jumpahead() to
move the generators to disjoint segments of the full period. For example,
* The period is 2**19937-1.
* It is one of the most extensively tested generators in existence
* Without a direct way to compute N steps forward, the
semantics of jumpahead(n) are weakened to simply jump
to another distant state and rely on the large period
to avoid overlapping sequences.
* The random() method is implemented in C, executes in
a single Python step, and is, therefore, threadsafe.
def create_generators(num, delta, firstseed=None):
""\"Return list of num distinct generators.
Each generator has its own unique segment of delta elements from
Random.random()'s full period.
Seed the first generator with optional arg firstseed (default is
None, to seed from current time).
""\"
from random import Random
g = Random(firstseed)
result = [g]
for i in range(num - 1):
laststate = g.getstate()
g = Random()
g.setstate(laststate)
g.jumpahead(delta)
result.append(g)
return result
gens = create_generators(10, 1000000)
That creates 10 distinct generators, which can be passed out to 10 distinct
threads. The generators don't share state so can be called safely in
parallel. So long as no thread calls its g.random() more than a million
times (the second argument to create_generators), the sequences seen by
each thread will not overlap.
The period of the underlying Wichmann-Hill generator is 6,953,607,871,644,
and that limits how far this technique can be pushed.
Just for fun, note that since we know the period, .jumpahead() can also be
used to "move backward in time":
>>> g = Random(42) # arbitrary
>>> g.random()
0.25420336316883324
>>> g.jumpahead(6953607871644L - 1) # move *back* one
>>> g.random()
0.25420336316883324
"""
# XXX The docstring sucks.
from math import log as _log, exp as _exp, pi as _pi, e as _e
from math import sqrt as _sqrt, acos as _acos, cos as _cos, sin as _sin
@ -82,32 +47,20 @@ __all__ = ["Random","seed","random","uniform","randint","choice","sample",
"randrange","shuffle","normalvariate","lognormvariate",
"cunifvariate","expovariate","vonmisesvariate","gammavariate",
"stdgamma","gauss","betavariate","paretovariate","weibullvariate",
"getstate","setstate","jumpahead","whseed"]
def _verify(name, computed, expected):
if abs(computed - expected) > 1e-7:
raise ValueError(
"computed value for %s deviates too much "
"(computed %g, expected %g)" % (name, computed, expected))
"getstate","setstate","jumpahead"]
NV_MAGICCONST = 4 * _exp(-0.5)/_sqrt(2.0)
_verify('NV_MAGICCONST', NV_MAGICCONST, 1.71552776992141)
TWOPI = 2.0*_pi
_verify('TWOPI', TWOPI, 6.28318530718)
LOG4 = _log(4.0)
_verify('LOG4', LOG4, 1.38629436111989)
SG_MAGICCONST = 1.0 + _log(4.5)
_verify('SG_MAGICCONST', SG_MAGICCONST, 2.50407739677627)
del _verify
# Translated by Guido van Rossum from C source provided by
# Adrian Baddeley.
# Adrian Baddeley. Adapted by Raymond Hettinger for use with
# the Mersenne Twister core generator.
class Random:
from _random import Random as CoreGenerator
class Random(CoreGenerator):
"""Random number generator base class used by bound module functions.
Used to instantiate instances of Random to get generators that don't
@ -122,7 +75,7 @@ class Random:
"""
VERSION = 1 # used by getstate/setstate
VERSION = 2 # used by getstate/setstate
def __init__(self, x=None):
"""Initialize an instance.
@ -131,12 +84,7 @@ class Random:
"""
self.seed(x)
## -------------------- core generator -------------------
# Specific to Wichmann-Hill generator. Subclasses wishing to use a
# different core generator should override the seed(), random(),
# getstate(), setstate() and jumpahead() methods.
self.gauss_next = None
def seed(self, a=None):
"""Initialize internal state from hashable object.
@ -144,141 +92,26 @@ class Random:
None or no argument seeds from current time.
If a is not None or an int or long, hash(a) is used instead.
If a is an int or long, a is used directly. Distinct values between
0 and 27814431486575L inclusive are guaranteed to yield distinct
internal states (this guarantee is specific to the default
Wichmann-Hill generator).
"""
if a is None:
# Initialize from current time
import time
a = long(time.time() * 256)
if type(a) not in (type(3), type(3L)):
a = hash(a)
a, x = divmod(a, 30268)
a, y = divmod(a, 30306)
a, z = divmod(a, 30322)
self._seed = int(x)+1, int(y)+1, int(z)+1
CoreGenerator.seed(self, a)
self.gauss_next = None
def random(self):
"""Get the next random number in the range [0.0, 1.0)."""
# Wichman-Hill random number generator.
#
# Wichmann, B. A. & Hill, I. D. (1982)
# Algorithm AS 183:
# An efficient and portable pseudo-random number generator
# Applied Statistics 31 (1982) 188-190
#
# see also:
# Correction to Algorithm AS 183
# Applied Statistics 33 (1984) 123
#
# McLeod, A. I. (1985)
# A remark on Algorithm AS 183
# Applied Statistics 34 (1985),198-200
# This part is thread-unsafe:
# BEGIN CRITICAL SECTION
x, y, z = self._seed
x = (171 * x) % 30269
y = (172 * y) % 30307
z = (170 * z) % 30323
self._seed = x, y, z
# END CRITICAL SECTION
# Note: on a platform using IEEE-754 double arithmetic, this can
# never return 0.0 (asserted by Tim; proof too long for a comment).
return (x/30269.0 + y/30307.0 + z/30323.0) % 1.0
def getstate(self):
"""Return internal state; can be passed to setstate() later."""
return self.VERSION, self._seed, self.gauss_next
return self.VERSION, CoreGenerator.getstate(self), self.gauss_next
def setstate(self, state):
"""Restore internal state from object returned by getstate()."""
version = state[0]
if version == 1:
version, self._seed, self.gauss_next = state
if version == 2:
version, internalstate, self.gauss_next = state
CoreGenerator.setstate(self, internalstate)
else:
raise ValueError("state with version %s passed to "
"Random.setstate() of version %s" %
(version, self.VERSION))
def jumpahead(self, n):
"""Act as if n calls to random() were made, but quickly.
n is an int, greater than or equal to 0.
Example use: If you have 2 threads and know that each will
consume no more than a million random numbers, create two Random
objects r1 and r2, then do
r2.setstate(r1.getstate())
r2.jumpahead(1000000)
Then r1 and r2 will use guaranteed-disjoint segments of the full
period.
"""
if not n >= 0:
raise ValueError("n must be >= 0")
x, y, z = self._seed
x = int(x * pow(171, n, 30269)) % 30269
y = int(y * pow(172, n, 30307)) % 30307
z = int(z * pow(170, n, 30323)) % 30323
self._seed = x, y, z
def __whseed(self, x=0, y=0, z=0):
"""Set the Wichmann-Hill seed from (x, y, z).
These must be integers in the range [0, 256).
"""
if not type(x) == type(y) == type(z) == int:
raise TypeError('seeds must be integers')
if not (0 <= x < 256 and 0 <= y < 256 and 0 <= z < 256):
raise ValueError('seeds must be in range(0, 256)')
if 0 == x == y == z:
# Initialize from current time
import time
t = long(time.time() * 256)
t = int((t&0xffffff) ^ (t>>24))
t, x = divmod(t, 256)
t, y = divmod(t, 256)
t, z = divmod(t, 256)
# Zero is a poor seed, so substitute 1
self._seed = (x or 1, y or 1, z or 1)
self.gauss_next = None
def whseed(self, a=None):
"""Seed from hashable object's hash code.
None or no argument seeds from current time. It is not guaranteed
that objects with distinct hash codes lead to distinct internal
states.
This is obsolete, provided for compatibility with the seed routine
used prior to Python 2.1. Use the .seed() method instead.
"""
if a is None:
self.__whseed()
return
a = hash(a)
a, x = divmod(a, 256)
a, y = divmod(a, 256)
a, z = divmod(a, 256)
x = (x + a) % 256 or 1
y = (y + a) % 256 or 1
z = (z + a) % 256 or 1
self.__whseed(x, y, z)
## ---- Methods below this point do not need to be overridden when
## ---- subclassing for the purpose of using a different core generator.
@ -744,6 +577,153 @@ class Random:
u = self.random()
return alpha * pow(-_log(u), 1.0/beta)
## -------------------- Wichmann-Hill -------------------
class WichmannHill(Random):
VERSION = 1 # used by getstate/setstate
def seed(self, a=None):
"""Initialize internal state from hashable object.
None or no argument seeds from current time.
If a is not None or an int or long, hash(a) is used instead.
If a is an int or long, a is used directly. Distinct values between
0 and 27814431486575L inclusive are guaranteed to yield distinct
internal states (this guarantee is specific to the default
Wichmann-Hill generator).
"""
if a is None:
# Initialize from current time
import time
a = long(time.time() * 256)
if not isinstance(a, (int, long)):
a = hash(a)
a, x = divmod(a, 30268)
a, y = divmod(a, 30306)
a, z = divmod(a, 30322)
self._seed = int(x)+1, int(y)+1, int(z)+1
self.gauss_next = None
def random(self):
"""Get the next random number in the range [0.0, 1.0)."""
# Wichman-Hill random number generator.
#
# Wichmann, B. A. & Hill, I. D. (1982)
# Algorithm AS 183:
# An efficient and portable pseudo-random number generator
# Applied Statistics 31 (1982) 188-190
#
# see also:
# Correction to Algorithm AS 183
# Applied Statistics 33 (1984) 123
#
# McLeod, A. I. (1985)
# A remark on Algorithm AS 183
# Applied Statistics 34 (1985),198-200
# This part is thread-unsafe:
# BEGIN CRITICAL SECTION
x, y, z = self._seed
x = (171 * x) % 30269
y = (172 * y) % 30307
z = (170 * z) % 30323
self._seed = x, y, z
# END CRITICAL SECTION
# Note: on a platform using IEEE-754 double arithmetic, this can
# never return 0.0 (asserted by Tim; proof too long for a comment).
return (x/30269.0 + y/30307.0 + z/30323.0) % 1.0
def getstate(self):
"""Return internal state; can be passed to setstate() later."""
return self.VERSION, self._seed, self.gauss_next
def setstate(self, state):
"""Restore internal state from object returned by getstate()."""
version = state[0]
if version == 1:
version, self._seed, self.gauss_next = state
else:
raise ValueError("state with version %s passed to "
"Random.setstate() of version %s" %
(version, self.VERSION))
def jumpahead(self, n):
"""Act as if n calls to random() were made, but quickly.
n is an int, greater than or equal to 0.
Example use: If you have 2 threads and know that each will
consume no more than a million random numbers, create two Random
objects r1 and r2, then do
r2.setstate(r1.getstate())
r2.jumpahead(1000000)
Then r1 and r2 will use guaranteed-disjoint segments of the full
period.
"""
if not n >= 0:
raise ValueError("n must be >= 0")
x, y, z = self._seed
x = int(x * pow(171, n, 30269)) % 30269
y = int(y * pow(172, n, 30307)) % 30307
z = int(z * pow(170, n, 30323)) % 30323
self._seed = x, y, z
def __whseed(self, x=0, y=0, z=0):
"""Set the Wichmann-Hill seed from (x, y, z).
These must be integers in the range [0, 256).
"""
if not type(x) == type(y) == type(z) == int:
raise TypeError('seeds must be integers')
if not (0 <= x < 256 and 0 <= y < 256 and 0 <= z < 256):
raise ValueError('seeds must be in range(0, 256)')
if 0 == x == y == z:
# Initialize from current time
import time
t = long(time.time() * 256)
t = int((t&0xffffff) ^ (t>>24))
t, x = divmod(t, 256)
t, y = divmod(t, 256)
t, z = divmod(t, 256)
# Zero is a poor seed, so substitute 1
self._seed = (x or 1, y or 1, z or 1)
self.gauss_next = None
def whseed(self, a=None):
"""Seed from hashable object's hash code.
None or no argument seeds from current time. It is not guaranteed
that objects with distinct hash codes lead to distinct internal
states.
This is obsolete, provided for compatibility with the seed routine
used prior to Python 2.1. Use the .seed() method instead.
"""
if a is None:
self.__whseed()
return
a = hash(a)
a, x = divmod(a, 256)
a, y = divmod(a, 256)
a, z = divmod(a, 256)
x = (x + a) % 256 or 1
y = (y + a) % 256 or 1
z = (z + a) % 256 or 1
self.__whseed(x, y, z)
## -------------------- test program --------------------
def _test_generator(n, funccall):
@ -768,25 +748,11 @@ def _test_generator(n, funccall):
print 'avg %g, stddev %g, min %g, max %g' % \
(avg, stddev, smallest, largest)
def _test_sample(n):
# For the entire allowable range of 0 <= k <= n, validate that
# the sample is of the correct length and contains only unique items
population = xrange(n)
for k in xrange(n+1):
s = sample(population, k)
uniq = dict.fromkeys(s)
assert len(uniq) == len(s) == k
assert None not in uniq
def _sample_generator(n, k):
# Return a fixed element from the sample. Validates random ordering.
return sample(xrange(n), k)[k//2]
def _test(N=2000):
print 'TWOPI =', TWOPI
print 'LOG4 =', LOG4
print 'NV_MAGICCONST =', NV_MAGICCONST
print 'SG_MAGICCONST =', SG_MAGICCONST
_test_generator(N, 'random()')
_test_generator(N, 'normalvariate(0.0, 1.0)')
_test_generator(N, 'lognormvariate(0.0, 1.0)')
@ -808,25 +774,13 @@ def _test(N=2000):
_test_generator(N, 'weibullvariate(1.0, 1.0)')
_test_generator(N, '_sample_generator(50, 5)') # expected s.d.: 14.4
_test_generator(N, '_sample_generator(50, 45)') # expected s.d.: 14.4
_test_sample(500)
# Test jumpahead.
s = getstate()
jumpahead(N)
r1 = random()
# now do it the slow way
setstate(s)
for i in range(N):
random()
r2 = random()
if r1 != r2:
raise ValueError("jumpahead test failed " + `(N, r1, r2)`)
# Create one instance, seeded from current time, and export its methods
# as module-level functions. The functions are not threadsafe, and state
# is shared across all uses (both in the user's code and in the Python
# libraries), but that's fine for most programs and is easier for the
# casual user than making them instantiate their own Random() instance.
# as module-level functions. The functions share state across all uses
#(both in the user's code and in the Python libraries), but that's fine
# for most programs and is easier for the casual user than making them
# instantiate their own Random() instance.
_inst = Random()
seed = _inst.seed
random = _inst.random
@ -850,7 +804,6 @@ weibullvariate = _inst.weibullvariate
getstate = _inst.getstate
setstate = _inst.setstate
jumpahead = _inst.jumpahead
whseed = _inst.whseed
if __name__ == '__main__':
_test()

View File

@ -1,19 +1,206 @@
from test import test_support
#!/usr/bin/env python
import unittest
import random
import time
from test import test_support
# Ensure that the seed() method initializes all the hidden state. In
# particular, through 2.2.1 it failed to reset a piece of state used by
# (and only by) the .gauss() method.
class TestBasicOps(unittest.TestCase):
# Superclass with tests common to all generators.
# Subclasses must arrange for self.gen to retrieve the Random instance
# to be tested.
for seed in 1, 12, 123, 1234, 12345, 123456, 654321:
for seeder in random.seed, random.whseed:
seeder(seed)
x1 = random.random()
y1 = random.gauss(0, 1)
def randomlist(self, n):
"""Helper function to make a list of random numbers"""
return [self.gen.random() for i in xrange(n)]
seeder(seed)
x2 = random.random()
y2 = random.gauss(0, 1)
def test_autoseed(self):
self.gen.seed()
state1 = self.gen.getstate()
time.sleep(1)
self.gen.seed() # diffent seeds at different times
state2 = self.gen.getstate()
self.assertNotEqual(state1, state2)
test_support.vereq(x1, x2)
test_support.vereq(y1, y2)
def test_saverestore(self):
N = 1000
self.gen.seed()
state = self.gen.getstate()
randseq = self.randomlist(N)
self.gen.setstate(state) # should regenerate the same sequence
self.assertEqual(randseq, self.randomlist(N))
def test_seedargs(self):
for arg in [None, 0, 0L, 1, 1L, -1, -1L, 10**20, -(10**20),
3.14, 1+2j, 'a', tuple('abc')]:
self.gen.seed(arg)
for arg in [range(3), dict(one=1)]:
self.assertRaises(TypeError, self.gen.seed, arg)
def test_jumpahead(self):
self.gen.seed()
state1 = self.gen.getstate()
self.gen.jumpahead(100)
state2 = self.gen.getstate() # s/b distinct from state1
self.assertNotEqual(state1, state2)
self.gen.jumpahead(100)
state3 = self.gen.getstate() # s/b distinct from state2
self.assertNotEqual(state2, state3)
self.assertRaises(TypeError, self.gen.jumpahead) # needs an arg
self.assertRaises(TypeError, self.gen.jumpahead, "ick") # wrong type
self.assertRaises(TypeError, self.gen.jumpahead, 2.3) # wrong type
self.assertRaises(TypeError, self.gen.jumpahead, 2, 3) # too many
def test_sample(self):
# For the entire allowable range of 0 <= k <= N, validate that
# the sample is of the correct length and contains only unique items
N = 100
population = xrange(N)
for k in xrange(N+1):
s = self.gen.sample(population, k)
self.assertEqual(len(s), k)
uniq = dict.fromkeys(s)
self.assertEqual(len(uniq), k)
self.failIf(None in uniq)
def test_gauss(self):
# Ensure that the seed() method initializes all the hidden state. In
# particular, through 2.2.1 it failed to reset a piece of state used
# by (and only by) the .gauss() method.
for seed in 1, 12, 123, 1234, 12345, 123456, 654321:
self.gen.seed(seed)
x1 = self.gen.random()
y1 = self.gen.gauss(0, 1)
self.gen.seed(seed)
x2 = self.gen.random()
y2 = self.gen.gauss(0, 1)
self.assertEqual(x1, x2)
self.assertEqual(y1, y2)
class WichmannHill_TestBasicOps(TestBasicOps):
gen = random.WichmannHill()
def test_strong_jumpahead(self):
# tests that jumpahead(n) semantics correspond to n calls to random()
N = 1000
s = self.gen.getstate()
self.gen.jumpahead(N)
r1 = self.gen.random()
# now do it the slow way
self.gen.setstate(s)
for i in xrange(N):
self.gen.random()
r2 = self.gen.random()
self.assertEqual(r1, r2)
def test_gauss_with_whseed(self):
# Ensure that the seed() method initializes all the hidden state. In
# particular, through 2.2.1 it failed to reset a piece of state used
# by (and only by) the .gauss() method.
for seed in 1, 12, 123, 1234, 12345, 123456, 654321:
self.gen.whseed(seed)
x1 = self.gen.random()
y1 = self.gen.gauss(0, 1)
self.gen.whseed(seed)
x2 = self.gen.random()
y2 = self.gen.gauss(0, 1)
self.assertEqual(x1, x2)
self.assertEqual(y1, y2)
class MersenneTwister_TestBasicOps(TestBasicOps):
gen = random.Random()
def test_referenceImplementation(self):
# Compare the python implementation with results from the original
# code. Create 2000 53-bit precision random floats. Compare only
# the last ten entries to show that the independent implementations
# are tracking. Here is the main() function needed to create the
# list of expected random numbers:
# void main(void){
# int i;
# unsigned long init[4]={61731, 24903, 614, 42143}, length=4;
# init_by_array(init, length);
# for (i=0; i<2000; i++) {
# printf("%.15f ", genrand_res53());
# if (i%5==4) printf("\n");
# }
# }
expected = [0.45839803073713259,
0.86057815201978782,
0.92848331726782152,
0.35932681119782461,
0.081823493762449573,
0.14332226470169329,
0.084297823823520024,
0.53814864671831453,
0.089215024911993401,
0.78486196105372907]
self.gen.seed(61731L + (24903L<<32) + (614L<<64) + (42143L<<96))
actual = self.randomlist(2000)[-10:]
for a, e in zip(actual, expected):
self.assertAlmostEqual(a,e,places=14)
def test_strong_reference_implementation(self):
# Like test_referenceImplementation, but checks for exact bit-level
# equality. This should pass on any box where C double contains
# at least 53 bits of precision (the underlying algorithm suffers
# no rounding errors -- all results are exact).
from math import ldexp
expected = [0x0eab3258d2231fL,
0x1b89db315277a5L,
0x1db622a5518016L,
0x0b7f9af0d575bfL,
0x029e4c4db82240L,
0x04961892f5d673L,
0x02b291598e4589L,
0x11388382c15694L,
0x02dad977c9e1feL,
0x191d96d4d334c6L]
self.gen.seed(61731L + (24903L<<32) + (614L<<64) + (42143L<<96))
actual = self.randomlist(2000)[-10:]
for a, e in zip(actual, expected):
self.assertEqual(long(ldexp(a, 53)), e)
def test_long_seed(self):
# This is most interesting to run in debug mode, just to make sure
# nothing blows up. Under the covers, a dynamically resized array
# is allocated, consuming space proportional to the number of bits
# in the seed. Unfortunately, that's a quadratic-time algorithm,
# so don't make this horribly big.
seed = (1L << (10000 * 8)) - 1 # about 10K bytes
self.gen.seed(seed)
class TestModule(unittest.TestCase):
def testMagicConstants(self):
self.assertAlmostEqual(random.NV_MAGICCONST, 1.71552776992141)
self.assertAlmostEqual(random.TWOPI, 6.28318530718)
self.assertAlmostEqual(random.LOG4, 1.38629436111989)
self.assertAlmostEqual(random.SG_MAGICCONST, 2.50407739677627)
def test__all__(self):
# tests validity but not completeness of the __all__ list
defined = dict.fromkeys(dir(random))
for entry in random.__all__:
self.failUnless(entry in defined)
def test_main():
suite = unittest.TestSuite()
for testclass in (WichmannHill_TestBasicOps,
MersenneTwister_TestBasicOps,
TestModule):
suite.addTest(unittest.makeSuite(testclass))
test_support.run_suite(suite)
if __name__ == "__main__":
test_main()

View File

@ -545,6 +545,25 @@ Library
and 'stop' arguments so long as each is in the range of Python's
bounded integers.
- Thanks to Raymond Hettinger, random.random() now uses a new core
generator. The Mersenne Twister algorithm is implemented in C,
threadsafe, faster than the previous generator, has an astronomically
large period (2**19937-1), creates random floats to full 53-bit
precision, and may be the most widely tested random number generator
in existence.
The random.jumpahead(n) method has different semantics for the new
generator. Instead of jumping n steps ahead, it uses n and the
existing state to create a new state. This means that jumpahead()
continues to support multi-threaded code needing generators of
non-overlapping sequences. However, it will break code which relies
on jumpahead moving a specific number of steps forward.
The attributes random.whseed and random.__whseed have no meaning for
the new generator. Code using these attributes should switch to a
new class, random.WichmannHill which is provided for backward
compatibility and to make an alternate generator available.
- New "algorithms" module: heapq, implements a heap queue. Thanks to
Kevin O'Connor for the code and François Pinard for an entertaining
write-up explaining the theory and practical uses of heaps.

528
Modules/_randommodule.c Normal file
View File

@ -0,0 +1,528 @@
/* Random objects */
/* ------------------------------------------------------------------
The code in this module was based on a download from:
http://www.math.keio.ac.jp/~matumoto/MT2002/emt19937ar.html
It was modified in 2002 by Raymond Hettinger as follows:
* the principal computational lines untouched except for tabbing.
* renamed genrand_res53() to random_random() and wrapped
in python calling/return code.
* genrand_int32() and the helper functions, init_genrand()
and init_by_array(), were declared static, wrapped in
Python calling/return code. also, their global data
references were replaced with structure references.
* unused functions from the original were deleted.
new, original C python code was added to implement the
Random() interface.
The following are the verbatim comments from the original code:
A C-program for MT19937, with initialization improved 2002/1/26.
Coded by Takuji Nishimura and Makoto Matsumoto.
Before using, initialize the state by using init_genrand(seed)
or init_by_array(init_key, key_length).
Copyright (C) 1997 - 2002, Makoto Matsumoto and Takuji Nishimura,
All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions
are met:
1. Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
2. Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.
3. The names of its contributors may not be used to endorse or promote
products derived from this software without specific prior written
permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR
CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
Any feedback is very welcome.
http://www.math.keio.ac.jp/matumoto/emt.html
email: matumoto@math.keio.ac.jp
*/
/* ---------------------------------------------------------------*/
#include "Python.h"
#include <time.h> // for seeding to current time
/* Period parameters -- These are all magic. Don't change. */
#define N 624
#define M 397
#define MATRIX_A 0x9908b0dfUL /* constant vector a */
#define UPPER_MASK 0x80000000UL /* most significant w-r bits */
#define LOWER_MASK 0x7fffffffUL /* least significant r bits */
typedef struct {
PyObject_HEAD
unsigned long state[N];
int index;
} RandomObject;
static PyTypeObject Random_Type;
#define RandomObject_Check(v) ((v)->ob_type == &Random_Type)
/* Random methods */
/* generates a random number on [0,0xffffffff]-interval */
static unsigned long
genrand_int32(RandomObject *self)
{
unsigned long y;
static unsigned long mag01[2]={0x0UL, MATRIX_A};
/* mag01[x] = x * MATRIX_A for x=0,1 */
unsigned long *mt;
mt = self->state;
if (self->index >= N) { /* generate N words at one time */
int kk;
for (kk=0;kk<N-M;kk++) {
y = (mt[kk]&UPPER_MASK)|(mt[kk+1]&LOWER_MASK);
mt[kk] = mt[kk+M] ^ (y >> 1) ^ mag01[y & 0x1UL];
}
for (;kk<N-1;kk++) {
y = (mt[kk]&UPPER_MASK)|(mt[kk+1]&LOWER_MASK);
mt[kk] = mt[kk+(M-N)] ^ (y >> 1) ^ mag01[y & 0x1UL];
}
y = (mt[N-1]&UPPER_MASK)|(mt[0]&LOWER_MASK);
mt[N-1] = mt[M-1] ^ (y >> 1) ^ mag01[y & 0x1UL];
self->index = 0;
}
y = mt[self->index++];
y ^= (y >> 11);
y ^= (y << 7) & 0x9d2c5680UL;
y ^= (y << 15) & 0xefc60000UL;
y ^= (y >> 18);
return y;
}
/* random_random is the function named genrand_res53 in the original code;
* generates a random number on [0,1) with 53-bit resolution; note that
* 9007199254740992 == 2**53; I assume they're spelling "/2**53" as
* multiply-by-reciprocal in the (likely vain) hope that the compiler will
* optimize the division away at compile-time. 67108864 is 2**26. In
* effect, a contains 27 random bits shifted left 26, and b fills in the
* lower 26 bits of the 53-bit numerator.
* The orginal code credited Isaku Wada for this algorithm, 2002/01/09.
*/
static PyObject *
random_random(RandomObject *self)
{
unsigned long a=genrand_int32(self)>>5, b=genrand_int32(self)>>6;
return PyFloat_FromDouble((a*67108864.0+b)*(1.0/9007199254740992.0));
}
/* initializes mt[N] with a seed */
static void
init_genrand(RandomObject *self, unsigned long s)
{
int mti;
unsigned long *mt;
mt = self->state;
mt[0]= s & 0xffffffffUL;
for (mti=1; mti<N; mti++) {
mt[mti] =
(1812433253UL * (mt[mti-1] ^ (mt[mti-1] >> 30)) + mti);
/* See Knuth TAOCP Vol2. 3rd Ed. P.106 for multiplier. */
/* In the previous versions, MSBs of the seed affect */
/* only MSBs of the array mt[]. */
/* 2002/01/09 modified by Makoto Matsumoto */
mt[mti] &= 0xffffffffUL;
/* for >32 bit machines */
}
self->index = mti;
return;
}
/* initialize by an array with array-length */
/* init_key is the array for initializing keys */
/* key_length is its length */
static PyObject *
init_by_array(RandomObject *self, unsigned long init_key[], unsigned long key_length)
{
unsigned int i, j, k; /* was signed in the original code. RDH 12/16/2002 */
unsigned long *mt;
mt = self->state;
init_genrand(self, 19650218UL);
i=1; j=0;
k = (N>key_length ? N : key_length);
for (; k; k--) {
mt[i] = (mt[i] ^ ((mt[i-1] ^ (mt[i-1] >> 30)) * 1664525UL))
+ init_key[j] + j; /* non linear */
mt[i] &= 0xffffffffUL; /* for WORDSIZE > 32 machines */
i++; j++;
if (i>=N) { mt[0] = mt[N-1]; i=1; }
if (j>=key_length) j=0;
}
for (k=N-1; k; k--) {
mt[i] = (mt[i] ^ ((mt[i-1] ^ (mt[i-1] >> 30)) * 1566083941UL))
- i; /* non linear */
mt[i] &= 0xffffffffUL; /* for WORDSIZE > 32 machines */
i++;
if (i>=N) { mt[0] = mt[N-1]; i=1; }
}
mt[0] = 0x80000000UL; /* MSB is 1; assuring non-zero initial array */
Py_INCREF(Py_None);
return Py_None;
}
/*
* The rest is Python-specific code, neither part of, nor derived from, the
* Twister download.
*/
static PyObject *
random_seed(RandomObject *self, PyObject *args)
{
PyObject *result = NULL; /* guilty until proved innocent */
PyObject *masklower = NULL;
PyObject *thirtytwo = NULL;
PyObject *n = NULL;
unsigned long *key = NULL;
unsigned long keymax; /* # of allocated slots in key */
unsigned long keyused; /* # of used slots in key */
PyObject *arg = NULL;
if (!PyArg_UnpackTuple(args, "seed", 0, 1, &arg))
return NULL;
if (arg == NULL || arg == Py_None) {
time_t now;
time(&now);
init_genrand(self, (unsigned long)now);
Py_INCREF(Py_None);
return Py_None;
}
/* If the arg is an int or long, use its absolute value; else use
* the absolute value of its hash code.
*/
if (PyInt_Check(arg) || PyLong_Check(arg))
n = PyNumber_Absolute(arg);
else {
long hash = PyObject_Hash(arg);
if (hash == -1)
goto Done;
n = PyLong_FromUnsignedLong((unsigned long)hash);
}
if (n == NULL)
goto Done;
/* Now split n into 32-bit chunks, from the right. Each piece is
* stored into key, which has a capacity of keymax chunks, of which
* keyused are filled. Alas, the repeated shifting makes this a
* quadratic-time algorithm; we'd really like to use
* _PyLong_AsByteArray here, but then we'd have to break into the
* long representation to figure out how big an array was needed
* in advance.
*/
keymax = 8; /* arbitrary; grows later if needed */
keyused = 0;
key = (unsigned long *)PyMem_Malloc(keymax * sizeof(*key));
if (key == NULL)
goto Done;
masklower = PyLong_FromUnsignedLong(0xffffffffU);
if (masklower == NULL)
goto Done;
thirtytwo = PyInt_FromLong(32L);
if (thirtytwo == NULL)
goto Done;
while (PyObject_IsTrue(n)) {
PyObject *newn;
PyObject *pychunk;
unsigned long chunk;
pychunk = PyNumber_And(n, masklower);
if (pychunk == NULL)
goto Done;
chunk = PyLong_AsUnsignedLong(pychunk);
Py_DECREF(pychunk);
if (chunk == (unsigned long)-1 && PyErr_Occurred())
goto Done;
newn = PyNumber_Rshift(n, thirtytwo);
if (newn == NULL)
goto Done;
Py_DECREF(n);
n = newn;
if (keyused >= keymax) {
unsigned long bigger = keymax << 1;
if ((bigger >> 1) != keymax) {
PyErr_NoMemory();
goto Done;
}
key = (unsigned long *)PyMem_Realloc(key,
bigger * sizeof(*key));
if (key == NULL)
goto Done;
keymax = bigger;
}
assert(keyused < keymax);
key[keyused++] = chunk;
}
if (keyused == 0)
key[keyused++] = 0UL;
result = init_by_array(self, key, keyused);
Done:
Py_XDECREF(masklower);
Py_XDECREF(thirtytwo);
Py_XDECREF(n);
PyMem_Free(key);
return result;
}
static PyObject *
random_getstate(RandomObject *self)
{
PyObject *state;
PyObject *element;
int i;
state = PyTuple_New(N+1);
if (state == NULL)
return NULL;
for (i=0; i<N ; i++) {
element = PyInt_FromLong((long)(self->state[i]));
if (element == NULL)
goto Fail;
PyTuple_SET_ITEM(state, i, element);
}
element = PyInt_FromLong((long)(self->index));
if (element == NULL)
goto Fail;
PyTuple_SET_ITEM(state, i, element);
return state;
Fail:
Py_DECREF(state);
return NULL;
}
static PyObject *
random_setstate(RandomObject *self, PyObject *state)
{
int i;
long element;
if (!PyTuple_Check(state)) {
PyErr_SetString(PyExc_TypeError,
"state vector must be a tuple");
return NULL;
}
if (PyTuple_Size(state) != N+1) {
PyErr_SetString(PyExc_ValueError,
"state vector is the wrong size");
return NULL;
}
for (i=0; i<N ; i++) {
element = PyInt_AsLong(PyTuple_GET_ITEM(state, i));
if (element == -1 && PyErr_Occurred())
return NULL;
self->state[i] = (unsigned long)element;
}
element = PyInt_AsLong(PyTuple_GET_ITEM(state, i));
if (element == -1 && PyErr_Occurred())
return NULL;
self->index = (int)element;
Py_INCREF(Py_None);
return Py_None;
}
/*
Jumpahead should be a fast way advance the generator n-steps ahead, but
lacking a formula for that, the next best is to use n and the existing
state to create a new state far away from the original.
The generator uses constant spaced additive feedback, so shuffling the
state elements ought to produce a state which would not be encountered
(in the near term) by calls to random(). Shuffling is normally
implemented by swapping the ith element with another element ranging
from 0 to i inclusive. That allows the element to have the possibility
of not being moved. Since the goal is to produce a new, different
state, the swap element is ranged from 0 to i-1 inclusive. This assures
that each element gets moved at least once.
To make sure that consecutive calls to jumpahead(n) produce different
states (even in the rare case of involutory shuffles), i+1 is added to
each element at position i. Successive calls are then guaranteed to
have changing (growing) values as well as shuffled positions.
Finally, the self->index value is set to N so that the generator itself
kicks in on the next call to random(). This assures that all results
have been through the generator and do not just reflect alterations to
the underlying state.
*/
static PyObject *
random_jumpahead(RandomObject *self, PyObject *n)
{
long i, j;
PyObject *iobj;
PyObject *remobj;
unsigned long *mt, tmp;
if (!PyInt_Check(n) && !PyLong_Check(n)) {
PyErr_Format(PyExc_TypeError, "jumpahead requires an "
"integer, not '%s'",
n->ob_type->tp_name);
return NULL;
}
mt = self->state;
for (i = N-1; i > 1; i--) {
iobj = PyInt_FromLong(i);
if (iobj == NULL)
return NULL;
remobj = PyNumber_Remainder(n, iobj);
Py_DECREF(iobj);
if (remobj == NULL)
return NULL;
j = PyInt_AsLong(remobj);
Py_DECREF(remobj);
if (j == -1L && PyErr_Occurred())
return NULL;
tmp = mt[i];
mt[i] = mt[j];
mt[j] = tmp;
}
for (i = 0; i < N; i++)
mt[i] += i+1;
self->index = N;
Py_INCREF(Py_None);
return Py_None;
}
static PyObject *
random_new(PyTypeObject *type, PyObject *args, PyObject *kwds)
{
RandomObject *self;
PyObject *tmp;
self = (RandomObject *)type->tp_alloc(type, 0);
if (self == NULL)
return NULL;
tmp = random_seed(self, args);
if (tmp == NULL) {
Py_DECREF(self);
return NULL;
}
Py_DECREF(tmp);
return (PyObject *)self;
}
static PyMethodDef random_methods[] = {
{"random", (PyCFunction)random_random, METH_NOARGS,
PyDoc_STR("random() -> x in the interval [0, 1).")},
{"seed", (PyCFunction)random_seed, METH_VARARGS,
PyDoc_STR("seed([n]) -> None. Defaults to current time.")},
{"getstate", (PyCFunction)random_getstate, METH_NOARGS,
PyDoc_STR("getstate() -> tuple containing the current state.")},
{"setstate", (PyCFunction)random_setstate, METH_O,
PyDoc_STR("setstate(state) -> None. Restores generator state.")},
{"jumpahead", (PyCFunction)random_jumpahead, METH_O,
PyDoc_STR("jumpahead(int) -> None. Create new state from "
"existing state and integer.")},
{NULL, NULL} /* sentinel */
};
PyDoc_STRVAR(random_doc,
"Random() -> create a random number generator with its own internal state.");
static PyTypeObject Random_Type = {
PyObject_HEAD_INIT(NULL)
0, /*ob_size*/
"_random.Random", /*tp_name*/
sizeof(RandomObject), /*tp_basicsize*/
0, /*tp_itemsize*/
/* methods */
0, /*tp_dealloc*/
0, /*tp_print*/
0, /*tp_getattr*/
0, /*tp_setattr*/
0, /*tp_compare*/
0, /*tp_repr*/
0, /*tp_as_number*/
0, /*tp_as_sequence*/
0, /*tp_as_mapping*/
0, /*tp_hash*/
0, /*tp_call*/
0, /*tp_str*/
PyObject_GenericGetAttr, /*tp_getattro*/
0, /*tp_setattro*/
0, /*tp_as_buffer*/
Py_TPFLAGS_DEFAULT | Py_TPFLAGS_BASETYPE, /*tp_flags*/
random_doc, /*tp_doc*/
0, /*tp_traverse*/
0, /*tp_clear*/
0, /*tp_richcompare*/
0, /*tp_weaklistoffset*/
0, /*tp_iter*/
0, /*tp_iternext*/
random_methods, /*tp_methods*/
0, /*tp_members*/
0, /*tp_getset*/
0, /*tp_base*/
0, /*tp_dict*/
0, /*tp_descr_get*/
0, /*tp_descr_set*/
0, /*tp_dictoffset*/
0, /*tp_init*/
PyType_GenericAlloc, /*tp_alloc*/
random_new, /*tp_new*/
_PyObject_Del, /*tp_free*/
0, /*tp_is_gc*/
};
PyDoc_STRVAR(module_doc,
"Module implements the Mersenne Twister random number generator.");
PyMODINIT_FUNC
init_random(void)
{
PyObject *m;
if (PyType_Ready(&Random_Type) < 0)
return;
m = Py_InitModule3("_random", NULL, module_doc);
Py_INCREF(&Random_Type);
PyModule_AddObject(m, "Random", (PyObject *)&Random_Type);
}

View File

@ -316,6 +316,8 @@ class PyBuildExt(build_ext):
libraries=math_libs) )
exts.append( Extension('datetime', ['datetimemodule.c'],
libraries=math_libs) )
# random number generator implemented in C
exts.append( Extension("_random", ["_randommodule.c"]) )
# operator.add() and similar goodies
exts.append( Extension('operator', ['operator.c']) )
# Python C API test module