Is Pythons Random.randint Statistically Random?
Solution 1:
From the random
module documentation:
Almost all module functions depend on the basic function random(), which generates a random float uniformly in the semi-open range [0.0, 1.0). Python uses the Mersenne Twister as the core generator. It produces 53-bit precision floats and has a period of 2**19937-1. The underlying implementation in C is both fast and threadsafe. The Mersenne Twister is one of the most extensively tested random number generators in existence. However, being completely deterministic, it is not suitable for all purposes, and is completely unsuitable for cryptographic purposes.
From the Wikipedia article on the Mersenne Twister:
It provides for fast generation of very high-quality pseudorandom numbers, having been designed specifically to rectify many of the flaws found in older algorithms.
If you have an OS-specific randomness source, available through os.urandom()
, then you can use the random.SystemRandom()
class instead. Most of the random
module functions are available as methods on that class. It perhaps would be more suitable for cryptographic purposes, quoting the docs again:
The returned data should be unpredictable enough for cryptographic applications, though its exact quality depends on the OS implementation.
Python 3.6 adds a secrets
module with convenience methods to produce random data suitable for cryptographic purposes:
The
secrets
module is used for generating cryptographically strong random numbers suitable for managing data such as passwords, account authentication, security tokens, and related secrets.In particularly,
secrets
should be used in preference to the default pseudo-random number generator in therandom
module, which is designed for modelling and simulation, not security or cryptography.
Solution 2:
I reran the OP's exercise with one billion iterations:
from collections import Counter
importrandomn=1000000000
c = Counter(random.randint(1, 10) for _ in xrange(n))
for i in range(1,11):
print '%2s %02.10f%%' % (i, c[i] * 100.0 / n)
Here's the (reformatted) result:
1 9.9996500000%
2 10.0011089000%
3 10.0008568000%
4 10.0007495000%
5 9.9999089000%
6 9.9985344000%
7 9.9994913000%
8 9.9997877000%
9 10.0010818000%
10 9.9988307000%
See the other answers to this question for their excellent analysis.
Solution 3:
Martijn's answer is a pretty succinct review of the random number generators that Python has access to.
If you want to check out the properties of the generated pseudo-random data, download random.zip
from http://www.fourmilab.ch/random/, and run it on a big sample of random data. Especially the χ² (chi squared) test is very sensitive to randomness. For a sequence to be really random, the percentage from the χ² test should be between 10% and 90%.
For a game I'd guess that the Mersenne Twister that Python uses internally should be sufficiently random (unless you're building an online casino :-).
If you want pure randomness, and if you are using Linux, you can read from /dev/random
. This only produces random data from the kernel's entropy pool (which is gathered from the unpredictable times that interrupts arrive), so it will block if you exhaust it. This entropy is used to initialize (seed) the PRNG used by /dev/urandom
. On FreeBSD, the PRNG that supplies data for /dev/random
uses the Yarrow algorithm, which is generally regarded as being cryptographically secure.
Edit: I ran some tests on bytes from random.randint
. First creating a million random bytes:
import random
ba = bytearray([random.randint(0,255) for n in xrange(1000000)])
withopen('randint.dat', 'w+') as f:
f.write(ba)
Then I ran the ent
program from Fourmilab on it:
Entropy = 7.999840 bits per byte.
Optimum compression would reduce the size
of this1000000 byte file by0 percent.
Chi square distribution for1000000 samples is221.87, and randomly
would exceed this value 93.40 percent of the times.
Arithmetic mean value of data bytes is127.5136 (127.5 = random).
Monte Carlo value for Pi is3.139644559 (error 0.06 percent).
Serial correlation coefficient is -0.000931 (totally uncorrelated = 0.0).
Now for the χ² test, the further you get from 50%, the more suspect the data is. If one is very fussy, values <10% or >90% are deemed unacceptable. John Walker, author of ent
calls this value "almost suspect".
As a contrast, here is the same analysis of 10 MiB from FreeBSD's Yarrow prng that I ran earlier:
Entropy = 7.999982 bits per byte.
Optimum compression would reduce the size
of this10485760 byte file by0 percent.
Chi square distribution for10485760 samples is259.03, and randomly
would exceed this value 41.80 percent of the times.
Arithmetic mean value of data bytes is127.5116 (127.5 = random).
Monte Carlo value for Pi is3.139877754 (error 0.05 percent).
Serial correlation coefficient is -0.000296 (totally uncorrelated = 0.0).
While there seems not much difference in the other data, the χ² precentage is much closer to 50%.
Solution 4:
Yes, it is statistically random for all practical purposes. The random variation you saw is perfectly normal. In fact it would be a poor rng if it didn't have variation like that.
Since the period of the prng is 2**19937-1, you would need to generate more numbers than there are atoms in the universe before you see a nonrandom distribution. Note that if you generate 623 dimensional vectors, it becomes non random much sooner.
Solution 5:
It is indeed normal for random numbers to come up imperfectly distributed with a good PRNG. However, the more numbers you generate, the less you should see that.
BTW, I'm getting a standard deviation of 0.03066, which is slightly lower than what you gave.
Post a Comment for "Is Pythons Random.randint Statistically Random?"