In NumPy 1.17, the random
module was given an overhaul. Among the new additions were SeedSequence
. In the section about parallel random number generation in the docs, SeedSequence
is said to
ensure that low-quality seeds are turned into high quality initial states
which supposedly mean that one can safely use a seed of e.g. 1
, as long as this is processed through a SeedSequence
prior to seeding the PRNG. However, the rest of the documentation page goes on to describe how SeedSequence
can turn one seed into several, independent PRNGs. What if we only want one PRNG, but want to be able to safely make use of small/bad seeds?
I have written the below test code, which uses the Mersenne twister MT19937
to draw normal distributed random numbers. The basic seeding (without using SeedSequence
) is MT19937(seed)
, corresponding to f()
below. I also try MT19937(SeedSequence(seed))
, corresponding to g()
, though this results in exactly the same stream. Lastly, I try using the spawn
/spawn_key
functionality of SeedSequence
, which does alter the stream (corresponding to h()
and i()
, which produce identical streams).
import numpy as np
import matplotlib.pyplot as plt
def f():
return np.random.Generator(np.random.MT19937(seed))
def g():
return np.random.Generator(np.random.MT19937(np.random.SeedSequence(seed)))
def h():
return np.random.Generator(np.random.MT19937(np.random.SeedSequence(seed).spawn(1)[0]))
def i():
return np.random.Generator(np.random.MT19937(np.random.SeedSequence(seed, spawn_key=(0,))))
seed = 42 # low seed, contains many 0s in binary representation
n = 100
for func, ls in zip([f, g, h, i], ['-', '--', '-', '--']):
generator = func()
plt.plot([generator.normal(0, 1) for _ in range(n)], ls)
plt.show()
Question
Are h()
and i()
really superior to f()
an g()
? If so, why is it necessary to invoke the spawn (parallel) functionality, just to convert a (possibly bad) seed into a good seed? To me these seem like they ought to be disjoint features.
CodePudding user response:
The reason nothing changed when you used an explicit SeedSequence
is that the new randomness APIs already pass seeds through SeedSequence
by default. It's not a sign that something went wrong, or that you need to explicitly call spawn
. Calling spawn
doesn't produce better output; it just produces different output.