How Does Numpy Random Seed Changes?-CodePudding

So, I'm in a project that uses Monte Carlo Method and I was studying the importance of the seed for pseudo-random numbers generation.

While doing experiments with python numpy random, I was trying to understand how the change in the seed affects the randomness, but I found something peculiar, at least for me. Using numpy.random.get_state() I saw that every time I run the script the seed starts different, changes once, but then keeps the same value for the entire script, as show in this code where it compares the state from two consecutive sampling:

import numpy as np

rand_state = [0]
for i in range(5):
    rand_state_i = np.random.get_state()[1]
    # printing only 3 state numbers, but comparing all of them
    print(np.random.rand(), rand_state_i[:3], all(rand_state_i==rand_state))
    rand_state = rand_state_i

# Print:
# 0.9721364306537633 [2147483648 2240777606 2786125948] False
# 0.0470329351113805 [3868808884  608863200 2913530561] False
# 0.4471038484385019 [3868808884  608863200 2913530561] True
# 0.2690477632739811 [3868808884  608863200 2913530561] True
# 0.7279016433547768 [3868808884  608863200 2913530561] True

So, my question is: how is the seed keeping the same value but returning different random values for each sampling? Does numpy uses other or more "data" to generate random numbers other than those present in numpy.random.get_state()?

CodePudding user response：

You're only looking at part of the state. The big array of 624 integers isn't the whole story.

The Mersenne Twister only updates its giant internal state array once every 624 calls. The rest of the time, it just reads an element of that array, feeds it through a "tempering" pass, and outputs the tempered result. It only updates the array on the first call, or once it's read every element.

To keep track of the last element it read, the Mersenne Twister has an additional position variable that you didn't account for. It's at index 2 in the get_state() tuple. You'll see it increment in steps of 2 in your loop, because np.random.rand() has to fetch 2 32-bit integers to build a single double-precision floating point output.

(NumPy also maintains some additional state that's not really part of the Mersenne Twister state, to generate normally distributed values more efficiently. You'll find this in indexes 3 and 4 of the get_state() tuple.)