Suppose I want to sample a normal distribution. This is straightforward through rng = numpy.random.default_rng()
and then rng.normal(mean, std, size)
.
This is also easy if I want to change the standard deviation, like rng.normal(mean, std*2, size)
.
However, executing the two commands give "different" result. To my understanding, on of the possible way in which sampling may happen is via inverse transform: so it is like I am choosing a random value r
from an uniform distribution and using it to index the inverse of the cumulative distribution of the normal distribution, like F_X^(-1)(r)
. Now, I think that if I use rng.normal(mean, std, size)
and rng.normal(mean, std*2, size)
each of them will produce size = (a,b,c) -> a b c
samples which will NOT correspond to the same sample from the uniform distribution (i.e., they will use different values of r
). This is not how it is implemented in numpy, but I think that the actual sample that I would obtain will be different in this sense (based on the actual implementation here, it is like I am using a different state, but I am not sure).
In other words, I am searching a method that will provide me a sample from rng.normal(mean, std, size)
and the sample obtained in the same way but using rng.normal(mean, 2*std, size)
, so a different standard deviation.
I thougt about manipulating the sample produced by rng.normal(mean, std, size)
in order to obtain the desired result:
def alter_value(value, new_std, prev_mean=0, prev_std=1):
# use the PDF expression to obtain p(x)
gaussian_value = 1/(np.sqrt(2)*prev_std) * np.exp(-(value-prev_mean)**2/2/prev_std**2)
# fixed p(x), obtain the new x
new_x = prev_mean - np.log(np.sqrt(2)*new_std*gaussian_val)*2*new_std**2
return new_x
However, I do not know if this is the correct method at all.
CodePudding user response:
The easiest way to get repeatable or closely-analogous random samples is to set the "seed" or "state" of the random number generator directly. The result isn't guaranteed to obey your alter_value()
function, but as long as you always set the state to the same value before calling your function, and as long as you don't change the function itself, the output of the RNG is guaranteed to be the same. (See also comments from @slothrop and @warren-weckesser on your question.)
This probably works because under the hood, numpy is likely just drawing random numbers and scaling them based on the standard deviation & mean - the way your alter_value()
does. It's possible for numpy to do something more complex but there's no reason for it to do so.
A "RNG" is just a fixed "pseudorandom" sequence of numbers, so as long as that sequence is being consumed in the same way, starting at the same position will yield the same sequence every time.
Of course, if you absolutely need the behavior of alter_value()
just use that function... but if you just want to be able to reproduce the behavior at a high level, setting the seed will do it.