Home > Mobile >  Data generated from Scipy truncnorm.rvs does not match specified standard deviation
Data generated from Scipy truncnorm.rvs does not match specified standard deviation

Time:08-10

I am trying to generate data which follow specified truncated normal distribution. Based on answers enter image description here

showcases the truncated normal density alluding to the fact that the narrower the interval [lower, upper] are chosen, the smaller the standard deviation will be (even approaching 0 asymptotically when lower and upper get infinitesimally close).

Let's make this rigorous to really be sure. Given the age-old equations for the expected value and variance of our (truncated normal random variable X) we have

enter image description here

Then, defining the helper functions

def xfx(x, lower=lower, upper=upper, mu=mu, sigma=sigma):
    '''helper function returning x*f(x) for the truncated normal density f'''
    return x*scipy.stats.truncnorm.pdf(x, (lower-mu)/sigma, (upper-mu)/sigma, loc=mu, scale=sigma)

def x_EX_fx(x, lower=lower, upper=upper, mu=mu, sigma=sigma):
    '''helper function returning (x - E[X])**2 * f(x) for the truncated normal density f'''
    EX = quad(func=xfx,a=lower,b=upper)[0]
    return ((x - EX)**2) * scipy.stats.truncnorm.pdf(x, (lower-mu)/sigma, (upper-mu)/sigma, loc=mu, scale=sigma)

allows us to compute the values exact

# E[X], expected value of X
quad(func=xfx,a=lower,b=upper)[0]
> 10.0

# (Var(X))^(1/2), standard deviation of X
np.sqrt(quad(func=x_EX_fx,a=lower,b=upper)[0])
> 2.697

This looks eerily similar to your observed value 2.673. Let's see if the difference is merely based on the finite sample size by running a simulation study to observe if the empirical standard deviation approaches the theoretical one.

# simulation study
np.random.seed(7447)
stdList = [scipy.stats.truncnorm.rvs((lower-mu)/sigma, (upper-mu)/sigma, loc=mu, scale=sigma, size=round(10**N)).std() for N in range(2,8)]

# plot 
plt.title("Convergence behaviour of $\hat{σ}_{n}$ to σ", fontsize=18)
plt.plot(range(2,8), stdList)
plt.axhline(2.697800468774485, color='red', lw=0.85)
plt.legend({'emprical' : 'blue', 'theoretical' : 'red'}, fontsize=14)
plt.xlabel("$log_{10}(N)$", fontsize=14)
plt.show()

yielding

enter image description here

This confirms that your output is sound,

CodePudding user response:

This is generating a clipped normal distribution between [5,15]. This is /- 1 s.d, so the s.d. measured across this sample will not be equal to the input.

If you clip the range of outputs, you necessarily reduce the s.d. observed.

As lower/upper -> /-infinity, the sample std -> 5. As lower/upper -> 10, the sample std -> 0.

  • Related