Home > Software engineering >  Generating random numbers with upper and lower standard deviation of the scatter
Generating random numbers with upper and lower standard deviation of the scatter

Time:09-16

I have to produce randomly generated, normally distributed numbers, based on an astronomical table containing a most probable value and a standard deviation. The peculiar thing is that the standard deviation is not given by one, but two numbers - an upper standard deviation of the error and a lower, something like this:

mass_object, error_up, error_down
7.33, 0.12, 0.07
9.40, 0.04, 0.02
6.01, 0.11, 0.09
...

For example, this means for the first object that if a random mass m gets generated with m<7.33, then probably it will be further away from 7.33 than in the case that m>7.33. So I am looking now for a way to randomly generate the numbers and this way has to include the 2 possible standard deviations. If I was dealing with just one standard deviation per object, I would create the random number (mass) of the first object like that:

mass_random = np.random.normal(loc=7.33, scale=0.12)

Do you have ideas how to create these random numbers with upper and lower standard deviation of the scatter? Tnx

CodePudding user response:

As we discussed in the comments, a normal distribution has the same standard deviation in each direction (it's symmetric around the mean). So we know our distribution won't be normal. We can try a lognormal approach, since this allows us to introduce the idea of skewness. To do this in Python, you'll need Scipy. Here's a crude approach, assuming that 68% of data is on the mean, 16% is at the high point, and 16% is at the low point. We fit the distribution to that crude dataset, then we can calculate new points from the distribution:

import scipy.stats

#  Choose one of the rows
mean, high, low = 7.33, 0.12, 0.07

#  Create a dummy dataset to fit the distribution
values = [mean] * 68   [high   top] * 16   [mean - low ] * 16

# Print the fit distribution
fit_dist = scipy.stats.lognorm.fit(values)
print(fit_dist)

#  Calculate 10 new random values based on the fit
scipy.stats.lognorm.rvs(*fit_dist, size=10)

array([7.25541865, 7.34873107, 7.33831589, 7.36387121, 7.26912469,
       7.33084677, 7.35626689, 7.33907124, 7.32522422, 7.31688687])
  • Related