I have the mean and the SD from a log normal distribution. However, in order to provide a sampling with from a log-normal distribution in python I need to transfer these variables into a the mean and SD of the underlying Normal distribution.
from numpy.random import seed
from numpy.random import normal
import numpy as np
mu = 25.2
sigma = 10.5
#pd.reset_option('display.float_format')
r = []
r = np.random.lognormal(mu, sigma, 1000)
for i in range(1000):
while r[i] > 64 or r[i] < 4:
y = np.random.lognormal(mu, sigma, 1)
r[i] = y[0]
df = pd.DataFrame(r, columns = ['Column_A'])
print(df)
sns.set_style("whitegrid", {'axes.grid' : False})
sns.set(rc={"figure.figsize": (8, 4)})
sns.distplot(df['Column_A'], bins = 70)
However, I don't know how to transfer these values
CodePudding user response:
If I understand correctly your post, you want to access to the underlying (mu sigma^2)
parametrization of the normal distribution that produced your log-normal observations ?
TL;DR
Assuming your log-normal observations are stored in r
:
mu = np.log(np.median(r))
var = 2*(np.log(np.mean(r)) - np.log(np.median(r)))
sd = np.sqrt(var)
Theoretical part
Start reading ref some statistics about log-normal distribution. It appears it's quite hard to retrieve (mu, sigma^2)
from the empirical mean and variance of a log-normal sample ...
Let X
be a log-normal random variable and let Y=ln(X)
. It appears Y
follows a normal distribution with mean (mu, sigma^2)
. Let M
ans S
be the mean and variance of X
. It turns out that:
M = exp(mu sigma^2/2)
S = (exp(sigma^2) - 1) * exp(2*mu sigma^2)
Which hardly leads to a simple expression for (mu, sigma^2)
.
However, according to ref, inverting your (M, S) system will be easier by replacing the variance S
by either the median Med
or the mode Mode
since they hold a much simpler expression wrt (mu, sigma^2)
:
Med = exp(mu)
Mode = exp(mu - sigma^2)
The empirical median will be easier to compute through Numpy so let's assume we'll use it in our computations. The inverted system should lead to the following estimators for (mu, sigma^2)
:
mu = log(Med)
sigma2 = 2*(log(M) - log(Med))
Pythonic part
Supposing your log-normal observations are stored in your r
array:
mu = np.log(np.median(r))
var = 2*(np.log(np.mean(r)) - np.log(np.median(r)))
sd = np.sqrt(var)
And a quick-check shows it's likely to be right :
# random log-normal sample with (mu, sigma)=(1, 2)
r = np.random.lognormal(1, 2, size=(1000000))
# estimators
mu = np.log(np.median(r))
var = 2*(np.log(np.mean(r)) - np.log(np.median(r)))
sd = np.sqrt(var)
$> mu = 1.001368782773
$> sigma = 2.0024723139