Parameters for numpy.random.lognormal function-CodePudding

I need to create a fictitious log-normal distribution of household income in a particular area. The data I have are: Average: 13,600 and Standard Deviation 7,900.

What should be the parameters in the function numpy.random.lognormal? When i set the mean and the standard deviation as they are most of the values in the distribution are "inf", and the values also doesn't make sense when i set the parameters as the log of the mean and standard deviation.

If someone can help me to figure out what the parameters are it would be great. Thanks!

CodePudding user response：

This is indeed a nontrivial task as the moments of the log-normal distribution should be solved for the unknown parameters. By looking at say [Wikipedia][1], you will find the mean and variance of the log-normal distribution to be exp(mu sigma2) and [exp(sigma2)-1]*exp(2*mu sigma**2), respectively.

The choice of mu and sigma should solve exp(mu sigma**2) = 13600 and [exp(sigma**2)-1]*exp(2*mu sigma**2)= 7900**2. This can be solved analytically because the first equation squared provides exactly exp(2*mu sigma**2) thus eliminating the variable mu from the second equation.

A sample code is provided below. I took a large sample size to explicitly show that the mean and standard deviation of the simulated data are close to the desired numbers.

import numpy as np

# Input characteristics
DataAverage = 13600
DataStdDev = 7900

# Sample size
SampleSize = 100000

# Mean and variance of the standard lognormal distribution
SigmaLogNormal = np.sqrt( np.log(1 (DataStdDev/DataAverage)**2))
MeanLogNormal = np.log( DataAverage ) - SigmaLogNormal**2/2
print(MeanLogNormal, SigmaLogNormal)

# Obtain draw from log-normal distribution
Draw = np.random.lognormal(mean=MeanLogNormal, sigma=SigmaLogNormal, size=SampleSize)

# Check
print( np.mean(Draw), np.std(Draw))