Following a log-normal distribution or not using Python-CodePudding

I have a matrix and I plotted all the elements of the matrix as shown below. How do I check if the elements follow a log-normal distribution?

import numpy as np
import matplotlib.pyplot as plt


inv_r=np.array([[0.64823518, 0.83720202, 0.93563635, 0.94477343, 1.05253592,
        1.28367679, 1.22086603, 1.3869073 , 1.63835474, 1.3472428 ],
       [0.67779819, 0.89049499, 1.87163307, 0.76616821, 0.63859816,
        0.88970981, 1.24933222, 1.70244884, 1.41288773, 0.4352665 ],
       [0.87478114, 1.61466754, 1.76209121, 1.18354069, 1.98489809,
        0.68434697, 0.89164925, 1.01179243, 0.55344252, 0.67770014],
       [1.57398937, 0.73402648, 1.50853046, 3.39180481, 1.12097252,
        1.44690537, 2.71361645, 1.35833663, 2.8387416 , 1.63599491],
       [0.98352121, 0.8034909 , 0.43440047, 1.15819716, 1.72482182,
        2.29668575, 1.79702295, 0.48908485, 2.3722099 , 1.46943279],
       [1.185387  , 2.36026176, 1.08328055, 1.18599108, 0.97528197,
        0.8820969 , 2.08205908, 1.34425083, 0.4146258 , 1.95035103],
       [2.19711571, 1.58786973, 0.87048445, 1.24324117, 1.55456239,
        1.34691756, 1.00360813, 1.77622545, 2.25403217, 1.63541101],
       [0.77615636, 0.58180174, 0.74028442, 1.39470309, 0.76768667,
        0.54203086, 1.89078838, 0.82748047, 1.39862745, 2.39547037],
       [2.53200407, 0.23182676, 1.91493715, 1.69375212, 1.02297358,
        1.15553262, 0.86335816, 0.93200029, 0.43445186, 1.47100631],
       [1.88479544, 1.21261474, 0.53477358, 1.51359341, 1.79260722,
        0.64794657, 1.17978372, 0.80116473, 1.46847931, 3.24720226]]).flatten()

nodes=np.arange(len(inv_r))

plt.bar(nodes, inv_r)
plt.xlabel('Node')
plt.ylabel('1/r')
plt.show()

CodePudding user response：

Assuming you have a concrete theoretical distribution to check against (i.e., the parameters of the log-normal distribution), you can use a one-sample Kolmogorov-Smirnov test.

scipy.stats.kstest can do that, something like:

stats.kstest(data, 'lognorm')

If the test results in a p-value of greater than your accepted level of significance, you can reject the hypothesis of the data being generated from a log-normal.

CodePudding user response：

You can generate a random probability which follows the required distribution, e.g., the log-normal as in your case, and then use one of two options:

calculate the Kullback-Liebler divergencefrom the scipy.special module, which basically checks how similar values from one some random distribution follows some known distribution. I.e., if they are similar - the value will be small, and as they differ - the output value will be larger.
Alternatively, you can use the Kolmogorov-Smirnov test, which computes the mean difference between two distributions, and outputs the probability that the two distributions differ:

It may be done as follows:

import numpy as np
import scipy


inv_r=np.array([[0.64823518, 0.83720202, 0.93563635, 0.94477343, 1.05253592, 1.28367679, 1.22086603, 1.3869073 , 1.63835474, 1.3472428 ], [0.67779819, 0.89049499, 1.87163307, 0.76616821, 0.63859816, 0.88970981, 1.24933222, 1.70244884, 1.41288773, 0.4352665 ], [0.87478114, 1.61466754, 1.76209121, 1.18354069, 1.98489809, 0.68434697, 0.89164925, 1.01179243, 0.55344252, 0.67770014], [1.57398937, 0.73402648, 1.50853046, 3.39180481, 1.12097252, 1.44690537, 2.71361645, 1.35833663, 2.8387416 , 1.63599491], [0.98352121, 0.8034909 , 0.43440047, 1.15819716, 1.72482182, 2.29668575, 1.79702295, 0.48908485, 2.3722099 , 1.46943279], [1.185387  , 2.36026176, 1.08328055, 1.18599108, 0.97528197, 0.8820969 , 2.08205908, 1.34425083, 0.4146258 , 1.95035103], [2.19711571, 1.58786973, 0.87048445, 1.24324117, 1.55456239, 1.34691756, 1.00360813, 1.77622545, 2.25403217, 1.63541101], [0.77615636, 0.58180174, 0.74028442, 1.39470309, 0.76768667, 0.54203086, 1.89078838, 0.82748047, 1.39862745, 2.39547037], [2.53200407, 0.23182676, 1.91493715, 1.69375212, 1.02297358, 1.15553262, 0.86335816, 0.93200029, 0.43445186, 1.47100631], [1.88479544, 1.21261474, 0.53477358, 1.51359341, 1.79260722, 0.64794657, 1.17978372, 0.80116473, 1.46847931, 3.24720226]]).flatten()

log_norm_dist = np.random.lognormal(0, 1, 100)

KL-Divergence:

scipy.special.kl_div(log_norm_dist, inv_r).mean()

Output: 1.69

scipy.special.kl_div(log_norm_dist, log_norm_dist).mean()

Output: 0.0

scipy.special.kl_div(inv_r, inv_r).mean()

Output: 0.0

KS-Test

from scipy.stats import kstest


kstest(log_norm_dist, inv_r)

Output: KstestResult(statistic=0.2, pvalue=0.03638428787491733)

kstest(log_norm_dist, log_norm_dist)

Output: KstestResult(statistic=0.0, pvalue=1.0)

kstest(inv_r, inv_r)

Output: KstestResult(statistic=0.0, pvalue=1.0)