Home > Mobile >  small KS p-value in fitting power law using degree data from graph with expected power-law degree di
small KS p-value in fitting power law using degree data from graph with expected power-law degree di

Time:01-22

I am using the fit_power_law function from the igraph R package. The function returns "KS.p" (the p-value of the Kolmogorov-Smirnov test) as a significance test of the fitness. As I understand it, the smaller the KS p-value, the more likely one can reject the hypothesis that the data was drawn from a power law distribution. In other words, one would want a large KS p-value if one expects the data to fit the power law distribution.

However when I try this on a random graph with an expected power law degree distribution using the sample_fitness_pl function:

set.seed(137)
g <- sample_fitness_pl(10000, 60000, exponent.out=2.2)

and then run the fit_power_law function:

fit_power_law(degree(g))

I get the following results:

$continuous
[1] FALSE

$alpha
[1] 2.363965

$xmin
[1] 8

$logLik
[1] -16114.15

$KS.stat
[1] 0.02660938

$KS.p
[1] 0.002639624

The KS.p is quite small. If I set the significance level to 0.05, I could reject that the degree distribution is drawn from power law. However, it is original draw from graph with power law distributed degrees. I also followed another bootstrap approach to estimate the p-value using poweRlaw based on guidance here.

data_pl = displ$new(degree(g)[degree(g)>0])
est <- estimate_xmin(data_pl)
data_pl$xmin <- est$xmin
data_pl$pars <- est$pars
bs <- bootstrap_p(data_pl)

and the returned p-value is 0

bs$p
[1] 0

Anybody have an idea on how to explain this discrepancy? Any comments are appreciated!

CodePudding user response:

Citing doc of sample_fitness_sp:

Note that significant finite size effects may be observed for exponents smaller than 3 in the original formulation of the game. This function provides an argument that lets you remove the finite size effects by assuming that the fitness of vertex i is (i i_0-1)^{-alpha} is a constant chosen appropriately to ensure that the maximum degree is less than the square root of the number of edges times the average degree; see the paper of Chung and Lu, and Cho et al for more details.

Even very small differences may be significative:

library(igraph)
set.seed(137)
g <- sample_fitness_pl(10000, 60000, exponent.out=2.99999)
fit_power_law(degree(g))["KS.p"]
#> $KS.p
#> [1] 0.1234042

set.seed(137)
g <- sample_fitness_pl(10000, 60000, exponent.out=3)
fit_power_law(degree(g))["KS.p"]
#> $KS.p
#> [1] 0.9999999

Finally you can set the finite size effect correction to change this behaviour:

set.seed(137)
g <- sample_fitness_pl(10000, 60000, exponent.out=2.2, finite.size.correction = F)
fit_power_law(degree(g))["KS.p"]
#> $KS.p
#> [1] 0.9951813
  • Related