Home > Software design >  Mixture Poisson distribution: mean and variance in R
Mixture Poisson distribution: mean and variance in R

Time:05-28

So, I tried to simulate mixed Poisson distribution:

data2 <- data.frame(x = c(rpois(n = 50, lambda = 0), rpois(n = 450, lambda = 10)))

Then plot the histogram and density function upon it, for the distribution function I used function dmixpois from spatstat package. Here is the code for the plot:

ggplot(data2, aes(x = x))  
  geom_histogram(aes(y = ..density..), bins = 15)  
  geom_line(aes(x = x, y = dmixpois(data2$x, mu = 9, sd = sqrt(41))))

Here is the plot: enter image description here

Clearly, the density function is wrong. As far as I know, the mean for mixed Poisson distribution is linear combination of the means for the singled distributions and the variance is E[lambda] Var[lambda]. Here on the plot I only used the variance term, but if I add the expected value of lambda, I get the density to be even more steep. What is wrong with the computations?

CodePudding user response:

The 'mixed Poisson' that you are simulating isn't the same as the mixed Poisson model in spatstat, which just assumes that the lambda of a Poisson distribution itself is a normally-distributed random variable. From the docs:

In effect, the Poisson mean parameter lambda is randomised by setting lambda = invlink(Z) where Z has a Gaussian N(μ,σ²) distribution.

It therefore won't simulate the mixing of two independent Poisson distributions.

It looks like what you are simulating is a zero-inflated Poisson distribution, that is to say, a distribution where there is a probability of getting zero counts or a Poisson-distributed count for any given observation. There is a specific dzipois for this in the VGAM package.

Remember also that a Poisson distribution is a discrete probability distribution, so you cannot properly show it with a continuous line, but rather only with points or spikes at the integer values.

If you want to plot a distribution that matches your simulation, you can try the following:

set.seed(1) # To make the example reproducible

data2 <- data.frame(x = c(rpois(n = 50, lambda = 0), 
                          rpois(n = 450, lambda = 10)))


ggplot(data2)  
  geom_histogram(aes(x = x, y = ..density..), breaks = 0:21 - 0.5)  
  geom_point(data = data.frame(x =0:20, y = VGAM::dzipois(0:20, 10, 1/10)),
             aes(x, y))

enter image description here

CodePudding user response:

First of all the Poisson distribution with lambda = 0 is degenerate and always constant to zero:

rpois(10, lambda = 0)
[1] 0 0 0 0 0 0 0 0 0 0

This is why you have a spike at x = 0.

Second, the Poisson distribution is discrete and as such does not have a density (or pdf). Instead you can plot the relative frequencies in histogram (as you did) to get an estimation for the probability of X = x.

The mean of the mixed distribution is simply the weighted mean of the underlying distributions. In your case the underlying distribution have weights 10% and 90% and hence E[X] = 0.1 E[X_1] 0.9 E[X_2] = 0,9

The variance of the mixed distribution is given by Var(X) = Var(\Lambda) E[Lambda]

So everything is working as it should:

set.seed(1)

df <- data.frame(x = c(rpois(50, 1), rpois(450, 9))) 

mean(df$x)
sd(df$x)
var(df$x)
  • Related