Density plot of the F-distribution (df1=1). Theoretical or simulated?-CodePudding

I am plotting the density of F(1,49) in R. It seems that the simulated plot does not match the theoretical plot when values approach the zero.

set.seed(123)
val <- rf(1000, df1=1, df2=49)
plot(density(val), yaxt="n",ylab="",xlab="Observation",
     main=expression(paste("Density plot (",italic(n),"=1000, ",italic(df)[1],"=1, ",italic(df)[2],"=49)")),
     lwd=2)
curve(df(x, df1=1, df2=49), from=0, to=10, add=T, col="red",lwd=2,lty=2)
legend("topright",c("Theoretical","Simulated"),
       col=c("red","black"),lty=c(2,1),bty="n")

CodePudding user response：

Using density(val, from = 0) gets you much closer, although still not perfect. Densities near boundaries are notoriously difficult to calculate in a satisfactory way.

CodePudding user response：

By default, density uses a Gaussian kernel to estimate the probability density at a given point. Effectively, this means that at each point an observation was found, a normal density curve is placed there with its center at the observation. All these normal densities are added up, then the result is normalized so that the area under the curve is 1.

This works well if observations have a central tendency, but gives unrealistic results when there are sharp boundaries (Try plot(density(runif(1000))) for a prime example).

When you have a very high density of points close to zero, but none below zero, the left tail of all the normal kernels will "spill over" into the negative values, giving a Gaussian-type which doesn't match the theoretical density.

This means that if you have a sharp boundary at 0, you should remove values of your simulated density that are between zero and about two standard deviations of your smoothing kernel - anything below this will be misleading.

Since we can control the standard deviation of our smoothing kernel with the bw parameter of density, and easily control which x values are plotted using ggplot, we will get a more sensible result by doing something like this:

library(ggplot2)

ggplot(as.data.frame(density(val), bw = 0.1), aes(x, y))   
  geom_line(aes(col = "Simulated"), na.rm = TRUE)   
  geom_function(fun = ~ df(.x, df1 = 1, df2 = 49), 
                aes(col = "Theoretical"), lty = 2)  
  lims(x = c(0.2, 12))  
  theme_classic(base_size = 16)  
  labs(title = expression(paste("Density plot (",italic(n),"=1000, ",
                                italic(df)[1],"=1, ",italic(df)[2],"=49)")),
       x = "Observation", y = "")  
  scale_color_manual(values = c("black", "red"), name = "")