ggplot2 stat_function() not ploting the exact area under curve-CodePudding

I am trying to graph the area under t distribution tail, like in

Here is the code.

alpha=0.1
n=5

l.critical = qt(alpha,df=n-1)
u.critical = -l.critical


# function to shade lower tail
funcShaded <- function(x) {
  y <- dt(x,df=n-1)
  y[x>l.critical]<-NA
  return(y)
}

ggplot(data.frame(x = c(l.critical-3,u.critical 3)), aes(x = x))  
  stat_function(fun = dt,
                args = list(df=n-1),linewidth=1) 
  scale_x_continuous(name = "t values") 
  stat_function(fun=funcShaded, geom="area", fill="#84CA72", alpha=1,
                outline.type="full",color="black") 
  theme(axis.text.y = element_blank(),
        axis.ticks.y = element_blank()) 
  labs(y="") 
  geom_vline(xintercept=l.critical)

I suspect the problem might be the line y[x>l.critical]<-NA where I replace the y values above my lower critical value i.e the Upper tail with NA, because probably the x values generated by stat_function() doesn't include my lower critical value, this would lead to a situation where the highest value not replaced is for x less than the lower critical value, and for that reason we end up with this. if this is the cause, is there a way to enforce my lower critical value to be among the generated x values??

CodePudding user response：

stat_function has an argument, n, which determines how many values are calculated along the curve. Set this to a high number (say 1000) and the inaccuracy will disappear. For example, with 99 degrees of freedom, the default plot looks like this:

ggplot(data.frame(x = c(l.critical - 3, u.critical   3)), aes(x))  
  stat_function(fun = funcShaded, geom = "area", fill = "#84CA72")  
  stat_function(fun = dt, args = list(df = n - 1), linewidth = 1)  
  geom_vline(xintercept = l.critical)  
  scale_x_continuous(name = "t values")  
  theme(axis.text.y  = element_blank(),
        axis.ticks.y = element_blank(),
        axis.title.y = element_blank())

But if we add n = 1000 to stat_function, the alignment is perfect:

ggplot(data.frame(x = c(l.critical - 3, u.critical   3)), aes(x))  
  stat_function(fun = funcShaded, geom = "area", fill = "#84CA72", n = 1000)  
  stat_function(fun = dt, args = list(df = n - 1), linewidth = 1, n = 1000)  
  geom_vline(xintercept = l.critical)  
  scale_x_continuous(name = "t values")  
  theme(axis.text.y  = element_blank(),
        axis.ticks.y = element_blank(),
        axis.title.y = element_blank())

CodePudding user response：

To fix your issue you could set the limits in stat_function via xlim which at the same time allows to get rid of your funcShaded:

alpha <- 0.1
n <- 99

l.critical <- qt(alpha, df = n - 1)
u.critical <- -l.critical

library(ggplot2)

ggplot(data.frame(x = c(l.critical - 3, u.critical   3)), aes(x = x))  
  stat_function(
    fun = dt,
    args = list(df = n - 1), linewidth = 1
  )  
  scale_x_continuous(name = "t values")  
  stat_function(
    fun = dt, geom = "area", fill = "#84CA72", alpha = 1,
    outline.type = "full", color = "black", xlim = c(l.critical - 3, l.critical),
    args = list(df = n - 1), 
  )  
  theme(
    axis.text.y = element_blank(),
    axis.ticks.y = element_blank()
  )  
  labs(y = "")  
  geom_vline(xintercept = l.critical)