I am trying to graph the area under t distribution tail, like in
Here is the code.
alpha=0.1
n=5
l.critical = qt(alpha,df=n-1)
u.critical = -l.critical
# function to shade lower tail
funcShaded <- function(x) {
y <- dt(x,df=n-1)
y[x>l.critical]<-NA
return(y)
}
ggplot(data.frame(x = c(l.critical-3,u.critical 3)), aes(x = x))
stat_function(fun = dt,
args = list(df=n-1),linewidth=1)
scale_x_continuous(name = "t values")
stat_function(fun=funcShaded, geom="area", fill="#84CA72", alpha=1,
outline.type="full",color="black")
theme(axis.text.y = element_blank(),
axis.ticks.y = element_blank())
labs(y="")
geom_vline(xintercept=l.critical)
I suspect the problem might be the line y[x>l.critical]<-NA
where I replace the y values above my lower critical value i.e the Upper tail with NA, because probably the x values generated by stat_function() doesn't include my lower critical value, this would lead to a situation where the highest value not replaced is for x less than the lower critical value, and for that reason we end up with this. if this is the cause, is there a way to enforce my lower critical value to be among the generated x values??
CodePudding user response:
stat_function
has an argument, n
, which determines how many values are calculated along the curve. Set this to a high number (say 1000) and the inaccuracy will disappear. For example, with 99 degrees of freedom, the default plot looks like this:
ggplot(data.frame(x = c(l.critical - 3, u.critical 3)), aes(x))
stat_function(fun = funcShaded, geom = "area", fill = "#84CA72")
stat_function(fun = dt, args = list(df = n - 1), linewidth = 1)
geom_vline(xintercept = l.critical)
scale_x_continuous(name = "t values")
theme(axis.text.y = element_blank(),
axis.ticks.y = element_blank(),
axis.title.y = element_blank())
But if we add n = 1000
to stat_function
, the alignment is perfect:
ggplot(data.frame(x = c(l.critical - 3, u.critical 3)), aes(x))
stat_function(fun = funcShaded, geom = "area", fill = "#84CA72", n = 1000)
stat_function(fun = dt, args = list(df = n - 1), linewidth = 1, n = 1000)
geom_vline(xintercept = l.critical)
scale_x_continuous(name = "t values")
theme(axis.text.y = element_blank(),
axis.ticks.y = element_blank(),
axis.title.y = element_blank())
CodePudding user response:
To fix your issue you could set the limits in stat_function
via xlim
which at the same time allows to get rid of your funcShaded
:
alpha <- 0.1
n <- 99
l.critical <- qt(alpha, df = n - 1)
u.critical <- -l.critical
library(ggplot2)
ggplot(data.frame(x = c(l.critical - 3, u.critical 3)), aes(x = x))
stat_function(
fun = dt,
args = list(df = n - 1), linewidth = 1
)
scale_x_continuous(name = "t values")
stat_function(
fun = dt, geom = "area", fill = "#84CA72", alpha = 1,
outline.type = "full", color = "black", xlim = c(l.critical - 3, l.critical),
args = list(df = n - 1),
)
theme(
axis.text.y = element_blank(),
axis.ticks.y = element_blank()
)
labs(y = "")
geom_vline(xintercept = l.critical)