I am writing a function to calculate the confidence interval, but I am not getting the same results as from the t.test()
my data:
testData <-structure(list(group = c("Group1", "Group1", "Group1", "Group1",
"Group1", "Group1", "Group1", "Group1", "Group1", "Group1", "Group1",
"Group1", "Group1", "Group1", "Group1", "Group1", "Group1", "Group1",
"Group1", "Group1", "Group1", "Group1", "Group1", "Group1", "Group1",
"Group1", "Group1", "Group1", "Group1", "Group1", "Group1", "Group1"
), year = c(2015, 2015, 2015, 2015, 2015, 2015, 2015, 2015, 2015,
2015, 2016, 2016, 2016, 2016, 2016, 2016, 2016, 2016, 2016, 2016,
2016, 2016, 2016, 2016, 2016, 2017, 2017, 2017, 2017, 2017, 2017,
2017), category = c("cat1", "cat1", "cat1", "cat1", "cat1", "cat1",
"cat1", "cat1", "cat1", "cat1", "cat2", "cat2", "cat2", "cat2",
"cat2", "cat2", "cat2", "cat2", "cat2", "cat2", "cat2", "cat2",
"cat2", "cat2", "cat2", "cat3", "cat3", "cat3", "cat3", "cat3",
"cat3", "cat3"), value = c(15.1382663715558, 38.7804544564934,
46.8153764828161, 167.414619767484, 147.819242182614, 163.289605383038,
97.4909154781249, 76.4990140823147, 10.2998099118541, 106.829837472452,
47.9470225625797, 117.481510505374, 103.353651531038, 82.8258992025231,
75.8617413682001, 0.895652854035013, 158.506322595117, 153.09256856583,
223.536384788365, 75.748851191101, 46.9191391269587, 1.05445490408603,
34.2440937279552, 12.5493519758163, 81.9894639436096, 102.38603104988,
11.8608226647822, 16.0662436435422, 0.883484884196097, 58.1467542647205,
145.495946136843, 106.259860732627)), row.names = c(NA, -32L), class = c("data.table",
"data.frame"))
the function I have written to calculate the confidence intervals (95%) is:
calculateCI <- function(value){
avg <- mean(value)
s <- sqrt(var(value))
n <- length(value)
error <- qnorm(0.975)*s/sqrt(n)
lower <- avg - error
upper <- avg error
return(list(lowerCI = lower,
upperCI = upper))
}
Now I have compared the results from my function to the t.test()
$lowerCI [1] 58.49955
$upperCI [1] 99.4681
t.test(testData$value)
One Sample t-test
data: testData$value
t = 7.5573, df = 31, p-value = 1.615e-08
alternative hypothesis: true mean is not equal to 0
95 percent confidence interval:
57.66815 100.29950
sample estimates:
mean of x
78.98382
why are the results different??
CodePudding user response:
t.test
does not use the normal distribution to calculate the quantiles, but the student t distribution. If you use the student t distribution in your function, you get the same values:
calculateCI <- function(value){
avg <- mean(value)
s <- sqrt(var(value))
n <- length(value)
error <- qt(0.975, n-1)*s/sqrt(n)
lower <- avg - error
upper <- avg error
return(list(lowerCI = lower,
upperCI = upper))
}