Home > Back-end >  calculate confidence interval for value in R
calculate confidence interval for value in R

Time:08-30

I am writing a function to calculate the confidence interval, but I am not getting the same results as from the t.test()

my data:

testData <-structure(list(group = c("Group1", "Group1", "Group1", "Group1", 
                                    "Group1", "Group1", "Group1", "Group1", "Group1", "Group1", "Group1", 
                                    "Group1", "Group1", "Group1", "Group1", "Group1", "Group1", "Group1", 
                                    "Group1", "Group1", "Group1", "Group1", "Group1", "Group1", "Group1", 
                                    "Group1", "Group1", "Group1", "Group1", "Group1", "Group1", "Group1"
), year = c(2015, 2015, 2015, 2015, 2015, 2015, 2015, 2015, 2015, 
            2015, 2016, 2016, 2016, 2016, 2016, 2016, 2016, 2016, 2016, 2016, 
            2016, 2016, 2016, 2016, 2016, 2017, 2017, 2017, 2017, 2017, 2017, 
            2017), category = c("cat1", "cat1", "cat1", "cat1", "cat1", "cat1", 
                                "cat1", "cat1", "cat1", "cat1", "cat2", "cat2", "cat2", "cat2", 
                                "cat2", "cat2", "cat2", "cat2", "cat2", "cat2", "cat2", "cat2", 
                                "cat2", "cat2", "cat2", "cat3", "cat3", "cat3", "cat3", "cat3", 
                                "cat3", "cat3"), value = c(15.1382663715558, 38.7804544564934, 
                                                           46.8153764828161, 167.414619767484, 147.819242182614, 163.289605383038, 
                                                           97.4909154781249, 76.4990140823147, 10.2998099118541, 106.829837472452, 
                                                           47.9470225625797, 117.481510505374, 103.353651531038, 82.8258992025231, 
                                                           75.8617413682001, 0.895652854035013, 158.506322595117, 153.09256856583, 
                                                           223.536384788365, 75.748851191101, 46.9191391269587, 1.05445490408603, 
                                                           34.2440937279552, 12.5493519758163, 81.9894639436096, 102.38603104988, 
                                                           11.8608226647822, 16.0662436435422, 0.883484884196097, 58.1467542647205, 
                                                           145.495946136843, 106.259860732627)), row.names = c(NA, -32L), class = c("data.table", 
                                                                                                                                    "data.frame"))

the function I have written to calculate the confidence intervals (95%) is:

calculateCI <- function(value){
  
  avg <- mean(value)
  s <- sqrt(var(value))
  n <- length(value)
  
  error <- qnorm(0.975)*s/sqrt(n)
  
  lower <- avg - error
  upper <- avg   error 
  
  return(list(lowerCI = lower, 
              upperCI = upper))
  
}

Now I have compared the results from my function to the t.test()

$lowerCI [1] 58.49955

$upperCI [1] 99.4681

t.test(testData$value)

One Sample t-test

data:  testData$value
t = 7.5573, df = 31, p-value = 1.615e-08
alternative hypothesis: true mean is not equal to 0
95 percent confidence interval:
  57.66815 100.29950
sample estimates:
mean of x 
 78.98382 

why are the results different??

CodePudding user response:

t.test does not use the normal distribution to calculate the quantiles, but the student t distribution. If you use the student t distribution in your function, you get the same values:

calculateCI <- function(value){
    
    avg <- mean(value)
    s <- sqrt(var(value))
    n <- length(value)
    
    error <- qt(0.975, n-1)*s/sqrt(n)
    
    lower <- avg - error
    upper <- avg   error 
    
    return(list(lowerCI = lower, 
                upperCI = upper))
    
}
  • Related