Home > Net >  Adding confidence level to dataframe in R
Adding confidence level to dataframe in R

Time:11-30

I am putting together some summary stats of a dataframe about biomass at different sights. Here's some dummy data:

df1 <- data.frame(ID = c(1, 2, 3, 4, 5,6,7,8,9,10,11),
Reach = c('a', 'b', 'c', 'a', 'a', 'b', 'c', 'c','b','c','a'),
Bio = c(12, 11, 10.4, 10, 12.5, 14, 12, 17, 17.5, 17.3, 16.2))

I have created a table in which to capture summary stats:

sumstats<- data.frame(matrix(data=NA, nrow=1, ncol=4))
colnames(sumstats) <- c("Total.Fish", "Mean.Fish", "St.Dev", "95%.Conf")

Completing the first three columns is easy enough

sumstats$Total.Fish<- sum(df1$Bio)
sumstats$Mean.Fish<- mean(df1$Bio)
sumstats$St.Dev <- sd(df1$Bio)

But the last one is giving me some bother. To my understanding there isn't a function in base R which computes the 95% confidence level. I have found that I can compute it using a Z Test in BSDA:

library(BSDA)
test1<- z.test(df1$Bio, sigma.x=(mean(df1$Bio)), conf.level = 0.95)

But I cannot figure out how to get the outputs of that into my dataframe. The output of the Z test is a list, one of the list items is the confidence level.

If I print the confidence level line of that list it shows several number, my summary stats dataframe needs the first one (5.74217 in this case). So my question is either:

  1. how do I get just part of the outputs from the z test into my dataframe
  2. is there an easier way to calculate the 95% condeince level?

CodePudding user response:

First, I don't think you should be using a Z-test. You would use that if you were comparing your sample to a population with known mean and standard deviation, which is not the case. Also, you use sample mean to specify the population standard deviation (sigma.x) which is certainly not correct.

  1. Just for completeness, you could get the confidence interval lower bound with test1$conf.int[1], but don't do that.

  2. You can use a t-test to find the confidence interval for your sample mean and the confidence interval can be obtained in a similar manner as for the z.test output:

test <- t.test(df1$Bio)
sumstats$Conf.Int.Lower <- test$conf.int[1]

Please note that for the t-test results to be reliable, your data should be normally distributed which it rather isn't. So treat the results with caution. Alternatively, you might use a non-parametric test, such as the Wilcoxon signed-rank test.

I am not sure why you are interested only in the lower CI bound. In any case, you can get the upper bound almost the same way:

sumstats$Conf.Int.Upper <- test$conf.int[2]

CodePudding user response:

Here is one directly from R for Data Science by Hadley Wickham. First he creates a function for mean CI, makes a randomly simulated uniform distribution, and finally runs the function on the uniform data. I have only added one thing: a random seed to replicate what you see here:

#### Set Random Seed ####
set.seed(1)

#### Mean CI Function ####
mean_ci <- function(x, conf = 0.95) { 
  se <- sd(x) / sqrt(length(x)) 
  alpha <- 1 - conf 
  mean(x)   se * qnorm(c(alpha / 2, 1 - alpha / 2)) 
} 

#### Create Uniform Distribution of 100 Values ####
x <- runif(100) 

#### Calculate Mean CI ####
mean_ci(x)

Which gets you this:

[1] 0.4654014 0.5702927

Of course other versions of confidence intervals may vary a lot, so I have only used this version here.

  •  Tags:  
  • r
  • Related