I am putting together some summary stats of a dataframe about biomass at different sights. Here's some dummy data:
df1 <- data.frame(ID = c(1, 2, 3, 4, 5,6,7,8,9,10,11),
Reach = c('a', 'b', 'c', 'a', 'a', 'b', 'c', 'c','b','c','a'),
Bio = c(12, 11, 10.4, 10, 12.5, 14, 12, 17, 17.5, 17.3, 16.2))
I have created a table in which to capture summary stats:
sumstats<- data.frame(matrix(data=NA, nrow=1, ncol=4))
colnames(sumstats) <- c("Total.Fish", "Mean.Fish", "St.Dev", "95%.Conf")
Completing the first three columns is easy enough
sumstats$Total.Fish<- sum(df1$Bio)
sumstats$Mean.Fish<- mean(df1$Bio)
sumstats$St.Dev <- sd(df1$Bio)
But the last one is giving me some bother. To my understanding there isn't a function in base R which computes the 95% confidence level. I have found that I can compute it using a Z Test in BSDA:
library(BSDA)
test1<- z.test(df1$Bio, sigma.x=(mean(df1$Bio)), conf.level = 0.95)
But I cannot figure out how to get the outputs of that into my dataframe. The output of the Z test is a list, one of the list items is the confidence level.
If I print the confidence level line of that list it shows several number, my summary stats dataframe needs the first one (5.74217 in this case). So my question is either:
- how do I get just part of the outputs from the z test into my dataframe
- is there an easier way to calculate the 95% condeince level?
CodePudding user response:
First, I don't think you should be using a Z-test. You would use that if you were comparing your sample to a population with known mean and standard deviation, which is not the case. Also, you use sample mean to specify the population standard deviation (sigma.x
) which is certainly not correct.
Just for completeness, you could get the confidence interval lower bound with
test1$conf.int[1]
, but don't do that.You can use a t-test to find the confidence interval for your sample mean and the confidence interval can be obtained in a similar manner as for the
z.test
output:
test <- t.test(df1$Bio)
sumstats$Conf.Int.Lower <- test$conf.int[1]
Please note that for the t-test results to be reliable, your data should be normally distributed which it rather isn't. So treat the results with caution. Alternatively, you might use a non-parametric test, such as the Wilcoxon signed-rank test.
I am not sure why you are interested only in the lower CI bound. In any case, you can get the upper bound almost the same way:
sumstats$Conf.Int.Upper <- test$conf.int[2]
CodePudding user response:
Here is one directly from R for Data Science by Hadley Wickham. First he creates a function for mean CI, makes a randomly simulated uniform distribution, and finally runs the function on the uniform data. I have only added one thing: a random seed to replicate what you see here:
#### Set Random Seed ####
set.seed(1)
#### Mean CI Function ####
mean_ci <- function(x, conf = 0.95) {
se <- sd(x) / sqrt(length(x))
alpha <- 1 - conf
mean(x) se * qnorm(c(alpha / 2, 1 - alpha / 2))
}
#### Create Uniform Distribution of 100 Values ####
x <- runif(100)
#### Calculate Mean CI ####
mean_ci(x)
Which gets you this:
[1] 0.4654014 0.5702927
Of course other versions of confidence intervals may vary a lot, so I have only used this version here.