I have a very hard time with making a plot of the mean confidence interval for my dataset. My dataset consists of 2 columns, to simplify:
df$category<- c("a", "d", "a", "q", "d", "d", "q", "d", "a", "q")
df$count<- c(3, 2, 0, 5, 0, 4, 8, 0, 2, 4)
So it has 3 category (a, d & q) which have corresponding count data. My real dataset follows a poisson distribution.
I want to calculate the mean of each category as well as the confidence interval and plot this in a bargraph.
As the categories have different lenghts, I made subsets of each category and tried the following:
SE<- function(x) sd(x)/sqrt(length(x))
lim1<-function(x) mean(x)-1.96*SE(x)
lim2<-function(x) mean(x) 1.96*SE(x)
confidence1a<-apply(a$count, lim1)
confidence2a<-apply(a$count, lim2)
confidence1d<-apply(d$count, lim1)
confidence2d<-apply(d$count, lim2)
With the plan to binding them into one dataset afterwards
But this resulted in the errror: Error in apply(a$count, FUN = lim1) : dim(X) must have a positive length
How can I fix this and not have to write out the formulas for each subset? My real dataset has 8 categories... Also it would be nicer to not have to subset each category in the first place.
If anyone can make this into some nice code I would be forever grateful!
CodePudding user response:
library(tidyverse)
df <- tibble(
category = c("a", "d", "a", "q", "d", "d", "q", "d", "a", "q"),
count = c(3, 2, 0, 5, 0, 4, 8, 0, 2, 4)
) %>%
arrange_all()
df %>%
group_by(category) %>%
mutate(mean = mean(count),
conf_lower = mean - 1.96*(sd(count) * length(count)),
conf_upper = mean 1.96*(sd(count) * length(count)))
# A tibble: 10 x 5
# Groups: category [3]
category count mean conf_lower conf_upper
<chr> <dbl> <dbl> <dbl> <dbl>
1 a 0 1.67 -7.32 10.6
2 a 2 1.67 -7.32 10.6
3 a 3 1.67 -7.32 10.6
4 d 0 1.5 -13.5 16.5
5 d 0 1.5 -13.5 16.5
6 d 2 1.5 -13.5 16.5
7 d 4 1.5 -13.5 16.5
8 q 4 5.67 -6.57 17.9
9 q 5 5.67 -6.57 17.9
10 q 8 5.67 -6.57 17.9
CodePudding user response:
Some basic data manipulation with dplyr will allow easy plotting with ggplot here. Your calculation for the confidence interval of a Poisson distribution isn't quite right here - it shouldn't result in negative values, so I have changed it to the appropriate calculation:
library(tidyverse)
df %>%
group_by(category) %>%
summarize(mean = mean(count),
upper = mean(count) 1.96 * sqrt(mean(count)/n()),
lower = mean(count) - 1.96 * sqrt(mean(count)/n())) %>%
ggplot(aes(category, mean))
geom_col(fill = 'deepskyblue4')
geom_errorbar(aes(ymin = lower, ymax = upper), width = 0.5)
theme_minimal(base_size = 16)