Home > OS >  Calculate and plot mean confidence interval for multiple categories with poisson distribution in R
Calculate and plot mean confidence interval for multiple categories with poisson distribution in R

Time:07-22

I have a very hard time with making a plot of the mean confidence interval for my dataset. My dataset consists of 2 columns, to simplify:

df$category<- c("a", "d", "a", "q", "d", "d", "q", "d", "a", "q")
df$count<- c(3, 2, 0, 5, 0, 4, 8, 0, 2, 4)

So it has 3 category (a, d & q) which have corresponding count data. My real dataset follows a poisson distribution.

I want to calculate the mean of each category as well as the confidence interval and plot this in a bargraph.

As the categories have different lenghts, I made subsets of each category and tried the following:

        SE<- function(x) sd(x)/sqrt(length(x))
        lim1<-function(x) mean(x)-1.96*SE(x)
        lim2<-function(x) mean(x) 1.96*SE(x)

        confidence1a<-apply(a$count, lim1) 
        confidence2a<-apply(a$count, lim2)

        confidence1d<-apply(d$count, lim1) 
        confidence2d<-apply(d$count, lim2)

With the plan to binding them into one dataset afterwards

But this resulted in the errror: Error in apply(a$count, FUN = lim1) : dim(X) must have a positive length

How can I fix this and not have to write out the formulas for each subset? My real dataset has 8 categories... Also it would be nicer to not have to subset each category in the first place.

If anyone can make this into some nice code I would be forever grateful!

CodePudding user response:

library(tidyverse)

df <- tibble(
  category = c("a", "d", "a", "q", "d", "d", "q", "d", "a", "q"),
  count =  c(3, 2, 0, 5, 0, 4, 8, 0, 2, 4)
) %>%  
  arrange_all()

df %>%
  group_by(category) %>%  
  mutate(mean = mean(count), 
         conf_lower = mean - 1.96*(sd(count) * length(count)), 
         conf_upper = mean   1.96*(sd(count) * length(count)))

# A tibble: 10 x 5
# Groups:   category [3]
   category count  mean conf_lower conf_upper
   <chr>    <dbl> <dbl>      <dbl>      <dbl>
 1 a            0  1.67      -7.32       10.6
 2 a            2  1.67      -7.32       10.6
 3 a            3  1.67      -7.32       10.6
 4 d            0  1.5      -13.5        16.5
 5 d            0  1.5      -13.5        16.5
 6 d            2  1.5      -13.5        16.5
 7 d            4  1.5      -13.5        16.5
 8 q            4  5.67      -6.57       17.9
 9 q            5  5.67      -6.57       17.9
10 q            8  5.67      -6.57       17.9

CodePudding user response:

Some basic data manipulation with dplyr will allow easy plotting with ggplot here. Your calculation for the confidence interval of a Poisson distribution isn't quite right here - it shouldn't result in negative values, so I have changed it to the appropriate calculation:

library(tidyverse)

df %>%
  group_by(category) %>%
  summarize(mean = mean(count),
            upper = mean(count)   1.96 * sqrt(mean(count)/n()),
            lower = mean(count) - 1.96 * sqrt(mean(count)/n())) %>%
  ggplot(aes(category, mean))  
  geom_col(fill = 'deepskyblue4')  
  geom_errorbar(aes(ymin = lower, ymax = upper), width = 0.5)  
  theme_minimal(base_size = 16)

enter image description here

  • Related