Error bars look huge in R, but not in Excel-CodePudding

I am having an issue with my error bars. they look gigantic in R, but look much more accurate in Excel. Example in R:

Example in Excel:

Code

Graph = subset(dat,dat$Condition != "z" & dat$Environment != "a") 
    Graph %>%
      group_by(Gate = fct_inorder(Gate), Sound, Block) %>%
      summarize(Accuracy = mean(Correct), 
                sd = sd(Correct)) %>%    
      ggplot(aes(x = Gate, y = Accuracy, color = Sound, group = Sound))   
      geom_line()   
      geom_errorbar(aes(ymin = Accuracy - sd, ymax = Accuracy   sd,))   
      scale_y_continuous(labels = scales::percent)  
      facet_wrap(~Block)   
      theme_minimal()

Reproducible sample

        set.seed(42)
    n <- 100
    dat <- data.frame(Participant=1:n, 
                      Environment=rep(LETTERS[1:2], n/2),
                      Condition=rep(LETTERS[25:26], n/2),
                      Gate= sample(1:5, n, replace=TRUE),
                      Block= sample(1:2,n, replace=TRUE),
                      Sound=rep(LETTERS[3:4], n/2),
                      Correct=sample(0:1, n, replace=TRUE)
)

CodePudding user response：

As the other answer points out, you should be looking at the standard error (sd/sqrt(n)) rather than the standard deviation. Here is a slightly more compact way to run your code, using stat_summary() to compute the summary statistics (mean_cl_normal normally plots the Normal 95% CIs, mult = 1 tells it to plot ±1 SE instead). If you want the end-caps on your error bars to be narrower, use the width= argument to adjust them.

(My plot still has large error bars but I assume that's because of the size of your reproducible example.)

library(tidyverse)
filter(dat, Condition != "z" & Environment != "a") %>%
  mutate(across(Gate = fct_inorder)) %>%
  ggplot(aes(Gate, Correct, colour = Sound))  
  stat_summary(geom="line", fun = mean)  
  stat_summary(geom="errorbar", fun.data = \(x) mean_cl_normal(x, mult=1))  
  facet_wrap(~ Block)

CodePudding user response：

They look huge because I was calculating standard deviation instead of standard error. In the above code, I changed

sd(Correct)

sd(Correct)/sqrt(length(Correct))