Why are my standard error values showing as NA?-CodePudding

Here is my reproducible sample:

set.seed(42)
n <- 1000
dat <- data.frame(Participant=1:20, 
                  Environment=rep(LETTERS[1:2], n/2),
                  Condition=rep(LETTERS[25:26], n/2),
                  Gate= sample(1:5, n, replace=TRUE),
                  Block = sample(1:2, n, replace=TRUE),
                  Sound=rep(LETTERS[3:4], n/2),
                  Correct=sample(0:1, n, replace=TRUE)
)

From this dataset, I am trying to analyze at the participant-level, not the item-level. I Am trying to achieve this by transforming the dataset like this:

Participant_Data<- dat%>% 
  group_by(Condition, Gate, Sound, Participant) %>% 
  summarize(Accuracy = mean(Correct), 
            se = sd(Correct)/sqrt(length(Correct)))

Then I am making a graph with this new dataset:

Participant_Data%>% 
  group_by(Condition, Gate, Sound) %>%
  summarize(Proportion_Correct = mean(Accuracy),
            standarderror = sd(Proportion_Correct)/sqrt(length(Proportion_Correct))) %>%
  ggplot(aes(x = Gate, y = Proportion_Correct, color = Sound, group = Sound))   
  geom_line()   
  geom_errorbar(aes(ymin = Proportion_Correct - standarderror, ymax = Proportion_Correct   standarderror), color = "Black", size = .15, width = .3)   
  geom_point(size = 2)  
  scale_y_continuous(labels = scales::percent)  
  facet_wrap(~Condition)   
  theme_minimal()   
  scale_color_brewer(palette = "Set1")

But as you will see, my error values are coming up as NA, and therefore are not showing up on my graph. Let me know if you can see what I am not seeing, and thanks in advance!

CodePudding user response：

As pointed out by @MrFlick in the comments the issue is that using sd(Proportion_Correct) you are trying to compute a standard deviation for a vector of length 1 which will return NA.

Instead I would suggest to compute the standard error as sd(Accuracy, na.rm = TRUE)/sqrt(n()) which looks more like the natural way to compute the standard error given that Proportion_Correct is computed as mean(Accuracy).

library(dplyr)
library(ggplot2)

Participant_Data1 <- Participant_Data%>% 
  group_by(Condition, Gate, Sound) %>%
  summarize(Proportion_Correct = mean(Accuracy),
            standarderror = sd(Accuracy, na.rm = TRUE)/sqrt(n()))
#> `summarise()` has grouped output by 'Condition', 'Gate'. You can override using
#> the `.groups` argument.

ggplot(Participant_Data1, aes(x = Gate, y = Proportion_Correct, color = Sound, group = Sound))   
  geom_line()   
  geom_errorbar(aes(ymin = Proportion_Correct - standarderror, ymax = Proportion_Correct   standarderror), color = "Black", size = .15, width = .3)   
  geom_point(size = 2)  
  scale_y_continuous(labels = scales::percent)  
  facet_wrap(~Condition)   
  theme_minimal()   
  scale_color_brewer(palette = "Set1")