When I use sum in base R for a character vector is counts as expected:
Letters <- c("A","A","B", "B")
Pass <- c("Pass", "Fail", "Pass", "Fail")
df <- data.frame( Letters, Pass)
sum(df$Pass=="Fail")
[1] 2
When I use sum in dplyr it does not count in the same way:
Pass_summary <- df %>% group_by(Letters) %>%
summarise(n=n(),
Pass=sum(Pass=="Pass"),
Fail=sum(Pass=="Fail")
)
I understand now from MrGrumble's comment that Pass is being reassigned in the 3rd line. Although I thought it was necessary to use mutate() to reference variable that are assigned in the summarise() phase?
CodePudding user response:
You are overriding Pass
!
Try switching the order of summarize
:
df %>% group_by(Letters) %>%
summarise(n=n(),
Fail=sum(Pass=="Fail"),
Pass=sum(Pass=="Pass")
)
Output:
Letters n Fail Pass
<chr> <int> <int> <int>
1 A 2 1 1
2 B 2 1 1
Or just don't name it "Pass"!