Home > database >  Sum does not count certain character vectors with summarise in dplyr
Sum does not count certain character vectors with summarise in dplyr

Time:10-30

When I use sum in base R for a character vector is counts as expected:

Letters <- c("A","A","B", "B") 

Pass <- c("Pass", "Fail", "Pass", "Fail")

df <- data.frame( Letters, Pass)

sum(df$Pass=="Fail")

[1] 2

When I use sum in dplyr it does not count in the same way:

Pass_summary <- df %>% group_by(Letters) %>% 
  summarise(n=n(), 
            Pass=sum(Pass=="Pass"), 
            Fail=sum(Pass=="Fail")
  )

enter image description here

I understand now from MrGrumble's comment that Pass is being reassigned in the 3rd line. Although I thought it was necessary to use mutate() to reference variable that are assigned in the summarise() phase?

CodePudding user response:

You are overriding Pass!

Try switching the order of summarize:

df %>% group_by(Letters) %>% 
  summarise(n=n(),  
            Fail=sum(Pass=="Fail"),
            Pass=sum(Pass=="Pass")
  )

Output:

  Letters     n  Fail  Pass
  <chr>   <int> <int> <int>
1 A           2     1     1
2 B           2     1     1

Or just don't name it "Pass"!

  • Related