Home > database >  Using any() and all() for string columns in grouped data
Using any() and all() for string columns in grouped data

Time:10-22

I have a grouped data and I would like to create a new variable based on the values of the each of the rows.

> df <- data.frame(Group = c("A","A","A","B","B","B", "C", "C"), Gender=c("M","M","F","F","F","F", "M", "M"))
>   df
  Group Gender
1     A      M
2     A      M
3     A      F
4     B      F
5     B      F
6     B      F
7     C      M
8     C      M 

In this example, I would like to now if groups of A, B and C are

  • Male Only: all members of the group are male
  • Female Only: all members of the group are female
  • Mixed gender: there are males and females in the group

So the desired outcome is:

  Group Gender  gender_mix
1     A      M       Mixed
2     A      M       Mixed
3     A      F       Mixed
4     B      F Female Only
5     B      F Female Only
6     B      F Female Only
7     C      M   Male Only
8     C      M   Male Only

I tried using any() and all() with no luck:

>   df%>%
      group_by(Group)%>%
      mutate(gender_mix=case_when(all(Gender)=="M"~"Male Only",
                                  all(Gender)=="F"~"FemAle Only",
                                  any(Gender)=="M"&any(Gender)=="F"~"Mixed",
                                  TRUE~NA_character_))
# A tibble: 8 × 3
# Groups:   Group [3]
  Group Gender gender_mix
  <chr> <chr>  <chr>     
1 A     M      NA        
2 A     M      NA        
3 A     F      NA        
4 B     F      NA        
5 B     F      NA        
6 B     F      NA        
7 C     M      NA        
8 C     M      NA        
There were 12 warnings (use warnings() to see them)
> warnings()
Warning messages:
1: Problem while computing `gender_mix = case_when(...)`.
ℹ coercing argument of type 'character' to logical
ℹ The warning occurred in group 1: Group = "A".
2: Problem while computing `gender_mix = case_when(...)`.
ℹ coercing argument of type 'character' to logical
ℹ The warning occurred in group 1: Group = "A".
3: Problem while computing `gender_mix = case_when(...)`.
ℹ coercing argument of type 'character' to logical
ℹ The warning occurred in group 1: Group = "A".
4: Problem while computing `gender_mix = case_when(...)`.
ℹ coercing argument of type 'character' to logical
ℹ The warning occurred in group 1: Group = "A".
5: Problem while computing `gender_mix = case_when(...)`.
ℹ coercing argument of type 'character' to logical
ℹ The warning occurred in group 2: Group = "B".
6: Problem while computing `gender_mix = case_when(...)`.
ℹ coercing argument of type 'character' to logical
ℹ The warning occurred in group 2: Group = "B".
7: Problem while computing `gender_mix = case_when(...)`.
ℹ coercing argument of type 'character' to logical
ℹ The warning occurred in group 2: Group = "B".
8: Problem while computing `gender_mix = case_when(...)`.
ℹ coercing argument of type 'character' to logical
ℹ The warning occurred in group 2: Group = "B".
9: Problem while computing `gender_mix = case_when(...)`.
ℹ coercing argument of type 'character' to logical
ℹ The warning occurred in group 3: Group = "C".
10: Problem while computing `gender_mix = case_when(...)`.
ℹ coercing argument of type 'character' to logical
ℹ The warning occurred in group 3: Group = "C".
11: Problem while computing `gender_mix = case_when(...)`.
ℹ coercing argument of type 'character' to logical
ℹ The warning occurred in group 3: Group = "C".
12: Problem while computing `gender_mix = case_when(...)`.
ℹ coercing argument of type 'character' to logical
ℹ The warning occurred in group 3: Group = "C".

Another issues is that my data is rather large (10M rows) and any() and all() seems to be very slow.

Any help would be much appreciated.

CodePudding user response:

You should put the entire conditions between parenthesis:

df %>% 
  group_by(Group) %>% 
  mutate(gender_mix = case_when(all(Gender == "M") ~ "Male only",
                                all(Gender == "F") ~ "Female only",
                                any(Gender == "F") & any(Gender == "M") ~ "Mixed",
                                TRUE ~ NA_character_))

CodePudding user response:

Another option

library(dplyr)
library(stringr)
df %>% 
  distinct %>% 
  group_by(Group) %>% 
  mutate(gender_mix = if(n_distinct(Gender)== 2) 'Mixed' else 
       str_c(Gender, ' only')) %>% 
  ungroup %>% 
  left_join(df, .)
  • Related