I have a grouped data and I would like to create a new variable based on the values of the each of the rows.
> df <- data.frame(Group = c("A","A","A","B","B","B", "C", "C"), Gender=c("M","M","F","F","F","F", "M", "M"))
> df
Group Gender
1 A M
2 A M
3 A F
4 B F
5 B F
6 B F
7 C M
8 C M
In this example, I would like to now if groups of A, B and C are
- Male Only: all members of the group are male
- Female Only: all members of the group are female
- Mixed gender: there are males and females in the group
So the desired outcome is:
Group Gender gender_mix
1 A M Mixed
2 A M Mixed
3 A F Mixed
4 B F Female Only
5 B F Female Only
6 B F Female Only
7 C M Male Only
8 C M Male Only
I tried using any() and all() with no luck:
> df%>%
group_by(Group)%>%
mutate(gender_mix=case_when(all(Gender)=="M"~"Male Only",
all(Gender)=="F"~"FemAle Only",
any(Gender)=="M"&any(Gender)=="F"~"Mixed",
TRUE~NA_character_))
# A tibble: 8 × 3
# Groups: Group [3]
Group Gender gender_mix
<chr> <chr> <chr>
1 A M NA
2 A M NA
3 A F NA
4 B F NA
5 B F NA
6 B F NA
7 C M NA
8 C M NA
There were 12 warnings (use warnings() to see them)
> warnings()
Warning messages:
1: Problem while computing `gender_mix = case_when(...)`.
ℹ coercing argument of type 'character' to logical
ℹ The warning occurred in group 1: Group = "A".
2: Problem while computing `gender_mix = case_when(...)`.
ℹ coercing argument of type 'character' to logical
ℹ The warning occurred in group 1: Group = "A".
3: Problem while computing `gender_mix = case_when(...)`.
ℹ coercing argument of type 'character' to logical
ℹ The warning occurred in group 1: Group = "A".
4: Problem while computing `gender_mix = case_when(...)`.
ℹ coercing argument of type 'character' to logical
ℹ The warning occurred in group 1: Group = "A".
5: Problem while computing `gender_mix = case_when(...)`.
ℹ coercing argument of type 'character' to logical
ℹ The warning occurred in group 2: Group = "B".
6: Problem while computing `gender_mix = case_when(...)`.
ℹ coercing argument of type 'character' to logical
ℹ The warning occurred in group 2: Group = "B".
7: Problem while computing `gender_mix = case_when(...)`.
ℹ coercing argument of type 'character' to logical
ℹ The warning occurred in group 2: Group = "B".
8: Problem while computing `gender_mix = case_when(...)`.
ℹ coercing argument of type 'character' to logical
ℹ The warning occurred in group 2: Group = "B".
9: Problem while computing `gender_mix = case_when(...)`.
ℹ coercing argument of type 'character' to logical
ℹ The warning occurred in group 3: Group = "C".
10: Problem while computing `gender_mix = case_when(...)`.
ℹ coercing argument of type 'character' to logical
ℹ The warning occurred in group 3: Group = "C".
11: Problem while computing `gender_mix = case_when(...)`.
ℹ coercing argument of type 'character' to logical
ℹ The warning occurred in group 3: Group = "C".
12: Problem while computing `gender_mix = case_when(...)`.
ℹ coercing argument of type 'character' to logical
ℹ The warning occurred in group 3: Group = "C".
Another issues is that my data is rather large (10M rows) and any() and all() seems to be very slow.
Any help would be much appreciated.
CodePudding user response:
You should put the entire conditions between parenthesis:
df %>%
group_by(Group) %>%
mutate(gender_mix = case_when(all(Gender == "M") ~ "Male only",
all(Gender == "F") ~ "Female only",
any(Gender == "F") & any(Gender == "M") ~ "Mixed",
TRUE ~ NA_character_))
CodePudding user response:
Another option
library(dplyr)
library(stringr)
df %>%
distinct %>%
group_by(Group) %>%
mutate(gender_mix = if(n_distinct(Gender)== 2) 'Mixed' else
str_c(Gender, ' only')) %>%
ungroup %>%
left_join(df, .)