In column A, there are two factors, male and female. In column B, there are 5 assets, a to e.
df <- data.frame(ID = c(1:7),
gender = c("male","male", "male", "female", "female","female","female"),
assets = c("a,e","a,b,e,d", "b,c,e","b,c,e", "a,b,e,d", "c,d","a,d"))
How do I retrieve data where both male and female shared the same asset combinations?
I've zero clue how to write the R syntax for it and below is what I have tried
sameassets <- df %>% filter(filter(gender="male",assets) == filter(gender="female",assets))
desired output
sameassests <- data.frame(ID = c(2,5,3,4),
gender = c("male", "female", "male", "female"),
assets = c("a,b,e,d", "a,b,e,d", "b,c,e","b,c,e"))
Can someone help please?
Edit to include additional question. I also want to find out which asset combination that male does not share with female.
So the desired output for this will look like
diffassests <- data.frame(ID = c(1,6,7),
gender = c("male", "female", "female"),
assets = c("a,e", "c,d", "a,d"))
CodePudding user response:
df %>%
group_by(assets) %>%
filter(all(c('male', 'female') %in% gender))
# A tibble: 4 x 3
# Groups: assets [2]
ID gender assets
<int> <chr> <chr>
1 2 male a,b,e,d
2 3 male b,c,e
3 4 female b,c,e
4 5 female a,b,e,d
CodePudding user response:
Another option is to first group by assets
, then determine whether the number of distinct gender
for that assets
is equal to the dataframe level number of distinct gender
. The first gender
refers to gender
for each group, then .$gender
refers to the overall gender
column. See here for original idea.
library(dplyr)
df1 %>%
group_by(assets) %>%
filter(n_distinct(gender) == n_distinct(.$gender))
Or a possible base R solution:
df[df$assets %in% Reduce(intersect, split(df$assets, df$gender)), ]
Output
ID gender assets
<int> <chr> <chr>
1 2 male a,b,e,d
2 3 male b,c,e
3 4 female b,c,e
4 5 female a,b,e,d
CodePudding user response:
You can do this by grouping by assets
and then filtering to remove any asset groups that have only one row, as in:
library(dplyr)
df <- data.frame(ID = c(1:7),
gender = c("male","male", "male", "female", "female","female","female"),
assets = c("a,e","a,b,e,d", "b,c,e","b,c,e", "a,b,e,d", "c,d","a,d"))
df |>
group_by(assets) |>
filter(n() > 1) |>
arrange(assets, ID)