Home > OS >  Retrieve data that has the same factor in column B for both factors in column A and finding which as
Retrieve data that has the same factor in column B for both factors in column A and finding which as

Time:04-26

In column A, there are two factors, male and female. In column B, there are 5 assets, a to e.

df <- data.frame(ID = c(1:7),                                   
                 gender = c("male","male", "male", "female", "female","female","female"),               
                 assets = c("a,e","a,b,e,d", "b,c,e","b,c,e", "a,b,e,d", "c,d","a,d"))  

   

How do I retrieve data where both male and female shared the same asset combinations?

I've zero clue how to write the R syntax for it and below is what I have tried

sameassets <- df %>% filter(filter(gender="male",assets) == filter(gender="female",assets))

desired output

sameassests <- data.frame(ID = c(2,5,3,4),                                   
                          gender = c("male", "female", "male", "female"),               
                          assets = c("a,b,e,d", "a,b,e,d", "b,c,e","b,c,e"))

Can someone help please?

Edit to include additional question. I also want to find out which asset combination that male does not share with female.

So the desired output for this will look like

diffassests <- data.frame(ID = c(1,6,7),                                   
                              gender = c("male", "female", "female"),               
                              assets = c("a,e", "c,d", "a,d"))

CodePudding user response:

df %>%
   group_by(assets) %>%
   filter(all(c('male', 'female') %in% gender))

# A tibble: 4 x 3
# Groups:   assets [2]
     ID gender assets 
  <int> <chr>  <chr>  
1     2 male   a,b,e,d
2     3 male   b,c,e  
3     4 female b,c,e  
4     5 female a,b,e,d

CodePudding user response:

Another option is to first group by assets, then determine whether the number of distinct gender for that assets is equal to the dataframe level number of distinct gender. The first gender refers to gender for each group, then .$gender refers to the overall gender column. See here for original idea.

library(dplyr)

df1 %>% 
  group_by(assets) %>% 
  filter(n_distinct(gender) == n_distinct(.$gender))

Or a possible base R solution:

df[df$assets %in% Reduce(intersect, split(df$assets, df$gender)), ]

Output

     ID gender assets 
  <int> <chr>  <chr>  
1     2 male   a,b,e,d
2     3 male   b,c,e  
3     4 female b,c,e  
4     5 female a,b,e,d

CodePudding user response:

You can do this by grouping by assets and then filtering to remove any asset groups that have only one row, as in:

library(dplyr)

df <- data.frame(ID = c(1:7),                                   
                 gender = c("male","male", "male", "female", "female","female","female"),               
                 assets = c("a,e","a,b,e,d", "b,c,e","b,c,e", "a,b,e,d", "c,d","a,d"))  

df |> 
  group_by(assets) |> 
  filter(n() > 1) |> 
  arrange(assets, ID)
  • Related