Home > Enterprise >  Find differences among groups based on a condition in dplyr
Find differences among groups based on a condition in dplyr

Time:10-04

I have a data frame that looks like this, but its gigantic.

df = data.frame(gene=c("A","B","F","A","D","E","B","C","D","G"),
                group=c("group1","group1","group1","group2","group2","group2","group3","group3","group3","group3"))
df

 gene    group
   A     group1
   B     group1
   F     group1
   A     group2
   D     group2
   E     group2
   B     group3
   C     group3
   D     group3
   G     group3

Based on the column gene, I want to find unique differences between groups containing the gene "A" and groups that do not include gene A.

I want my data to look this after the "filtering"

gene group
 F    group1
 E    group2

Since F is the only gene that is present in a group that contains the gene A and its not present in any other group.

CodePudding user response:

We can filter the rows that have 'gene' containing 'A' and not having 'A' and then do an anti_join

library(dplyr)
tmp1 <- df %>% 
       filter(group %in% group[gene %in% 'A'])
 
tmp2 <- df %>% 
          group_by(group) %>% 
         filter(!'A' %in% gene) %>%
         ungroup
anti_join(tmp1, tmp2, by = 'gene') %>%
      filter(gene != 'A')

-output

 gene  group
1    F group1
2    E group2
  • Related