I have a data frame that looks like this, but its gigantic.
df = data.frame(gene=c("A","B","F","A","D","E","B","C","D","G"),
group=c("group1","group1","group1","group2","group2","group2","group3","group3","group3","group3"))
df
gene group
A group1
B group1
F group1
A group2
D group2
E group2
B group3
C group3
D group3
G group3
Based on the column gene, I want to find unique differences between groups containing the gene "A" and groups that do not include gene A.
I want my data to look this after the "filtering"
gene group
F group1
E group2
Since F is the only gene that is present in a group that contains the gene A and its not present in any other group.
CodePudding user response:
We can filter
the rows that have 'gene' containing 'A' and not having 'A' and then do an anti_join
library(dplyr)
tmp1 <- df %>%
filter(group %in% group[gene %in% 'A'])
tmp2 <- df %>%
group_by(group) %>%
filter(!'A' %in% gene) %>%
ungroup
anti_join(tmp1, tmp2, by = 'gene') %>%
filter(gene != 'A')
-output
gene group
1 F group1
2 E group2