Home > Software engineering >  How to filter based on multiple column values from another data frame?
How to filter based on multiple column values from another data frame?

Time:12-16

I have a data frame that I am trying to filter based on 2 ID columns from the first data frame. Here's data frame 1:

id_1 id_2
A 1
B 2
C 3

And data frame 2:

id_1 id_2
A 1
B 2
A 2

If I do something like...

df2_filtered <- df2 %>%
filter(id_1 %in% df1$id_1 &
        id_2 %in% df1$id_2)

Then I get back the entirety of df2, which is not what I want. Even though "A" is in id_1 of df1, and "2" is in id_2 of df1, there is no row that has both. How do I fix it so that I only get back

id_1 id_2
A 1
B 2

CodePudding user response:

If you aren't bound to a dplyr solution, then data.table has a nice option:

library(data.table)
df1 = as.data.table(df1)
df2 = as.data.table(df2)
fintersect(df1, df2)
   id_1 id_2
1:    A    1
2:    B    2

CodePudding user response:

You can use dplyr::inner_join:

inner_join(data1, data2)

# Joining, by = c("id_1", "id_2")
# id_1 id_2
# 1    A    1
# 2    B    2
  • Related