how to remove rows in df1 present in df2?-CodePudding

Assume I have two dataframes

df1 <- data.frame (name = c("Mike", "Paul", "Paul", "Henry"),
                   age = c(20, 21, 22, 23))

df2 <- data.frame (name = c("Sam", "Paul", "Paul", "Bob"),
                   age = c(26, 30, 22, 23))

I would like to remove row 3 from df1, because this row is also present in df2

What is the most elegant way to do this in R?

CodePudding user response：

Using setdiff from dplyr

library(dplyr)
setdiff(df1, df2)
   name age
1  Mike  20
2  Paul  21
3 Henry  23

If it is based on subset of column names that are common, use anti_join

anti_join(df1, df2)

In this example, all the columns are common, so by default, it uses by as the full column names. If we want a subset, specify it in by

anti_join(df1, df2, by = c('name'))