Assume I have two dataframes
df1 <- data.frame (name = c("Mike", "Paul", "Paul", "Henry"),
age = c(20, 21, 22, 23))
df2 <- data.frame (name = c("Sam", "Paul", "Paul", "Bob"),
age = c(26, 30, 22, 23))
I would like to remove row 3 from df1, because this row is also present in df2
What is the most elegant way to do this in R?
CodePudding user response:
Using setdiff
from dplyr
library(dplyr)
setdiff(df1, df2)
name age
1 Mike 20
2 Paul 21
3 Henry 23
If it is based on subset of column names that are common, use anti_join
anti_join(df1, df2)
In this example, all the columns are common, so by default, it uses by
as the full column names. If we want a subset, specify it in by
anti_join(df1, df2, by = c('name'))