Home > Software design >  How to find common rows between two dataframe in R and remove them
How to find common rows between two dataframe in R and remove them

Time:09-29

I have two dataframes with different number of rows.

df1 is longer than df2, they both share several common rows.

My example

df1 <- data.frame(name1 = "a", "b", "c",
                 name2 = "a1","b1","c1",
                 name3 = "a2","b2","c2") 
df1
  name1 name2 name3
1     a    a1    a2
2     b    b1    b2
3     c    c1    c2

df2 <- data.frame(name1 = c("a", "b", "m"),
                  name2 = c("a3","b3", "m1"),
                  name3 = c("a4", "b4", "m2"))
df2
  name1 name2  name3
1     a    a3     a4
2     b    b3     b4
3     m    m1     m2

I would like to exclude the common rows in two dataframe and only keep one row of df2 in this case using tidyverse. Any suggestion for this?

Desired output

name1 name2 name3
    m    m1    m2

CodePudding user response:

anti_join(df1, df2, by = "name1")

  name1 name2 name3
1     c    c1    c2


anti_join(df2, df1, by = "name1")

  name1 name2 name3
1     m    m1    m2

CodePudding user response:

We may use anti_join (originally posted as comments way before the other answer was posted)

library(dplyr)
anti_join(df1, df2, by = c("name1"))

data

df1 <- structure(list(name1 = c("a", "b", "c"), name2 = c("a1", "b1", 
"c1"), name3 = c("a2", "b2", "c2")), class = "data.frame", row.names = c(NA, 
-3L))

df2 <- structure(list(name1 = c("a", "b"), name2 = c("a3", "b3")), class = "data.frame", row.names = c(NA, 
-2L))
  • Related