Home > Back-end >  Remove rows of a data frame from another dataframe but keep duplicated in R
Remove rows of a data frame from another dataframe but keep duplicated in R

Time:04-14

I'm working in R and I have two dataframes, one is the base dataframe, and another has the rows that i need to remove from the base one. But I can't use setdiff function, because it remove duplicated rows. Here goes an example:

a <- data.frame(var1 = c(1, NA, 2, 2, 3, 4, 5),
                var2 = c(1, 7, 2, 2, 3, 4, 5))

b <- data.frame(id = c(2, 4),
                numero = c(2, 4))

Ane the result must be:

id numero
1 1
NA 7
2 2
3 3
5 5

It must be a efficient algorithm too because the base dataframe has 3 millions rows with 26 collumns

CodePudding user response:

We may need to create a sequence column before joining

library(data.table)
setDT(a)[, rn := rowid(var1, var2)][!setDT(b)[, 
    rn:= rowid(id, numero)], on = .(var1 = id, var2 = numero, rn)][, 
     rn := NULL][]

-output

   var1  var2
   <num> <num>
1:     1     1
2:    NA     7
3:     2     2
4:     3     3
5:     5     5
  •  Tags:  
  • r
  • Related