Home > OS >  Find rows that occur only once, in two datasets
Find rows that occur only once, in two datasets

Time:11-09

I have data as follows:

library(data.table)
datA <- fread("A B C
               1 1 1
               2 2 2")

datB <- fread("A B C
               1 1 1
               2 2 2
               3 3 3")

I want to figure out which rows are unique (which is the one with 3 3 3, because all others occur more often).

I tried:

dat <- rbind(datA, datB)
unique(dat)
!duplicated(dat)

I also tried

setDT(dat)[,if(.N ==1) .SD,]

But that is NULL.

How should I do this?

CodePudding user response:

You can use fsetdiff:

rbind.data.frame(fsetdiff(datA, datB, all = TRUE),
                 fsetdiff(datB, datA, all = TRUE))

In general, this is called an anti_join:

library(dplyr)
bind_rows(anti_join(datA, datB),
          anti_join(datB, datA))

   A B C
1: 4 4 4
2: 3 3 3

Data: I added a row in datA to show how to keep rows from both data sets (a simple anti-join does not work otherwise):

library(data.table)
datA <- fread("A B C
               1 1 1
               2 2 2
               4 4 4")

datB <- fread("A B C
               1 1 1
               2 2 2
               3 3 3")

CodePudding user response:

One possible solution

library(data.table)

datB[!datA, on=c("A", "B", "C")]

       A     B     C
   <int> <int> <int>
1:     3     3     3

Or (if you are interested in the symmetric difference)

funion(fsetdiff(datB, datA), fsetdiff(datA, datB))

       A     B     C
   <int> <int> <int>
1:     3     3     3
  • Related