I have data as follows:
library(data.table)
datA <- fread("A B C
1 1 1
2 2 2")
datB <- fread("A B C
1 1 1
2 2 2
3 3 3")
I want to figure out which rows are unique (which is the one with 3 3 3
, because all others occur more often).
I tried:
dat <- rbind(datA, datB)
unique(dat)
!duplicated(dat)
I also tried
setDT(dat)[,if(.N ==1) .SD,]
But that is NULL
.
How should I do this?
CodePudding user response:
You can use fsetdiff
:
rbind.data.frame(fsetdiff(datA, datB, all = TRUE),
fsetdiff(datB, datA, all = TRUE))
In general, this is called an anti_join
:
library(dplyr)
bind_rows(anti_join(datA, datB),
anti_join(datB, datA))
A B C
1: 4 4 4
2: 3 3 3
Data: I added a row in datA
to show how to keep rows from both data sets (a simple anti-join does not work otherwise):
library(data.table)
datA <- fread("A B C
1 1 1
2 2 2
4 4 4")
datB <- fread("A B C
1 1 1
2 2 2
3 3 3")
CodePudding user response:
One possible solution
library(data.table)
datB[!datA, on=c("A", "B", "C")]
A B C
<int> <int> <int>
1: 3 3 3
Or (if you are interested in the symmetric difference)
funion(fsetdiff(datB, datA), fsetdiff(datA, datB))
A B C
<int> <int> <int>
1: 3 3 3