How to add column named "ref" with value "1" in table DT1, for existing rows in table DT2, using join on 2 columns (id1=id3 and id2=id4)?
DT1 <- data.table(
id1 = c('A','B','C', 'D'),
id2 = c(1,2,3,4)
)
DT2 <- data.table(
id3 = c('B','D','F'),
id4 = c(2,4,5)
)
DT1:
id1 id2
A 1,00000
B 2,00000
C 3,00000
D 4,00000
DT2:
id3 id4
B 2,00000
D 4,00000
Expected result:
id1 id2 ref
A 1,00000 0
B 2,00000 1
C 3,00000 0
D 4,00000 1
I started with the following code, which does not filter as I want:
result <- DT1[DT2, on = c(id1='id3', id2='id4')]
CodePudding user response:
Add a new column with join, then convert NAs to zero:
DT1[ DT2, on = c(id1 = 'id3', id2 = 'id4'), ref := 1 ][ is.na(ref), ref := 0 ]
DT1
# id1 id2 ref
# 1: A 1 0
# 2: B 2 1
# 3: C 3 0
# 4: D 4 1
CodePudding user response:
With %in%
:
DT1[, ref := (id1 %in% DT2$id3)]
id1 id2 ref
1: A 1 0
2: B 2 1
3: C 3 0
4: D 4 1
Use paste
for multiple columns:
DT1[, ref := (paste(id1, id2) %in% paste(DT2$id3, DT2$id4))]
CodePudding user response:
You may join the two data.table
's with merge
-
library(data.table)
#Add a new column `ref` in `DT2`.
DT2[, ref := 1]
#left join the two data.tables
result <- merge(DT1, DT2, by.x = c('id1', 'id2'),
by.y = c('id3', 'id4'), all.x = TRUE)
#Replace NAs with 0
result[is.na(ref), ref := 0]
result
# id1 id2 ref
#1: A 1 0
#2: B 2 1
#3: C 3 0
#4: D 4 1