Add column for existing rows in other tables with datatable-CodePudding

How to add column named "ref" with value "1" in table DT1, for existing rows in table DT2, using join on 2 columns (id1=id3 and id2=id4)?

DT1 <- data.table(
  id1 = c('A','B','C', 'D'),
  id2 = c(1,2,3,4)
)

DT2 <- data.table(
  id3 = c('B','D','F'),
  id4 = c(2,4,5)
)

DT1:

id1  id2     
A   1,00000
B   2,00000
C   3,00000
D   4,00000

DT2:

id3  id4     
B   2,00000
D   4,00000

Expected result:

id1  id2      ref
A   1,00000   0
B   2,00000   1
C   3,00000   0
D   4,00000   1

I started with the following code, which does not filter as I want:

result <- DT1[DT2, on = c(id1='id3', id2='id4')]

CodePudding user response：

Add a new column with join, then convert NAs to zero:

DT1[ DT2, on = c(id1 = 'id3', id2 = 'id4'), ref := 1 ][ is.na(ref), ref := 0 ]

DT1
#    id1 id2 ref
# 1:   A   1   0
# 2:   B   2   1
# 3:   C   3   0
# 4:   D   4   1

CodePudding user response：

With %in%:

DT1[, ref :=  (id1 %in% DT2$id3)]

   id1 id2 ref
1:   A   1   0
2:   B   2   1
3:   C   3   0
4:   D   4   1

Use paste for multiple columns:

DT1[, ref :=  (paste(id1, id2) %in% paste(DT2$id3, DT2$id4))]

CodePudding user response：

You may join the two data.table's with merge -

library(data.table)

#Add a new column `ref` in `DT2`. 
DT2[, ref := 1]
#left join the two data.tables
result <- merge(DT1, DT2, by.x = c('id1', 'id2'), 
                          by.y = c('id3', 'id4'), all.x = TRUE)

#Replace NAs with 0
result[is.na(ref), ref := 0]

result

#   id1 id2 ref
#1:   A   1   0
#2:   B   2   1
#3:   C   3   0
#4:   D   4   1