I'm trying to compare 2 columns from dataframe d1
with 2 columns from dataframe d2
, row by row. To illustrate the issue I created dummy datasets:
d1 <- data.frame(
a = c(1,2,3),
b = c(4,5,6)
)
d2 <- data.frame(
a = c(2,0,2),
b = c(5,5,6)
)
Ideally, I would like to flag all rows in d1
for which I can find a match in at least one row of d2
, so the wanted result would be:
data.frame(
a = c(1,2,3),
b = c(4,5,6),
flag = c(0,1,0)
)
this is what I tried:
for (i in 1:nrow(d1)) {
for (j in 1:nrow(d2)) {
test[i,j] = ifelse(d1$a[i] == d2$a[j] & d1$b[i] == d2$b[j], 1, 0)
}
}
a for loop would be the best solution
CodePudding user response:
You are basically looking for a kind of join. For this special task, flagging only, data.table
is very neat here with the in-place join and update
library(data.table)
d1 <- data.table(
a = c(1,2,3),
b = c(4,5,6)
)
d2 <- data.table(
a = c(2,0,2),
b = c(5,5,6)
)
# assign 1 to each match in place
d1[d2,
on = .(a, b),
flag := 1]
d1
#> a b flag
#> 1: 1 4 NA
#> 2: 2 5 1
#> 3: 3 6 NA
# convert NAs to zeros
d1[is.na(flag), flag := 0]
d1
#> a b flag
#> 1: 1 4 0
#> 2: 2 5 1
#> 3: 3 6 0
CodePudding user response:
You can use match
for this:
d1$flag <- match(paste0(d1$a, d1$b), paste0(d2$a, d2$b), nomatch = 0)
Edit: @mnist raises a fair point. Here is an alternative that works more securely for NA rows:
d1$flag <- as.numeric(duplicated(rbind(d2, d1)))[-seq_len(nrow(d1))]