Home > Software engineering >  r - for loop to compare 2 dataframes all by all rows
r - for loop to compare 2 dataframes all by all rows

Time:09-17

I'm trying to compare 2 columns from dataframe d1 with 2 columns from dataframe d2, row by row. To illustrate the issue I created dummy datasets:

d1 <- data.frame(
  a = c(1,2,3),
  b = c(4,5,6)
)

d2 <- data.frame(
  a = c(2,0,2),
  b = c(5,5,6)
)

Ideally, I would like to flag all rows in d1 for which I can find a match in at least one row of d2, so the wanted result would be:

data.frame(
  a = c(1,2,3),
  b = c(4,5,6),
  flag = c(0,1,0)
)

this is what I tried:

for (i in 1:nrow(d1)) {
  for (j in 1:nrow(d2)) {
    test[i,j] = ifelse(d1$a[i] == d2$a[j] & d1$b[i] == d2$b[j], 1, 0)
  }
}

a for loop would be the best solution

CodePudding user response:

You are basically looking for a kind of join. For this special task, flagging only, data.table is very neat here with the in-place join and update

library(data.table)
d1 <- data.table(
  a = c(1,2,3),
  b = c(4,5,6)
)

d2 <- data.table(
  a = c(2,0,2),
  b = c(5,5,6)
)


# assign 1 to each match in place
d1[d2,
   on = .(a, b),
   flag := 1]
d1
#>    a b flag
#> 1: 1 4   NA
#> 2: 2 5    1
#> 3: 3 6   NA

# convert NAs to zeros
d1[is.na(flag), flag := 0]
d1
#>    a b flag
#> 1: 1 4    0
#> 2: 2 5    1
#> 3: 3 6    0

CodePudding user response:

You can use match for this:

d1$flag <- match(paste0(d1$a, d1$b), paste0(d2$a, d2$b), nomatch = 0)

Edit: @mnist raises a fair point. Here is an alternative that works more securely for NA rows:

d1$flag <- as.numeric(duplicated(rbind(d2, d1)))[-seq_len(nrow(d1))]
  • Related