Home > OS >  R data.table join to table containing "from" & "to" row_id's
R data.table join to table containing "from" & "to" row_id's

Time:10-09

I'm sure this is a dumb question...but I've looked around and can't seem to find anything.

Say I have two tables:

a <- data.table(row_id = 1:12)
b <- data.table(start_row = c(1,5,10),end_row = c(3,7,11),value = c("a","b","c"))
> a
    row_id
 1:      1
 2:      2
 3:      3
 4:      4
 5:      5
 6:      6
 7:      7
 8:      8
 9:      9
10:     10
11:     11
12:     12
> b
   start_row end_row value
1:         1       3     a
2:         5       7     b
3:        10      11     c

I'd like to join using the row indices from b to insert the value into a, to produce something like this:

> c
    row_id value
 1:      1     a
 2:      2     a
 3:      3     a
 4:      4  <NA>
 5:      5     b
 6:      6     b
 7:      7     b
 8:      8  <NA>
 9:      9  <NA>
10:     10     c
11:     11     c
12:     12  <NA>

What I'm doing right now is using a for loop, which is slow:

for (i in 1:nrow(s)) {
    a[row_id >= b[i,start_row] & row_id <= b[i,end_row],value := b[i,value]]
}

I've thought about dcasting b into a long form of sort and then joining, but what I really need is someway to generate the sequences between start and end rows before joining. Any suggestions welcome!

CodePudding user response:

We just need a non-equi join

library(data.table)
a[b, value := i.value, on = .(row_id >= start_row, row_id <= end_row)]

-output

> a
    row_id value
 1:      1     a
 2:      2     a
 3:      3     a
 4:      4  <NA>
 5:      5     b
 6:      6     b
 7:      7     b
 8:      8  <NA>
 9:      9  <NA>
10:     10     c
11:     11     c
12:     12  <NA>
  • Related