Home > database >  R data.table add column from a lookup using a date non-equi join with a lookup data.table
R data.table add column from a lookup using a date non-equi join with a lookup data.table

Time:01-19

I want to add a column to a data.table using a date lookup, matching to the lookup row which is greater than or equal to the date in the data.table.

So for example, given this short table of four dates:

dt1<-data.table(Dt=as.Date(rbind("2023-01-01", "2023-01-02", "2023-01-14", "2023-01-15")))
dt1
           Dt
1: 2023-01-01
2: 2023-01-02
3: 2023-01-14
4: 2023-01-15

And given this second lookup table of dates:

dtLookup<-data.table(Dt=as.Date(rbind("2023-01-07", "2023-01-14", "2023-01-21")))
dtLookup
           Dt
1: 2023-01-07
2: 2023-01-14
3: 2023-01-21

I would want to join somehow and end up with the original dt1 table having an additional column corresponding to the lookup date. So in this example dt1 would look like this:

dt1
           Dt   LookupDt
1: 2023-01-01 2023-01-07
2: 2023-01-02 2023-01-07
3: 2023-01-14 2023-01-14
4: 2023-01-15 2023-01-21

Is there a way to non-equi join dt1 to dtLookup and set a new dt1 column based on the row found in dtLookup?

CodePudding user response:

Here, we may do this quickly with findInterval

library(data.table)
dt1[, LookupDt := dtLookup$Dt[findInterval(Dt, dtLookup$Dt, 
               left.open = TRUE) 1]]

-output

> dt1
           Dt   LookupDt
1: 2023-01-01 2023-01-07
2: 2023-01-02 2023-01-07
3: 2023-01-14 2023-01-14
4: 2023-01-15 2023-01-21

CodePudding user response:

I think findInterval (as in akrun's answer) is likely the best given the sample data, but if you need a join (e.g., other columns), try this:

dtLookup[, Dt2 := shift(Dt, type = "lag", fill = min(dt1$Dt)-1)]
dtLookup
#            Dt        Dt2
#        <Date>     <Date>
# 1: 2023-01-07 2022-12-31
# 2: 2023-01-14 2023-01-07
# 3: 2023-01-21 2023-01-14
dt1[dtLookup, Lookup := i.Dt, on = .(Dt > Dt2, Dt <= Dt)]
dt1
#            Dt     Lookup
#        <Date>     <Date>
# 1: 2023-01-01 2023-01-07
# 2: 2023-01-02 2023-01-07
# 3: 2023-01-14 2023-01-14
# 4: 2023-01-15 2023-01-21

This produces a range in dtLookup that we can use internally.

  • Related