I want to add a column to a data.table using a date lookup, matching to the lookup row which is greater than or equal to the date in the data.table.
So for example, given this short table of four dates:
dt1<-data.table(Dt=as.Date(rbind("2023-01-01", "2023-01-02", "2023-01-14", "2023-01-15")))
dt1
Dt
1: 2023-01-01
2: 2023-01-02
3: 2023-01-14
4: 2023-01-15
And given this second lookup table of dates:
dtLookup<-data.table(Dt=as.Date(rbind("2023-01-07", "2023-01-14", "2023-01-21")))
dtLookup
Dt
1: 2023-01-07
2: 2023-01-14
3: 2023-01-21
I would want to join somehow and end up with the original dt1 table having an additional column corresponding to the lookup date. So in this example dt1 would look like this:
dt1
Dt LookupDt
1: 2023-01-01 2023-01-07
2: 2023-01-02 2023-01-07
3: 2023-01-14 2023-01-14
4: 2023-01-15 2023-01-21
Is there a way to non-equi join dt1 to dtLookup and set a new dt1 column based on the row found in dtLookup?
CodePudding user response:
Here, we may do this quickly with findInterval
library(data.table)
dt1[, LookupDt := dtLookup$Dt[findInterval(Dt, dtLookup$Dt,
left.open = TRUE) 1]]
-output
> dt1
Dt LookupDt
1: 2023-01-01 2023-01-07
2: 2023-01-02 2023-01-07
3: 2023-01-14 2023-01-14
4: 2023-01-15 2023-01-21
CodePudding user response:
I think findInterval
(as in akrun's answer) is likely the best given the sample data, but if you need a join (e.g., other columns), try this:
dtLookup[, Dt2 := shift(Dt, type = "lag", fill = min(dt1$Dt)-1)]
dtLookup
# Dt Dt2
# <Date> <Date>
# 1: 2023-01-07 2022-12-31
# 2: 2023-01-14 2023-01-07
# 3: 2023-01-21 2023-01-14
dt1[dtLookup, Lookup := i.Dt, on = .(Dt > Dt2, Dt <= Dt)]
dt1
# Dt Lookup
# <Date> <Date>
# 1: 2023-01-01 2023-01-07
# 2: 2023-01-02 2023-01-07
# 3: 2023-01-14 2023-01-14
# 4: 2023-01-15 2023-01-21
This produces a range in dtLookup
that we can use internally.