Home > Net >  Rolling join two data.tables with date in R
Rolling join two data.tables with date in R

Time:06-01

I have two data tables called dt1 (which has date column with all weekdays) and dt2 (which has date column with some weekdays and some weekends). I want to join dt2 table into dt1 table so that -

  1. The dates in dt2 get aligned with dates in dt1, we will call the new column dates_2.
  2. In case dt2 has a date (Saturday or Sunday), which is not there in dt1, then move that date to the next date in dt1.
  3. The dates where dt2 doesn't have a corresponding date in dt1, then fill NA in dates_2.
library(data.table)
date = seq(from = as.Date("2019-12-31"), to=as.Date("2020-12-31"), by=paste(1,"days"))
date <- date[!weekdays(date) %in% c('Saturday','Sunday')]
  
dt1 = data.table(group = "A", date1 = as.Date(date))
dt2 = structure(list(group = c("A", "A", "A", "A", "A", "A", "A", "A"
), date2 = structure(c(18286, 18391, 18468, 18566, 18657, 18741, 
18832, 18923), class = "Date")), row.names = c(NA, -8L), class = c("data.table", "data.frame"))

I have tried the following code -

dt3 = dt1[dt2, on = c("group" = "group"), date_2 := date2, roll = T]
dt3[]

But the dt3 does not roll properly.

Please guide me on what I am doing wrong.

Here is the expected result. For brevity, I am showing the only subset of the expected result

   group      date1     date_2
...
...
...
1:     A 2020-05-06     NA
2:     A 2020-05-07     NA
3:     A 2020-05-08     NA
4:     A 2020-05-11     2020-05-09
5:     A 2020-05-12     NA
6:     A 2020-05-13     NA
7:     A 2020-05-14     NA
...
...
...
and the series goes on

Since the date "2020-05-09" is not there in dt1, the date in dt2 got rolled over to the next date. Rest all the dates are NA.

CodePudding user response:

We may use non-equi join

dt1[dt2, date_2 := date2, on = .(group, date1 > date2), mult = "first"]
  • Related