Home > Software engineering >  Add new column to data.table using conditional lookup from another data.table using R
Add new column to data.table using conditional lookup from another data.table using R

Time:10-08

I have the data.table dt as follows -

dt = structure(list(date = structure(c(18904L, 18904L, 18904L), class = c("IDate", 
"Date")), exp_date = structure(c(18915L, 19013L, 19377L), class = c("IDate", 
"Date")), days_remaining = c(11, 109, 473), year_remaining = c(0.0301369863013699, 
0.298630136986301, 1.2958904109589)), row.names = c(NA, -3L), class = c("data.table", 
"data.frame"))

    date   exp_date days_remaining year_remaining
    1: 2021-10-04 2021-10-15             11     0.03013699
    2: 2021-10-04 2022-01-21            109     0.29863014
    3: 2021-10-04 2023-01-20            473     1.29589041

To table dt, I want to add a rates column from the interest rates table called rates_dt Here is the interest rate data.table rates_dt

rates_dt = structure(list(index = structure(c(18900, 18901, 18904, 18905, 
18906), tzone = "UTC", tclass = "Date", class = "Date"), `1_MO` = c(0.07, 
0.08, 0.09, 0.1, 0.06), `2_MO` = c(0.05, 0.04, 0.04, 0.04, 0.04
), `3_MO` = c(0.04, 0.04, 0.04, 0.04, 0.04), `6_MO` = c(0.05, 
0.05, 0.06, 0.06, 0.06), `1_YR` = c(0.09, 0.09, 0.09, 0.09, 0.1
), `2_YR` = c(0.28, 0.27, 0.27, 0.28, 0.3), `3_YR` = c(0.53, 
0.49, 0.52, 0.54, 0.55)), class = c("data.table", "data.frame"
), row.names = c(NA, -5L))

        index 1_MO 2_MO 3_MO 6_MO 1_YR 2_YR 3_YR
1: 2021-09-30 0.07 0.05 0.04 0.05 0.09 0.28 0.53
2: 2021-10-01 0.08 0.04 0.04 0.05 0.09 0.27 0.49
3: 2021-10-04 0.09 0.04 0.04 0.06 0.09 0.27 0.52
4: 2021-10-05 0.10 0.04 0.04 0.06 0.09 0.28 0.54
5: 2021-10-06 0.06 0.04 0.04 0.06 0.10 0.30 0.55

What I am trying to do is, if days_remaining is less than 30 days in dt table, then add a rates column looking up at column 1_MO in rates_dt table. And similarly, if days_remaining is 473 (since more than 365 days), then take the rates from column 1_YR in rates_dt table. The output should look something like this -

             date   exp_date days_remaining year_remaining rates
    1: 2021-10-04 2021-10-15             11     0.03013699 0.09
    2: 2021-10-04 2022-01-21            109     0.29863014 0.04
    3: 2021-10-04 2023-01-20            473     1.29589041 0.09

In actual data, the column date will have all the dates for the past year. So the solution I am looking for should use group by date.

I have tried to melt up the rates_dt and then do something as follows, but could not reach the desired solution -

dt[melted_rates_dt, on = c("date" = "index"), rates := value]

I will appreciate it if someone can provide any pointer on how to solve this problem.

Thanks!

CodePudding user response:

I think this is a good case for findInterval and joining on the melted rates.

library(lubridate)
periodtxt <- c("1_MO", "2_MO", "3_MO", "6_MO", "1_YR", "2_YR", "3_YR")
periodn <- c(1, 2, 3, 6, 12, 24, 36)
dt[, period := mapply(function(dt, exp) periodtxt[findInterval(exp, dt %m % months(c(0, periodn)))],
                      date, exp_date) ]
#          date   exp_date days_remaining year_remaining period
#        <IDat>     <IDat>          <num>          <num> <char>
# 1: 2021-10-04 2021-10-15             11     0.03013699   1_MO
# 2: 2021-10-04 2022-01-21            109     0.29863014   6_MO
# 3: 2021-10-04 2023-01-20            473     1.29589041   2_YR

From here,

melted_rates_dt <- melt(rates_dt, id.vars = "index", variable.name = "period", value.name = "rate")
dt[melted_rates_dt, rate := rate, on = .(date == index, period)]
#          date   exp_date days_remaining year_remaining period  rate
#        <IDat>     <IDat>          <num>          <num> <char> <num>
# 1: 2021-10-04 2021-10-15             11     0.03013699   1_MO  0.09
# 2: 2021-10-04 2022-01-21            109     0.29863014   6_MO  0.06
# 3: 2021-10-04 2023-01-20            473     1.29589041   2_YR  0.27

(I am still not sure how you got 0.09 out of the 2_YR data, so I'm offering it as-is until I understand better your third row of expected output.)

CodePudding user response:

Here is what I suggest:

REPREX

library(data.table)

# Join between dt and rates_dt
z <- rates_dt[dt, on = .(index = date)][, c(1,9,10,11,2:8)]


# fill 'rates' column with multiple nested 'fifelse' (i.e. dedicated ifelse for data.table) and delete unnecessary columns
z[, `:=` (rates = fifelse (days_remaining <= 30, `1_MO`, 
                           fifelse(days_remaining > 30 & days_remaining <= 60, `2_MO`,
                                   fifelse(days_remaining > 60 & days_remaining <= 365, `3_MO`,
                                           fifelse(days_remaining > 365 & days_remaining <= 730, `1_YR`, `3_YR`)))),
          `1_MO` = NULL, `2_MO` = NULL, `3_MO` = NULL, `6_MO` = NULL, `1_YR` = NULL, `2_YR` = NULL, `3_YR` = NULL)]

Created on 2021-10-07 by the reprex package (v2.0.1)

The output:

z
#>         index   exp_date days_remaining year_remaining rates
#> 1: 2021-10-04 2021-10-15             11     0.03013699  0.09
#> 2: 2021-10-04 2022-01-21            109     0.29863014  0.04
#> 3: 2021-10-04 2023-01-20            473     1.29589041  0.09

Created on 2021-10-07 by the reprex package (v2.0.1)

CodePudding user response:

I solved it as follows -

melted_rates_dt = melt(rates_dt, id.vars = "index")
dt[, duration := fcase(days_remaining < 30, "1_MO",
                            days_remaining < 61, "2_MO",
                            days_remaining < 91, "3_MO",
                            days_remaining < 181, "6_MO",
                            days_remaining < 365, "1_YR",
                            days_remaining < 730, "2_YR",
                            days_remaining < 1095, "3_YR")]
    
dt[melted_rates_dt, on = c("duration" = "variable", 
                           "date" = "index"), rates := value]
  • Related