I have the data.table dt
as follows -
dt = structure(list(date = structure(c(18904L, 18904L, 18904L), class = c("IDate",
"Date")), exp_date = structure(c(18915L, 19013L, 19377L), class = c("IDate",
"Date")), days_remaining = c(11, 109, 473), year_remaining = c(0.0301369863013699,
0.298630136986301, 1.2958904109589)), row.names = c(NA, -3L), class = c("data.table",
"data.frame"))
date exp_date days_remaining year_remaining
1: 2021-10-04 2021-10-15 11 0.03013699
2: 2021-10-04 2022-01-21 109 0.29863014
3: 2021-10-04 2023-01-20 473 1.29589041
To table dt
, I want to add a rates column from the interest rates table called rates_dt
Here is the interest rate data.table rates_dt
rates_dt = structure(list(index = structure(c(18900, 18901, 18904, 18905,
18906), tzone = "UTC", tclass = "Date", class = "Date"), `1_MO` = c(0.07,
0.08, 0.09, 0.1, 0.06), `2_MO` = c(0.05, 0.04, 0.04, 0.04, 0.04
), `3_MO` = c(0.04, 0.04, 0.04, 0.04, 0.04), `6_MO` = c(0.05,
0.05, 0.06, 0.06, 0.06), `1_YR` = c(0.09, 0.09, 0.09, 0.09, 0.1
), `2_YR` = c(0.28, 0.27, 0.27, 0.28, 0.3), `3_YR` = c(0.53,
0.49, 0.52, 0.54, 0.55)), class = c("data.table", "data.frame"
), row.names = c(NA, -5L))
index 1_MO 2_MO 3_MO 6_MO 1_YR 2_YR 3_YR
1: 2021-09-30 0.07 0.05 0.04 0.05 0.09 0.28 0.53
2: 2021-10-01 0.08 0.04 0.04 0.05 0.09 0.27 0.49
3: 2021-10-04 0.09 0.04 0.04 0.06 0.09 0.27 0.52
4: 2021-10-05 0.10 0.04 0.04 0.06 0.09 0.28 0.54
5: 2021-10-06 0.06 0.04 0.04 0.06 0.10 0.30 0.55
What I am trying to do is, if days_remaining
is less than 30 days in dt
table, then add a rates
column looking up at column 1_MO
in rates_dt
table. And similarly, if days_remaining is 473 (since more than 365 days), then take the rates from column 1_YR
in rates_dt
table.
The output should look something like this -
date exp_date days_remaining year_remaining rates
1: 2021-10-04 2021-10-15 11 0.03013699 0.09
2: 2021-10-04 2022-01-21 109 0.29863014 0.04
3: 2021-10-04 2023-01-20 473 1.29589041 0.09
In actual data, the column date
will have all the dates for the past year. So the solution I am looking for should use group by date
.
I have tried to melt up the rates_dt
and then do something as follows, but could not reach the desired solution -
dt[melted_rates_dt, on = c("date" = "index"), rates := value]
I will appreciate it if someone can provide any pointer on how to solve this problem.
Thanks!
CodePudding user response:
I think this is a good case for findInterval
and joining on the melted rates.
library(lubridate)
periodtxt <- c("1_MO", "2_MO", "3_MO", "6_MO", "1_YR", "2_YR", "3_YR")
periodn <- c(1, 2, 3, 6, 12, 24, 36)
dt[, period := mapply(function(dt, exp) periodtxt[findInterval(exp, dt %m % months(c(0, periodn)))],
date, exp_date) ]
# date exp_date days_remaining year_remaining period
# <IDat> <IDat> <num> <num> <char>
# 1: 2021-10-04 2021-10-15 11 0.03013699 1_MO
# 2: 2021-10-04 2022-01-21 109 0.29863014 6_MO
# 3: 2021-10-04 2023-01-20 473 1.29589041 2_YR
From here,
melted_rates_dt <- melt(rates_dt, id.vars = "index", variable.name = "period", value.name = "rate")
dt[melted_rates_dt, rate := rate, on = .(date == index, period)]
# date exp_date days_remaining year_remaining period rate
# <IDat> <IDat> <num> <num> <char> <num>
# 1: 2021-10-04 2021-10-15 11 0.03013699 1_MO 0.09
# 2: 2021-10-04 2022-01-21 109 0.29863014 6_MO 0.06
# 3: 2021-10-04 2023-01-20 473 1.29589041 2_YR 0.27
(I am still not sure how you got 0.09
out of the 2_YR
data, so I'm offering it as-is until I understand better your third row of expected output.)
CodePudding user response:
Here is what I suggest:
REPREX
library(data.table)
# Join between dt and rates_dt
z <- rates_dt[dt, on = .(index = date)][, c(1,9,10,11,2:8)]
# fill 'rates' column with multiple nested 'fifelse' (i.e. dedicated ifelse for data.table) and delete unnecessary columns
z[, `:=` (rates = fifelse (days_remaining <= 30, `1_MO`,
fifelse(days_remaining > 30 & days_remaining <= 60, `2_MO`,
fifelse(days_remaining > 60 & days_remaining <= 365, `3_MO`,
fifelse(days_remaining > 365 & days_remaining <= 730, `1_YR`, `3_YR`)))),
`1_MO` = NULL, `2_MO` = NULL, `3_MO` = NULL, `6_MO` = NULL, `1_YR` = NULL, `2_YR` = NULL, `3_YR` = NULL)]
Created on 2021-10-07 by the reprex package (v2.0.1)
The output:
z
#> index exp_date days_remaining year_remaining rates
#> 1: 2021-10-04 2021-10-15 11 0.03013699 0.09
#> 2: 2021-10-04 2022-01-21 109 0.29863014 0.04
#> 3: 2021-10-04 2023-01-20 473 1.29589041 0.09
Created on 2021-10-07 by the reprex package (v2.0.1)
CodePudding user response:
I solved it as follows -
melted_rates_dt = melt(rates_dt, id.vars = "index")
dt[, duration := fcase(days_remaining < 30, "1_MO",
days_remaining < 61, "2_MO",
days_remaining < 91, "3_MO",
days_remaining < 181, "6_MO",
days_remaining < 365, "1_YR",
days_remaining < 730, "2_YR",
days_remaining < 1095, "3_YR")]
dt[melted_rates_dt, on = c("duration" = "variable",
"date" = "index"), rates := value]