Home > database >  Linearly interpolate values based on another dataframe in R
Linearly interpolate values based on another dataframe in R

Time:07-21

I have two dataframes:

cash_flows

    coupon_date
1   2026-07-31   
2   2026-01-31    
3   2025-07-31     
4   2025-01-31    
5   2024-07-31    
6   2024-01-31    

discount_rates

  date        df 
1 2023-07-25  0.9806698 
2 2024-07-25  0.9737091 
3 2025-07-25  0.9432057 
4 2026-07-27  0.9109546 
5 2027-07-26  0.8780984 

I would like to create a new column in cash flows with linearly interpolated values from the dr column in discount rates.

The desired output is therefore:

cash_flows

    coupon_dates  new_column
1   2026-07-31    0.910594
2   2026-01-31    0.926509
3   2025-07-31    0.942678
4   2025-01-31    0.957831
5   2024-07-31    0.973208
6   2024-01-31    0.977056

I have downloaded the forecast package but still not entirely sure how to achieve this. Any help is appreciated.

Code to replicate dataframes:

cash_flows <- data.frame(coupon_date = as.Date(c("2026-07-31","2026-01-31","2025-07-31", "2024-07-31","2024-01-31")))
drdr <- data.frame(date = as.Date(c("2023-07-25","2024-07-25","2025-07-25", "2026-07-27","2027-07-26")), df = c(0.9806698, 0.9737091, 0.9432057, 0.9109546, 0.8780984))

CodePudding user response:

Here is an option. We fit a linear model df ~ date to drdr. Then use predict to estimate df for dates in cash_flows. Note that these numbers don't exactly match your expected input.

fit <- lm(df ~ date, data = drdr)
cash_flows %>%
    mutate(new_column = predict(fit, newdata = data.frame(date = coupon_date)))
#  coupon_date new_column
#1  2026-07-31  0.9101729
#2  2026-01-31  0.9234352
#3  2025-07-31  0.9369172
#4  2024-07-31  0.9636614
#5  2024-01-31  0.9769969

CodePudding user response:

?approx can be used for linear interpolation.
Pass the source x and y variables from the drdr data, and specify you want to know the interpolated output y values based on the cash_flows$coupon_date x values (xout=):

cash_flows$new_column <- approx(x=drdr$date, y=drdr$df, xout=cash_flows$coupon_date)$y
cash_flows
#  coupon_date new_column
#1  2026-07-31  0.9105935
#2  2026-01-31  0.9265089
#3  2025-07-31  0.9426784
#4  2024-07-31  0.9732077
#5  2024-01-31  0.9770563

Matches your expected output exactly (with the exception of one row in cash_flows which isn't in your code to replicate the data.frames, but is shown earlier in the question).

  • Related