I have a time series where I need to calculate a counterfactual value based based on a column of historical growth rates. The counterfactual
column is the exact same as the the observed_value
column up until March of 2020, where the values need to be recomputed using historical growth rates from the growth_rate
column. The idea is to fill in all the missing values after Feb 2020, by multiplying the value of the previous month with the historical growth rate. Here is the data
df <- data.frame("month" = c("Jan", "Feb", "March", "April"),
"year" = c(2020, 2020, 2020, 2020),
"observed_value" = c(15,18,22,19),
"growth_rate" = c(0.001, -0.549, 0.4788, -0.11),
"counterfactual" = c(15,18,NA,NA))
So in this example, we fill in the counterfactual value for March 2020 by multiplying the Feb 2020 value * (1 growth rate) in the row we are filling in. In this example, the calculation for the missing value for March 2020 = 18*(1 0.4788) = 26.62. And now we iterate and using the counterfactual March 2020 value we computed to calculate the counterfactual for April 2020. For example April 2020 = 26.62 * 0.89....and so on until we have a full counterfactual column. Note, all values after Feb 2020 in the counterfactual column are NA.
desired_output <- data.frame("month" = c("Jan", "Feb", "March", "April"),
"year" = c(2020, 2020, 2020, 2020),
"observed_value" = c(15,18,22,19),
"growth_rate" = c(0.001, -0.549, 0.4788, -0.11),
"counterfactual" = c(15,18,26.62,23.69))
CodePudding user response:
If, as you observe, all values after a date are NA, you can do this:
miss <- which(is.na(df$counterfactual))
df$counterfactual[miss] <-
tail(df$observed_value[-miss], 1) *
cumprod(1 df$growth_rate[miss])
df
#> month year observed_value growth_rate counterfactual
#> 1 Jan 2020 15 0.0010 15.00000
#> 2 Feb 2020 18 -0.5490 18.00000
#> 3 March 2020 22 0.4788 26.61840
#> 4 April 2020 19 -0.1100 23.69038
If not all NA's are contiguous, this solution does not apply and you must iterate or group by NA's groups.