Home > other >  R - Fill in missing values, with the previous value in the column, times another column, and iterate
R - Fill in missing values, with the previous value in the column, times another column, and iterate

Time:01-23

I have a time series where I need to calculate a counterfactual value based based on a column of historical growth rates. The counterfactual column is the exact same as the the observed_value column up until March of 2020, where the values need to be recomputed using historical growth rates from the growth_rate column. The idea is to fill in all the missing values after Feb 2020, by multiplying the value of the previous month with the historical growth rate. Here is the data

df <- data.frame("month" = c("Jan", "Feb", "March", "April"),
                 "year" = c(2020, 2020, 2020, 2020),
                 "observed_value" = c(15,18,22,19),
                 "growth_rate" = c(0.001, -0.549, 0.4788, -0.11),
                 "counterfactual" = c(15,18,NA,NA))

So in this example, we fill in the counterfactual value for March 2020 by multiplying the Feb 2020 value * (1 growth rate) in the row we are filling in. In this example, the calculation for the missing value for March 2020 = 18*(1 0.4788) = 26.62. And now we iterate and using the counterfactual March 2020 value we computed to calculate the counterfactual for April 2020. For example April 2020 = 26.62 * 0.89....and so on until we have a full counterfactual column. Note, all values after Feb 2020 in the counterfactual column are NA.

desired_output <- data.frame("month" = c("Jan", "Feb", "March", "April"),
                          "year" = c(2020, 2020, 2020, 2020),
                          "observed_value" = c(15,18,22,19),
                          "growth_rate" = c(0.001, -0.549, 0.4788, -0.11),
                          "counterfactual" = c(15,18,26.62,23.69))

CodePudding user response:

If, as you observe, all values after a date are NA, you can do this:


miss <- which(is.na(df$counterfactual))

df$counterfactual[miss] <- 
  tail(df$observed_value[-miss], 1) * 
  cumprod(1   df$growth_rate[miss])

df
#>   month year observed_value growth_rate counterfactual
#> 1   Jan 2020             15      0.0010       15.00000
#> 2   Feb 2020             18     -0.5490       18.00000
#> 3 March 2020             22      0.4788       26.61840
#> 4 April 2020             19     -0.1100       23.69038

If not all NA's are contiguous, this solution does not apply and you must iterate or group by NA's groups.

  • Related