Home > Net >  How to extrapolate values over dates using R
How to extrapolate values over dates using R

Time:12-20

I have data for the first 7 months of a year, and would like to linearly extrapolate the data for the last 5 months. I have tried using the approxExtrap function from the Hmisc package but am not sure how to employ it, particularly what to specify for xout. I'm open to any solution (dplyr-based would be ideal if possible). Thank you for your time.

Here is a sample of the data:


df <- tibble(pop = c(18968, 18956, 18946, 18934, 18923, 18912, 18901, NA, NA, NA, NA, NA),
                       date = c("2020-01-01", "2020-02-01", "2020-03-01", 
                                "2020-04-01", "2020-05-01", "2020-06-01", 
                                "2020-07-01", "2020-08-01", "2020-09-01", 
                                "2020-10-01", "2020-11-01", "2020-12-01"))
df$date <- lubridate::as_date(df$date)

CodePudding user response:

If you want linear interpolation, lm will create a fitted object and you can predict from it.

na is the new data passed on to predict.lm as argument newdata.

df <- data.frame(pop = c(18968, 18956, 18946, 18934, 18923, 18912, 18901, NA, NA, NA, NA, NA),
                 date = c("2020-01-01", "2020-02-01", "2020-03-01", 
                          "2020-04-01", "2020-05-01", "2020-06-01", 
                          "2020-07-01", "2020-08-01", "2020-09-01", 
                          "2020-10-01", "2020-11-01", "2020-12-01"))
df$date <- lubridate::as_date(df$date)

fit <- lm(pop ~ date, df)
na <- df[is.na(df$pop), "date", drop = FALSE]
newpop <- predict(fit, newdata = na)
na <- cbind(na, pop = newpop)
na
#>          date      pop
#> 8  2020-08-01 18889.45
#> 9  2020-09-01 18878.06
#> 10 2020-10-01 18867.03
#> 11 2020-11-01 18855.64
#> 12 2020-12-01 18844.61

plot(pop ~ date, df, ylim = c(18800, 19000), pch = 19)
points(pop ~ date, na, col = "red", pch = 19)
abline(fit)

Created on 2022-12-20 with reprex v2.0.2

CodePudding user response:

In case you want to use approxExtrap() function, xout is a numeric vector of x values where you want to put the extrapolated y values. In your case, it is the 8th to 12th values of date column.

extrap_8to12 <- approxExtrap(df$date, df$pop, xout = df$date[8:12]) 
df$pop[8:12] <- extrap_8to12$y

df
# A tibble: 12 × 2
     pop date      
   <dbl> <date>    
 1 18968 2020-01-01
 2 18956 2020-02-01
 3 18946 2020-03-01
 4 18934 2020-04-01
 5 18923 2020-05-01
 6 18912 2020-06-01
 7 18901 2020-07-01
 8 18901 2020-08-01
 9 18901 2020-09-01
10 18901 2020-10-01
11 18901 2020-11-01
12 18901 2020-12-01
  • Related