I have data for the first 7 months of a year, and would like to linearly extrapolate the data for the last 5 months. I have tried using the approxExtrap function from the Hmisc package but am not sure how to employ it, particularly what to specify for xout. I'm open to any solution (dplyr-based would be ideal if possible). Thank you for your time.
Here is a sample of the data:
df <- tibble(pop = c(18968, 18956, 18946, 18934, 18923, 18912, 18901, NA, NA, NA, NA, NA),
date = c("2020-01-01", "2020-02-01", "2020-03-01",
"2020-04-01", "2020-05-01", "2020-06-01",
"2020-07-01", "2020-08-01", "2020-09-01",
"2020-10-01", "2020-11-01", "2020-12-01"))
df$date <- lubridate::as_date(df$date)
CodePudding user response:
If you want linear interpolation, lm
will create a fitted object and you can predict from it.
na
is the new data passed on to predict.lm
as argument newdata
.
df <- data.frame(pop = c(18968, 18956, 18946, 18934, 18923, 18912, 18901, NA, NA, NA, NA, NA),
date = c("2020-01-01", "2020-02-01", "2020-03-01",
"2020-04-01", "2020-05-01", "2020-06-01",
"2020-07-01", "2020-08-01", "2020-09-01",
"2020-10-01", "2020-11-01", "2020-12-01"))
df$date <- lubridate::as_date(df$date)
fit <- lm(pop ~ date, df)
na <- df[is.na(df$pop), "date", drop = FALSE]
newpop <- predict(fit, newdata = na)
na <- cbind(na, pop = newpop)
na
#> date pop
#> 8 2020-08-01 18889.45
#> 9 2020-09-01 18878.06
#> 10 2020-10-01 18867.03
#> 11 2020-11-01 18855.64
#> 12 2020-12-01 18844.61
plot(pop ~ date, df, ylim = c(18800, 19000), pch = 19)
points(pop ~ date, na, col = "red", pch = 19)
abline(fit)
Created on 2022-12-20 with reprex v2.0.2
CodePudding user response:
In case you want to use approxExtrap()
function, xout
is a numeric vector of x values where you want to put the extrapolated y values. In your case, it is the 8th to 12th values of date
column.
extrap_8to12 <- approxExtrap(df$date, df$pop, xout = df$date[8:12])
df$pop[8:12] <- extrap_8to12$y
df
# A tibble: 12 × 2
pop date
<dbl> <date>
1 18968 2020-01-01
2 18956 2020-02-01
3 18946 2020-03-01
4 18934 2020-04-01
5 18923 2020-05-01
6 18912 2020-06-01
7 18901 2020-07-01
8 18901 2020-08-01
9 18901 2020-09-01
10 18901 2020-10-01
11 18901 2020-11-01
12 18901 2020-12-01