I want to write a sliding window function in order to use the model trained from t, t 1, and t 2 year to make prediction on the outcome of the t 3 year. This means that for a 10-year's data, the desired sliding window function should create 7 train-test splits and make 7 predictions (for the t 3, t 4, t 5, t 6, t 7, t 8, t 9 year).
I came up with the following code but the result doesn't ring the bell. Not only does the resulting object length differs, but even if I try to manually work through the prediction task, the predict
function actually generates 3 predicted values for a single year's outcome, which doesn't make sense. It would be grateful if someone could point out the sources of the error.
# generate the data
set.seed(123)
df <- data.frame(year = 2000:2009, # T = 10
y = c(1, 1, 1, 1, 0, 0, 1, 0, 0, 0),
var1 = runif(10, min=0, max=1),
var2 = runif(10, min=1, max=2))
# store predicted values in a list
pred <- list()
# loop from the 1st year to the T-3 year
for(i in 2000:2007){
df_sub1 <- subset(df, year == c(i, i 1, i 2))
mod <- glm(y~var1 var2, data=df_sub1, family=binomial())
df_sub2 <- subset(df, year == i 3)
pred[[i]] <- predict(mod, data=df_sub2, type = "response")
}
# error message
Error in family$linkfun(mustart) :
Argument mu must be a nonempty numeric vector
In addition: Warning messages:
1: In year == c(i, i 1, i 2) :
longer object length is not a multiple of shorter object length
2: In year == c(i, i 1, i 2) :
longer object length is not a multiple of shorter object length
CodePudding user response:
The error/warning is from using ==
when the rhs is of length > 1. Use %in%
pred <- vector('list', 8)
names(pred) <- 2000:2007
for(i in 2000:2007){
df_sub1 <- subset(df, year %in% c(i, i 1, i 2))
mod <- glm(y~var1 var2, data=df_sub1, family=binomial())
df_sub2 <- subset(df, year == (i 3))
pred[[as.character(i)]] <- tryCatch(predict(mod,
newdata=df_sub2, type = "response"), error = function(e) NA_real_)
}
-output
> pred
$`2000`
4
1
$`2001`
5
1
$`2002`
6
1
$`2003`
7
2.220446e-16
$`2004`
8
0.1467543
$`2005`
9
0.001408577
$`2006`
10
2.220446e-16
$`2007`
[1] NA
CodePudding user response:
Here is another way with one of package zoo
's functions to apply a function to a rolling window. The function to be applied, roll_pred
is almost a copy&paste of akrun's, only the creation of the subsets is different.
# generate the data
set.seed(123)
df <- data.frame(year = 2000:2009, # T = 10
y = c(1, 1, 1, 1, 0, 0, 1, 0, 0, 0),
var1 = runif(10, min=0, max=1),
var2 = runif(10, min=1, max=2))
library(zoo, quietly = TRUE)
#>
#> Attaching package: 'zoo'
#> The following objects are masked from 'package:base':
#>
#> as.Date, as.Date.numeric
roll_pred <- function(year, X) {
i <- match(year, X$year)
df_sub1 <- X[i, ]
mod <- glm(y ~ var1 var2, data = df_sub1, family = binomial())
df_sub2 <- X[ i[length(year)] 1, ]
tryCatch(predict(mod, newdata = df_sub2, type = "response"),
error = function(e) NA_real_)
}
rollapplyr(df$year, 3, roll_pred, X = df)
#> 4 5 6 7 8 9
#> 1.000000e 00 1.000000e 00 1.000000e 00 2.220446e-16 1.467543e-01 1.408577e-03
#> 10 NA
#> 2.220446e-16 NA
Created on 2022-06-05 by the reprex package (v2.0.1)