How to use predict.lm in for loop?-CodePudding

I need to use a linear regression. Since each predictor is added to the model respectively, I should use a for loop to fit the model.

set.seed(98274)                          # Creating example data
y <- rnorm(1000)
x1 <- rnorm(1000)   0.2 * y
x2 <- rnorm(1000)   0.2 * x1   0.1 * y
x3 <- rnorm(1000) - 0.1 * x1   0.3 * x2 - 0.3 * y
data <- data.frame(y, x1, x2, x3)
head(data)                               # Head of data

mod_summaries <- list()                  # Create empty list

for(i in 2:ncol(data)) {                 # Head of for-loop
  
  predictors_i <- colnames(data)[2:i]    # Create vector of predictor names
  mod_summaries[[i - 1]] <- summary(     # Store regression model summary in list
    lm(y ~ ., data[ , c("y", predictors_i)]))
  
}

Then, I tried to predict the test data using those models in another for loop. My code is provided in the following.

## Test
set.seed(44)                          # Creating test data
y <- rnorm(1000)
x1 <- rnorm(1000)   0.19 * y
x2 <- rnorm(1000)   0.2 * x1   0.11 * y
x3 <- rnorm(1000) - 0.12 * x1   0.28 * x2 - 0.33 * y
test <- data.frame(y, x1, x2, x3)


predict_models <- matrix(nrow = nrow(test), ncol = 3)

for(i in 2:ncol(data)) {                 # Head of for-loop
  
  predictors_i <- colnames(data)[2:i]    # Create vector of predictor names
  predict_models[,i-1] <- predict.lm(mod_summaries[[i-1]], test[,2:i])
  
}
predict_models

but it throws out the following error:

Error in model.frame.default(Terms, newdata, na.action = na.action, xlev = object$xlevels) : 
  'data' must be a data.frame, environment, or list
In addition: Warning message:
In predict.lm(mod_summaries[[i - 1]], test[, 2:i]) :
  calling predict.lm(<fake-lm-object>) ...

CodePudding user response：

First, you want to store just the models, not the summaries.

mod_summaries <- vector('list', ncol(data) - 1L)  ## preallocate list of known length, it's way more efficient

for (i in seq_len(ncol(data))[-1]) {
  predictors_i <- colnames(data)[2:i]
  mod_summaries[[i - 1]] <- lm(y ~ ., data[, c("y", predictors_i)])
}

Then, data for predict actually doesn't change, only columns in model are used, so using test is sufficient.

predict_models <- matrix(nrow=nrow(test), ncol=ncol(test) - 1L)
for (i in seq_len(ncol(data))[-1]) {
  predict_models[, i - 1] <- predict.lm(mod_summaries[[i - 1]], test)
}

That's actually it.

head(predict_models)
#              [,1]        [,2]       [,3]
# [1,] -0.115690784 -0.19149611 -0.4815419
# [2,] -0.004721430  0.03814865  0.1894562
# [3,] -0.110812904  0.02312155  0.2579051
# [4,]  0.004264032 -0.06147035 -0.2328833
# [5,]  0.320110168 -0.04145044 -0.3229186
# [6,] -0.040603638  0.01977484 -0.1090088

Alternatively, and more R-ish, you could do the same in just two lines of code, without for loops, though.

ms <- lapply(seq_along(data)[-1], \(i) lm(reformulate(names(data)[2:i], 'y'), data))
pm <- sapply(ms, predict, test)
head(pm)
#           [,1]        [,2]       [,3]
# 1 -0.115690784 -0.19149611 -0.4815419
# 2 -0.004721430  0.03814865  0.1894562
# 3 -0.110812904  0.02312155  0.2579051
# 4  0.004264032 -0.06147035 -0.2328833
# 5  0.320110168 -0.04145044 -0.3229186
# 6 -0.040603638  0.01977484 -0.1090088