Home > Software design >  Running Multiple Linear Regression Models in for-Loop
Running Multiple Linear Regression Models in for-Loop

Time:11-15

The logic is similar to the content-based recommender,

content undesirable desirable user_1 ... user_10
1 3.00 2.77 0.11 NA
...
5000 2.50 2.11 NA 0.12

I need to run the model for undesirable and desirable as independent values and each user as the dependent value, thus I need run 10 times to fit the model and predict each user's NA value.

This is the code that I hard coding, but I wonder how to use for loop, I just searched for several methods but they do not work for me...

the data as 'test'

hard code

#fit model
fit_1 = lm(user_1 ~ undesirable   desirable, data = test)
...
fit_10 = lm(user_10 ~ undesirable   desirable, data = test)

#prediction
u_1_na = test[is.na(test$user_1), c('user_1', 'undesirable', 'desirable')]
result1 = predict(fit_1, newdata = u_1_na)
which(result1 == max(result1))
max(result1)
...
u_10_na = test[is.na(test$user_10), c('user_10', 'undesirable', 'desirable')]
result10 = predict(fit_10, newdata = u_10_na)
which(result10 == max(result10))
max(result10)

#make to csv file
apply each max predict value to csv.

this is what I try for now(for loop)

mod_summaries <- list() 

for(i in 1:10) {                 
  
  predictors_i <- colnames(data)[1:10]   
  mod_summaries[[i - 1]] <- summary(     
    lm(predictors_i ~ ., test[ , c("undesirable", 'desirable')]))
  
}

CodePudding user response:

You could use the function as.formula together with the paste function to create your formula. Following is an example

formula_lm <- as.formula(
    paste(response_var, 
          paste(expl_var, collapse = "   "), 
          sep = " ~ "))

This implies that you have more than one explanatory variable (separated in the paste with ). If you only have one, omit the second paste.

With the created formula, you can use the lm funciton like this:

lm(formula_lm, data)

Edit: the vector expl_var would in your case include the undesirable and desirable variable.

CodePudding user response:

Avoid the loop. Make your data tidy. Something like:

library(tidyverse)

test %>%
  select(-content) %>%
  pivot_longer(
    starts_with("user"),
    names_to="user",
    values_to="value"
  ) %>%
  group_by(user) %>%
  group_map(
    function(.x, .y) {
      summary(lm(user ~ ., data=.x))
    }
  )

Untested code since your example is not reproducible.

CodePudding user response:

An apply method:

mod_summaries_lapply <-
  lapply(
    colnames(mtcars),
    FUN = function(x)
      summary(lm(reformulate(".", response = x), data = mtcars))
  )

A for loop method to make linear models for each column. The key is the reformulate() function, which creates the formula from strings. In the question, the function is made of a string and results in error invalid term in model formula. The string needs to be evaluated with eval() . This example uses the mtcars dataset.

mod_summaries <- list() 
for(i in 1:11) {                 
  predictors_i <- colnames(mtcars)[i]   
  mod_summaries[[i]] <- summary(lm(reformulate(".", response = predictors_i), data=mtcars))
  #summary(lm(reformulate(". -1", response = predictors_i), data=mtcars))  # -1 to exclude intercept
  #summary(lm(as.formula(paste(predictors_i, "~ .")), data=mtcars)) # a "paste as formula" method
}
  • Related