Home > Blockchain >  Loop inside a function, how to store function output to an existing dataframe
Loop inside a function, how to store function output to an existing dataframe

Time:02-05

My goal is to run linear regressions with my defined equation, and then store the model residuals to my original dataset.

library(tidyverse)
library(stringr)

set.seed(5)
df <- data.frame(
  id = c(1:100),
  age = sample(20:80, 100, replace = TRUE),
  sex = sample(c("M", "F"), 100, replace = TRUE, prob = c(0.7, 0.3)),
  type = sample(letters[1:4], 100, replace = TRUE),
  bmi = sample(15:35, 100, replace = TRUE),
  sbp = sample(75:160, 100, replace = TRUE),
  cat_outcome1 = sample(c(0L, 1L), 100, replace = TRUE, prob = c(0.68, 0.32)),
  cat_outcome2 = sample(c(0L, 1L), 100, replace = TRUE, prob = c(0.65, 0.35)),
  cat_outcome3 = sample(c(0L, 1L), 100, replace = TRUE, prob = c(0.60, 0.40)),
  cat_outcome4 = sample(c(0L, 1L), 100, replace = TRUE, prob = c(0.45, 0.55)),
  dog_outcome1 = sample(c(0L, 1L), 100, replace = TRUE, prob = c(0.68, 0.32)),
  dog_outcome2 = sample(c(0L, 1L), 100, replace = TRUE, prob = c(0.65, 0.35)),
  dog_outcome3 = sample(c(0L, 1L), 100, replace = TRUE, prob = c(0.60, 0.40)),
  dog_outcome4 = sample(c(0L, 1L), 100, replace = TRUE, prob = c(0.45, 0.55))
  )

outcome = colnames(df)[str_detect(colnames(df), "outcome")]

test_function = function(vars_dep, vars_indep, input_data){
  for (z in vars_dep) {
    formula = as.formula(paste0(z, " ~ ", vars_indep))
    
    model = lm(formula, data = input_data, na.action = na.exclude)
    
    # Take the residual from each model, create a new col with the suffix '.res'
    input_data[, paste0(z, ".res")] = residuals(model)
  }
}

Like shown above, I would like to save the residuals and give them a suffix depending on which y I use in the model, and finally save these residuals as columns in my original dataframe df. So I expected to see cat_outcome1.res, cat_outcome2.res as new columns but they were not saved in df. Any suggestions are greatly appreciated!

CodePudding user response:

This function gives you what you want:

test_function <- function(vars_dep, vars_indep, input_data){
  for (z in vars_dep) {
    formula = as.formula(paste0(z, " ~ ", vars_indep))
    
    model = lm(formula, data = input_data, na.action = na.exclude)
    
    # Take the residual from each model, create a new col with the suffix '.res'
    input_data[[paste0(z, ".res")]] <- residuals(model)
  }
  return(input_data)
}
  •  Tags:  
  • r
  • Related