Home > Blockchain >  Loop inside a function, how to store function output to an existing dataframe
Loop inside a function, how to store function output to an existing dataframe


My goal is to run linear regressions with my defined equation, and then store the model residuals to my original dataset.


df <- data.frame(
  id = c(1:100),
  age = sample(20:80, 100, replace = TRUE),
  sex = sample(c("M", "F"), 100, replace = TRUE, prob = c(0.7, 0.3)),
  type = sample(letters[1:4], 100, replace = TRUE),
  bmi = sample(15:35, 100, replace = TRUE),
  sbp = sample(75:160, 100, replace = TRUE),
  cat_outcome1 = sample(c(0L, 1L), 100, replace = TRUE, prob = c(0.68, 0.32)),
  cat_outcome2 = sample(c(0L, 1L), 100, replace = TRUE, prob = c(0.65, 0.35)),
  cat_outcome3 = sample(c(0L, 1L), 100, replace = TRUE, prob = c(0.60, 0.40)),
  cat_outcome4 = sample(c(0L, 1L), 100, replace = TRUE, prob = c(0.45, 0.55)),
  dog_outcome1 = sample(c(0L, 1L), 100, replace = TRUE, prob = c(0.68, 0.32)),
  dog_outcome2 = sample(c(0L, 1L), 100, replace = TRUE, prob = c(0.65, 0.35)),
  dog_outcome3 = sample(c(0L, 1L), 100, replace = TRUE, prob = c(0.60, 0.40)),
  dog_outcome4 = sample(c(0L, 1L), 100, replace = TRUE, prob = c(0.45, 0.55))

outcome = colnames(df)[str_detect(colnames(df), "outcome")]

test_function = function(vars_dep, vars_indep, input_data){
  for (z in vars_dep) {
    formula = as.formula(paste0(z, " ~ ", vars_indep))
    model = lm(formula, data = input_data, na.action = na.exclude)
    # Take the residual from each model, create a new col with the suffix '.res'
    input_data[, paste0(z, ".res")] = residuals(model)

Like shown above, I would like to save the residuals and give them a suffix depending on which y I use in the model, and finally save these residuals as columns in my original dataframe df. So I expected to see cat_outcome1.res, cat_outcome2.res as new columns but they were not saved in df. Any suggestions are greatly appreciated!

CodePudding user response:

This function gives you what you want:

test_function <- function(vars_dep, vars_indep, input_data){
  for (z in vars_dep) {
    formula = as.formula(paste0(z, " ~ ", vars_indep))
    model = lm(formula, data = input_data, na.action = na.exclude)
    # Take the residual from each model, create a new col with the suffix '.res'
    input_data[[paste0(z, ".res")]] <- residuals(model)
  •  Tags:  
  • r
  • Related