My goal is to run linear regressions with my defined equation, and then store the model residuals to my original dataset.
library(tidyverse)
library(stringr)
set.seed(5)
df <- data.frame(
id = c(1:100),
age = sample(20:80, 100, replace = TRUE),
sex = sample(c("M", "F"), 100, replace = TRUE, prob = c(0.7, 0.3)),
type = sample(letters[1:4], 100, replace = TRUE),
bmi = sample(15:35, 100, replace = TRUE),
sbp = sample(75:160, 100, replace = TRUE),
cat_outcome1 = sample(c(0L, 1L), 100, replace = TRUE, prob = c(0.68, 0.32)),
cat_outcome2 = sample(c(0L, 1L), 100, replace = TRUE, prob = c(0.65, 0.35)),
cat_outcome3 = sample(c(0L, 1L), 100, replace = TRUE, prob = c(0.60, 0.40)),
cat_outcome4 = sample(c(0L, 1L), 100, replace = TRUE, prob = c(0.45, 0.55)),
dog_outcome1 = sample(c(0L, 1L), 100, replace = TRUE, prob = c(0.68, 0.32)),
dog_outcome2 = sample(c(0L, 1L), 100, replace = TRUE, prob = c(0.65, 0.35)),
dog_outcome3 = sample(c(0L, 1L), 100, replace = TRUE, prob = c(0.60, 0.40)),
dog_outcome4 = sample(c(0L, 1L), 100, replace = TRUE, prob = c(0.45, 0.55))
)
outcome = colnames(df)[str_detect(colnames(df), "outcome")]
test_function = function(vars_dep, vars_indep, input_data){
for (z in vars_dep) {
formula = as.formula(paste0(z, " ~ ", vars_indep))
model = lm(formula, data = input_data, na.action = na.exclude)
# Take the residual from each model, create a new col with the suffix '.res'
input_data[, paste0(z, ".res")] = residuals(model)
}
}
Like shown above, I would like to save the residuals and give them a suffix depending on which y
I use in the model, and finally save these residuals as columns in my original dataframe df
. So I expected to see cat_outcome1.res
, cat_outcome2.res
as new columns but they were not saved in df
. Any suggestions are greatly appreciated!
CodePudding user response:
This function gives you what you want:
test_function <- function(vars_dep, vars_indep, input_data){
for (z in vars_dep) {
formula = as.formula(paste0(z, " ~ ", vars_indep))
model = lm(formula, data = input_data, na.action = na.exclude)
# Take the residual from each model, create a new col with the suffix '.res'
input_data[[paste0(z, ".res")]] <- residuals(model)
}
return(input_data)
}