I am using the data set mtcars
as my example. The goal is to:
- Step 1: Use a loop to run regressions with changing outcome, while the independent variables stay the same for each model.
- Step 2: Transform the residuals from each model in step 1
My code:
library(tidyverse)
data("mtcars")
# Plan:
# Step 1: Outcome = cyl disp hp drat
# Step 2: Transform residual from step 1
# The outcomes are all the other coloumns
outcome = colnames(mtcars[, -c(2:5)])
for (i in outcome){
# Step 1: Run the model using a loop with changing outcome
formula = as.formula(paste0(i, "~ cyl disp hp drat"))
model = lm(formula, data = mtcars, na.action = na.exclude)
# Save the residuals from each model as new columns with the suffix '.res'
mtcars[, paste0(i, ".res")] = residuals(model)
# Step 2: Transform the residuals and save them as new columns with the suffix '.invn'
mtcars[, paste0(i, ".invn")] =
qnorm((rank(mtcars[,get(paste0(i,".res"))],na.last="keep")-0.5)/sum(!is.na(mtcars[,get(paste0(i,".res"))])))
}
However, I am getting an error Error in get(paste0(i, ".res")) : object 'mpg.res' not found
and this is from step 2.
- The reason I think is because when indexing a column from a data set using
[]
, the column name has to be put in quotes. So, if I were to putmtcars[, 'mpg.res']
I would have not received this error. - Nonetheless, the problem is that the column names are changing depending on the
i
so I can't putpaste0(i, ".res")
in quotes. - In summary, my question is: How to index a newly created column when the column name is part of the loop? I tried
eval(parse())
but it didn't work.
PS: I know I can use purrr::map
or apply
to make my life easier, but I would really like to learn how to solve this problem when using a loop.
CodePudding user response:
Another simple approach might be to always use the last column name, since you always append the column to the end of the data.frame
.
mtcars[, colnames(mtcars)[length(colnames(mtcars))]]
Translating your code into:
mtcars[, paste0(i, ".invn")] =
qnorm((rank(mtcars[, colnames(mtcars)[length(colnames(mtcars))]],na.last="keep")-0.5)/sum(!is.na(mtcars[, colnames(mtcars)[length(colnames(mtcars))]])))
CodePudding user response:
Simply remove the get()
# Step 2: Transform the residuals and save them as new columns with the suffix '.invn'
mtcars[, paste0(i, ".invn")] =
qnorm((rank(mtcars[,paste0(i,".res")],na.last="keep")-0.5)/sum(!is.na(mtcars[,paste0(i,".res")])))