Home > database >  How to index a newly created column name inside a loop
How to index a newly created column name inside a loop

Time:07-04

I am using the data set mtcars as my example. The goal is to:

  • Step 1: Use a loop to run regressions with changing outcome, while the independent variables stay the same for each model.
  • Step 2: Transform the residuals from each model in step 1

My code:

library(tidyverse)

data("mtcars")

# Plan:
# Step 1: Outcome = cyl   disp   hp   drat
# Step 2: Transform residual from step 1


# The outcomes are all the other coloumns 
outcome = colnames(mtcars[, -c(2:5)])

for (i in outcome){
  # Step 1: Run the model using a loop with changing outcome
  formula = as.formula(paste0(i, "~ cyl   disp   hp   drat"))
  model = lm(formula, data = mtcars, na.action = na.exclude)
  
  # Save the residuals from each model as new columns with the suffix '.res'
  mtcars[, paste0(i, ".res")] = residuals(model)
  
  # Step 2: Transform the residuals and save them as new columns with the suffix '.invn'
  mtcars[, paste0(i, ".invn")] =
    qnorm((rank(mtcars[,get(paste0(i,".res"))],na.last="keep")-0.5)/sum(!is.na(mtcars[,get(paste0(i,".res"))])))
}

However, I am getting an error Error in get(paste0(i, ".res")) : object 'mpg.res' not found and this is from step 2.

  • The reason I think is because when indexing a column from a data set using [], the column name has to be put in quotes. So, if I were to put mtcars[, 'mpg.res'] I would have not received this error.
  • Nonetheless, the problem is that the column names are changing depending on the i so I can't put paste0(i, ".res") in quotes.
  • In summary, my question is: How to index a newly created column when the column name is part of the loop? I tried eval(parse()) but it didn't work.

PS: I know I can use purrr::map or apply to make my life easier, but I would really like to learn how to solve this problem when using a loop.

CodePudding user response:

Another simple approach might be to always use the last column name, since you always append the column to the end of the data.frame.

mtcars[, colnames(mtcars)[length(colnames(mtcars))]]

Translating your code into:

mtcars[, paste0(i, ".invn")] =
    qnorm((rank(mtcars[, colnames(mtcars)[length(colnames(mtcars))]],na.last="keep")-0.5)/sum(!is.na(mtcars[, colnames(mtcars)[length(colnames(mtcars))]])))

CodePudding user response:

Simply remove the get()

# Step 2: Transform the residuals and save them as new columns with the suffix '.invn'
  mtcars[, paste0(i, ".invn")] =
    qnorm((rank(mtcars[,paste0(i,".res")],na.last="keep")-0.5)/sum(!is.na(mtcars[,paste0(i,".res")])))
  • Related