Home > Enterprise >  R Loop Residual Regression
R Loop Residual Regression

Time:10-10

data(mtcars)
head(mtcars)
Y = c("mpg")
X = c("cyl", "disp", "hp")

for(V in 1:length(X)){
  MODEL=lm(Y~X[V],data=mtcars)
  mtcars$paste(X[V], "residual")=MODEL$resid
}

I have data mtcars and wish to estimate many regression models and store residuals as new variables in data mtcars. I provide example below. Basically I wish to regress 'mpg' variable on all of three variable predictors separately ergo generating three new variables to mtcars which are the model residuals for the three looped regressions however I am unable to do so with success.

CodePudding user response:

Here is another option also with sapply.

e <- sapply(X, function(V){
  fmla <- as.formula(paste(Y, V, sep = "~"))
  model <- lm(fmla, data = mtcars)
  resid(model)
})
colnames(e) <- paste(colnames(e), "residual", sep = "_")
mtcars <- cbind(mtcars, e)

head(mtcars)
#                   mpg cyl disp  hp drat    wt  qsec vs am gear carb cyl_residual disp_residual hp_residual
#Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4    0.3701643     -2.005436  -1.5937500
#Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4    0.3701643     -2.005436  -1.5937500
#Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1   -3.5814159     -2.348622  -0.9536307
#Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1    0.7701643      2.433646  -1.1937500
#Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2    3.8217446      3.937588   0.5410881
#Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1   -2.5298357     -2.226453  -4.8348913

CodePudding user response:

By following your approach,

  for(V in X){
    
   mtcars[,paste0(V,"_residual")] <- lm(paste("mpg", "~", V),mtcars)$resid 
}

or with sapply,

sapply(X, function(a) { 
            mtcars[,paste0(a,"_residual")] <<- lm(paste("mpg", "~", a),mtcars)$resid 
            
            })

CodePudding user response:

I propose a slightly different approach

library(tidyverse)
data(mtcars)

flm = function(data) lm(mpg~value, data)
fres = function(model) model$resid

mtcars %>% as_tibble() %>% 
  pivot_longer(cyl:carb) %>% #1
  group_by(name) %>% 
  nest() %>% #2
  mutate(model = map(data, ~flm(.x))) %>% #3
  mutate(residuals = map(model, ~fres(.x))) %>% #4
  unnest(residuals) 
  

output

# A tibble: 320 x 4
# Groups:   name [10]
   name  data              model  residuals
   <chr> <list>            <list>     <dbl>
 1 cyl   <tibble [32 x 2]> <lm>       0.370
 2 cyl   <tibble [32 x 2]> <lm>       0.370
 3 cyl   <tibble [32 x 2]> <lm>      -3.58 
 4 cyl   <tibble [32 x 2]> <lm>       0.770
 5 cyl   <tibble [32 x 2]> <lm>       3.82 
 6 cyl   <tibble [32 x 2]> <lm>      -2.53 
 7 cyl   <tibble [32 x 2]> <lm>      -0.578
 8 cyl   <tibble [32 x 2]> <lm>      -1.98 
 9 cyl   <tibble [32 x 2]> <lm>      -3.58 
10 cyl   <tibble [32 x 2]> <lm>      -1.43 
# ... with 310 more rows

As this can be a bit confusing I will show you step by step what is going on here (see comment numbers). So let's see what we have after step one

# A tibble: 320 x 3
     mpg name   value
   <dbl> <chr>  <dbl>
 1    21 cyl     6   
 2    21 disp  160   
 3    21 hp    110   
 4    21 drat    3.9 
 5    21 wt      2.62
 6    21 qsec   16.5 
 7    21 vs      0   
 8    21 am      1   
 9    21 gear    4   
10    21 carb    4   
# ... with 310 more rows

It is rather simple. We just made the mtcars long.

After step two, we have something like this

# A tibble: 10 x 2
# Groups:   name [10]
   name  data             
   <chr> <list>           
 1 cyl   <tibble [32 x 2]>
 2 disp  <tibble [32 x 2]>
 3 hp    <tibble [32 x 2]>
 4 drat  <tibble [32 x 2]>
 5 wt    <tibble [32 x 2]>
 6 qsec  <tibble [32 x 2]>
 7 vs    <tibble [32 x 2]>
 8 am    <tibble [32 x 2]>
 9 gear  <tibble [32 x 2]>
10 carb  <tibble [32 x 2]>

As you can see, each variable has its own tibble in which there are two variables mpg and value. In the third step, we add the lm models.

# A tibble: 10 x 3
# Groups:   name [10]
   name  data              model 
   <chr> <list>            <list>
 1 cyl   <tibble [32 x 2]> <lm>  
 2 disp  <tibble [32 x 2]> <lm>  
 3 hp    <tibble [32 x 2]> <lm>  
 4 drat  <tibble [32 x 2]> <lm>  
 5 wt    <tibble [32 x 2]> <lm>  
 6 qsec  <tibble [32 x 2]> <lm>  
 7 vs    <tibble [32 x 2]> <lm>  
 8 am    <tibble [32 x 2]> <lm>  
 9 gear  <tibble [32 x 2]> <lm>  
10 carb  <tibble [32 x 2]> <lm>  

On the other hand, in step four of these models, we add the residues.

# A tibble: 10 x 4
# Groups:   name [10]
   name  data              model  residuals 
   <chr> <list>            <list> <list>    
 1 cyl   <tibble [32 x 2]> <lm>   <dbl [32]>
 2 disp  <tibble [32 x 2]> <lm>   <dbl [32]>
 3 hp    <tibble [32 x 2]> <lm>   <dbl [32]>
 4 drat  <tibble [32 x 2]> <lm>   <dbl [32]>
 5 wt    <tibble [32 x 2]> <lm>   <dbl [32]>
 6 qsec  <tibble [32 x 2]> <lm>   <dbl [32]>
 7 vs    <tibble [32 x 2]> <lm>   <dbl [32]>
 8 am    <tibble [32 x 2]> <lm>   <dbl [32]>
 9 gear  <tibble [32 x 2]> <lm>   <dbl [32]>
10 carb  <tibble [32 x 2]> <lm>   <dbl [32]>

This solution is very quick and transparent and allows you to save all the models that you can use for further calculations. You can freely adapt the solution to your needs and perform calculations on selected variables.

CodePudding user response:

library(dplyr, warn.conflicts = FALSE)

Y = c("mpg")
X = c("cyl", "disp", "hp")

out <-
  mtcars %>%
    mutate(across(
      all_of(X),
      list(resid = ~ lm(reformulate(cur_column(), Y), data = mtcars)$resid)
    ))

head(out)
#>                    mpg cyl disp  hp drat    wt  qsec vs am gear carb  cyl_resid
#> Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4  0.3701643
#> Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4  0.3701643
#> Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1 -3.5814159
#> Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1  0.7701643
#> Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2  3.8217446
#> Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1 -2.5298357
#>                   disp_resid   hp_resid
#> Mazda RX4          -2.005436 -1.5937500
#> Mazda RX4 Wag      -2.005436 -1.5937500
#> Datsun 710         -2.348622 -0.9536307
#> Hornet 4 Drive      2.433646 -1.1937500
#> Hornet Sportabout   3.937588  0.5410881
#> Valiant            -2.226453 -4.8348913

Created on 2021-10-09 by the reprex package (v2.0.1)

  • Related