data(mtcars)
head(mtcars)
Y = c("mpg")
X = c("cyl", "disp", "hp")
for(V in 1:length(X)){
MODEL=lm(Y~X[V],data=mtcars)
mtcars$paste(X[V], "residual")=MODEL$resid
}
I have data mtcars and wish to estimate many regression models and store residuals as new variables in data mtcars. I provide example below. Basically I wish to regress 'mpg' variable on all of three variable predictors separately ergo generating three new variables to mtcars which are the model residuals for the three looped regressions however I am unable to do so with success.
CodePudding user response:
Here is another option also with sapply
.
e <- sapply(X, function(V){
fmla <- as.formula(paste(Y, V, sep = "~"))
model <- lm(fmla, data = mtcars)
resid(model)
})
colnames(e) <- paste(colnames(e), "residual", sep = "_")
mtcars <- cbind(mtcars, e)
head(mtcars)
# mpg cyl disp hp drat wt qsec vs am gear carb cyl_residual disp_residual hp_residual
#Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 0.3701643 -2.005436 -1.5937500
#Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 0.3701643 -2.005436 -1.5937500
#Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 -3.5814159 -2.348622 -0.9536307
#Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1 0.7701643 2.433646 -1.1937500
#Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2 3.8217446 3.937588 0.5410881
#Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1 -2.5298357 -2.226453 -4.8348913
CodePudding user response:
By following your approach,
for(V in X){
mtcars[,paste0(V,"_residual")] <- lm(paste("mpg", "~", V),mtcars)$resid
}
or with sapply
,
sapply(X, function(a) {
mtcars[,paste0(a,"_residual")] <<- lm(paste("mpg", "~", a),mtcars)$resid
})
CodePudding user response:
I propose a slightly different approach
library(tidyverse)
data(mtcars)
flm = function(data) lm(mpg~value, data)
fres = function(model) model$resid
mtcars %>% as_tibble() %>%
pivot_longer(cyl:carb) %>% #1
group_by(name) %>%
nest() %>% #2
mutate(model = map(data, ~flm(.x))) %>% #3
mutate(residuals = map(model, ~fres(.x))) %>% #4
unnest(residuals)
output
# A tibble: 320 x 4
# Groups: name [10]
name data model residuals
<chr> <list> <list> <dbl>
1 cyl <tibble [32 x 2]> <lm> 0.370
2 cyl <tibble [32 x 2]> <lm> 0.370
3 cyl <tibble [32 x 2]> <lm> -3.58
4 cyl <tibble [32 x 2]> <lm> 0.770
5 cyl <tibble [32 x 2]> <lm> 3.82
6 cyl <tibble [32 x 2]> <lm> -2.53
7 cyl <tibble [32 x 2]> <lm> -0.578
8 cyl <tibble [32 x 2]> <lm> -1.98
9 cyl <tibble [32 x 2]> <lm> -3.58
10 cyl <tibble [32 x 2]> <lm> -1.43
# ... with 310 more rows
As this can be a bit confusing I will show you step by step what is going on here (see comment numbers). So let's see what we have after step one
# A tibble: 320 x 3
mpg name value
<dbl> <chr> <dbl>
1 21 cyl 6
2 21 disp 160
3 21 hp 110
4 21 drat 3.9
5 21 wt 2.62
6 21 qsec 16.5
7 21 vs 0
8 21 am 1
9 21 gear 4
10 21 carb 4
# ... with 310 more rows
It is rather simple. We just made the mtcars
long.
After step two, we have something like this
# A tibble: 10 x 2
# Groups: name [10]
name data
<chr> <list>
1 cyl <tibble [32 x 2]>
2 disp <tibble [32 x 2]>
3 hp <tibble [32 x 2]>
4 drat <tibble [32 x 2]>
5 wt <tibble [32 x 2]>
6 qsec <tibble [32 x 2]>
7 vs <tibble [32 x 2]>
8 am <tibble [32 x 2]>
9 gear <tibble [32 x 2]>
10 carb <tibble [32 x 2]>
As you can see, each variable has its own tibble
in which there are two variables mpg
and value
.
In the third step, we add the lm
models.
# A tibble: 10 x 3
# Groups: name [10]
name data model
<chr> <list> <list>
1 cyl <tibble [32 x 2]> <lm>
2 disp <tibble [32 x 2]> <lm>
3 hp <tibble [32 x 2]> <lm>
4 drat <tibble [32 x 2]> <lm>
5 wt <tibble [32 x 2]> <lm>
6 qsec <tibble [32 x 2]> <lm>
7 vs <tibble [32 x 2]> <lm>
8 am <tibble [32 x 2]> <lm>
9 gear <tibble [32 x 2]> <lm>
10 carb <tibble [32 x 2]> <lm>
On the other hand, in step four of these models, we add the residues.
# A tibble: 10 x 4
# Groups: name [10]
name data model residuals
<chr> <list> <list> <list>
1 cyl <tibble [32 x 2]> <lm> <dbl [32]>
2 disp <tibble [32 x 2]> <lm> <dbl [32]>
3 hp <tibble [32 x 2]> <lm> <dbl [32]>
4 drat <tibble [32 x 2]> <lm> <dbl [32]>
5 wt <tibble [32 x 2]> <lm> <dbl [32]>
6 qsec <tibble [32 x 2]> <lm> <dbl [32]>
7 vs <tibble [32 x 2]> <lm> <dbl [32]>
8 am <tibble [32 x 2]> <lm> <dbl [32]>
9 gear <tibble [32 x 2]> <lm> <dbl [32]>
10 carb <tibble [32 x 2]> <lm> <dbl [32]>
This solution is very quick and transparent and allows you to save all the models that you can use for further calculations. You can freely adapt the solution to your needs and perform calculations on selected variables.
CodePudding user response:
library(dplyr, warn.conflicts = FALSE)
Y = c("mpg")
X = c("cyl", "disp", "hp")
out <-
mtcars %>%
mutate(across(
all_of(X),
list(resid = ~ lm(reformulate(cur_column(), Y), data = mtcars)$resid)
))
head(out)
#> mpg cyl disp hp drat wt qsec vs am gear carb cyl_resid
#> Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 0.3701643
#> Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 0.3701643
#> Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 -3.5814159
#> Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1 0.7701643
#> Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2 3.8217446
#> Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1 -2.5298357
#> disp_resid hp_resid
#> Mazda RX4 -2.005436 -1.5937500
#> Mazda RX4 Wag -2.005436 -1.5937500
#> Datsun 710 -2.348622 -0.9536307
#> Hornet 4 Drive 2.433646 -1.1937500
#> Hornet Sportabout 3.937588 0.5410881
#> Valiant -2.226453 -4.8348913
Created on 2021-10-09 by the reprex package (v2.0.1)