I am trying to do several scaled linear regressions in an automated way, not manually typing every possible combination of variables.
I have 20 variables. I want to run a regression on each pair and then each set of 3, 4, etc.
So, I'd want a regression comparing Y ~ X1, X2
, Y ~ X2, X3
.... Y~ X1, X3
, Y~X1,X2,X3
, Y~X1,X2,X3,X4
, Y~X1,X2,X3,X5,X7
, Y~X1, X3, X7, X8
etc,etc. There would be a lot of models. Does anyone know how to do this?
I tried this (How do you repeat linear regressions where only the IV changes without having to write code repeatedly?):
lmfun <- function(x) do.call("lm", list(reformulate(x, "retention_rate"), quote(data)))
L <- Map(lmfun, names(data)[-1])
It works wonderfully except that I need the other combinations (3, 4, 5, 6 variables, etc, etc, not just iterating through a single X variable. Does anyone know how to change the above code to get the different combinations of different quantities? Also open to new suggestions.
CodePudding user response:
You can use combn
in a lapply
loop to create the formulas. Note that you'l have 1M regression formulas.
xvars <- paste0("X", 1:20)
fmla_list <- lapply(2:20, \(k){
combn(xvars, k, \(x) {
regr <- paste(x, collapse = " ")
fmla <- paste("Y ~", regr)
as.formula(fmla)
}, simplify = FALSE)
})
fmla_list <- unlist(fmla_list)
length(fmla_list)
#> [1] 1048555
sum(choose(20, 2:20))
#> [1] 1048555
Created on 2022-02-17 by the reprex package (v2.0.1)
You can also run the regressions in combn
. In its last code line instead of returning as.formula
, run lm(as.formula(fmla), etc)
.
CodePudding user response:
The stepwise
function from MASS package uses a method that could be useful for your porpouses: through comparing different kind of diagnostic indicators, sistematically eliminates or incorporates variables (backward/forward elimination). Here it's documentation link: https://www.rdocumentation.org/packages/Rcmdr/versions/2.0-4/topics/stepwise