Regression with many variables, but not enough to justify using . and subtracting unnecessary variab-CodePudding

I'm trying to run a regression with roughly 20 variables, in a dataset that has 50 variables. So it looks something like:

lm(data=data, formula = y ~ explanatory_1 ... explanatory_20)

Obviously this works fine, but we want the code to look a little cleaner. A lot of answers tell you to use . - however, I don't want to do that, because the dataset has about 20 or so variables that we don't use in the regression. i.e. We'd be subtracting as many variables as we include in the normal regression.

Is there a way to group the explanatory vars into a list, so it can instead look like

lm(data=data, formula = y ~ list)?

Furthermore, in some specifications we include a new covariate that also acts as an interaction term on all the original covariates, so ideally we would have

lm(data=data, formula = y ~ list new_var new_var:list).

Can this be done? Thanks!

CodePudding user response：

You can put the explanatory variables in a vector and use reformulate

x_vars <- c('cyl', 'disp', 'hp')
lm(data = mtcars, formula = reformulate(x_vars, response = 'mpg'))