Home > Software design >  Linear Model: Calculate all possible combinations of variables with interaction: interaction terms m
Linear Model: Calculate all possible combinations of variables with interaction: interaction terms m

Time:04-14

I know this what I'm going to be doing is akin to a stepwise regression, and I know that is bad. But this is also an exercise for me that I would like to complete.

Let's say I have a linear model with four predictor variables: x1, x2, x3, and x4. How would I find all the possible model combinations, with interaction, but making sure that any interaction terms are also included in the main effects?

In R the "global model" would look like this:

lm(y ~ x1   x2   x3   x4   x1:X2   x1:x3   x1:x4   x2:x3   x2:x4   x3:x4   x1:x2:x3   x1:x2:x4   x1:x3:x4   x2:x3:x4   x1:x2:x3:x4) 

So there are a total 15 terms in the model. Now I could do something like this

regMat <- expand.grid(c(TRUE,FALSE), c(TRUE,FALSE),
                      c(TRUE,FALSE), c(TRUE,FALSE),
                      c(TRUE,FALSE), c(TRUE,FALSE),
                      c(TRUE,FALSE), c(TRUE,FALSE),
                      c(TRUE,FALSE), c(TRUE,FALSE),
                      c(TRUE,FALSE), c(TRUE,FALSE),
                      c(TRUE,FALSE), c(TRUE,FALSE), c(TRUE,FALSE))

But I would get ALL the possible permutations with repetition of the variables (a whopping 2^15 = 32768 permutations). However, what I need is to find those combinations where the variables in the interaction terms must be present as main effects (i.e., one of the first four single variables in the model).

Any idea on how to accomplish/calculate this?

CodePudding user response:

You can just write y ~ x1 * x2* x3 * x4 to contain all possible interaction terms. Keep in mind that the more coefficients you want to fit, the more samples you must have (~10 samples per coefficient to fit).

CodePudding user response:

If you're looking for all combinations of your variables of length one to length of your variables vector, then create formulae with all possible interactions, here is a way.

First loop over all combn with m in 1 to number of variables, then again all combn from 1 to length of subset for all subsets to create the interactions.

x <- c('x1', 'x2', 'x3', 'x4')

lst <- lapply(seq_along(x), \(i) 
              combn(x, i, \(z) lapply(seq_along(z),
                                      \(m) combn(z, m, paste, collapse=':')), 
                    simplify=FALSE)) |> unlist(recursive=FALSE)

fos <- lapply(lst, \(x) reformulate(unlist(x), 'y'))

Result

fos
# [[1]]
# y ~ x1
# <environment: 0x55e3b49fcf30>
#   
#   [[2]]
# y ~ x2
# <environment: 0x55e3b4a05dd8>
#   
#   [[3]]
# y ~ x3
# <environment: 0x55e3b4aa5880>
#   
#   [[4]]
# y ~ x4
# <environment: 0x55e3b4aa8a80>
#   
#   [[5]]
# y ~ x1   x2   x1:x2
# <environment: 0x55e3b4aabc80>
#   
#   [[6]]
# y ~ x1   x3   x1:x3
# <environment: 0x55e3b4ab28c0>
#   
#   [[7]]
# y ~ x1   x4   x1:x4
# <environment: 0x55e3b4abb420>
#   
#   [[8]]
# y ~ x2   x3   x2:x3
# <environment: 0x55e3b4abe230>
#   
#   [[9]]
# y ~ x2   x4   x2:x4
# <environment: 0x55e3b4ac4e70>
#   
#   [[10]]
# y ~ x3   x4   x3:x4
# <environment: 0x55e3b4ac9ba0>
#   
#   [[11]]
# y ~ x1   x2   x3   x1:x2   x1:x3   x2:x3   x1:x2:x3
# <environment: 0x55e3b4ad07e0>
#   
#   [[12]]
# y ~ x1   x2   x4   x1:x2   x1:x4   x2:x4   x1:x2:x4
# <environment: 0x55e3b4ad2b70>
#   
#   [[13]]
# y ~ x1   x3   x4   x1:x3   x1:x4   x3:x4   x1:x3:x4
# <environment: 0x55e3b4ad8d30>
#   
#   [[14]]
# y ~ x2   x3   x4   x2:x3   x2:x4   x3:x4   x2:x3:x4
# <environment: 0x55e3b4ae0e10>
#   
#   [[15]]
# y ~ x1   x2   x3   x4   x1:x2   x1:x3   x1:x4   x2:x3   x2:x4   
#   x3:x4   x1:x2:x3   x1:x2:x4   x1:x3:x4   x2:x3:x4   x1:x2:x3:x4
# <environment: 0x55e3b4ae6fd0>
  • Related