Home > database >  asterisk in formula with many variables: how to limit order of interactions?
asterisk in formula with many variables: how to limit order of interactions?

Time:02-16

Suppose I have a list of variables names x = c('a','b','c','d','e') for a statistical model. When building the formula, it's nice to use something like paste('y ~',paste(x,collapse=' ')) to get y ~ a b c d e, especially when x may change.

Now I'd like to do the same thing with interaction terms, but paste(x,collapse=' : ') produces a : b : c : d : e, which is only one term, and paste(x,collapse=' * ') produces a * b * c * d * e, which includes all possible interactions across all orders -- i.e. a b c ... a:b a:c ... a:b:c a:b:d ... a:b:c:d:e. How can I limit the order of interaction terms up to say, 2nd, e.g. a:b ?

CodePudding user response:

reformulate handles this problem quite naturally, though how you would apply it is context-dependent.

If you want to drop interactions of order greater than order_max from an existing formula, then you can do:

f1 <- function(formula, order_max) {
    a <- attributes(terms(formula))
    reformulate(termlabels = a$term.labels[a$order <= order_max], 
                response = if (r <- a$response) a$variables[[1L   r]],
                intercept = a$intercept,
                env = environment(formula))
}

f1(y ~ a * b * c * d * e, 2L)
## y ~ a   b   c   d   e   a:b   a:c   b:c   a:d   b:d   c:d   a:e   
##     b:e   c:e   d:e

If you have a character vector x listing names of variables, and you want to construct a formula containing their interactions up to order order_max, then you can do:

f2 <- function(x, order_max, response = NULL, intercept = TRUE, env = parent.frame()) {
    paste1 <- function(x) paste0(x, collapse = ":")
    combn1 <- function(n) if (n > 1L) combn(x, n, paste1) else x
    termlabels <- unlist(lapply(seq_len(order_max), combn1), FALSE, FALSE)
    reformulate(termlabels = termlabels, response = response,
                intercept = intercept, env = env)
}

f2(letters[1:5], 2L, response = quote(y))
## y ~ a   b   c   d   e   a:b   a:c   a:d   a:e   b:c   b:d   b:e   
##     c:d   c:e   d:e

To be parsed correctly, nonsyntactic variable names must be protected with backquotes:

f2(c("`!`", "`?`"), 1L, response = quote(`#`))
## `#` ~ `!`   `?`

CodePudding user response:

The most straightforward way to achieve this, assuming you want to cross all terms to a specified degree, is to use the ^ operator in the formula.

x = c('a','b','c','d','e')

# Build formula using reformulate
(fm <- reformulate(x, "y"))
y ~ a   b   c   d   e

# Cross to second degree  
(fm2 <- update(fm, ~ .^2))
y ~ a   b   c   d   e   a:b   a:c   a:d   a:e   b:c   b:d   b:e   
c:d   c:e   d:e

# Terms of f2 as character:
attr(terms.formula(fm2), "term.labels")
[1] "a"   "b"   "c"   "d"   "e"   "a:b" "a:c" "a:d" "a:e" "b:c" "b:d" "b:e" "c:d" "c:e" "d:e"

# Cross to third degree
(fm3 <- update(fm, ~ .^3))
y ~ a   b   c   d   e   a:b   a:c   a:d   a:e   b:c   b:d   b:e   
    c:d   c:e   d:e   a:b:c   a:b:d   a:b:e   a:c:d   a:c:e   
    a:d:e   b:c:d   b:c:e   b:d:e   c:d:e

CodePudding user response:

Here is a another solution to create the : terms:

iterms = function(x,n,lower=TRUE){
  return(paste(lapply(ifelse(lower,1,n):n,function(ni){
    paste(apply(combn(x,ni),2,paste,collapse=':'),collapse='   ')
  }),collapse='   '))
}

Testing with:

x = c('a','b','c','d')
print(iterms(x,1))
print(iterms(x,2))
print(iterms(x,3))
print(iterms(x,3,lower=FALSE))

yields:

[1] "a   b   c   d"
[1] "a   b   c   d   a:b   a:c   a:d   b:c   b:d   c:d"
[1] "a   b   c   d   a:b   a:c   a:d   b:c   b:d   c:d   a:b:c   a:b:d   a:c:d   b:c:d"
[1] "a:b:c   a:b:d   a:c:d   b:c:d"
  • Related