Suppose I have a list of variables names x = c('a','b','c','d','e')
for a statistical model. When building the formula, it's nice to use something like paste('y ~',paste(x,collapse=' '))
to get y ~ a b c d e
, especially when x
may change.
Now I'd like to do the same thing with interaction terms, but paste(x,collapse=' : ')
produces a : b : c : d : e
, which is only one term, and paste(x,collapse=' * ')
produces a * b * c * d * e
, which includes all possible interactions across all orders -- i.e. a b c ... a:b a:c ... a:b:c a:b:d ... a:b:c:d:e
. How can I limit the order of interaction terms up to say, 2nd, e.g. a:b
?
CodePudding user response:
reformulate
handles this problem quite naturally, though how you would apply it is context-dependent.
If you want to drop interactions of order greater than order_max
from an existing formula, then you can do:
f1 <- function(formula, order_max) {
a <- attributes(terms(formula))
reformulate(termlabels = a$term.labels[a$order <= order_max],
response = if (r <- a$response) a$variables[[1L r]],
intercept = a$intercept,
env = environment(formula))
}
f1(y ~ a * b * c * d * e, 2L)
## y ~ a b c d e a:b a:c b:c a:d b:d c:d a:e
## b:e c:e d:e
If you have a character vector x
listing names of variables, and you want to construct a formula containing their interactions up to order order_max
, then you can do:
f2 <- function(x, order_max, response = NULL, intercept = TRUE, env = parent.frame()) {
paste1 <- function(x) paste0(x, collapse = ":")
combn1 <- function(n) if (n > 1L) combn(x, n, paste1) else x
termlabels <- unlist(lapply(seq_len(order_max), combn1), FALSE, FALSE)
reformulate(termlabels = termlabels, response = response,
intercept = intercept, env = env)
}
f2(letters[1:5], 2L, response = quote(y))
## y ~ a b c d e a:b a:c a:d a:e b:c b:d b:e
## c:d c:e d:e
To be parsed correctly, nonsyntactic variable names must be protected with backquotes:
f2(c("`!`", "`?`"), 1L, response = quote(`#`))
## `#` ~ `!` `?`
CodePudding user response:
The most straightforward way to achieve this, assuming you want to cross all terms to a specified degree, is to use the ^
operator in the formula.
x = c('a','b','c','d','e')
# Build formula using reformulate
(fm <- reformulate(x, "y"))
y ~ a b c d e
# Cross to second degree
(fm2 <- update(fm, ~ .^2))
y ~ a b c d e a:b a:c a:d a:e b:c b:d b:e
c:d c:e d:e
# Terms of f2 as character:
attr(terms.formula(fm2), "term.labels")
[1] "a" "b" "c" "d" "e" "a:b" "a:c" "a:d" "a:e" "b:c" "b:d" "b:e" "c:d" "c:e" "d:e"
# Cross to third degree
(fm3 <- update(fm, ~ .^3))
y ~ a b c d e a:b a:c a:d a:e b:c b:d b:e
c:d c:e d:e a:b:c a:b:d a:b:e a:c:d a:c:e
a:d:e b:c:d b:c:e b:d:e c:d:e
CodePudding user response:
Here is a another solution to create the :
terms:
iterms = function(x,n,lower=TRUE){
return(paste(lapply(ifelse(lower,1,n):n,function(ni){
paste(apply(combn(x,ni),2,paste,collapse=':'),collapse=' ')
}),collapse=' '))
}
Testing with:
x = c('a','b','c','d')
print(iterms(x,1))
print(iterms(x,2))
print(iterms(x,3))
print(iterms(x,3,lower=FALSE))
yields:
[1] "a b c d"
[1] "a b c d a:b a:c a:d b:c b:d c:d"
[1] "a b c d a:b a:c a:d b:c b:d c:d a:b:c a:b:d a:c:d b:c:d"
[1] "a:b:c a:b:d a:c:d b:c:d"