Home > Software design >  R - Automatically adjust the model formula
R - Automatically adjust the model formula

Time:05-25

I am trying to find a way to automatically adjust the model formula that R will use to fit any sort of model. Here is a simple example. In the code below I want to be able to choose if I want to include "a" and "b" into the model or not by providing "include.a/b". If I choose "TRUE" it should be included into the model formula, if not left out.

x=1:10
y=2:11
y[9] = y[9] 1

a = rep(3, times = 10)
a[7] = 7
b = c(3:10, 10, 10)

include.a = FALSE
include.b = TRUE

# to get the model y ~ x   b
model = lm(y ~ x 
           if(include.b == TRUE){  b)}
           )

I've been searching this website everywhere but cannot find any hints.

CodePudding user response:

One option would be to define a character vector with the desired covariate names then create a formula using as.formula() then plug it in to lm():

# specify what you want to include
# both a and b
includes <- c("a","b")

# define formula
frmla <- as.formula(paste0("x ~ y", 
                           ifelse(!is.null(includes), 
                                  paste0(" ", paste(includes, collapse = " ")),"")))
# > frmla
# x ~ y   a   b

# Run model
lm(frmla)

#Call:
#lm(formula = frmla)

#Coefficients:
#(Intercept)            y            a            b  
# -1.250e 00    7.500e-01    8.885e-17    2.500e-01  

Add as many as you like

includes <- c("a", "b", "c", "d", "f")

frmla <- as.formula(paste0("x ~ y", ifelse(!is.null(includes), paste0(" ",paste(includes, collapse = " ")),"")))
#> frmla
#x ~ y   a   b   c   d   f

Or none at all:

includes <- c()
frmla <- as.formula(paste0("x ~ y", ifelse(!is.null(includes), paste0(" ",paste(includes, collapse = " ")),"")))

# > frmla
# x ~ y

CodePudding user response:

1) Use reformulate as shown:

fo <- reformulate(c("x", if (include.a) "a", if (include.b) "b"), "y")
lm(fo)

giving:

Call:
lm(formula = fo)

Coefficients:
(Intercept)            x            b  
    1.06154      1.10769     -0.07692  

2) Alternately call lm like this:

do.call("lm", list(fo))

giving a nicer Call: line:

Call:
lm(formula = y ~ x   b)

Coefficients:
(Intercept)            x            b  
    1.06154      1.10769     -0.07692  

3) Also consider a design where a single character vector v of variable names is provided.

v <- "b"
fo <- reformulate(c("x", v), "y")
lm(fo)

v <- c("a", "b")
fo <- reformulate(c("x", v), "y")
lm(fo)

v <- c()
fo <- reformulate(c("x", v), "y")
lm(fo)

In a function it would be written like this:

my_lm <- function(v = c(), resp = "y", indep = "x", env = parent.frame()) {
  fo <- reformulate(c(indep, v), resp, env = env)
  do.call("lm", list(fo))
}

my_lm("b")
  • Related