Home > OS >  How to dynamically name variables in formula in lm() function?
How to dynamically name variables in formula in lm() function?

Time:10-19

I am wondering if there is a way to "predefine" parameters to functions like lm, glmer(lme4), glm, or home made functions.

I'll try to show my question with a small dataframe

y1<-(rnorm(n = 100, mean = 0, sd = 1))
y2<-(rnorm(n = 100, mean = 4, sd = 1))
x1 <- letters[1:2]; x1<- rep(x1, times =50 )
x2 <- letters[2:3]; x1<- rep(x1, times =50 )
x3 <- letters[4:5]; x1<- rep(x1, times =50 )
df<-as.data.frame(cbind(y1,y2,x1,x2,x3));df$y1<-as.numeric(df$y1);df$y2<-as.numeric(df$y2)

then I can easily fit lm like this

model <- lm(y1 ~x1, data=df)

However, what I am interested in being able to do is something like this

#first define list of predictors 
predictor_vector<- c("x1","x2","x3")

And then use the names (strings) as a parameter in the lm() function.

In this example, I am using lm() and attempting to dynamically construct the regression as so:

model <- lm(y1 ~predictor_vector[1], data=df)
model <- lm(y1 ~predictor_vector[2], data=df)
model <- lm(y1 ~predictor_vector[3], data=df)

The example above doesn't work.

I am very grateful for any input on this topic and hope my example and explanation is clear enough.

CodePudding user response:

We may use a loop. Construct the formula with reformulate or paste, and apply the lm to return the models in a list

out <- lapply(predictor_vector, function(x)
     lm(reformulate(x, response = "y1"), data = df))

CodePudding user response:

The core issue to understand here is that lm() takes a type formula as the first parameter that specifies the regression.

You've created a vector of strings (characters) but R won't dynamically generated the formula for you in the function call - the ability to just type variable names as a formula is a convenience but not practical when you are attempting to be dynamic.

To simplify your example, start with:

y1 <- (rnorm(n = 10, mean = 0, sd = 1))
x1 <- (rnorm(n = 10, mean = 0, sd = 1))
x2 <- (rnorm(n = 10, mean = 0, sd = 1))
x3 <- (rnorm(n = 10, mean = 0, sd = 1))

df <- as.data.frame(cbind(y1,x1,x2,x3))

predictors = c("x1", "x2", "x3")

Now you can dynamically create a formula as as concatenated string (paste0) and convert it to a formula. Then pass this formula to your lm() call:

form1 = as.formula(paste0("y1~", predictors[1]))

lm(form1, data = df)

As akrun pointed out, you can then start doing things like create loops to dynamically generate these.

You can also do things like:

my_formula = as.formula(paste0("y1~", paste0(predictors, collapse=" ")))

## generates y1 ~ x1   x2   x3
lm(my_formula, data = df)

See also: Formula with dynamic number of variables

One of the answers on that page also mentions akrun's alternative way of doing this, using the function reformulate. From ?reformulate:

reformulate creates a formula from a character vector. If length(termlabels) > 1, its elements are concatenated with . Non-syntactic names (e.g. containing spaces or special characters; see make.names) must be protected with backticks (see examples). A non-parseable response still works for now, back compatibly, with a deprecation warning.

  •  Tags:  
  • r
  • Related