I am wondering if there is a way to "predefine" parameters to functions like lm, glmer(lme4), glm, or home made functions.
I'll try to show my question with a small dataframe
y1<-(rnorm(n = 100, mean = 0, sd = 1))
y2<-(rnorm(n = 100, mean = 4, sd = 1))
x1 <- letters[1:2]; x1<- rep(x1, times =50 )
x2 <- letters[2:3]; x1<- rep(x1, times =50 )
x3 <- letters[4:5]; x1<- rep(x1, times =50 )
df<-as.data.frame(cbind(y1,y2,x1,x2,x3));df$y1<-as.numeric(df$y1);df$y2<-as.numeric(df$y2)
then I can easily fit lm like this
model <- lm(y1 ~x1, data=df)
However, what I am interested in being able to do is something like this
#first define list of predictors
predictor_vector<- c("x1","x2","x3")
And then use the names (strings) as a parameter in the lm()
function.
In this example, I am using lm()
and attempting to dynamically construct the regression as so:
model <- lm(y1 ~predictor_vector[1], data=df)
model <- lm(y1 ~predictor_vector[2], data=df)
model <- lm(y1 ~predictor_vector[3], data=df)
The example above doesn't work.
I am very grateful for any input on this topic and hope my example and explanation is clear enough.
CodePudding user response:
We may use a loop. Construct the formula with reformulate
or paste
, and apply the lm
to return the models in a list
out <- lapply(predictor_vector, function(x)
lm(reformulate(x, response = "y1"), data = df))
CodePudding user response:
The core issue to understand here is that lm()
takes a type formula
as the first parameter that specifies the regression.
You've created a vector of strings (characters) but R won't dynamically generated the formula for you in the function call - the ability to just type variable names as a formula is a convenience but not practical when you are attempting to be dynamic.
To simplify your example, start with:
y1 <- (rnorm(n = 10, mean = 0, sd = 1))
x1 <- (rnorm(n = 10, mean = 0, sd = 1))
x2 <- (rnorm(n = 10, mean = 0, sd = 1))
x3 <- (rnorm(n = 10, mean = 0, sd = 1))
df <- as.data.frame(cbind(y1,x1,x2,x3))
predictors = c("x1", "x2", "x3")
Now you can dynamically create a formula as as concatenated string (paste0
) and convert it to a formula. Then pass this formula to your lm()
call:
form1 = as.formula(paste0("y1~", predictors[1]))
lm(form1, data = df)
As akrun pointed out, you can then start doing things like create loops to dynamically generate these.
You can also do things like:
my_formula = as.formula(paste0("y1~", paste0(predictors, collapse=" ")))
## generates y1 ~ x1 x2 x3
lm(my_formula, data = df)
See also: Formula with dynamic number of variables
One of the answers on that page also mentions akrun's alternative way of doing this, using the function reformulate
. From ?reformulate
:
reformulate creates a formula from a character vector. If length(termlabels) > 1, its elements are concatenated with . Non-syntactic names (e.g. containing spaces or special characters; see make.names) must be protected with backticks (see examples). A non-parseable response still works for now, back compatibly, with a deprecation warning.