Get a vector of input variables names in R regression (lm(), glm() etc)-CodePudding

I am trying to get the input variable names out of the model object returned by the lm() function. I tried to access the attribute 'variables' in under lm_obj$terms. However, the returned object is a 'language' type object rather than a regular vector of names. For example:

lm_obj = lm(y ~ x   z   z:x, data=df)
attr(lm_obj, 'variables')
> list(x, z)

What is a 'language' type? How to convert this 'language' type object to a regular vector like c('x', 'z')?

CodePudding user response：

You may get them out of the call,

fit <- lm(mpg ~ hp, mtcars)

head(all.vars(fit$call), -1)
# [1] "mpg" "hp"

or the names of the model.frame which is probably better.

names(model.frame(fit))
# [1] "mpg" "hp"

"language" is the (storage) mode or typeof of the object just as "double", "integer" or "list" are. See ?mode, for more explanation and nice examples. In the R language definition you find a detailed explanation—anyway a nice reading.

CodePudding user response：

In your object m_obj$terms, it is formula and you can access each term of it using [[ extractor operator like

m_obj$terms[[1]]

#> `~`  # formula symbol

if you want to get your input variables you can use

strsplit(as.character(lm_obj$terms[[3]])[2] , " \\  ")[[1]]

#> [1] "x" "z"

CodePudding user response：

You are on the correct track. "terms" object is where you should look at. If you want to omit the response variable, you can use delete.response.

all.vars(delete.response(terms(lm_obj)))
#[1] "x" "z"

I would also like to point you to

labels(terms(lm_obj))
#[1] "x"   "z"   "x:z"

which is sometimes more useful.

A reproducible example to complement your question

df <- data.frame(y = rnorm(20), x = rnorm(20), z = rnorm(20))
lm_obj <- lm(y ~ x   z   z:x, data = df)

Oh, let me explain why we should look at "terms" object. Try different answers here on the following model:

lmfit <- lm(y ~ poly(x)   z   I(z ^ 2)   z:x, data = df, na.action = na.exclude)