Get a vector of input variables' names out of "lm" and "glm" objects-CodePudding

I am trying to get the input variable names out of the model object returned by the lm() function. I tried to access the attribute 'variables' in under lm_obj$terms. However, the returned object is a 'language' type object rather than a regular vector of names. For example:

lm_obj = lm(y ~ x   z   z:x, data=df)
attr(lm_obj$terms, 'variables')
# list(x, z)

What is a 'language' type? How to convert this 'language' type object to a regular vector like c('x', 'z')?

CodePudding user response：

You may get them out of the call,

fit <- lm(mpg ~ hp, mtcars)

head(all.vars(fit$call), -1)
# [1] "mpg" "hp"

or the names of the model.frame which is probably better.

names(model.frame(fit))
# [1] "mpg" "hp"

"language" is the (storage) mode or typeof of the object just as "double", "integer" or "list" are. See ?mode, for more explanation and nice examples. In the R language definition you find a detailed explanation—anyway a nice reading.

CodePudding user response：

You are on the correct track. "terms" object is where you should look at. If you want to omit the response variable, you can use delete.response.

all.vars(delete.response(terms(lm_obj)))
#[1] "x" "z"

I would also like to point you to

labels(terms(lm_obj))
#[1] "x"   "z"   "x:z"

which is sometimes more useful.

A reproducible example to complement your question

df <- data.frame(y = rnorm(20), x = rnorm(20), z = rnorm(20))
lm_obj <- lm(y ~ x   z   z:x, data = df)

Oh, let me explain why we should look at "terms" object. Try different answers here on the following model:

lmfit <- lm(y ~ poly(x)   z   I(z ^ 2)   z:x, data = df, na.action = na.exclude)

CodePudding user response：

In your object m_obj$terms, it is formula and you can access each term of it using [[ extractor operator like

m_obj$terms[[1]]

#> `~`  # formula symbol

if you want to get your input variables you can use

strsplit(as.character(lm_obj$terms[[3]])[2] , " \\  ")[[1]]

#> [1] "x" "z"