I want to create a plot like below for a lm model calculated using R.
Is there a simple way of doing it?
The plot above was collected here in this page.
CodePudding user response:
Package {caret} offers a convenient method varImp
:
Example:
library(caret)
my_model <- lm(mpg ~ disp cyl, data = mtcars)
## > varImp(my_model)
##
## Overall
## disp 2.006696
## cyl 2.229809
For different measures of variable importance see ?varImp
. Feed values into your plotting library of choice.
CodePudding user response:
Following the method in the linked article (relative marginal increase in r squared), you could write your own function that takes a formula, and the data frame, then plots the relative importance:
library(ggplot2)
plot_importance <- function(formula, data) {
lhs <- as.character(as.list(formula)[[2]])
rhs <- as.list(as.list(formula)[[3]])
vars <- grep("[ \\*]", rapply(rhs, as.character), invert = TRUE, value = TRUE)
df <- do.call(rbind, lapply(seq_along(vars), function(i) {
f1 <- as.formula(paste(lhs, paste(vars[-i], collapse = " "), sep = "~"))
f2 <- as.formula(paste(lhs, paste(c(vars[-i], vars[i]), collapse = " "),
sep = "~"))
r1 <- summary(lm(f1, data = data))$r.squared
r2 <- summary(lm(f2, data = data))$r.squared
data.frame(variable = vars[i], importance = r2 - r1)
}))
df$importance <- df$importance / sum(df$importance)
df$variable <- reorder(factor(df$variable), -df$importance)
ggplot(df, aes(x = variable, y = importance))
geom_col(fill = "deepskyblue4")
scale_y_continuous(labels = scales::percent)
coord_flip()
labs(title = "Relative importance of variables",
subtitle = deparse(formula))
theme_classic(base_size = 16)
}
We can test this out with the sample data provided in the linked article:
IV <- read.csv(paste0("https://statisticsbyjim.com/wp-content/uploads/",
"2017/07/ImportantVariables.csv"))
plot_importance(Strength ~ Time Pressure Temperature, data = IV)
And we see that the plot is the same.
We can also test it out on some built-in datasets to demonstrate that its use is generalized:
plot_importance(mpg ~ disp wt gear, data = mtcars)
plot_importance(Petal.Length ~ Species Petal.Width, data = iris)
Created on 2022-05-01 by the reprex package (v2.0.1)
CodePudding user response:
Just ended up using relaimpo
package and showing with ggplot answered by @Allan Cameron
library(relaimpo)
relative_importance <- calc.relimp(mymodel, type="lmg")$lmg
df = data.frame(
variable=names(relative_importance),
importance=round(c(relative_importance) * 100,2)
)
ggplot(df, aes(x = reorder(variable, -importance), y = importance))
geom_col(fill = "deepskyblue4")
geom_text(aes(label=importance), vjust=.3, hjust=1.2, size=3, color="white")
coord_flip()
labs(title = "Relative importance of variables")
theme_classic(base_size = 16)