Home > Net >  Plot the impact for each variable in linear regression?
Plot the impact for each variable in linear regression?

Time:05-02

I want to create a plot like below for a lm model calculated using R.

enter image description here

Is there a simple way of doing it?

The plot above was collected here in this page.

CodePudding user response:

Package {caret} offers a convenient method varImp: Example:

library(caret)
my_model <- lm(mpg ~ disp   cyl, data = mtcars)

## > varImp(my_model)
##
##       Overall
## disp 2.006696
## cyl  2.229809

For different measures of variable importance see ?varImp. Feed values into your plotting library of choice.

CodePudding user response:

Following the method in the linked article (relative marginal increase in r squared), you could write your own function that takes a formula, and the data frame, then plots the relative importance:

library(ggplot2)

plot_importance <- function(formula, data) {
  
  lhs <- as.character(as.list(formula)[[2]])
  rhs <- as.list(as.list(formula)[[3]])
  vars <- grep("[ \\*]", rapply(rhs, as.character), invert = TRUE, value = TRUE)
  df <- do.call(rbind, lapply(seq_along(vars), function(i) {
    f1 <- as.formula(paste(lhs, paste(vars[-i], collapse = " "), sep = "~"))
    f2 <- as.formula(paste(lhs, paste(c(vars[-i], vars[i]), collapse = " "), 
                           sep = "~"))
    r1 <- summary(lm(f1, data = data))$r.squared
    r2 <- summary(lm(f2, data = data))$r.squared
    
    data.frame(variable = vars[i], importance = r2 - r1)
  }))
  
  df$importance <- df$importance / sum(df$importance)
  df$variable <- reorder(factor(df$variable), -df$importance)

  ggplot(df, aes(x = variable, y = importance))  
    geom_col(fill = "deepskyblue4")  
    scale_y_continuous(labels = scales::percent)  
    coord_flip()  
    labs(title = "Relative importance of variables",
         subtitle = deparse(formula))  
    theme_classic(base_size = 16)
}

We can test this out with the sample data provided in the linked article:

IV <- read.csv(paste0("https://statisticsbyjim.com/wp-content/uploads/",
                      "2017/07/ImportantVariables.csv"))

plot_importance(Strength ~ Time   Pressure   Temperature, data = IV)

And we see that the plot is the same.

We can also test it out on some built-in datasets to demonstrate that its use is generalized:

plot_importance(mpg ~ disp   wt   gear, data = mtcars)

plot_importance(Petal.Length ~ Species   Petal.Width, data = iris)

Created on 2022-05-01 by the reprex package (v2.0.1)

CodePudding user response:

Just ended up using relaimpo package and showing with ggplot answered by @Allan Cameron

library(relaimpo)

relative_importance <- calc.relimp(mymodel, type="lmg")$lmg

df = data.frame(
variable=names(relative_importance),
importance=round(c(relative_importance) * 100,2)
)

ggplot(df, aes(x = reorder(variable, -importance), y = importance))  
  geom_col(fill = "deepskyblue4")   
  geom_text(aes(label=importance), vjust=.3, hjust=1.2, size=3, color="white") 
  coord_flip()  
  labs(title = "Relative importance of variables")  
  theme_classic(base_size = 16)
  • Related