Home > Blockchain >  How to save a model summary as data frame in R
How to save a model summary as data frame in R

Time:11-15

I'm trying to save the summary of a model as a data frame in R. The model is a stepwise regression model using the MASS package. I'm primarily interested in saving the coefficients, their t value and the R-squared of the model.

Model summary output

I tried

ModelSummary <- data.frame(unclass(summary(step.model)), check.names = FALSE, stringsAsFactors = FALSE)

But had the error

Error in as.data.frame.default(x[[i]], optional = TRUE, stringsAsFactors = stringsAsFactors) : 
  cannot coerce class ‘"lm"’ to a data.frame

CodePudding user response:

Once you have run the model, you can access the coefficients and r-squared using $ syntax: $coefficients and $r.squared. Then you can cbind() to combine these.

So, for sample data:

model.cars <- lm(mtcars)

summary.cars <- summary(model.cars)

want <- cbind(summary.cars$coefficients, rsq=summary.cars$r.squared)

CodePudding user response:

Quick Intro

The broom package has a lot of great ways to turn your regression summaries into data frames. Since you do not have a directly reproducible dataset included, I will just use a simple regression on the iris dataset as an example. First, you can load the packages broom for the tidy dataframes and tidyverse for wrangling/plotting the data.

#### Load Libraries ####
library(tidyverse)
library(broom)

Then fit a regression like so:

#### Fit Regression Model ####
fit <- lm(Petal.Length ~ Petal.Width,
          iris)

Broom Functions

The tidy function turns your main coefficients into a dataframe instantly.

#### Tidy Dataframe ####
fit.tidy <- tidy(fit)
fit.tidy

Like so:

# A tibble: 2 × 5
  term        estimate std.error statistic  p.value
  <chr>          <dbl>     <dbl>     <dbl>    <dbl>
1 (Intercept)     1.08    0.0730      14.8 4.04e-31
2 Petal.Width     2.23    0.0514      43.4 4.68e-86

The augment function fits your model data and several other useful metrics into a dataframe, such as fitted values, residuals, and other information.

#### Augmented Dataframe ####
fit.aug <- augment(fit)
fit.aug

Like so:

# A tibble: 150 × 8
   Petal.Length Petal.Width .fitted  .resid   .hat .sigma   .cooksd .std.resid
          <dbl>       <dbl>   <dbl>   <dbl>  <dbl>  <dbl>     <dbl>      <dbl>
 1          1.4         0.2    1.53 -0.130  0.0182  0.480 0.000693     -0.273 
 2          1.4         0.2    1.53 -0.130  0.0182  0.480 0.000693     -0.273 
 3          1.3         0.2    1.53 -0.230  0.0182  0.479 0.00218      -0.484 
 4          1.5         0.2    1.53 -0.0295 0.0182  0.480 0.0000360    -0.0624
 5          1.4         0.2    1.53 -0.130  0.0182  0.480 0.000693     -0.273 
 6          1.7         0.4    1.98 -0.276  0.0140  0.479 0.00240      -0.580 
 7          1.4         0.3    1.75 -0.353  0.0160  0.479 0.00449      -0.743 
 8          1.5         0.2    1.53 -0.0295 0.0182  0.480 0.0000360    -0.0624
 9          1.4         0.2    1.53 -0.130  0.0182  0.480 0.000693     -0.273 
10          1.5         0.1    1.31  0.193  0.0206  0.480 0.00176       0.409

To glance your final model fit metrics, glance checks things like adjusted R square, etc. and places them into a data frame.

#### Glance Dataframe ####
fit.glance <- glance(fit)
fit.glance

Like so:

# A tibble: 1 × 12
  r.squared adj.r.squ…¹ sigma stati…²  p.value    df logLik   AIC   BIC devia…³
      <dbl>       <dbl> <dbl>   <dbl>    <dbl> <dbl>  <dbl> <dbl> <dbl>   <dbl>
1     0.927       0.927 0.478   1882. 4.68e-86     1  -101.  208.  217.    33.8

These data frame variables can also be very quickly selected using pull from the tidyverse (specifically dplyr).

fit.glance %>% 
  pull(adj.r.squared)

Giving you a quick adjusted R square value:

[1] 0.9266173

Usage Example

I will show you how this can be used for one case with the augment code I just used. Here is a plot of the fitted values and residuals on a scatterplot using the model data frame:

fit.aug %>% 
  ggplot(aes(x=.fitted,
             y=.resid)) 
  geom_point()

enter image description here

Since you are interested in combining some of these values in a data frame, you can also do this by adding bits of each into each other.

fit.tidy %>% 
  add_column(r.square = fit.glance$r.squared,
             adj.r.square = fit.glance$adj.r.squared)

Like so:

# A tibble: 2 × 7
  term        estimate std.error statistic  p.value r.square adj.r.square
  <chr>          <dbl>     <dbl>     <dbl>    <dbl>    <dbl>        <dbl>
1 (Intercept)     1.08    0.0730      14.8 4.04e-31    0.927        0.927
2 Petal.Width     2.23    0.0514      43.4 4.68e-86    0.927        0.927
  • Related