Save linear model summary output in single row/col cell-CodePudding

I want to save the summary output from a linear model into a single row cell. I know that there are alternative ways to save the output of a linear model into a dataframe, but these save the output into different row and column numbers based on the numbers of predictors in the linear model.

Essentially, I have the following function that stores the formula and supposed to store the linear model summary coefficients alongside it.

require(faraway)
col_names <- names(teengamb)[-5]
n_iter =length(col_names)
form <-""
models<-""
DF <- data.frame(formulas = rep(0,n_iter), linear_models = rep("",n_iter))
for(i in 1:length(col_names)){
  form[i]=reformulate(col_names[1:i], response='gamble') %>% deparse()  %>% list()
  models[i] <- lm(form[[i]], data=teengamb)%>% summary() %>% coefficients %>% list()
  DF[i, 'formulas'] <- form[[i]]
  DF[i,'linear_models']<-models[[i]]
}

Error in [<-.data.frame(*tmp*, i, "linear_models", value = c(29.775, : replacement has 2 rows, data has 1

I am aware this issue is from Error - replacement has [x] rows, data has [y], which is why I wanted to know how I can store all the values into a single cell. So the data.frame does not read it as having rows or columns >1.

Expected output:

                                 formulas         linear_models
1                            gamble ~ sex      first linear model        
2                   gamble ~ sex   status      second linear model  
3          gamble ~ sex   status   income      third linear model          
4 gamble ~ sex   status   income   verbal      fourth linear model

I should be able to index the linear models like:

DF[1,2]
            Estimate Std. Error   t value     Pr(>|t|)
(Intercept)  29.77500   5.498275  5.415335 2.281844e-06
sex         -25.90921   8.647659 -2.996095 4.436553e-03

DF[2, 2]
              Estimate Std. Error   t value     Pr(>|t|)
(Intercept)  60.2232938 15.1346581  3.979164 0.0002548844
sex         -35.7093699  9.4898594 -3.762898 0.0004933992
status       -0.5855441  0.2726923 -2.147270 0.0373207721

..
..

CodePudding user response：

You can do this with a list column. Populate an empty list with your output, then add that list as a column

library(faraway)

col_names <- names(teengamb)[-5]
n_iter <- length(col_names)
DF <- data.frame(formulas = rep(0, n_iter))
linear_models <-list()

for(i in seq(n_iter)) {
  DF[i, 'formulas'] <- deparse(reformulate(col_names[1:i], response = 'gamble'))
  linear_models[[i]] <- coef(summary(lm(DF$formulas[i], data = teengamb)))
}

DF$linear_models <- linear_models

DF[1, 2]
#> [[1]]
#>              Estimate Std. Error   t value     Pr(>|t|)
#> (Intercept)  29.77500   5.498275  5.415335 2.281844e-06
#> sex         -25.90921   8.647659 -2.996095 4.436553e-03

DF[2, 2]
#> [[1]]
#>                Estimate Std. Error   t value     Pr(>|t|)
#> (Intercept)  60.2232938 15.1346581  3.979164 0.0002548844
#> sex         -35.7093699  9.4898594 -3.762898 0.0004933992
#> status       -0.5855441  0.2726923 -2.147270 0.0373207721

^{Created on 2022-06-04 by the reprex package (v2.0.1)}

CodePudding user response：

@AllanCameron has given you a perfectly acceptable answer, but I was puzzled by your response to my comment about nest(). So here's an alternative using nest(). I don't have the faraway package, so I've used the diamods dataset that is part of ggplot2 as my test data.

library(tidyverse)
library(broom)

# Take price as the dependent variable...
col_names <- names(diamonds)[-7]

DF <- lapply(
        1:length(col_names),
        function(i) {
          formula <- paste0("price ~ ", paste(col_names[1:i], collapse="   "))
          tidy(lm(as.formula(formula), data=diamonds)) %>% 
          add_column(formula=formula, .before=1) %>% 
          nest(model=c(term, estimate, std.error, statistic, p.value))
        }
      ) %>% 
      bind_rows()

DF
# A tibble: 9 × 2
  formula                                                           model            
  <chr>                                                             <list>           
1 price ~ carat                                                     <tibble [2 × 5]> 
2 price ~ carat   cut                                               <tibble [6 × 5]> 
3 price ~ carat   cut   color                                       <tibble [12 × 5]>
4 price ~ carat   cut   color   clarity                             <tibble [19 × 5]>
5 price ~ carat   cut   color   clarity   depth                     <tibble [20 × 5]>
6 price ~ carat   cut   color   clarity   depth   table             <tibble [21 × 5]>
7 price ~ carat   cut   color   clarity   depth   table   x         <tibble [22 × 5]>
8 price ~ carat   cut   color   clarity   depth   table   x   y     <tibble [23 × 5]>
9 price ~ carat   cut   color   clarity   depth   table   x   y   z <tibble [24 × 5]>

DF[1,2] %>% unnest(cols=c(model))
# A tibble: 2 × 5
  term        estimate std.error statistic p.value
  <chr>          <dbl>     <dbl>     <dbl>   <dbl>
1 (Intercept)   -2256.      13.1     -173.       0
2 carat          7756.      14.1      551.       0