I want to save the summary output from a linear model into a single row cell. I know that there are alternative ways to save the output of a linear model into a dataframe, but these save the output into different row and column numbers based on the numbers of predictors in the linear model.
Essentially, I have the following function that stores the formula and supposed to store the linear model summary coefficients alongside it.
require(faraway)
col_names <- names(teengamb)[-5]
n_iter =length(col_names)
form <-""
models<-""
DF <- data.frame(formulas = rep(0,n_iter), linear_models = rep("",n_iter))
for(i in 1:length(col_names)){
form[i]=reformulate(col_names[1:i], response='gamble') %>% deparse() %>% list()
models[i] <- lm(form[[i]], data=teengamb)%>% summary() %>% coefficients %>% list()
DF[i, 'formulas'] <- form[[i]]
DF[i,'linear_models']<-models[[i]]
}
Error in
[<-.data.frame
(*tmp*
, i, "linear_models", value = c(29.775, : replacement has 2 rows, data has 1
I am aware this issue is from Error - replacement has [x] rows, data has [y], which is why I wanted to know how I can store all the values into a single cell. So the data.frame
does not read it as having rows or columns >1.
Expected output:
formulas linear_models
1 gamble ~ sex first linear model
2 gamble ~ sex status second linear model
3 gamble ~ sex status income third linear model
4 gamble ~ sex status income verbal fourth linear model
I should be able to index the linear models like:
DF[1,2]
Estimate Std. Error t value Pr(>|t|)
(Intercept) 29.77500 5.498275 5.415335 2.281844e-06
sex -25.90921 8.647659 -2.996095 4.436553e-03
DF[2, 2]
Estimate Std. Error t value Pr(>|t|)
(Intercept) 60.2232938 15.1346581 3.979164 0.0002548844
sex -35.7093699 9.4898594 -3.762898 0.0004933992
status -0.5855441 0.2726923 -2.147270 0.0373207721
..
..
CodePudding user response:
You can do this with a list column. Populate an empty list with your output, then add that list as a column
library(faraway)
col_names <- names(teengamb)[-5]
n_iter <- length(col_names)
DF <- data.frame(formulas = rep(0, n_iter))
linear_models <-list()
for(i in seq(n_iter)) {
DF[i, 'formulas'] <- deparse(reformulate(col_names[1:i], response = 'gamble'))
linear_models[[i]] <- coef(summary(lm(DF$formulas[i], data = teengamb)))
}
DF$linear_models <- linear_models
DF[1, 2]
#> [[1]]
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 29.77500 5.498275 5.415335 2.281844e-06
#> sex -25.90921 8.647659 -2.996095 4.436553e-03
DF[2, 2]
#> [[1]]
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 60.2232938 15.1346581 3.979164 0.0002548844
#> sex -35.7093699 9.4898594 -3.762898 0.0004933992
#> status -0.5855441 0.2726923 -2.147270 0.0373207721
Created on 2022-06-04 by the reprex package (v2.0.1)
CodePudding user response:
@AllanCameron has given you a perfectly acceptable answer, but I was puzzled by your response to my comment about nest()
. So here's an alternative using nest()
. I don't have the faraway
package, so I've used the diamods dataset that is part of ggplot2
as my test data.
library(tidyverse)
library(broom)
# Take price as the dependent variable...
col_names <- names(diamonds)[-7]
DF <- lapply(
1:length(col_names),
function(i) {
formula <- paste0("price ~ ", paste(col_names[1:i], collapse=" "))
tidy(lm(as.formula(formula), data=diamonds)) %>%
add_column(formula=formula, .before=1) %>%
nest(model=c(term, estimate, std.error, statistic, p.value))
}
) %>%
bind_rows()
DF
# A tibble: 9 × 2
formula model
<chr> <list>
1 price ~ carat <tibble [2 × 5]>
2 price ~ carat cut <tibble [6 × 5]>
3 price ~ carat cut color <tibble [12 × 5]>
4 price ~ carat cut color clarity <tibble [19 × 5]>
5 price ~ carat cut color clarity depth <tibble [20 × 5]>
6 price ~ carat cut color clarity depth table <tibble [21 × 5]>
7 price ~ carat cut color clarity depth table x <tibble [22 × 5]>
8 price ~ carat cut color clarity depth table x y <tibble [23 × 5]>
9 price ~ carat cut color clarity depth table x y z <tibble [24 × 5]>
DF[1,2] %>% unnest(cols=c(model))
# A tibble: 2 × 5
term estimate std.error statistic p.value
<chr> <dbl> <dbl> <dbl> <dbl>
1 (Intercept) -2256. 13.1 -173. 0
2 carat 7756. 14.1 551. 0