Home > Net >  Tidy multiple univariate regressions
Tidy multiple univariate regressions

Time:03-28

db = tibble(a = rnorm(100), b = rnorm(100), c = rnorm(100))

If I want a tidy multivariate linear regression, I just can go

lm(data = db, 0   a ~ b   c) %>% tidy()

But if I want multiple univariate regressions I would go

lm(data = db, a ~ 0   b) %>% tidy() %>%
  add_row(lm(data = db, a ~ 0   c) %>% tidy())

Now, given many regressor columns, I would like to avoid to code every single regressor as a new add_row, how should I make the code more synthetic?

This has a partial solution here:

Tidy output from many single-variable models using purrr, broom

I think the code can be even more lean than in the example?

CodePudding user response:

We could use {} to block the multiple expressions

library(magrittr)
library(broom)
lm(data = db, a ~ 0   b) %>%
     tidy() %>%  
    {add_row(., lm(data = db, a ~ 0   c) %>% 
         tidy())}

-output

# A tibble: 2 × 5
  term  estimate std.error statistic p.value
  <chr>    <dbl>     <dbl>     <dbl>   <dbl>
1 b       0.0601    0.0907     0.663   0.509
2 c       0.0411    0.0899     0.457   0.649

Or may do this within summarise and unnest

library(tidyr)
db %>% 
   summarise(out1 = list(bind_rows(lm(a ~ 0   b) %>% tidy, 
                    lm(a~ 0   c) %>% tidy))) %>%
   unnest(out1)

-output

# A tibble: 2 × 5
  term  estimate std.error statistic p.value
  <chr>    <dbl>     <dbl>     <dbl>   <dbl>
1 b       0.0601    0.0907     0.663   0.509
2 c       0.0411    0.0899     0.457   0.649

CodePudding user response:

My answer

db %>%
  select(-a) %>%
  names() %>%
  paste('a~0 ',.)%>%
  map_df(~tidy(lm(as.formula(.x), 
               data= db, 
               )))

CodePudding user response:

You could do something like this: Depending your columns:

library(broom)

vars <- names(db)[-1]

models <- list()

for (i in 1:2){
  vc <- combn(vars,i)
  for (j in 1:ncol(vc)){
    model <- as.formula(paste0("a ~", paste0(vc[,j], collapse = " ")))
    models <- c(models, model)
  }
}

lapply(models, function(x) lm(x, data = db) %>% tidy()) 
[[1]]
# A tibble: 2 x 5
  term        estimate std.error statistic p.value
  <chr>          <dbl>     <dbl>     <dbl>   <dbl>
1 (Intercept)   0.0155    0.0856     0.181   0.857
2 b            -0.0502    0.0797    -0.630   0.530

[[2]]
# A tibble: 2 x 5
  term        estimate std.error statistic p.value
  <chr>          <dbl>     <dbl>     <dbl>   <dbl>
1 (Intercept)   0.0113    0.0856     0.132   0.896
2 c             0.0553    0.0865     0.640   0.524

[[3]]
# A tibble: 3 x 5
  term        estimate std.error statistic p.value
  <chr>          <dbl>     <dbl>     <dbl>   <dbl>
1 (Intercept)   0.0132    0.0860     0.153   0.878
2 b            -0.0439    0.0807    -0.544   0.588
3 c             0.0486    0.0877     0.555   0.580
  • Related