Home > Mobile >  Grouped regression with dplyr using different formulas
Grouped regression with dplyr using different formulas

Time:03-16

I try to transfer the problem from this post to a setting where you use different formulas in the lm() function in R.

Here a basic setup to reproduce the problem:

library(dplyr)
library(broom)
library(purrr)
library(tidyr)

# Generate data
set.seed(324)
dt <- data.frame(
  t = sort(rep(c(1,2), 50)),
  w1 = rnorm(100),
  w2 = rnorm(100),
  x1 = rnorm(100),
  x2 = rnorm(100)
)

# Generate formulas
fm <- map(1:2, ~as.formula(paste0("w", .x,  "~ x", .x)))

Now I try to run different regressions for each group t with models specified in formulas object fm :

# Approach 1:
dt %>% group_by(t) %>% 
  do(fit = tidy(map(fm, ~lm(.x, data = .)))) %>% 
  unnest(fit) 

# Approach 2
dt %>% nest(-t) %>% 
  mutate(
    fit = map(fm, ~lm(.x, data = .)),
    tfit = tidy(fit)
  )

This produces an error indicating that the formula cannot be converted to a data.frame . What am I doing wrong?

CodePudding user response:

This needs map2 instead of map as the data column from nest is also a list of data.frame, and thus we need to loop over the corresponding elements of 'fm' list and data (map2 does that)

library(tidyr)
library(purrr)
library(dplyr)
library(broom)
out <- dt %>%
  nest(data = -t)  %>%
  mutate(
    fit = map2(fm, data, ~lm(.x, data = .y)), 
    tfit = map(fit, tidy))

-output

> out
# A tibble: 2 × 4
      t data              fit    tfit            
  <dbl> <list>            <list> <list>          
1     1 <tibble [50 × 4]> <lm>   <tibble [2 × 5]>
2     2 <tibble [50 × 4]> <lm>   <tibble [2 × 5]>

> bind_rows(out$tfit)
# A tibble: 4 × 5
  term        estimate std.error statistic p.value
  <chr>          <dbl>     <dbl>     <dbl>   <dbl>
1 (Intercept)  0.0860      0.128    0.670   0.506 
2 x1           0.262       0.119    2.19    0.0331
3 (Intercept) -0.00285     0.152   -0.0187  0.985 
4 x2          -0.115       0.154   -0.746   0.459 

Or may also use

> imap_dfr(fm, ~ lm(.x, data = dt %>% 
                          filter(t == .y)) %>% 
               tidy)
# A tibble: 4 × 5
  term        estimate std.error statistic p.value
  <chr>          <dbl>     <dbl>     <dbl>   <dbl>
1 (Intercept)  0.0860      0.128    0.670   0.506 
2 x1           0.262       0.119    2.19    0.0331
3 (Intercept) -0.00285     0.152   -0.0187  0.985 
4 x2          -0.115       0.154   -0.746   0.459 

If we want to have all the combinations of 'fm' for each level of 't', then use crossing

dt %>% 
   nest(data = -t) %>% 
   crossing(fm) %>% 
   mutate(fit = map2(fm, data, ~ lm(.x, data = .y)),
        tfit = map(fit, tidy))

-output

# A tibble: 4 × 5
      t data              fm        fit    tfit            
  <dbl> <list>            <list>    <list> <list>          
1     1 <tibble [50 × 4]> <formula> <lm>   <tibble [2 × 5]>
2     1 <tibble [50 × 4]> <formula> <lm>   <tibble [2 × 5]>
3     2 <tibble [50 × 4]> <formula> <lm>   <tibble [2 × 5]>
4     2 <tibble [50 × 4]> <formula> <lm>   <tibble [2 × 5]>
  • Related