I'm trying to write a function that can flexibly group by a variable number of arguments and fit a linear model to each subset. The output should be a table with each row showing the grouping variable(s) and corresponding lm call results that broom::glance provides. But I can't figure out how to structure the output. Code that produces the same error is as follows:
library(dplyr)
library(broom)
test_fcn <- function(var1, ...) {
x <- unlist(list(...))
mtcars %>%
group_by(across(all_of(c('gear', x)))) %>%
mutate(mod = list(lm(hp ~ !!sym(var1), data = .))) %>%
summarize(broom::glance(mod))
}
test_fcn('qsec', 'cyl', 'carb')
I'm pushing my R/dplyr comfort zone by mixing static and dynamic variable arguments, so I've left them here in case that's a contributing factor. Thanks for any input!
CodePudding user response:
You were nearly there.
test_fcn <- function(var1, ...) {
x <- unlist(list(...))
mtcars %>%
group_by(across(all_of(c('gear', x)))) %>%
summarise(
mod = list(lm(hp ~ !!sym(var1), data = .)),
mod = map(mod, broom::glance),
.groups = "drop")
}
test_fcn('qsec', 'cyl', 'carb') %>% unnest(mod)
## A tibble: 12 × 15
# gear cyl carb r.squared adj.r.sq…¹ sigma stati…² p.value df logLik AIC BIC devia…³ df.re…⁴
# <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <int>
# 1 3 4 1 0.502 0.485 49.2 30.2 5.77e-6 1 -169. 344. 348. 72633. 30
# 2 3 6 1 0.502 0.485 49.2 30.2 5.77e-6 1 -169. 344. 348. 72633. 30
# 3 3 8 2 0.502 0.485 49.2 30.2 5.77e-6 1 -169. 344. 348. 72633. 30
# 4 3 8 3 0.502 0.485 49.2 30.2 5.77e-6 1 -169. 344. 348. 72633. 30
# 5 3 8 4 0.502 0.485 49.2 30.2 5.77e-6 1 -169. 344. 348. 72633. 30
# 6 4 4 1 0.502 0.485 49.2 30.2 5.77e-6 1 -169. 344. 348. 72633. 30
# 7 4 4 2 0.502 0.485 49.2 30.2 5.77e-6 1 -169. 344. 348. 72633. 30
# 8 4 6 4 0.502 0.485 49.2 30.2 5.77e-6 1 -169. 344. 348. 72633. 30
# 9 5 4 2 0.502 0.485 49.2 30.2 5.77e-6 1 -169. 344. 348. 72633. 30
#10 5 6 6 0.502 0.485 49.2 30.2 5.77e-6 1 -169. 344. 348. 72633. 30
#11 5 8 4 0.502 0.485 49.2 30.2 5.77e-6 1 -169. 344. 348. 72633. 30
#12 5 8 8 0.502 0.485 49.2 30.2 5.77e-6 1 -169. 344. 348. 72633. 30
## … with 1 more variable: nobs <int>, and abbreviated variable names ¹adj.r.squared, ²statistic,
## ³deviance, ⁴df.residual
## ℹ Use `colnames()` to see all variable names
Because you are storing the lm
fit objects in a list
, you need to loop over the entries using purrr::map
.
You might want to put the unnest
into the test_fcn
: a slightly more compact version would be
test_fcn <- function(var1, ...) {
x <- unlist(list(...))
mtcars %>%
group_by(across(all_of(c('gear', x)))) %>%
summarise(
mod = map(list(lm(hp ~ !!sym(var1), data = .)), broom::glance),
.groups = "drop") %>%
unnest(mod)
}