Home > other >  Write a function to run multiple regression models with changing independent variables and changing
Write a function to run multiple regression models with changing independent variables and changing

Time:01-17

Using the data set mtcars as an example: The goal is to write a function to run multiple regression models with changing independent variables and changing dependent variables.

In the code that I wrote (below), var are the independent variables and mpg is the independent variable. I used map to run regressions repeatedly with vs and am as the changing independent variables each time.

var = c("vs", "am")

mtcars %>% select(all_of(var)) %>% 
  map(~ glm(mpg ~ .x   cyl   disp   splines::ns(wt, 2)   hp, 
            family = gaussian(link = "identity"), 
            data = mtcars)) %>% 
  map_dfr(tidy, conf.int = T, .id = 'source') %>% 
  select(source, source, term, estimate, std.error, conf.low, conf.high, p.value) 

I would like to run the same regression with a different set of independent variables, and also with a y that I can specify (e.g., I ran with mpg above, and I would like to change it to qsec or some other variables). So I envision a function like this:

function_name <- function(x, y, dataset){
  dataset %>% select(all_of(x)) %>%
    map(~ glm(y ~ .x   cyl   disp   splines::ns(wt, 2)   hp, 
              family = gaussian(link = "identity"), 
              data = dataset)) %>%
    map_dfr(tidy, conf.int = T, .id = 'source') %>% 
    select(source, source, term, estimate, std.error, conf.low, conf.high, p.value) 
}

But this function didn't work. Any suggestions?

CodePudding user response:

You could achieve your desired result like so:

  1. The issue with your code is that y ~ ... will not work. Instead you could use reformulate (or as.formula) to dynamically create the formula for your regression model.
  2. To make this work loop directly over the character vector x or more more precisely setNames(x, x) instead of looping over dataset %>% select(all_of(x)).
library(dplyr)
library(purrr)
library(broom)

function_name <- function(x, y, dataset) {
  map(setNames(x, x), ~ glm(reformulate(
    termlabels = c(.x, "cyl", "disp", "splines::ns(wt, 2)", "hp"),
    response = y
  ),
  family = gaussian(link = "identity"),
  data = dataset
  )) %>%
    map_dfr(tidy, conf.int = T, .id = "source") %>%
    select(source, source, term, estimate, std.error, conf.low, conf.high, p.value)
}

var <- c("vs", "am")

function_name(x = var, y = "mpg", mtcars)
#> # A tibble: 14 × 7
#>    source term                  estimate std.error conf.low conf.high  p.value
#>    <chr>  <chr>                    <dbl>     <dbl>    <dbl>     <dbl>    <dbl>
#>  1 vs     (Intercept)          32.7         3.49    25.8     39.5     1.24e- 9
#>  2 vs     vs                    1.03        1.52    -1.95     4.01    5.05e- 1
#>  3 vs     cyl                  -0.187       0.821   -1.80     1.42    8.21e- 1
#>  4 vs     disp                  0.000545    0.0119  -0.0228   0.0239  9.64e- 1
#>  5 vs     splines::ns(wt, 2)1 -22.4         4.82   -31.9    -13.0     9.02e- 5
#>  6 vs     splines::ns(wt, 2)2  -9.48        3.16   -15.7     -3.28    6.09e- 3
#>  7 vs     hp                   -0.0202      0.0115  -0.0427   0.00226 9.02e- 2
#>  8 am     (Intercept)          34.6         2.65    29.4     39.8     1.15e-12
#>  9 am     am                    0.0113      1.57    -3.06     3.08    9.94e- 1
#> 10 am     cyl                  -0.470       0.714   -1.87     0.931   5.17e- 1
#> 11 am     disp                  0.000796    0.0125  -0.0236   0.0252  9.50e- 1
#> 12 am     splines::ns(wt, 2)1 -21.5         5.86   -33.0    -10.0     1.14e- 3
#> 13 am     splines::ns(wt, 2)2  -9.21        3.34   -15.8     -2.66    1.07e- 2
#> 14 am     hp                   -0.0214      0.0136  -0.0480   0.00527 1.28e- 1
  •  Tags:  
  • Related