Home > Back-end >  How to call multiple quoted variables in lm regressions
How to call multiple quoted variables in lm regressions

Time:06-04

I would like to make a function, let's call it fun_regression . This function takes data and ... as input. This ... contains single or multiple variables from data. For example,

library(tidyverse)
example_data <- tibble(y = rnorm(10), x1 = rnorm(10), x2 = rnorm(10), x3 = rnorm(10))

I would like to have a function that can regress y ~ x1 when I call fun_regression(example_data, x1); can regress y ~ x1 x2 when I call fun_regression(example_data, x1, x2); etc. What I have done is the following

fun_regression <- function(data, ...){
  rhs <- enquos(...)
  reg <- lm(as.formula(paste("y ~ ", paste(!!!rhs, collapse = " "))), data = data)
  summary(reg)
}

But, fun_regression(example_data, x1, x2) doesn't work.

CodePudding user response:

This feels like a workaround, but one option is to use str_remove_all():

fun_regression <- function(data, ...){
  rhs <- enquos(...)
  reg <- lm(as.formula(paste("y ~ ", paste(rhs, collapse = "   ") %>% str_remove_all('~'))), data = data)
  summary(reg)
}

fun_regression(example_data, x1, x2)

CodePudding user response:

Your variable rhs is a list of quosures. You need a character representation of the symbols for use in paste().

library(dplyr)
library(rlang)

example_data <- tibble(y = rnorm(10), x1 = rnorm(10), x2 = rnorm(10), x3 = rnorm(10))

fun_regression <- function(data, ...){
  rhs <- enquos(...)
  rhs_name <- sapply(rhs, as_name)
  
  reg <- lm(as.formula(paste("y ~ ", paste(rhs_name, collapse = " "))), data = data)
  summary(reg)
  
}

Note that when you do this, the Call shows as something less than ideal.

> fun_regression(example_data, x1, x2)

Call:
lm(formula = as.formula(paste("y ~ ", paste(rhs_name, collapse = " "))), 
    data = data)

Residuals:
     Min       1Q   Median       3Q      Max 
-1.33083 -0.43435 -0.02022  0.45940  1.10920 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)  
(Intercept)  -0.7644     0.3027  -2.525   0.0395 *
x1            0.4717     0.3226   1.462   0.1871  
x2            0.4749     0.4281   1.109   0.3039  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.8367 on 7 degrees of freedom
Multiple R-squared:  0.3063,    Adjusted R-squared:  0.1081 
F-statistic: 1.546 on 2 and 7 DF,  p-value: 0.278

You can fix this with the following modification.

library(dplyr)
library(rlang)

fun_regression <- function(data, ..., env = caller_env()){
  rhs <- enquos(...)
  rhs_name <- sapply(rhs, rlang::as_name)
  
  f <- parse_expr(paste("y ~ ", paste(rhs_name, collapse = " ")))
  
  lm_expr <- expr(lm(!!f, data = !!enexpr(data)))
  
  reg <- eval(lm_expr, env)
  summary(reg)
  
}

Now you see the Call actually shows the formula that was used.

> fun_regression(example_data, x1, x2)

Call:
lm(formula = y ~ x1   x2, data = example_data)

Residuals:
     Min       1Q   Median       3Q      Max 
-1.42076 -0.58613 -0.07209  0.59450  1.27787 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.26506    0.37341  -0.710    0.501
x1          -0.05333    0.28669  -0.186    0.858
x2           0.09249    0.38617   0.239    0.818

Residual standard error: 1.028 on 7 degrees of freedom
Multiple R-squared:  0.01514,   Adjusted R-squared:  -0.2662 
F-statistic: 0.05381 on 2 and 7 DF,  p-value: 0.948

This is very similar to the example shown in Chapter 20.6 of Hadley Wickham's Advanced R.

  •  Tags:  
  • r
  • Related