I would like to make a function, let's call it fun_regression
. This function takes data
and ...
as input. This ...
contains single or multiple variables from data
.
For example,
library(tidyverse)
example_data <- tibble(y = rnorm(10), x1 = rnorm(10), x2 = rnorm(10), x3 = rnorm(10))
I would like to have a function that can regress y ~ x1
when I call fun_regression(example_data, x1)
; can regress y ~ x1 x2
when I call fun_regression(example_data, x1, x2)
; etc.
What I have done is the following
fun_regression <- function(data, ...){
rhs <- enquos(...)
reg <- lm(as.formula(paste("y ~ ", paste(!!!rhs, collapse = " "))), data = data)
summary(reg)
}
But, fun_regression(example_data, x1, x2)
doesn't work.
CodePudding user response:
This feels like a workaround, but one option is to use str_remove_all()
:
fun_regression <- function(data, ...){
rhs <- enquos(...)
reg <- lm(as.formula(paste("y ~ ", paste(rhs, collapse = " ") %>% str_remove_all('~'))), data = data)
summary(reg)
}
fun_regression(example_data, x1, x2)
CodePudding user response:
Your variable rhs
is a list of quosures. You need a character representation of the symbols for use in paste()
.
library(dplyr)
library(rlang)
example_data <- tibble(y = rnorm(10), x1 = rnorm(10), x2 = rnorm(10), x3 = rnorm(10))
fun_regression <- function(data, ...){
rhs <- enquos(...)
rhs_name <- sapply(rhs, as_name)
reg <- lm(as.formula(paste("y ~ ", paste(rhs_name, collapse = " "))), data = data)
summary(reg)
}
Note that when you do this, the Call shows as something less than ideal.
> fun_regression(example_data, x1, x2)
Call:
lm(formula = as.formula(paste("y ~ ", paste(rhs_name, collapse = " "))),
data = data)
Residuals:
Min 1Q Median 3Q Max
-1.33083 -0.43435 -0.02022 0.45940 1.10920
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.7644 0.3027 -2.525 0.0395 *
x1 0.4717 0.3226 1.462 0.1871
x2 0.4749 0.4281 1.109 0.3039
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.8367 on 7 degrees of freedom
Multiple R-squared: 0.3063, Adjusted R-squared: 0.1081
F-statistic: 1.546 on 2 and 7 DF, p-value: 0.278
You can fix this with the following modification.
library(dplyr)
library(rlang)
fun_regression <- function(data, ..., env = caller_env()){
rhs <- enquos(...)
rhs_name <- sapply(rhs, rlang::as_name)
f <- parse_expr(paste("y ~ ", paste(rhs_name, collapse = " ")))
lm_expr <- expr(lm(!!f, data = !!enexpr(data)))
reg <- eval(lm_expr, env)
summary(reg)
}
Now you see the Call actually shows the formula that was used.
> fun_regression(example_data, x1, x2)
Call:
lm(formula = y ~ x1 x2, data = example_data)
Residuals:
Min 1Q Median 3Q Max
-1.42076 -0.58613 -0.07209 0.59450 1.27787
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.26506 0.37341 -0.710 0.501
x1 -0.05333 0.28669 -0.186 0.858
x2 0.09249 0.38617 0.239 0.818
Residual standard error: 1.028 on 7 degrees of freedom
Multiple R-squared: 0.01514, Adjusted R-squared: -0.2662
F-statistic: 0.05381 on 2 and 7 DF, p-value: 0.948
This is very similar to the example shown in Chapter 20.6 of Hadley Wickham's Advanced R.