I am trying to combine flexible modelling functions (using tidyeval) and then mapping over data in a nested dataframe (and attempting to learn tidy evaluation along the way). I am running into the problems of inlining expressions with the captured call (I think). Any suggestion, examples, tips, or best practices for writing wrappers to simplify repetitive modelling tasks and then using them with purrr::map etc?
The example below is based on the section wrapping modelling functions from 20 Evaluation | Advanced R using the mtcars data.
library(rlang)
library(tidyverse)
lm_wrap <- function(data, traits, resp, env = caller_env(), ...) {
traits <- enexpr(traits)
resp <- enexpr(resp)
data <- enexpr(data)
dots <- enexprs(...)
lm_call <- inject(lm(!!resp ~ !!traits, data = !!data, !!!dots), env)
return(lm_call)
}
The wrapper function works for single cases
lm_wrap(traits = hp, data = mtcars, resp = mpg)
#Call:
#lm(formula = mpg ~ hp, data = mtcars)
#Coefficients:
#(Intercept) hp
# 30.09886 -0.06823
But looks like it runs into the problems of inlining expressions, at least as per this somewhat related example 20 Evaluation | Advanced R
mt_nested <- mtcars %>% group_by(cyl) %>% nest() %>%
mutate(model = map(data, lm_wrap, resp = mpg, traits = hp))
mt_nested$model[[1]]$call
#lm(formula = mpg ~ hp, data = list(mpg = c(21, 21, 21.4, 18.1,
#19.2, 17.8, 19.7), disp = c(160, 160, 258, 225, 167.6, 167.6,
#145), hp = c(110, 110, 110, 105, 123, 123, 175), drat = c(3.9,
#3.9, 3.08, 2.76, 3.92, 3.92, 3.62), wt = c(2.62, 2.875, 3.215,
#3.46, 3.44, 3.44, 2.77), qsec = c(16.46, 17.02, 19.44, 20.22,
#18.3, 18.9, 15.5), vs = c(0, 0, 1, 1, 1, 1, 0), am = c(1, 1,
#0, 0, 0, 0, 1), gear = c(4, 4, 3, 3, 4, 4, 5), carb = c(4, 4,
#1, 1, 4, 4, 6)))
Thanks in advance,
M.
CodePudding user response:
The problem is that you are trying to mix different environments. The caller's, where data in the formula might be defined, and your function's, where data
has been passed to.
One solution is to create the formula separately in env
with expressions injected, then call lm()
in the local environment. Also note that enexprs(...)
is going to be broken in various unobvious ways. Instead I just passed the dots to lm()
.
lm_wrap <- function(data, traits, resp, ..., env = caller_env()) {
traits <- enexpr(traits)
resp <- enexpr(resp)
# First create the formula in the right environment.
# Formulas keep track of the env they've been created in.
f <- inject(!!resp ~ !!traits, env)
# Now inject the formula inside a local call
inject(lm(!!f, data = data, ...))
}
The second round of injection makes sure that the formula itself is recorded in the call rather than the symbol f
.
CodePudding user response:
You can build the call with data
being quoted:
library(rlang)
library(tidyverse)
lm_wrap <- function(data, traits, resp, env = caller_env(), ...) {
traits <- enexpr(traits)
resp <- enexpr(resp)
dots <- enexprs(...)
formula <- inject(formula(!!resp ~ !!traits, env = env), env)
do.call("lm", c(formula = formula, data = quote(data), inject(!!dots, env)))
}
mt_nested <- mtcars %>%
group_by(cyl) %>%
nest() %>%
mutate(model = map(data, lm_wrap, resp = mpg, traits = hp))
mt_nested$model[[1]]$call
#> lm(formula = mpg ~ hp, data = data)
If you want the call to contain the substituted value of data
rather than always saying data
, you can do substitute(data)
and evaluate the call in the parent frame. You could do this in base R as follows:
lm_wrap <- function(data, traits, resp, ...) {
f <- paste(deparse(substitute(resp)), deparse(substitute(traits)), sep = "~")
f <- as.formula(f)
do.call("lm", c(f, substitute(data), ...), envir = parent.frame())
}
Testing this, we get the value of data
in the call
object being .x[[i]]
, which is how the data chunk is referred to inside the body of map
:
mt_nested <- mtcars %>%
group_by(cyl) %>%
nest() %>%
mutate(model = map(data, lm_wrap, resp = mpg, traits = hp))
mt_nested$model[[1]]$call
#> lm(formula = mpg ~ hp, data = .x[[i]])
And if we call the function directly we get the expected mtcars
in the call
lm_wrap(mtcars, mpg, hp)$call
#> lm(formula = hp ~ mpg, data = mtcars)
Created on 2022-11-09 with reprex v2.0.2
CodePudding user response:
I guess you want something like that:
library(rlang)
library(tidyverse)
lm_wrap <- function(data, traits, resp, env = caller_env(), ...) {
traits <- enexpr(traits)
resp <- enexpr(resp)
data <- enexpr(data)
dots <- enexprs(...)
lm_call <- inject(lm(!!resp ~ !!traits, data = !!data, !!!dots), env)
return(lm_call)
}
mt_nested <- mtcars %>% group_by(cyl) %>%
group_modify( ~ tibble(
data = list(.x),
model = list(lm_wrap(mtcars %>% filter(cyl==!!.y$cyl), resp = mpg, traits = hp))))
mt_nested$model[[1]]$call
#> lm(formula = mpg ~ hp, data = mtcars %>% filter(cyl == 4))