How to avoid inlining expressions when combining map with tidyeval modelling wrappers-CodePudding

I am trying to combine flexible modelling functions (using tidyeval) and then mapping over data in a nested dataframe (and attempting to learn tidy evaluation along the way). I am running into the problems of inlining expressions with the captured call (I think). Any suggestion, examples, tips, or best practices for writing wrappers to simplify repetitive modelling tasks and then using them with purrr::map etc?

The example below is based on the section wrapping modelling functions from 20 Evaluation | Advanced R using the mtcars data.

library(rlang)
library(tidyverse)

lm_wrap <- function(data, traits, resp, env = caller_env(), ...) {
  
  traits <- enexpr(traits)
  resp <- enexpr(resp)
  data <- enexpr(data)
  dots <- enexprs(...)

  lm_call <- inject(lm(!!resp ~ !!traits, data = !!data, !!!dots),  env)
  
  return(lm_call)
}

The wrapper function works for single cases

lm_wrap(traits = hp, data = mtcars, resp = mpg)

#Call:
#lm(formula = mpg ~ hp, data = mtcars)

#Coefficients:
#(Intercept)           hp  
# 30.09886     -0.06823

But looks like it runs into the problems of inlining expressions, at least as per this somewhat related example 20 Evaluation | Advanced R

mt_nested <- mtcars %>% group_by(cyl) %>% nest() %>%
  mutate(model = map(data, lm_wrap, resp = mpg, traits = hp))

mt_nested$model[[1]]$call

#lm(formula = mpg ~ hp, data = list(mpg = c(21, 21, 21.4, 18.1, 
#19.2, 17.8, 19.7), disp = c(160, 160, 258, 225, 167.6, 167.6, 
#145), hp = c(110, 110, 110, 105, 123, 123, 175), drat = c(3.9, 
#3.9, 3.08, 2.76, 3.92, 3.92, 3.62), wt = c(2.62, 2.875, 3.215, 
#3.46, 3.44, 3.44, 2.77), qsec = c(16.46, 17.02, 19.44, 20.22, 
#18.3, 18.9, 15.5), vs = c(0, 0, 1, 1, 1, 1, 0), am = c(1, 1, 
#0, 0, 0, 0, 1), gear = c(4, 4, 3, 3, 4, 4, 5), carb = c(4, 4, 
#1, 1, 4, 4, 6)))

Thanks in advance,

CodePudding user response：

The problem is that you are trying to mix different environments. The caller's, where data in the formula might be defined, and your function's, where data has been passed to.

One solution is to create the formula separately in env with expressions injected, then call lm() in the local environment. Also note that enexprs(...) is going to be broken in various unobvious ways. Instead I just passed the dots to lm().

lm_wrap <- function(data, traits, resp, ..., env = caller_env()) {
  traits <- enexpr(traits)
  resp <- enexpr(resp)

  # First create the formula in the right environment.
  # Formulas keep track of the env they've been created in.
  f <- inject(!!resp ~ !!traits,  env)
  
  # Now inject the formula inside a local call
  inject(lm(!!f, data = data, ...))
}

The second round of injection makes sure that the formula itself is recorded in the call rather than the symbol f.

CodePudding user response：

You can build the call with data being quoted:

library(rlang)
library(tidyverse)

lm_wrap <- function(data, traits, resp, env = caller_env(), ...) {
  
  traits <- enexpr(traits)
  resp <- enexpr(resp)
  dots <- enexprs(...)
  
  formula <- inject(formula(!!resp ~ !!traits,  env = env), env)
  
  do.call("lm", c(formula = formula, data = quote(data), inject(!!dots, env)))
}


mt_nested <- mtcars %>% 
  group_by(cyl) %>% 
  nest() %>%
  mutate(model = map(data, lm_wrap, resp = mpg, traits = hp))

mt_nested$model[[1]]$call
#> lm(formula = mpg ~ hp, data = data)

If you want the call to contain the substituted value of data rather than always saying data, you can do substitute(data) and evaluate the call in the parent frame. You could do this in base R as follows:

lm_wrap <- function(data, traits, resp, ...) {
  
  f <- paste(deparse(substitute(resp)), deparse(substitute(traits)), sep = "~")
  f <- as.formula(f)
  do.call("lm", c(f, substitute(data), ...), envir = parent.frame())
}

Testing this, we get the value of data in the call object being .x[[i]], which is how the data chunk is referred to inside the body of map:

mt_nested <- mtcars %>% 
  group_by(cyl) %>% 
  nest() %>%
  mutate(model = map(data, lm_wrap, resp = mpg, traits = hp))

mt_nested$model[[1]]$call
#> lm(formula = mpg ~ hp, data = .x[[i]])

And if we call the function directly we get the expected mtcars in the call

lm_wrap(mtcars, mpg, hp)$call
#> lm(formula = hp ~ mpg, data = mtcars)

^{Created on 2022-11-09 with reprex v2.0.2}

CodePudding user response：

I guess you want something like that:

library(rlang)
library(tidyverse)


lm_wrap <- function(data, traits, resp, env = caller_env(), ...) {
  traits <- enexpr(traits)
  resp <- enexpr(resp)
  data <- enexpr(data)
  dots <- enexprs(...)
  lm_call <- inject(lm(!!resp ~ !!traits, data = !!data, !!!dots),  env)
  return(lm_call)
}

mt_nested <- mtcars %>% group_by(cyl) %>% 
  group_modify( ~ tibble(
    data = list(.x), 
    model = list(lm_wrap(mtcars %>% filter(cyl==!!.y$cyl), resp = mpg, traits = hp))))

mt_nested$model[[1]]$call

#> lm(formula = mpg ~ hp, data = mtcars %>% filter(cyl == 4))