Home > Blockchain >  Call function from the global environment with implicit dataframe variables (from the calling env?)
Call function from the global environment with implicit dataframe variables (from the calling env?)

Time:12-31

I want to create a list of functions in the global environment, and call them as needed inside a call to mutate or summarise, so it could make the dplyr code a bit less verbose. The problem is that the function must use variables defined inside the dataframe, but not the global env. It may all be related to object scooping, which is a bit tricky for me.

For all code bellow, please load required libraries:

library(dplyr)
library(purrr)
library(rlang)

An example: With the mtcars dataset, I want to group_by a variable and summarise with these three functions: any_vs_four_gears any_am_high_hp all_combined.

I can define them inside the call to summarise as follows, which works fine:

mtcars %>%
        group_by(carb) %>%
        summarise(any_vs_four_gears = any(vs == 1 & gear == 4),
                  any_am_high_hp = any(am == 1 & hp >170),
                  all_combined = all(any_vs_four_gears, any_am_high_hp))

# # A tibble: 6 × 4
carb any_vs_four_gears any_am_high_hp all_combined
<dbl> <lgl>             <lgl>          <lgl>
1     1 TRUE              FALSE          FALSE
2     2 TRUE              FALSE          FALSE
3     3 FALSE             FALSE          FALSE
4     4 TRUE              TRUE           TRUE
5     6 FALSE             TRUE           FALSE
6     8 FALSE             TRUE           FALSE

I can also define the functions as expressions then evaluate the expressions inside the call to summarise, like this:

expressions_as_strings <- list(any_vs_four_gears = 'any(vs == 1 & gear == 4)',
                               any_am_high_hp = 'any(am == 1 & hp >170)',
                               all_combined = 'all(any_vs_four_gears, any_am_high_hp)')
expressions <- map(expressions_as_strings, parse_expr)

mtcars %>%
        group_by(carb) %>%
        summarise(any_vs_four_gears = !!expressions$any_vs_four_gears,
                  any_am_high_hp = !!expressions$any_am_high_hp,
                  all_combined = !!expressions$all_combined)

However, i feel i could get more flexibility if I could define functions instead of expressions.

I tried several methods without success:

method_1

method_1 <- list(any_vs_four_gears = function() any(vs == 1 & gear == 4),
                  any_am_high_hp = function() any(am == 1 & hp >170),
                  all_combined = function() all(any_vs_four_gears, any_am_high_hp))
#example

mtcars %>%
        group_by(carb) %>%
        summarise(any_vs_four_gears = method_1$any_vs_four_gears())

method_1 fails. I think it is because the function is getting the values for vs and gear from the global env instead of the data.

method 2

method_2 <- list(any_vs_four_gears = function(var1, var2) {any({{var1}} == 1 & {{var2}} == 4)},
                any_am_high_hp = function(var1, var2) {any({{var1}} == 1 & {{var2}} > 170)},
                all_combined = function(var1, var2) {all({{var1}}, {{var2}})})

# example

mtcars %>%
        group_by(carb) %>%
        summarise(any_vs_four_gears = method_2$any_vs_four_gears(vs, gear))

Method 2 does work, but I must include the variables as arguments to the function, which I hoped to be able to bypass.

The main question

Is there a way to create a function that uses variables from the dataframe, but obviates the need to include the variable names as arguments? What I want is something similar to method_1, with pseudocode:

mtcars %>%
        group_by(carb) %>%
        summarise(any_vs_four_gears = method_x$any_vs_four_gears(),
                  any_am_high_hp = method_x$any_am_high_hp(),
                  all_combined = method_x$all_combined())

CodePudding user response:

Up front, I'm generally against writing functions that defeat functional reproducibility, having spent too much time troubleshooting functions that change behavior based on something not passed to them.

However, try this:

method_1 <- list(
  any_vs_four_gears = function(data = cur_data()) with(data, any(vs == 1 & gear == 4)),
  any_am_high_hp = function(data = cur_data()) with(data, any(am == 1 & hp > 170)),
  all_combined = function(data = cur_data()) with(data, all(any_vs_four_gears, any_am_high_hp))
)

mtcars %>%
  group_by(carb) %>%
  summarise(
    any_vs_four_gears = method_1$any_vs_four_gears()
    any_am_high_hp = method_1$any_am_high_hp(),
    all_combined = method_1$all_combined()
  )
# # A tibble: 6 x 4
#    carb any_vs_four_gears any_am_high_hp all_combined
#   <dbl> <lgl>             <lgl>          <lgl>       
# 1     1 TRUE              FALSE          FALSE       
# 2     2 TRUE              FALSE          FALSE       
# 3     3 FALSE             FALSE          FALSE       
# 4     4 TRUE              TRUE           TRUE        
# 5     6 FALSE             TRUE           FALSE       
# 6     8 FALSE             TRUE           FALSE       

This uses the cur_data() pronoun/function found in dplyr-pipe environments, adds just a little surrounding code (with(data, { ... }), so {-expression-friendly), and works "as is".

The errors are not difficult to interpret:

mtcars %>%
  select(-vs) %>%     # intentionally setting up an error
  group_by(carb) %>%
  summarise(
    any_vs_four_gears = method_1$any_vs_four_gears()
    any_am_high_hp = method_1$any_am_high_hp(),
    all_combined = method_1$all_combined()
  )
# Error: Problem with `summarise()` column `any_vs_four_gears`.
# i `any_vs_four_gears = method_1$any_vs_four_gears()`.
# x object 'vs' not found
# i The error occurred in group 1: carb = 1.
# Run `rlang::last_error()` to see where the error occurred.
  • Related