I want to create a list of functions in the global environment, and call them as needed inside a call to mutate or summarise, so it could make the dplyr code a bit less verbose. The problem is that the function must use variables defined inside the dataframe, but not the global env. It may all be related to object scooping, which is a bit tricky for me.
For all code bellow, please load required libraries:
library(dplyr)
library(purrr)
library(rlang)
An example:
With the mtcars
dataset, I want to group_by
a variable and summarise
with these three functions:
any_vs_four_gears
any_am_high_hp
all_combined
.
I can define them inside the call to summarise as follows, which works fine:
mtcars %>%
group_by(carb) %>%
summarise(any_vs_four_gears = any(vs == 1 & gear == 4),
any_am_high_hp = any(am == 1 & hp >170),
all_combined = all(any_vs_four_gears, any_am_high_hp))
# # A tibble: 6 × 4
carb any_vs_four_gears any_am_high_hp all_combined
<dbl> <lgl> <lgl> <lgl>
1 1 TRUE FALSE FALSE
2 2 TRUE FALSE FALSE
3 3 FALSE FALSE FALSE
4 4 TRUE TRUE TRUE
5 6 FALSE TRUE FALSE
6 8 FALSE TRUE FALSE
I can also define the functions as expressions then evaluate the expressions inside the call to summarise, like this:
expressions_as_strings <- list(any_vs_four_gears = 'any(vs == 1 & gear == 4)',
any_am_high_hp = 'any(am == 1 & hp >170)',
all_combined = 'all(any_vs_four_gears, any_am_high_hp)')
expressions <- map(expressions_as_strings, parse_expr)
mtcars %>%
group_by(carb) %>%
summarise(any_vs_four_gears = !!expressions$any_vs_four_gears,
any_am_high_hp = !!expressions$any_am_high_hp,
all_combined = !!expressions$all_combined)
However, i feel i could get more flexibility if I could define functions instead of expressions.
I tried several methods without success:
method_1
method_1 <- list(any_vs_four_gears = function() any(vs == 1 & gear == 4),
any_am_high_hp = function() any(am == 1 & hp >170),
all_combined = function() all(any_vs_four_gears, any_am_high_hp))
#example
mtcars %>%
group_by(carb) %>%
summarise(any_vs_four_gears = method_1$any_vs_four_gears())
method_1 fails. I think it is because the function is getting the values for vs and gear from the global env instead of the data.
method 2
method_2 <- list(any_vs_four_gears = function(var1, var2) {any({{var1}} == 1 & {{var2}} == 4)},
any_am_high_hp = function(var1, var2) {any({{var1}} == 1 & {{var2}} > 170)},
all_combined = function(var1, var2) {all({{var1}}, {{var2}})})
# example
mtcars %>%
group_by(carb) %>%
summarise(any_vs_four_gears = method_2$any_vs_four_gears(vs, gear))
Method 2 does work, but I must include the variables as arguments to the function, which I hoped to be able to bypass.
The main question
Is there a way to create a function that uses variables from the dataframe, but obviates the need to include the variable names as arguments? What I want is something similar to method_1, with pseudocode:
mtcars %>%
group_by(carb) %>%
summarise(any_vs_four_gears = method_x$any_vs_four_gears(),
any_am_high_hp = method_x$any_am_high_hp(),
all_combined = method_x$all_combined())
CodePudding user response:
Up front, I'm generally against writing functions that defeat functional reproducibility, having spent too much time troubleshooting functions that change behavior based on something not passed to them.
However, try this:
method_1 <- list(
any_vs_four_gears = function(data = cur_data()) with(data, any(vs == 1 & gear == 4)),
any_am_high_hp = function(data = cur_data()) with(data, any(am == 1 & hp > 170)),
all_combined = function(data = cur_data()) with(data, all(any_vs_four_gears, any_am_high_hp))
)
mtcars %>%
group_by(carb) %>%
summarise(
any_vs_four_gears = method_1$any_vs_four_gears()
any_am_high_hp = method_1$any_am_high_hp(),
all_combined = method_1$all_combined()
)
# # A tibble: 6 x 4
# carb any_vs_four_gears any_am_high_hp all_combined
# <dbl> <lgl> <lgl> <lgl>
# 1 1 TRUE FALSE FALSE
# 2 2 TRUE FALSE FALSE
# 3 3 FALSE FALSE FALSE
# 4 4 TRUE TRUE TRUE
# 5 6 FALSE TRUE FALSE
# 6 8 FALSE TRUE FALSE
This uses the cur_data()
pronoun/function found in dplyr
-pipe environments, adds just a little surrounding code (with(data, { ... })
, so {
-expression-friendly), and works "as is".
The errors are not difficult to interpret:
mtcars %>%
select(-vs) %>% # intentionally setting up an error
group_by(carb) %>%
summarise(
any_vs_four_gears = method_1$any_vs_four_gears()
any_am_high_hp = method_1$any_am_high_hp(),
all_combined = method_1$all_combined()
)
# Error: Problem with `summarise()` column `any_vs_four_gears`.
# i `any_vs_four_gears = method_1$any_vs_four_gears()`.
# x object 'vs' not found
# i The error occurred in group 1: carb = 1.
# Run `rlang::last_error()` to see where the error occurred.