I am trying to write functions when there are two env-variables. This vignette has multiple examples with one env variable and multiple data variables, but no examples with two env variables. https://dplyr.tidyverse.org/articles/programming.html I could not find a solution at https://adv-r.hadley.nz/ either.
As an example, I start with two data frames. First, I want to join them. Then I want to compute some summary statistics. I want to create a function that can do the work. Note that the number of grouping variables (such as state and people) may change depending on the example. Additionally, the variables that are being summed (such as sales and profit) may also change.
# I need a function
Compute = function(df1, df2, grp_vars, compute_vars) {code}
# An interactive solution:
library(dplyr)
sales_data = data.frame(staffID = rep(1:5, each = 5),
state = c(rep('Cal', 13), rep('Wash', 12)),
sales = 101:125,
profit = 11:35
)
sales_data
staff = data.frame(staffID = 1:5,
people = c('Al', 'Barb', 'Carol', 'Dave', 'Ellen'))
staff
res1 = sales_data %>% inner_join(staff, by = 'staffID')
res1
res2 = res1 %>%
group_by(state, people) %>% summarize(total_sales = sum(sales), total_profit = sum(profit))
res2
If I only needed to summarize the data, this would work:
# From Programming with dplyr
my_summarise <- function(data, group_var, summarise_var) {
data %>%
group_by(across({{ group_var }})) %>%
summarise(across({{ summarise_var }}, sum, .names = "sum_{.col}"))
}
my_summarise(res1, c(state, people), c(sales, profit))
Summary. I need a function, Compute = function(df1, df2, grp_vars, compute_vars) {code} First join two data frames when both the joining/grouping variables and the computed variables are selected by the user. Secondly, compute totals and return the results
CodePudding user response:
You could add a third argument by
to your function definition and add the join to your function:
library(dplyr)
compute <- function(df1, df2, by, grp_vars, compute_vars) {
res1 <- df1 %>%
inner_join(df2, by = by)
res1 %>%
group_by(across({{ grp_vars }})) %>%
summarise(across({{ compute_vars }}, sum, .names = "sum_{.col}"), .groups = "drop")
}
compute(sales_data, staff, 'staffID', c(state, people), c(sales, profit))
#> # A tibble: 6 × 4
#> state people sum_sales sum_profit
#> <chr> <chr> <int> <int>
#> 1 Cal Al 515 65
#> 2 Cal Barb 540 90
#> 3 Cal Carol 336 66
#> 4 Wash Carol 229 49
#> 5 Wash Dave 590 140
#> 6 Wash Ellen 615 165