Home > Software design >  Group by multiple columns in a function in dplyr
Group by multiple columns in a function in dplyr

Time:10-20

I want to create a function that takes an externally defined variable and uses it in a group by using dplyr. Here is what I have so far:


data(mtcars)

my_grp_col <- 'gear'

calculate_mean <- function(data, grouping_column, target){
  data %>% 
    group_by(cyl, am, {{my_grp_col}}, target) %>% 
    summarize(mean(target, na.rm = T))
}

calculate_mean(data = mtcars, grouping_column = my_grp_col, target = mpg)

Essentially, I want to group by cyl, am, gear (which I have defined externally) and then calculate the mean of target (mpg).

CodePudding user response:

The following would work (note that you need also {{...}} around target in this case):

data(mtcars)

my_grp_col <- 'gear'

calculate_mean <- function(data, grouping_column, target){

  data %>% 
    group_by(cyl, am, !!sym(grouping_column), {{target}}) %>% 
    summarize(mean(target, na.rm = T))
}

calculate_mean(data = mtcars, grouping_column = my_grp_col, target = mpg)

However, it would look much nicer if you also directly give grouping_column without defining it as string before:

calculate_mean <- function(data, grouping_column, target){
  data %>% 
    group_by(cyl, am, {{grouping_column}}, {{target}}) %>% 
    summarize(mean(target, na.rm = T))
}

calculate_mean(data = mtcars, grouping_column = gear, target = mpg)
  • Related