Home > Software design >  How to combine function argument with group_by in R
How to combine function argument with group_by in R

Time:09-19

I would like to use group_by( ) function with my customised function but the column names that goes within group_by would be defined in my function argument.

See a hypothetical example of what my data would look like:

data <- data.frame(ind = rep(c("A", "B", "C"), 4),
                   gender = rep(c("F", "M"), each = 6), 
                   value = sample(1:100, 12))

And this is the result I would like to have:

result <- data %>%
   group_by(ind, gender) %>%
   mutate(value = mean(value)) %>%
   distinct()

This is how I was trying to make my function to work:

myFunction <- function(data, set_group, variable){
   result <- data %>%
      group_by(get(set_group)) %>%
      mutate(across(all_of(variable), ~ mean(.x, na.rm = TRUE))) %>%
      distinct()
}

result3 <- myFunction(data, set_group = c("ind", "gender"), variable = c("value"))
result3

I want to allow that the user define as many set_group as needed and as many variable as needed. I tried using get( ) function, all_of( ) function and mget( ) function within group_by but none worked. Does anyone know how can I code it?

Thank you!

CodePudding user response:

We could use across within group_by

myFunction <- function(data, set_group, variable){
    data %>%
      group_by(across(all_of(set_group))) %>%
      mutate(across(all_of(variable), ~ mean(.x, na.rm = TRUE))) %>%
      ungroup %>%
      distinct() 
}

-testing

> myFunction(data, set_group = c("ind", "gender"), variable = c("value"))
# A tibble: 6 × 3
  ind   gender value
  <chr> <chr>  <dbl>
1 A     F       43.5
2 B     F       87.5
3 C     F       67.5
4 A     M       13  
5 B     M       43.5
6 C     M       37.5

Another option is to convert to symbols and evaluate (!!!)

myFunction <- function(data, set_group, variable){
    data %>%
      group_by(!!! rlang::syms(set_group)) %>%
      mutate(across(all_of(variable), ~ mean(.x, na.rm = TRUE))) %>%
      ungroup %>%
      distinct() 
}

-testing

> myFunction(data, set_group = c("ind", "gender"), variable = c("value"))
# A tibble: 6 × 3
  ind   gender value
  <chr> <chr>  <dbl>
1 A     F       43.5
2 B     F       87.5
3 C     F       67.5
4 A     M       13  
5 B     M       43.5
6 C     M       37.5

NOTE: get is used when there is a single object, for multiple objects mget can be used. But, it is better to use tidyverse functions

  • Related