Home > Software engineering >  Peform operations on column names within a user defined function
Peform operations on column names within a user defined function

Time:10-13

I recently understood how to access a column names inside a user defined function: How to access a column name in a user defined function with dplyr?

However, now I also need to access the column names within the operations that are being carried out. For example I would like to do this:

samp_df <- tibble(var1 = c('a', 'b', 'c'),
              var_in_df = c(3,7,9))
calculateSummaries <- function(df, variable){
  df <- df %>% 
    mutate("mean_of_{{variable}}" := mean({{variable}}),
           "sd_of_{{variable}}" := sd({{variable}}),
           "sd_plus_mean_of_{{variable}}" := ("mean_of_{{variable}}"   "sd_of_{{variable}}")
           )
}
df_result <- calculateSummaries(samp_df, var_in_df)

Of course I could do:

"sd_plus_mean_of_{{variable}}" := mean({{variable}})   sd({{variable}})

But in practice, with the real data this won't be practical.

Does anyone know how to so this?

CodePudding user response:

This case ineed a little bit tricky, I think we have to constuct the names first and then use !! sym() to evaluate the strings as objects.

library(dplyr)

samp_df <- tibble(var1 = c('a', 'b', 'c'),
                  var_in_df = c(3,7,9))

calculateSummaries <- function(df, variable){
  
  var_nm <- deparse(substitute(variable))
  
  mean_var_nm <- paste0("mean_of_", var_nm)
  sd_var_nm <- paste0("mean_of_", var_nm)

  df %>%
    mutate("mean_of_{{variable}}" := mean({{variable}}),
           "sd_of_{{variable}}" := sd({{variable}}),
           "sd_plus_mean_of_{{variable}}" := !! sym(mean_var_nm)   !! sym(sd_var_nm)
    )
}

calculateSummaries(samp_df, var_in_df)
#> # A tibble: 3 x 5
#>   var1  var_in_df mean_of_var_in_df sd_of_var_in_df sd_plus_mean_of_var_in_df
#>   <chr>     <dbl>             <dbl>           <dbl>                     <dbl>
#> 1 a             3              6.33            3.06                      12.7
#> 2 b             7              6.33            3.06                      12.7
#> 3 c             9              6.33            3.06                      12.7

An alternative way is using across(), but we still have to construct the variable names.

calculateSummaries <- function(df, variable){
  
  df %>%
    mutate("mean_of_{{variable}}" := mean({{variable}}),
           "sd_of_{{variable}}" := sd({{variable}}),
           across(c({{ variable }}),
                  list(sd_plus_mean_of = ~ get(paste0("mean_of_", cur_column()))   get(paste0("sd_of_", cur_column())))
                  )
    )
}

calculateSummaries(samp_df, var_in_df)

#> # A tibble: 3 x 5
#>   var1  var_in_df mean_of_var_in_df sd_of_var_in_df var_in_df_sd_plus_mean_of
#>   <chr>     <dbl>             <dbl>           <dbl>                     <dbl>
#> 1 a             3              6.33            3.06                      9.39
#> 2 b             7              6.33            3.06                      9.39
#> 3 c             9              6.33            3.06                      9.39

Created on 2022-10-12 by the reprex package (v2.0.1)

CodePudding user response:

According to this tidyverse blog post glue strings are only supported as result names, which IMHO means only on the LHS.

Besides the options offered by @TimTeaFan another option would be to use across to compute all desired values and name the columns using the .names argument:

library(dplyr)

calculateSummaries1 <- function(df, variable) {
  df <- df %>%
    mutate(across({{ variable }},
      .fns = list(
        mean = mean,
        sd = sd,
        sd_plus_mean = ~ mean(.x)   sd(.x)
      ),
      .names = "{.fn}_of_{.col}"
    ))
  df
}

calculateSummaries1(samp_df, var_in_df)
#> # A tibble: 3 × 5
#>   var1  var_in_df mean_of_var_in_df sd_of_var_in_df sd_plus_mean_of_var_in_df
#>   <chr>     <dbl>             <dbl>           <dbl>                     <dbl>
#> 1 a             3              6.33            3.06                      9.39
#> 2 b             7              6.33            3.06                      9.39
#> 3 c             9              6.33            3.06                      9.39

And a second option would be to use some helper variable names for the mean and the sd which avoids to use glue syntax one the RHS but requires an additional rename step:

calculateSummaries2 <- function(df, variable) {
  df <- df %>%
    mutate(
      mean = mean({{ variable }}),
      sd = sd({{ variable }}),
      "sd_plus_mean_of_{{variable}}" := mean   sd
    ) |>
    rename("mean_of_{{variable}}" := mean, "sd_of_{{variable}}" := sd)

  df
}
calculateSummaries2(samp_df, var_in_df)
#> # A tibble: 3 × 5
#>   var1  var_in_df mean_of_var_in_df sd_of_var_in_df sd_plus_mean_of_var_in_df
#>   <chr>     <dbl>             <dbl>           <dbl>                     <dbl>
#> 1 a             3              6.33            3.06                      9.39
#> 2 b             7              6.33            3.06                      9.39
#> 3 c             9              6.33            3.06                      9.39
  • Related