I recently understood how to access a column names inside a user defined function: How to access a column name in a user defined function with dplyr?
However, now I also need to access the column names within the operations that are being carried out. For example I would like to do this:
samp_df <- tibble(var1 = c('a', 'b', 'c'),
var_in_df = c(3,7,9))
calculateSummaries <- function(df, variable){
df <- df %>%
mutate("mean_of_{{variable}}" := mean({{variable}}),
"sd_of_{{variable}}" := sd({{variable}}),
"sd_plus_mean_of_{{variable}}" := ("mean_of_{{variable}}" "sd_of_{{variable}}")
)
}
df_result <- calculateSummaries(samp_df, var_in_df)
Of course I could do:
"sd_plus_mean_of_{{variable}}" := mean({{variable}}) sd({{variable}})
But in practice, with the real data this won't be practical.
Does anyone know how to so this?
CodePudding user response:
This case ineed a little bit tricky, I think we have to constuct the names first and then use !! sym()
to evaluate the strings as objects.
library(dplyr)
samp_df <- tibble(var1 = c('a', 'b', 'c'),
var_in_df = c(3,7,9))
calculateSummaries <- function(df, variable){
var_nm <- deparse(substitute(variable))
mean_var_nm <- paste0("mean_of_", var_nm)
sd_var_nm <- paste0("mean_of_", var_nm)
df %>%
mutate("mean_of_{{variable}}" := mean({{variable}}),
"sd_of_{{variable}}" := sd({{variable}}),
"sd_plus_mean_of_{{variable}}" := !! sym(mean_var_nm) !! sym(sd_var_nm)
)
}
calculateSummaries(samp_df, var_in_df)
#> # A tibble: 3 x 5
#> var1 var_in_df mean_of_var_in_df sd_of_var_in_df sd_plus_mean_of_var_in_df
#> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 a 3 6.33 3.06 12.7
#> 2 b 7 6.33 3.06 12.7
#> 3 c 9 6.33 3.06 12.7
An alternative way is using across()
, but we still have to construct the variable names.
calculateSummaries <- function(df, variable){
df %>%
mutate("mean_of_{{variable}}" := mean({{variable}}),
"sd_of_{{variable}}" := sd({{variable}}),
across(c({{ variable }}),
list(sd_plus_mean_of = ~ get(paste0("mean_of_", cur_column())) get(paste0("sd_of_", cur_column())))
)
)
}
calculateSummaries(samp_df, var_in_df)
#> # A tibble: 3 x 5
#> var1 var_in_df mean_of_var_in_df sd_of_var_in_df var_in_df_sd_plus_mean_of
#> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 a 3 6.33 3.06 9.39
#> 2 b 7 6.33 3.06 9.39
#> 3 c 9 6.33 3.06 9.39
Created on 2022-10-12 by the reprex package (v2.0.1)
CodePudding user response:
According to this tidyverse blog post glue strings are only supported as result names, which IMHO means only on the LHS.
Besides the options offered by @TimTeaFan another option would be to use across
to compute all desired values and name the columns using the .names
argument:
library(dplyr)
calculateSummaries1 <- function(df, variable) {
df <- df %>%
mutate(across({{ variable }},
.fns = list(
mean = mean,
sd = sd,
sd_plus_mean = ~ mean(.x) sd(.x)
),
.names = "{.fn}_of_{.col}"
))
df
}
calculateSummaries1(samp_df, var_in_df)
#> # A tibble: 3 × 5
#> var1 var_in_df mean_of_var_in_df sd_of_var_in_df sd_plus_mean_of_var_in_df
#> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 a 3 6.33 3.06 9.39
#> 2 b 7 6.33 3.06 9.39
#> 3 c 9 6.33 3.06 9.39
And a second option would be to use some helper variable names for the mean and the sd which avoids to use glue syntax one the RHS but requires an additional rename
step:
calculateSummaries2 <- function(df, variable) {
df <- df %>%
mutate(
mean = mean({{ variable }}),
sd = sd({{ variable }}),
"sd_plus_mean_of_{{variable}}" := mean sd
) |>
rename("mean_of_{{variable}}" := mean, "sd_of_{{variable}}" := sd)
df
}
calculateSummaries2(samp_df, var_in_df)
#> # A tibble: 3 × 5
#> var1 var_in_df mean_of_var_in_df sd_of_var_in_df sd_plus_mean_of_var_in_df
#> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 a 3 6.33 3.06 9.39
#> 2 b 7 6.33 3.06 9.39
#> 3 c 9 6.33 3.06 9.39