I am trying to calculate an indicator value per group in a dataframe, where the indicator value per group is the sum of one column divided by the sum of another column within that group. I want to pass the column names as numerator and denominator arguments. I have tried the following code to no avail.
library(tidyverse)
a = c(1,1,1,2,2)
b = 1:5
c = 6:10
d = 9:13
dummy_data = tibble(
a,b,c,d
)
calc_indicator = function(numerator,denominator){
data = dummy_data %>%
group_by(a) %>%
mutate(
indicator_value = sum({{numerator}})/sum({{denominator}})
)
data
}
calc_indicator("b","d")
#> Error in `mutate()`:
#> ! Problem while computing `indicator_value = sum("b")/sum("d")`.
#> ℹ The error occurred in group 1: a = 1.
#> Caused by error in `sum()`:
#> ! invalid 'type' (character) of argument
Created on 2022-10-17 by the reprex package (v2.0.1)
I realize that if I do not use quotations in the arguments submitted to the function (rather than calc_indicator("b","d")
I enter calc_indicator(b,d)
), this code runs. However, numerators and denominators for different indicators are defined in an excel file, so they arrive in the R environment as strings.
Any suggestions?
CodePudding user response:
As per the Programming with dplyr article/vignette, {{
is used for unquoted column names, but for string/character vector of column names in objects you should use .data[[col]]
, e.g.,
calc_indicator = function(numerator,denominator){
data = dummy_data %>%
group_by(a) %>%
mutate(
indicator_value = sum(.data[[numerator]])/sum(.data[[denominator]])
)
data
}
calc_indicator("b","d")
I'd also recommend passing the data frame in to the function as an argument too. Functions that rely on having (in this case) a data frame named dummy_data
in your global environment are much less flexible.
Right now, your function will only work if you have data frame named dummy_data
, and it will only work on a data frame with that name. If you rewrite the function to have a data
argument, then you can use it on any data frame:
calc_indicator = function(data, group, numerator, denominator){
data %>%
group_by(.data[[group]]) %>%
mutate(
indicator_value = sum(.data[[numerator]])/sum(.data[[denominator]])
)
}
## you can still use it on your dummy data
calc_indicator(dummy_data, "a", "b", "c")
## you can use it on other data too
calc_indicator(mtcars, "cyl", "hp", "wt")
# # A tibble: 32 × 12
# # Groups: cyl [3]
# mpg cyl disp hp drat wt qsec vs am gear carb indicator_value
# <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 21 6 160 110 3.9 2.62 16.5 0 1 4 4 39.2
# 2 21 6 160 110 3.9 2.88 17.0 0 1 4 4 39.2
# 3 22.8 4 108 93 3.85 2.32 18.6 1 1 4 1 36.2
# 4 21.4 6 258 110 3.08 3.22 19.4 1 0 3 1 39.2
# 5 18.7 8 360 175 3.15 3.44 17.0 0 0 3 2 52.3
# ...