Home > Blockchain >  Using tidyverse's curly-curly syntax to access data frame columns within a function
Using tidyverse's curly-curly syntax to access data frame columns within a function

Time:10-18

I am trying to calculate an indicator value per group in a dataframe, where the indicator value per group is the sum of one column divided by the sum of another column within that group. I want to pass the column names as numerator and denominator arguments. I have tried the following code to no avail.

library(tidyverse)

a = c(1,1,1,2,2)
b = 1:5
c = 6:10
d = 9:13

dummy_data = tibble(
  a,b,c,d
)

calc_indicator = function(numerator,denominator){
  data = dummy_data %>% 
    group_by(a) %>% 
    mutate(
      indicator_value = sum({{numerator}})/sum({{denominator}})
    )
  
  data
}

calc_indicator("b","d")
#> Error in `mutate()`:
#> ! Problem while computing `indicator_value = sum("b")/sum("d")`.
#> ℹ The error occurred in group 1: a = 1.
#> Caused by error in `sum()`:
#> ! invalid 'type' (character) of argument

Created on 2022-10-17 by the reprex package (v2.0.1)

I realize that if I do not use quotations in the arguments submitted to the function (rather than calc_indicator("b","d") I enter calc_indicator(b,d)), this code runs. However, numerators and denominators for different indicators are defined in an excel file, so they arrive in the R environment as strings.

Any suggestions?

CodePudding user response:

As per the Programming with dplyr article/vignette, {{ is used for unquoted column names, but for string/character vector of column names in objects you should use .data[[col]], e.g.,

calc_indicator = function(numerator,denominator){
  data = dummy_data %>% 
    group_by(a) %>% 
    mutate(
      indicator_value = sum(.data[[numerator]])/sum(.data[[denominator]])
    )
  
  data
}

calc_indicator("b","d")

I'd also recommend passing the data frame in to the function as an argument too. Functions that rely on having (in this case) a data frame named dummy_data in your global environment are much less flexible.

Right now, your function will only work if you have data frame named dummy_data, and it will only work on a data frame with that name. If you rewrite the function to have a data argument, then you can use it on any data frame:

calc_indicator = function(data, group, numerator, denominator){
  data %>% 
    group_by(.data[[group]]) %>% 
    mutate(
      indicator_value = sum(.data[[numerator]])/sum(.data[[denominator]])
    )
}

## you can still use it on your dummy data
calc_indicator(dummy_data, "a", "b", "c")

## you can use it on other data too
calc_indicator(mtcars, "cyl", "hp", "wt")
# # A tibble: 32 × 12
# # Groups:   cyl [3]
#      mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb indicator_value
#    <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>           <dbl>
#  1  21       6  160    110  3.9   2.62  16.5     0     1     4     4            39.2
#  2  21       6  160    110  3.9   2.88  17.0     0     1     4     4            39.2
#  3  22.8     4  108     93  3.85  2.32  18.6     1     1     4     1            36.2
#  4  21.4     6  258    110  3.08  3.22  19.4     1     0     3     1            39.2
#  5  18.7     8  360    175  3.15  3.44  17.0     0     0     3     2            52.3
#  ...
  • Related