Home > Enterprise >  How to wrap a data.table code with custom function, allowing to name a new column by getting the nam
How to wrap a data.table code with custom function, allowing to name a new column by getting the nam

Time:10-13

The following question seems very basic in programming with data.table, so my apologies if it's a duplicate. I spent time researching but could not find an answer.

I want to create a "user-defined function" that wraps around a data.table wrangling procedure. In this procedure, a new column is created, and I want to let the user set the name of that new column.

Example

I want to take the following code that does work as is, and wrap it inside a function.

library(data.table)
library(magrittr)
library(tibble)

mtcars %>%
  as.data.table() %>%
  .[, .(max_mpg = max(mpg)), by = cyl] %>%
  as_tibble()
#> # A tibble: 3 x 2
#>     cyl max_mpg
#>   <dbl>   <dbl>
#> 1     6    21.4
#> 2     4    33.9
#> 3     8    19.2

Created on 2021-10-13 by the reprex package (v0.3.0)

Let's say that all I want my function to do is to let the user set the name of new_colname_of_choice:

my_wrapper <- function(new_colname_of_choice) {
  mtcars %>%
    as.data.table() %>%
    .[, .(new_colname_of_choice = max(mpg)), by = cyl] %>%
    as_tibble()
}


my_wrapper(new_colname_of_choice = "my_lovely_colname")
#> # A tibble: 3 x 2
#>     cyl new_colname_of_choice <---------- why this isn't called "my_lovely_colname"?
#>   <dbl>                 <dbl>
#> 1     6                  21.4
#> 2     4                  33.9
#> 3     8                  19.2

I've tried using curly braces which didn't work either (actually threw an error):

my_wrapper_2 <- function(new_colname_of_choice) {
  
  mtcars %>%
    as.data.table() %>%
    .[, .({new_colname_of_choice} = max(mpg)), by = cyl] %>%
    as_tibble()
  
}

Error: unexpected '=' in: " as.data.table() %>% .[, .({new_colname_of_choice} ="

Which is surprising because curly braces do promote the wanted naming ability, but in a different (yet similar) kind of code:

my_wrapper_3 <- function(new_colname_of_choice) {
  mtcars %>%
    as.data.table() %>%
    .[, {new_colname_of_choice} := max(mpg), by = cyl] %>%
    as_tibble()
}


my_wrapper_3(new_colname_of_choice = "my_lovely_colname")

## # A tibble: 32 x 12
##      mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb my_lovely_colname <---- SUCCESS!
##    <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>             <dbl>
##  1  21       6  160    110  3.9   2.62  16.5     0     1     4     4              21.4
##  2  21       6  160    110  3.9   2.88  17.0     0     1     4     4              21.4
##  3  22.8     4  108     93  3.85  2.32  18.6     1     1     4     1              33.9
##  4  21.4     6  258    110  3.08  3.22  19.4     1     0     3     1              21.4
##  5  18.7     8  360    175  3.15  3.44  17.0     0     0     3     2              19.2
##  6  18.1     6  225    105  2.76  3.46  20.2     1     0     3     1              21.4
##  7  14.3     8  360    245  3.21  3.57  15.8     0     0     3     4              19.2
##  8  24.4     4  147.    62  3.69  3.19  20       1     0     4     2              33.9
##  9  22.8     4  141.    95  3.92  3.15  22.9     1     0     4     2              33.9
## 10  19.2     6  168.   123  3.92  3.44  18.3     1     0     4     4              21.4
## # ... with 22 more rows

Bottom line

My conclusion is that the = operator is sensitive to {...} on the LHS. How can I otherwise pass a name (from argument) to the LHS in the initial my_wrapper() example?


EDIT


I'd like to add the dplyr solution for the same problem, taken from the programming with dplyr vignette:

library(dplyr)

my_wrapper_dplyr <- function(new_colname_of_choice) {
  mtcars %>%
    group_by(cyl) %>%
    summarise("{new_colname_of_choice}" := max(mpg))
}

my_wrapper_dplyr("another_lovely_colname")

Which is pretty robust and works in all naming situations I've encountered. Is there a built-in/canonical practice in data.table similar to {dplyr}'s?

CodePudding user response:

One thing you can do is separate the creation of the column and the naming of the column like so:

my_wrapper <- function(new_colname_of_choice) {
  mtcars %>%
    as.data.table() %>%
    .[, .(tempcol = max(mpg)), by = cyl] %>%
    setnames(., "tempcol", new_colname_of_choice) %>%
    as.tibble()
}

my_wrapper("my_lovely_colname")

Using this method you can use either .(tempcol = max(mpg)) or tempcol := max(mpg)

CodePudding user response:

Using setNames from stats:

my_wrapper <- function(new_colname_of_choice) {
      
      mtcars %>%
        as.data.table() %>%
        .[, setNames(list(max(mpg)), new_colname_of_choice), by = cyl] %>%
        as_tibble()
    }
    
    
    my_wrapper(new_colname_of_choice = "my_lovely_colname")
  • Related