The following question seems very basic in programming with data.table
, so my apologies if it's a duplicate. I spent time researching but could not find an answer.
I want to create a "user-defined function" that wraps around a data.table
wrangling procedure. In this procedure, a new column is created, and I want to let the user set the name of that new column.
Example
I want to take the following code that does work as is, and wrap it inside a function.
library(data.table)
library(magrittr)
library(tibble)
mtcars %>%
as.data.table() %>%
.[, .(max_mpg = max(mpg)), by = cyl] %>%
as_tibble()
#> # A tibble: 3 x 2
#> cyl max_mpg
#> <dbl> <dbl>
#> 1 6 21.4
#> 2 4 33.9
#> 3 8 19.2
Created on 2021-10-13 by the reprex package (v0.3.0)
Let's say that all I want my function to do is to let the user set the name of new_colname_of_choice
:
my_wrapper <- function(new_colname_of_choice) {
mtcars %>%
as.data.table() %>%
.[, .(new_colname_of_choice = max(mpg)), by = cyl] %>%
as_tibble()
}
my_wrapper(new_colname_of_choice = "my_lovely_colname")
#> # A tibble: 3 x 2
#> cyl new_colname_of_choice <---------- why this isn't called "my_lovely_colname"?
#> <dbl> <dbl>
#> 1 6 21.4
#> 2 4 33.9
#> 3 8 19.2
I've tried using curly braces which didn't work either (actually threw an error):
my_wrapper_2 <- function(new_colname_of_choice) {
mtcars %>%
as.data.table() %>%
.[, .({new_colname_of_choice} = max(mpg)), by = cyl] %>%
as_tibble()
}
Error: unexpected '=' in: " as.data.table() %>% .[, .({new_colname_of_choice} ="
Which is surprising because curly braces do promote the wanted naming ability, but in a different (yet similar) kind of code:
my_wrapper_3 <- function(new_colname_of_choice) {
mtcars %>%
as.data.table() %>%
.[, {new_colname_of_choice} := max(mpg), by = cyl] %>%
as_tibble()
}
my_wrapper_3(new_colname_of_choice = "my_lovely_colname")
## # A tibble: 32 x 12
## mpg cyl disp hp drat wt qsec vs am gear carb my_lovely_colname <---- SUCCESS!
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 21 6 160 110 3.9 2.62 16.5 0 1 4 4 21.4
## 2 21 6 160 110 3.9 2.88 17.0 0 1 4 4 21.4
## 3 22.8 4 108 93 3.85 2.32 18.6 1 1 4 1 33.9
## 4 21.4 6 258 110 3.08 3.22 19.4 1 0 3 1 21.4
## 5 18.7 8 360 175 3.15 3.44 17.0 0 0 3 2 19.2
## 6 18.1 6 225 105 2.76 3.46 20.2 1 0 3 1 21.4
## 7 14.3 8 360 245 3.21 3.57 15.8 0 0 3 4 19.2
## 8 24.4 4 147. 62 3.69 3.19 20 1 0 4 2 33.9
## 9 22.8 4 141. 95 3.92 3.15 22.9 1 0 4 2 33.9
## 10 19.2 6 168. 123 3.92 3.44 18.3 1 0 4 4 21.4
## # ... with 22 more rows
Bottom line
My conclusion is that the =
operator is sensitive to {...}
on the LHS. How can I otherwise pass a name (from argument) to the LHS in the initial my_wrapper()
example?
EDIT
I'd like to add the dplyr
solution for the same problem, taken from the programming with dplyr vignette:
library(dplyr)
my_wrapper_dplyr <- function(new_colname_of_choice) {
mtcars %>%
group_by(cyl) %>%
summarise("{new_colname_of_choice}" := max(mpg))
}
my_wrapper_dplyr("another_lovely_colname")
Which is pretty robust and works in all naming situations I've encountered. Is there a built-in/canonical practice in data.table
similar to {dplyr}
's?
CodePudding user response:
One thing you can do is separate the creation of the column and the naming of the column like so:
my_wrapper <- function(new_colname_of_choice) {
mtcars %>%
as.data.table() %>%
.[, .(tempcol = max(mpg)), by = cyl] %>%
setnames(., "tempcol", new_colname_of_choice) %>%
as.tibble()
}
my_wrapper("my_lovely_colname")
Using this method you can use either .(tempcol = max(mpg))
or tempcol := max(mpg)
CodePudding user response:
Using setNames
from stats
:
my_wrapper <- function(new_colname_of_choice) {
mtcars %>%
as.data.table() %>%
.[, setNames(list(max(mpg)), new_colname_of_choice), by = cyl] %>%
as_tibble()
}
my_wrapper(new_colname_of_choice = "my_lovely_colname")