In other words, my question is about the j
argument to data.table
when the name of the new column is a character vector. For example:
dt <- data.table(x = c(1, 1, 2, 2, 3, 3), y = rnorm(6))
agg_col_name <- 'avg'
grouped_dt <- dt[, .(z = mean(y)), by = x]
setnames(grouped_dt, 'z', agg_col_name)
> grouped_dt
x avg
1: 1 -0.2554987
2: 2 -0.4245852
3: 3 -0.4881073
There should be a more elegant way to do the last two statements as one, yes?
Perhaps this is a question about how to create suitable list
for the j
argument.
CodePudding user response:
Although probably not what you are looking for, but you could use setNames
inside, where it wraps around (.(z = mean(y))
.
library(data.table)
dt[, setNames(.(z = mean(y)), agg_col_name), by = x]
Or use setnames
after doing the summary:
setnames(dt[, mean(y), by = x], 'V1', agg_col_name)[]
Output
x avg
1: 1 0.5626526
2: 2 0.3549653
3: 3 -0.2861405
However, as mentioned in the comments, it is easier to do with the dev version of data.table
. You can see more about the development of this feature at [programming on data.table #4304]:(https://github.com/Rdatatable/data.table/pull/4304).
# Latest development version:
data.table::update.dev.pkg()
library(data.table)
dt[, .(z = mean(y)), by = x, env = list(z=get("agg_col_name"))]
# x avg
#1: 1 -0.1640783
#2: 2 0.5375794
#3: 3 0.1539785