Home > OS >  Nice way to group data in a `data.table` when the new column name is given as a character vector
Nice way to group data in a `data.table` when the new column name is given as a character vector

Time:05-08

In other words, my question is about the j argument to data.table when the name of the new column is a character vector. For example:

dt <- data.table(x = c(1, 1, 2, 2, 3, 3), y = rnorm(6))
agg_col_name <- 'avg'

grouped_dt <- dt[, .(z = mean(y)), by = x]
setnames(grouped_dt, 'z', agg_col_name)
> grouped_dt
   x        avg
1: 1 -0.2554987
2: 2 -0.4245852
3: 3 -0.4881073

There should be a more elegant way to do the last two statements as one, yes?

Perhaps this is a question about how to create suitable list for the j argument.

CodePudding user response:

Although probably not what you are looking for, but you could use setNames inside, where it wraps around (.(z = mean(y)).

library(data.table)

dt[, setNames(.(z = mean(y)), agg_col_name), by = x]

Or use setnames after doing the summary:

setnames(dt[, mean(y), by = x], 'V1', agg_col_name)[]

Output

   x        avg
1: 1  0.5626526
2: 2  0.3549653
3: 3 -0.2861405

However, as mentioned in the comments, it is easier to do with the dev version of data.table. You can see more about the development of this feature at [programming on data.table #4304]:(https://github.com/Rdatatable/data.table/pull/4304).

# Latest development version:
data.table::update.dev.pkg()

library(data.table)

dt[, .(z = mean(y)), by = x, env = list(z=get("agg_col_name"))]

#   x        avg
#1: 1 -0.1640783
#2: 2  0.5375794
#3: 3  0.1539785
  • Related