Directly (re)naming summarized columns with data.table-CodePudding

I want to check the proportions of certain categories inside column of a data.table with respect to some grouping variable. A working chunk of code is attached that does just this:

require(data.table)

n_categories <- 5

dt <- data.table(categories = sample(paste("Category", 1:n_categories), 
                                     1000, 
                                     replace = TRUE), 
                 group_var = rep(1:10, 100))

dt[, lapply(1:n_categories, function(x) table(categories)[[x]]/.N), by = group_var]

The following output is produced:

    group_var   V1   V2   V3   V4   V5
 1:         1 0.22 0.19 0.18 0.24 0.17
 2:         2 0.17 0.23 0.18 0.23 0.19
 3:         3 0.17 0.22 0.21 0.17 0.23
 4:         4 0.17 0.17 0.24 0.19 0.23
 5:         5 0.26 0.19 0.16 0.19 0.20
 6:         6 0.14 0.19 0.24 0.21 0.22
 7:         7 0.12 0.14 0.28 0.30 0.16
 8:         8 0.13 0.19 0.19 0.17 0.32
 9:         9 0.23 0.26 0.24 0.17 0.10
10:        10 0.16 0.20 0.21 0.25 0.18

Now, I wonder if there is a way to modify the given code to directly assign column names ("proportion_category_1", "proportion_category_2", etc.) to the resulting data.table object other than assigning new column names in a new line of code.

The resulting data.table object should look something like this:

    group_var proportion_category_1 proportion_category_2 proportion_category_3 proportion_category_4 proportion_category_5
 1:         1                  0.22                  0.19                  0.18                  0.24                  0.17
 2:         2                  0.17                  0.23                  0.18                  0.23                  0.19
 3:         3                  0.17                  0.22                  0.21                  0.17                  0.23
 4:         4                  0.17                  0.17                  0.24                  0.19                  0.23
 5:         5                  0.26                  0.19                  0.16                  0.19                  0.20
 6:         6                  0.14                  0.19                  0.24                  0.21                  0.22
 7:         7                  0.12                  0.14                  0.28                  0.30                  0.16
 8:         8                  0.13                  0.19                  0.19                  0.17                  0.32
 9:         9                  0.23                  0.26                  0.24                  0.17                  0.10
10:        10                  0.16                  0.20                  0.21                  0.25                  0.18

The use of the lapply statement or some kind of routine that is able to group by an arbitrary number of categories is critical.

CodePudding user response：

You could do something like this with setNames, which you just wrap around the lapply statement.

library(data.table)

dt[, setNames(
  lapply(1:n_categories, function(x)
    table(categories)[[x]] / .N),
  paste0("proportion_category_", 1:n_categories)
), by = group_var]

Output

   group_var proportion_category_1 proportion_category_2 proportion_category_3 proportion_category_4 proportion_category_5
 1:         1                  0.19                  0.19                  0.29                  0.20                  0.13
 2:         2                  0.26                  0.19                  0.19                  0.15                  0.21
 3:         3                  0.17                  0.16                  0.24                  0.22                  0.21
 4:         4                  0.19                  0.20                  0.21                  0.22                  0.18
 5:         5                  0.21                  0.19                  0.20                  0.21                  0.19
 6:         6                  0.24                  0.20                  0.11                  0.21                  0.24
 7:         7                  0.16                  0.29                  0.22                  0.12                  0.21
 8:         8                  0.14                  0.21                  0.23                  0.24                  0.18
 9:         9                  0.29                  0.22                  0.17                  0.16                  0.16
10:        10                  0.20                  0.27                  0.22                  0.19                  0.12