Home > Blockchain >  Directly (re)naming summarized columns with data.table
Directly (re)naming summarized columns with data.table

Time:06-24

I want to check the proportions of certain categories inside column of a data.table with respect to some grouping variable. A working chunk of code is attached that does just this:

require(data.table)

n_categories <- 5

dt <- data.table(categories = sample(paste("Category", 1:n_categories), 
                                     1000, 
                                     replace = TRUE), 
                 group_var = rep(1:10, 100))

dt[, lapply(1:n_categories, function(x) table(categories)[[x]]/.N), by = group_var]

The following output is produced:

    group_var   V1   V2   V3   V4   V5
 1:         1 0.22 0.19 0.18 0.24 0.17
 2:         2 0.17 0.23 0.18 0.23 0.19
 3:         3 0.17 0.22 0.21 0.17 0.23
 4:         4 0.17 0.17 0.24 0.19 0.23
 5:         5 0.26 0.19 0.16 0.19 0.20
 6:         6 0.14 0.19 0.24 0.21 0.22
 7:         7 0.12 0.14 0.28 0.30 0.16
 8:         8 0.13 0.19 0.19 0.17 0.32
 9:         9 0.23 0.26 0.24 0.17 0.10
10:        10 0.16 0.20 0.21 0.25 0.18

Now, I wonder if there is a way to modify the given code to directly assign column names ("proportion_category_1", "proportion_category_2", etc.) to the resulting data.table object other than assigning new column names in a new line of code.

The resulting data.table object should look something like this:

    group_var proportion_category_1 proportion_category_2 proportion_category_3 proportion_category_4 proportion_category_5
 1:         1                  0.22                  0.19                  0.18                  0.24                  0.17
 2:         2                  0.17                  0.23                  0.18                  0.23                  0.19
 3:         3                  0.17                  0.22                  0.21                  0.17                  0.23
 4:         4                  0.17                  0.17                  0.24                  0.19                  0.23
 5:         5                  0.26                  0.19                  0.16                  0.19                  0.20
 6:         6                  0.14                  0.19                  0.24                  0.21                  0.22
 7:         7                  0.12                  0.14                  0.28                  0.30                  0.16
 8:         8                  0.13                  0.19                  0.19                  0.17                  0.32
 9:         9                  0.23                  0.26                  0.24                  0.17                  0.10
10:        10                  0.16                  0.20                  0.21                  0.25                  0.18

The use of the lapply statement or some kind of routine that is able to group by an arbitrary number of categories is critical.

CodePudding user response:

You could do something like this with setNames, which you just wrap around the lapply statement.

library(data.table)

dt[, setNames(
  lapply(1:n_categories, function(x)
    table(categories)[[x]] / .N),
  paste0("proportion_category_", 1:n_categories)
), by = group_var]

Output

   group_var proportion_category_1 proportion_category_2 proportion_category_3 proportion_category_4 proportion_category_5
 1:         1                  0.19                  0.19                  0.29                  0.20                  0.13
 2:         2                  0.26                  0.19                  0.19                  0.15                  0.21
 3:         3                  0.17                  0.16                  0.24                  0.22                  0.21
 4:         4                  0.19                  0.20                  0.21                  0.22                  0.18
 5:         5                  0.21                  0.19                  0.20                  0.21                  0.19
 6:         6                  0.24                  0.20                  0.11                  0.21                  0.24
 7:         7                  0.16                  0.29                  0.22                  0.12                  0.21
 8:         8                  0.14                  0.21                  0.23                  0.24                  0.18
 9:         9                  0.29                  0.22                  0.17                  0.16                  0.16
10:        10                  0.20                  0.27                  0.22                  0.19                  0.12
  • Related