I want to check the proportions of certain categories inside column of a data.table with respect to some grouping variable. A working chunk of code is attached that does just this:
require(data.table)
n_categories <- 5
dt <- data.table(categories = sample(paste("Category", 1:n_categories),
1000,
replace = TRUE),
group_var = rep(1:10, 100))
dt[, lapply(1:n_categories, function(x) table(categories)[[x]]/.N), by = group_var]
The following output is produced:
group_var V1 V2 V3 V4 V5
1: 1 0.22 0.19 0.18 0.24 0.17
2: 2 0.17 0.23 0.18 0.23 0.19
3: 3 0.17 0.22 0.21 0.17 0.23
4: 4 0.17 0.17 0.24 0.19 0.23
5: 5 0.26 0.19 0.16 0.19 0.20
6: 6 0.14 0.19 0.24 0.21 0.22
7: 7 0.12 0.14 0.28 0.30 0.16
8: 8 0.13 0.19 0.19 0.17 0.32
9: 9 0.23 0.26 0.24 0.17 0.10
10: 10 0.16 0.20 0.21 0.25 0.18
Now, I wonder if there is a way to modify the given code to directly assign column names ("proportion_category_1", "proportion_category_2", etc.) to the resulting data.table object other than assigning new column names in a new line of code.
The resulting data.table object should look something like this:
group_var proportion_category_1 proportion_category_2 proportion_category_3 proportion_category_4 proportion_category_5
1: 1 0.22 0.19 0.18 0.24 0.17
2: 2 0.17 0.23 0.18 0.23 0.19
3: 3 0.17 0.22 0.21 0.17 0.23
4: 4 0.17 0.17 0.24 0.19 0.23
5: 5 0.26 0.19 0.16 0.19 0.20
6: 6 0.14 0.19 0.24 0.21 0.22
7: 7 0.12 0.14 0.28 0.30 0.16
8: 8 0.13 0.19 0.19 0.17 0.32
9: 9 0.23 0.26 0.24 0.17 0.10
10: 10 0.16 0.20 0.21 0.25 0.18
The use of the lapply
statement or some kind of routine that is able to group by an arbitrary number of categories is critical.
CodePudding user response:
You could do something like this with setNames
, which you just wrap around the lapply
statement.
library(data.table)
dt[, setNames(
lapply(1:n_categories, function(x)
table(categories)[[x]] / .N),
paste0("proportion_category_", 1:n_categories)
), by = group_var]
Output
group_var proportion_category_1 proportion_category_2 proportion_category_3 proportion_category_4 proportion_category_5
1: 1 0.19 0.19 0.29 0.20 0.13
2: 2 0.26 0.19 0.19 0.15 0.21
3: 3 0.17 0.16 0.24 0.22 0.21
4: 4 0.19 0.20 0.21 0.22 0.18
5: 5 0.21 0.19 0.20 0.21 0.19
6: 6 0.24 0.20 0.11 0.21 0.24
7: 7 0.16 0.29 0.22 0.12 0.21
8: 8 0.14 0.21 0.23 0.24 0.18
9: 9 0.29 0.22 0.17 0.16 0.16
10: 10 0.20 0.27 0.22 0.19 0.12