I am trying to create multiple columns in my datatable that represent the sum of columns that match the given pattern by row.
df <- data.frame(first_column = c("Alpha", "Beta", "Charlie", "Tango", "Alpha, Beta,Alpha", "Alpha,Beta,Charlie", 'Tango,Tango,Tango,Tango', 'Tango,Tango,Tango, Tango', 'Tango,Tango,Tango, Tango , Alpha,Beta,Charlie, Alpha, Alpha ,Alpha '),
number_1 = 1:9,
number_2 = 11:19,
number_3 = 2:10,
number_4 = 12:20)
testing <- df %>%
mutate(number_1 = as.numeric(number_1),
number_2 = as.numeric(number_2))%>%
as.data.table
testing3 = testing[,`:=` ("Total 1" = Reduce(` `, grep("number_1|number_2", names(testing), value = TRUE)),
"Total 2" = Reduce(` `, grep("number_3|number_4", names(testing), value = TRUE)))]
This is what I have tried to far to no avail. What I should see is Total 1 row 1, 12 and Total 2 row 1, 14
but I am unable to do so.
CodePudding user response:
The reason is that the grep
returns only the column names with value = TRUE
, we need the value of the columns, Use .SD
to subset the columns from the column names
library(data.table)
testing[,`:=` (
"Total 1" = Reduce(` `, .SD[, grep("number_1|number_2", names(.SD),
value = TRUE), with = FALSE]),
"Total 2" = Reduce(` `, .SD[, grep("number_3|number_4", names(.SD),
value = TRUE), with = FALSE]))]
-output
> testing
first_column number_1 number_2 number_3 number_4 Total 1 Total 2
<char> <num> <num> <int> <int> <num> <int>
1: Alpha 1 11 2 12 12 14
2: Beta 2 12 3 13 14 16
3: Charlie 3 13 4 14 16 18
4: Tango 4 14 5 15 18 20
5: Alpha, Beta,Alpha 5 15 6 16 20 22
6: Alpha,Beta,Charlie 6 16 7 17 22 24
7: Tango,Tango,Tango,Tango 7 17 8 18 24 26
8: Tango,Tango,Tango, Tango 8 18 9 19 26 28
9: Tango,Tango,Tango, Tango , Alpha,Beta,Charlie, Alpha, Alpha ,Alpha 9 19 10 20 28 30
If there are multiple sets, we may also create a named list
, Filter
the list
elements based on the occurence of names
lst_names <- list(c("number_1", "number_2"),
c("number_3", "number_4"),
c("number_5", "number_6"))
names(lst_names) <- paste("Total", seq_along(lst_names))
lst_names_sub <- Filter(length, lapply(lst_names, function(x)
intersect(x, names(testing))))
testing[, names(lst_names_sub) := lapply(lst_names_sub, function(x)
Reduce(` `, .SD[, x, with = FALSE]))]