Home > front end >  Reduce columns that whose names matches a pattern
Reduce columns that whose names matches a pattern

Time:05-11

I am trying to create multiple columns in my datatable that represent the sum of columns that match the given pattern by row.

df <- data.frame(first_column = c("Alpha", "Beta", "Charlie", "Tango", "Alpha, Beta,Alpha", "Alpha,Beta,Charlie", 'Tango,Tango,Tango,Tango', 'Tango,Tango,Tango, Tango', 'Tango,Tango,Tango, Tango , Alpha,Beta,Charlie, Alpha, Alpha ,Alpha '),
                 number_1 = 1:9,
                 number_2 = 11:19,
                 number_3 = 2:10,
                 number_4 = 12:20) 

testing <- df %>%
  mutate(number_1 = as.numeric(number_1),
         number_2 = as.numeric(number_2))%>%
  as.data.table 


testing3 = testing[,`:=` ("Total 1" = Reduce(` `, grep("number_1|number_2", names(testing), value = TRUE)),
                          "Total 2" = Reduce(` `, grep("number_3|number_4", names(testing), value = TRUE)))]

This is what I have tried to far to no avail. What I should see is Total 1 row 1, 12 and Total 2 row 1, 14

but I am unable to do so.

CodePudding user response:

The reason is that the grep returns only the column names with value = TRUE, we need the value of the columns, Use .SD to subset the columns from the column names

library(data.table)
testing[,`:=` (
   "Total 1" = Reduce(` `, .SD[, grep("number_1|number_2", names(.SD),
              value = TRUE), with = FALSE]),
    "Total 2" = Reduce(` `, .SD[, grep("number_3|number_4", names(.SD), 
     value = TRUE), with = FALSE]))]

-output

> testing
                                                          first_column number_1 number_2 number_3 number_4 Total 1 Total 2
                                                                <char>    <num>    <num>    <int>    <int>   <num>   <int>
1:                                                               Alpha        1       11        2       12      12      14
2:                                                                Beta        2       12        3       13      14      16
3:                                                             Charlie        3       13        4       14      16      18
4:                                                               Tango        4       14        5       15      18      20
5:                                                   Alpha, Beta,Alpha        5       15        6       16      20      22
6:                                                  Alpha,Beta,Charlie        6       16        7       17      22      24
7:                                             Tango,Tango,Tango,Tango        7       17        8       18      24      26
8:                                            Tango,Tango,Tango, Tango        8       18        9       19      26      28
9: Tango,Tango,Tango, Tango , Alpha,Beta,Charlie, Alpha, Alpha ,Alpha         9       19       10       20      28      30

If there are multiple sets, we may also create a named list , Filter the list elements based on the occurence of names

lst_names <- list(c("number_1", "number_2"), 
                 c("number_3", "number_4"), 
                 c("number_5", "number_6"))
names(lst_names) <- paste("Total", seq_along(lst_names))
lst_names_sub <- Filter(length, lapply(lst_names, function(x)
        intersect(x, names(testing))))
testing[, names(lst_names_sub) := lapply(lst_names_sub, function(x) 
         Reduce(` `, .SD[, x, with = FALSE]))]
  • Related