Home > Software design >  Subset groups in a data.table using conditions on two columns
Subset groups in a data.table using conditions on two columns

Time:02-11

I have a data.table with a high number of groups. I would like to subset whole groups (not just rows) based on the conditions on multiple columns. Consider the following data.table:

DT <- structure(list(id = c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L), 
                 group = c("A", "A", "A", "A", "B", "B", "B", "B", "C", "C", "C", "C"), 
                 y = c(14, 19, 16, 10, 6, 8, 14, 19, 10, 9, 6, 8), 
                 x = c(3, 3, 2, 3, 3, 3, 3, 2, 2, 3, 3, 3)), 
            row.names = c(NA, -12L),
            class = c("data.table", "data.frame"))
>DT

    id group  y x
 1:  1     A 14 3
 2:  2     A 19 3
 3:  3     A 16 2
 4:  4     A 10 3
 5:  5     B  6 3
 6:  6     B  8 3
 7:  7     B 14 3
 8:  8     B 19 2
 9:  9     C 10 2
10: 10     C  9 3
11: 11     C  6 3
12: 12     C  8 3

I would like to keep groups that have y=6 and x=3 in the same row. So that I would have only class B and C (preferably using data.table package in R):

    id group  y x
 1:  5     B  6 3
 2:  6     B  8 3
 3:  7     B 14 3
 4:  8     B 19 2
 5:  9     C 10 2
 6: 10     C  9 3
 7: 11     C  6 3
 8: 12     C  8 3

All my attempts gave me only those rows containing y=6 and x=3, which I do not want:

    id group  y x
 1:  5     B  6 3
 2: 11     C  6 3

CodePudding user response:

Try dplyr package

#select groups containing y and x
groups = DT %>% filter(y == 6, x == 3) %>% select(group) %>% unique() %>% unlist() %>% as.vector()   

# filter for selected groups
DT %>% filter(group %in% groups)

CodePudding user response:

With data.table:

DT[,.SD[any(x==3&y==6)],by=group]

    group    id     y     x
   <char> <int> <num> <num>
1:      B     5     6     3
2:      B     6     8     3
3:      B     7    14     3
4:      B     8    19     2
5:      C     9    10     2
6:      C    10     9     3
7:      C    11     6     3
8:      C    12     8     3

Another possibly faster option:

DT[, if (any(x==3&y==6)) .SD, by=group]
  • Related