I have a data.table with a high number of groups. I would like to subset whole groups (not just rows) based on the conditions on multiple columns. Consider the following data.table:
DT <- structure(list(id = c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L),
group = c("A", "A", "A", "A", "B", "B", "B", "B", "C", "C", "C", "C"),
y = c(14, 19, 16, 10, 6, 8, 14, 19, 10, 9, 6, 8),
x = c(3, 3, 2, 3, 3, 3, 3, 2, 2, 3, 3, 3)),
row.names = c(NA, -12L),
class = c("data.table", "data.frame"))
>DT
id group y x
1: 1 A 14 3
2: 2 A 19 3
3: 3 A 16 2
4: 4 A 10 3
5: 5 B 6 3
6: 6 B 8 3
7: 7 B 14 3
8: 8 B 19 2
9: 9 C 10 2
10: 10 C 9 3
11: 11 C 6 3
12: 12 C 8 3
I would like to keep groups that have y=6
and x=3
in the same row. So that I would have only class B and C (preferably using data.table
package in R):
id group y x
1: 5 B 6 3
2: 6 B 8 3
3: 7 B 14 3
4: 8 B 19 2
5: 9 C 10 2
6: 10 C 9 3
7: 11 C 6 3
8: 12 C 8 3
All my attempts gave me only those rows containing y=6
and x=3
, which I do not want:
id group y x
1: 5 B 6 3
2: 11 C 6 3
CodePudding user response:
Try dplyr package
#select groups containing y and x
groups = DT %>% filter(y == 6, x == 3) %>% select(group) %>% unique() %>% unlist() %>% as.vector()
# filter for selected groups
DT %>% filter(group %in% groups)
CodePudding user response:
With data.table
:
DT[,.SD[any(x==3&y==6)],by=group]
group id y x
<char> <int> <num> <num>
1: B 5 6 3
2: B 6 8 3
3: B 7 14 3
4: B 8 19 2
5: C 9 10 2
6: C 10 9 3
7: C 11 6 3
8: C 12 8 3
Another possibly faster option:
DT[, if (any(x==3&y==6)) .SD, by=group]