I want to group a column and filter based on occurrence of certain values in another column. In my case, I have an ID variable and want to keep only instances where Time = 1 and = 2. This is a larger dataset so I'm not able to simply keep duplicates.
I tried using group_by
and filter
, but it has not produced the desired table.
df <- tibble(s1 = c(1001, 1001, 1002, 1002, 1003, 1004), s2 = c(1, 2, 1, 2, 2, 1))
df_filter <- df %>%
group_by(s1) %>%
filter(all(c(1, 2)) %in% s2)
Input:
ID | Time |
---|---|
1001 | 1 |
1001 | 2 |
1002 | 1 |
1002 | 2 |
1003 | 2 |
1004 | 1 |
Desired Output:
s1 | s2 |
---|---|
1001 | 1 |
1001 | 2 |
1002 | 1 |
1002 | 2 |
CodePudding user response:
You can use n()
from dplyr
library. It will give you the count of rows. Then you can compare it with 1.
Here's how you can do it:
df_filter <- df %>%
group_by(s1) %>%
filter(n() > 1)
And if you want only 1
or 2
in s2
, then you can extend your filter like this:
df_filter <- df %>%
group_by(s1) %>%
filter(s2 == 1 | s2 == 2) %>%
filter(n() > 1)