Home > Blockchain >  Filter column by group based on another column (R dplyr)
Filter column by group based on another column (R dplyr)

Time:03-18

I want to group a column and filter based on occurrence of certain values in another column. In my case, I have an ID variable and want to keep only instances where Time = 1 and = 2. This is a larger dataset so I'm not able to simply keep duplicates.

I tried using group_by and filter, but it has not produced the desired table.

df <- tibble(s1 = c(1001, 1001, 1002, 1002, 1003, 1004), s2 = c(1, 2, 1, 2, 2, 1))
df_filter <- df %>% 
  group_by(s1) %>% 
  filter(all(c(1, 2)) %in% s2)

Input:

ID Time
1001 1
1001 2
1002 1
1002 2
1003 2
1004 1

Desired Output:

s1 s2
1001 1
1001 2
1002 1
1002 2

CodePudding user response:

You can use n() from dplyr library. It will give you the count of rows. Then you can compare it with 1.

Here's how you can do it:

df_filter <- df %>% 
  group_by(s1) %>%
  filter(n() > 1)

And if you want only 1 or 2 in s2, then you can extend your filter like this:

df_filter <- df %>% 
  group_by(s1) %>%
  filter(s2 == 1 | s2 == 2) %>%
  filter(n() > 1)
  •  Tags:  
  • r
  • Related