Home > Blockchain >  r filter duplicate rows based on value in column
r filter duplicate rows based on value in column

Time:04-05

I have a dataset like this

  Id    Date      Col1   Col2   Col3
  1     1/1/1995  NA     1      A
  1     5/3/1887  2      2      B
  2     3/10/1992 0      1      B
  3     8/15/2002 1      1      B

I like to select rows where Col3=A only when the rows are repeated for that Id

Expected dataset

Id    Date      Col1   Col2   Col3
  1     1/1/1995  NA     1      A
  
  2     3/10/1992 0      1      B
  3     8/15/2002 1      1      B

I tried

 df %>%
   group_by(Id)%>%
     filter(Col3 == "A")

This is not doing what I expected. I am getting only one row where where ID=1 and Col3=A, this is not what I was expecting. Any suggestions on where I am going wrong is much appreciated. Thanks in advance.

CodePudding user response:

Here is an option

df %>%
    group_by(Id) %>%
    filter(Col3 == "A" | n() == 1) %>%
    ungroup()
## A tibble: 3 x 5
#     Id Date       Col1  Col2 Col3 
#  <int> <chr>     <int> <int> <chr>
#1     1 1/1/1995     NA     1 A    
#2     2 3/10/1992     0     1 B    
#3     3 8/15/2002     1     1 B    

This keeps either rows where Col3 == "A" or single rows per group. PS. I recommend always using ungroup() to avoid unwanted surprises downstream.

  • Related