I have a dataset like this
Id Date Col1 Col2 Col3
1 1/1/1995 NA 1 A
1 5/3/1887 2 2 B
2 3/10/1992 0 1 B
3 8/15/2002 1 1 B
I like to select rows where Col3=A only when the rows are repeated for that Id
Expected dataset
Id Date Col1 Col2 Col3
1 1/1/1995 NA 1 A
2 3/10/1992 0 1 B
3 8/15/2002 1 1 B
I tried
df %>%
group_by(Id)%>%
filter(Col3 == "A")
This is not doing what I expected. I am getting only one row where where ID=1 and Col3=A, this is not what I was expecting. Any suggestions on where I am going wrong is much appreciated. Thanks in advance.
CodePudding user response:
Here is an option
df %>%
group_by(Id) %>%
filter(Col3 == "A" | n() == 1) %>%
ungroup()
## A tibble: 3 x 5
# Id Date Col1 Col2 Col3
# <int> <chr> <int> <int> <chr>
#1 1 1/1/1995 NA 1 A
#2 2 3/10/1992 0 1 B
#3 3 8/15/2002 1 1 B
This keeps either rows where Col3 == "A"
or single rows per group. PS. I recommend always using ungroup()
to avoid unwanted surprises downstream.