This is my dataset
Group Status From To
Blue No 1994 2000
Red No 1994 1997
Red Yes 1998 2002
Yellow No 1994 2014
Yellow Yes 2015 2021
Purple No 1994 1997
I like to get rid of the rows with Status=No
only where they belong to a Group that repeats more than once.
For instance. Group=Red and Yellow have 2 rows, I like to get rid of the row with Status=No within these two groups. The final dataset like this.
Group Status From To
Blue No 1994 2000
Red Yes 1998 2002
Yellow Yes 2015 2021
Purple No 1994 1997
Any suggestions regarding this is much apricated. Thanks.
CodePudding user response:
You can return rows with Status = 'Yes'
if number of rows in the group is greater than 1.
library(dplyr)
df %>%
group_by(Group) %>%
filter(if(n() > 1) Status == 'Yes' else TRUE) %>%
ungroup
# Group Status From To
# <chr> <chr> <int> <int>
#1 Blue No 1994 2000
#2 Red Yes 1998 2002
#3 Yellow Yes 2015 2021
#4 Purple No 1994 1997
For this data, since 'Yes' > 'No'
we can also do -
df %>%
arrange(Group, desc(Status)) %>%
distinct(Group, .keep_all = TRUE)
data
It is easier to help if you provide data in a reproducible format
df <- structure(list(Group = c("Blue", "Red", "Red", "Yellow", "Yellow",
"Purple"), Status = c("No", "No", "Yes", "No", "Yes", "No"),
From = c(1994L, 1994L, 1998L, 1994L, 2015L, 1994L), To = c(2000L,
1997L, 2002L, 2014L, 2021L, 1997L)),
class = "data.frame", row.names = c(NA, -6L))