r conditional drop rows by Group-CodePudding

This is my dataset

   Group     Status      From    To
   Blue      No          1994    2000
   Red       No          1994    1997
   Red       Yes         1998    2002
   Yellow    No          1994    2014
   Yellow    Yes         2015    2021
   Purple    No          1994    1997

I like to get rid of the rows with Status=No only where they belong to a Group that repeats more than once.

For instance. Group=Red and Yellow have 2 rows, I like to get rid of the row with Status=No within these two groups. The final dataset like this.

   Group     Status      From    To
   Blue      No          1994    2000
   Red       Yes         1998    2002
   Yellow    Yes         2015    2021
   Purple    No          1994    1997

Any suggestions regarding this is much apricated. Thanks.

CodePudding user response：

You can return rows with Status = 'Yes' if number of rows in the group is greater than 1.

library(dplyr)

df %>% 
  group_by(Group) %>% 
  filter(if(n() > 1) Status == 'Yes' else TRUE) %>%
  ungroup

#  Group  Status  From    To
#  <chr>  <chr>  <int> <int>
#1 Blue   No      1994  2000
#2 Red    Yes     1998  2002
#3 Yellow Yes     2015  2021
#4 Purple No      1994  1997

For this data, since 'Yes' > 'No' we can also do -

df %>%
  arrange(Group, desc(Status)) %>%
  distinct(Group, .keep_all = TRUE)

data

It is easier to help if you provide data in a reproducible format

df <- structure(list(Group = c("Blue", "Red", "Red", "Yellow", "Yellow", 
"Purple"), Status = c("No", "No", "Yes", "No", "Yes", "No"), 
    From = c(1994L, 1994L, 1998L, 1994L, 2015L, 1994L), To = c(2000L, 
    1997L, 2002L, 2014L, 2021L, 1997L)), 
   class = "data.frame", row.names = c(NA, -6L))