Home > other >  filter rows based on group with multiple conditions in R
filter rows based on group with multiple conditions in R

Time:01-22

I am working on a dataframe of plant scientific names a sample of which is as follows:

plantlist <- data.frame(ID = c(1,2,2,2,2,2,2), 
                        SciName = c("Alkanna tuberculata", "Alkanna tuberculata", "Anchusa tinctoria", "Anchusa tinctoria", "Anchusa tinctoria", "Anchusa tinctoria", "Echium italicum"),
                        SciName.w.author = c("Alkanna tuberculata Greuter", "Alkanna tuberculata Meikle", "Anchusa tinctoria L", "Anchusa tinctoria Woodv", "Anchusa tinctoria Pall", "Anchusa tinctoria Meikle", "Echium italicum"),
                        Status = c("Unresolved", "Misapplied", "Accepted", "Synonym", "Unresolved", "Synonym", "Misapplied"))

What I need to do is to group the columns by ID, and SciName and then keep the following rows:

  1. if there is only one row in the group keep it, no matter what the status is
  2. if there are more than two rows keep the accepted and synonyms
  3. if there are no accepted and synonyms keep unresolved and if no unresolved keep missapplied

I tried to accomplish this using case_when and grouping but I'm stuck in the last part

keep.plantlist <- plantlist %>% 
  group_by(ID, SciName) %>% 
  mutate(count = n()) %>% 
  ungroup()%>%
  mutate(keep = case_when(count == 1  ~ T ,
                          count > 1 & STATUS == "Accepted" ~ T, 
                          count > 1 & STATUS == "Synonym" ~ T))
#expected keep row
plantlist$keep <- c(T, F, T, T, F, T, T)

I also tried mutating status as factor and arranging the groups by the priority I need, but I don't know if there is any function that could help if I have that order.

CodePudding user response:

I think this will work, but need a higher quality test-set to be sure.

keep.plantlist <- plantlist %>% 
  group_by(ID, SciName) %>% 
  mutate(count = n()) %>% 
  mutate(keep = case_when(
    count == 1  ~ T ,
    count > 1 & STATUS == "Accepted" ~ T, 
    count > 1 & STATUS == "Synonym" ~ T,
    !any(STATUS %in% c("Accepted", "Synonym")) &
      STATUS %in% "Unresolved" ~ TRUE,
    !any(STATUS %in% c("Accepted", "Synonym", "Unresolved")) &
      STATUS %in% "Misapplied" ~ TRUE,
    TRUE ~ FALSE
  ))
  •  Tags:  
  • Related