Filtering a specific combination or sequence-CodePudding

Using R, I am trying to filter the ID's that have a specific "Decision" sequence, but the have to be in the same "Application" number. The "Decision" order needs to be C,D,E. So What I am looking for in here, is to get

Because The ID "805" has the "Decision" sequence C,D,E and they are in the same "Application" number.

I tried using, for loops, if else, filter but nothing worked for me. Thank you in advance for your help.

CodePudding user response：

Maybe the following (it would be helpful if you provide the data to reproduce, as opposed to an image):

mydf <- data.frame(ID = c(804, rep(805, 6), rep(806, 3)),
           Application = c(3, 2, rep(3, 4), 4, 2, 3, 3),
           Decision = c(LETTERS[1:5], "A", LETTERS[1:3], "E"))

library(dplyr)
library(tidyr)
library(stringr)

mydf |> 
  group_by(ID, Application) |> 
  summarize(Decision = paste(Decision, collapse = ",")) |> 
  ungroup() |> 
  filter(str_detect(Decision, "C,D,E")) |> 
  separate_rows(Decision, sep = ",") |> 
  filter(Decision %in% c("C", "D", "E"))

# A tibble: 3 × 3
     ID Application Decision
  <dbl>       <dbl> <chr>   
1   805           3 C       
2   805           3 D       
3   805           3 E

CodePudding user response：

You could use nest() from the tidyverse to group by ID and Application. You just need to use map_lgl() to work with the list of dataframes inside the column that nest() creates. It might help to run the following code iteratively, adding one line at a time to see how nest() works:

library(tidyverse)
df = tibble(ID = c(804,
                   rep(805,6),
                   rep(806,3)),
            Application = c(3,2,3,3,3,3,4,2,3,3),
            Decision = c('A','B','C','D','E','A','A','B','C','E'))


df %>% 
  nest(Decision=Decision) %>% 
  filter(Decision %>% 
           map_lgl(~grepl("CDE", paste(.$Decision, collapse="")))) %>% 
  unnest(Decision) %>% 
  filter(Decision %in% c("C","D","E"))

#> # A tibble: 3 × 3
#>      ID Application Decision
#>   <dbl>       <dbl> <chr>   
#> 1   805           3 C       
#> 2   805           3 D       
#> 3   805           3 E