Home > database >  Filtering a specific combination or sequence
Filtering a specific combination or sequence

Time:07-17

Filter decision

Using R, I am trying to filter the ID's that have a specific "Decision" sequence, but the have to be in the same "Application" number. The "Decision" order needs to be C,D,E. So What I am looking for in here, is to get

Final filter

Because The ID "805" has the "Decision" sequence C,D,E and they are in the same "Application" number.

I tried using, for loops, if else, filter but nothing worked for me. Thank you in advance for your help.

CodePudding user response:

Maybe the following (it would be helpful if you provide the data to reproduce, as opposed to an image):

mydf <- data.frame(ID = c(804, rep(805, 6), rep(806, 3)),
           Application = c(3, 2, rep(3, 4), 4, 2, 3, 3),
           Decision = c(LETTERS[1:5], "A", LETTERS[1:3], "E"))

library(dplyr)
library(tidyr)
library(stringr)

mydf |> 
  group_by(ID, Application) |> 
  summarize(Decision = paste(Decision, collapse = ",")) |> 
  ungroup() |> 
  filter(str_detect(Decision, "C,D,E")) |> 
  separate_rows(Decision, sep = ",") |> 
  filter(Decision %in% c("C", "D", "E"))

# A tibble: 3 × 3
     ID Application Decision
  <dbl>       <dbl> <chr>   
1   805           3 C       
2   805           3 D       
3   805           3 E  

CodePudding user response:

You could use nest() from the tidyverse to group by ID and Application. You just need to use map_lgl() to work with the list of dataframes inside the column that nest() creates. It might help to run the following code iteratively, adding one line at a time to see how nest() works:

library(tidyverse)
df = tibble(ID = c(804,
                   rep(805,6),
                   rep(806,3)),
            Application = c(3,2,3,3,3,3,4,2,3,3),
            Decision = c('A','B','C','D','E','A','A','B','C','E'))


df %>% 
  nest(Decision=Decision) %>% 
  filter(Decision %>% 
           map_lgl(~grepl("CDE", paste(.$Decision, collapse="")))) %>% 
  unnest(Decision) %>% 
  filter(Decision %in% c("C","D","E"))

#> # A tibble: 3 × 3
#>      ID Application Decision
#>   <dbl>       <dbl> <chr>   
#> 1   805           3 C       
#> 2   805           3 D       
#> 3   805           3 E
  • Related