Using R, I am trying to filter the ID's that have a specific "Decision" sequence, but the have to be in the same "Application" number. The "Decision" order needs to be C,D,E. So What I am looking for in here, is to get
Because The ID "805" has the "Decision" sequence C,D,E and they are in the same "Application" number.
I tried using, for loops, if else, filter but nothing worked for me. Thank you in advance for your help.
CodePudding user response:
Maybe the following (it would be helpful if you provide the data to reproduce, as opposed to an image):
mydf <- data.frame(ID = c(804, rep(805, 6), rep(806, 3)),
Application = c(3, 2, rep(3, 4), 4, 2, 3, 3),
Decision = c(LETTERS[1:5], "A", LETTERS[1:3], "E"))
library(dplyr)
library(tidyr)
library(stringr)
mydf |>
group_by(ID, Application) |>
summarize(Decision = paste(Decision, collapse = ",")) |>
ungroup() |>
filter(str_detect(Decision, "C,D,E")) |>
separate_rows(Decision, sep = ",") |>
filter(Decision %in% c("C", "D", "E"))
# A tibble: 3 × 3
ID Application Decision
<dbl> <dbl> <chr>
1 805 3 C
2 805 3 D
3 805 3 E
CodePudding user response:
You could use nest()
from the tidyverse to group by ID
and Application
. You just need to use map_lgl()
to work with the list of dataframes inside the column that nest()
creates. It might help to run the following code iteratively, adding one line at a time to see how nest()
works:
library(tidyverse)
df = tibble(ID = c(804,
rep(805,6),
rep(806,3)),
Application = c(3,2,3,3,3,3,4,2,3,3),
Decision = c('A','B','C','D','E','A','A','B','C','E'))
df %>%
nest(Decision=Decision) %>%
filter(Decision %>%
map_lgl(~grepl("CDE", paste(.$Decision, collapse="")))) %>%
unnest(Decision) %>%
filter(Decision %in% c("C","D","E"))
#> # A tibble: 3 × 3
#> ID Application Decision
#> <dbl> <dbl> <chr>
#> 1 805 3 C
#> 2 805 3 D
#> 3 805 3 E