I'm trying to create a sub-data from a dataset where it has the start date, end date and the episode number of all individuals that ever was in the ICU.
Desired outcome
The outcome i would like to have look as follows, where episode 1 is the first period the individual was in ICU, episode 2 is the second time they were in ICU so on and so forth.
Patient | Start Date | End Date | Episode |
---|---|---|---|
Individual A | 3/5/2021 | 5/5/2021 | 1 |
Individual A | 10/5/2021 | 11/5/2021 | 2 |
Individual B | 6/6/2021 | 16/6/2021 | 1 |
DATASET
The dataset I have is in a long format, and we are only interested in whether the status of a patient is "in ICU".
Example of a few columns (mainly for individual A)
Patient | Status | as of Date |
---|---|---|
Individual A | In ICU | 3/5/2021 |
Individual A | In ICU | 4/5/2021 |
Individual A | In ICU | 5/5/2021 |
Individual A | Not in ICU | 6/5/2021 |
Individual A | Not in ICU | 7/5/2021 |
Individual A | Not in ICU | 8/5/2021 |
Individual A | Not in ICU | 9/5/2021 |
Individual A | In ICU | 10/5/2021 |
Individual A | In ICU | 11/5/2021 |
Individual A | Not in ICU | 12/5/2021 |
I have tried a groupby min, max and count by patient in Python but it only gives the earliest start and latest end date for any individual patients. Would appreciate if anyone can share insights on a feasible algorithm for this problem please! Solution can be in Python/R/Excel. Thanks alot.
CodePudding user response:
My solution relies on the fact that "In ICU" states are interleaved with "Not in ICU" states.
library(dplyr)
df |>
mutate(as.of.Date = as.Date(as.of.Date, "%d/%m/%y")) |>
arrange(as.of.Date) |>
group_by(Patient) |>
mutate(episode = {
r <- rle(Status)
rep(seq(1:length(r$values)), r$lengths)
}) |>
filter(Status == "In ICU") |>
mutate(episode = dense_rank(episode)) |>
group_by(Patient, episode) |>
summarize(`Start Date` = min(as.of.Date),
`End Date` = max(as.of.Date))
##> # Groups: Patient [1]
##> Patient episode `Start Date` `End Date`
##> <chr> <int> <date> <date>
##> 1 Individual A 1 2020-05-03 2020-05-05
##> 2 Individual A 2 2020-05-10 2020-05-11