Home > Mobile >  Getting the start and end dates of a patient, with the possibility of multiple "episodes"
Getting the start and end dates of a patient, with the possibility of multiple "episodes"

Time:03-23

I'm trying to create a sub-data from a dataset where it has the start date, end date and the episode number of all individuals that ever was in the ICU.

Desired outcome

The outcome i would like to have look as follows, where episode 1 is the first period the individual was in ICU, episode 2 is the second time they were in ICU so on and so forth.

Patient Start Date End Date Episode
Individual A 3/5/2021 5/5/2021 1
Individual A 10/5/2021 11/5/2021 2
Individual B 6/6/2021 16/6/2021 1

DATASET

The dataset I have is in a long format, and we are only interested in whether the status of a patient is "in ICU".

Example of a few columns (mainly for individual A)

Patient Status as of Date
Individual A In ICU 3/5/2021
Individual A In ICU 4/5/2021
Individual A In ICU 5/5/2021
Individual A Not in ICU 6/5/2021
Individual A Not in ICU 7/5/2021
Individual A Not in ICU 8/5/2021
Individual A Not in ICU 9/5/2021
Individual A In ICU 10/5/2021
Individual A In ICU 11/5/2021
Individual A Not in ICU 12/5/2021

I have tried a groupby min, max and count by patient in Python but it only gives the earliest start and latest end date for any individual patients. Would appreciate if anyone can share insights on a feasible algorithm for this problem please! Solution can be in Python/R/Excel. Thanks alot.

CodePudding user response:

My solution relies on the fact that "In ICU" states are interleaved with "Not in ICU" states.

library(dplyr)

df |>
  mutate(as.of.Date = as.Date(as.of.Date, "%d/%m/%y")) |>
  arrange(as.of.Date) |>
  group_by(Patient) |>
  mutate(episode = {
    r <- rle(Status)
    rep(seq(1:length(r$values)), r$lengths)
  }) |>
  filter(Status == "In ICU") |>
  mutate(episode = dense_rank(episode)) |>
  group_by(Patient, episode) |>
  summarize(`Start Date` = min(as.of.Date),
            `End Date` = max(as.of.Date))


##> # Groups:   Patient [1]
##>   Patient      episode `Start Date` `End Date`
##>   <chr>          <int> <date>       <date>    
##> 1 Individual A       1 2020-05-03   2020-05-05
##> 2 Individual A       2 2020-05-10   2020-05-11
  • Related