Home > front end >  Collapse rows and keep highest/lowest dates
Collapse rows and keep highest/lowest dates

Time:03-15

I want to collapse multiple rows with admission dates, discharge dates and causes of stay. But i only want to keep the earliest admission date and the latest discharge date grouped by the ID and the cause of stay.

My dataframe looks like this:

structure(list(AdmissionDate = structure(c(16709, 16710, 16713, 16714, 16715, 16729, 16730, 16731, 16735, 16741, 16742, 16787, 16790, 16792, 16797, 16798, 16799), class = "Date"), DischargeDate = structure(c(16709, 16710, 16713, 16714, 16715, 16729, 16730, 16731, 16735, 16741, 16742, 16787, 16790, 16792, 16797, 16798, 16799), class = "Date"), ID = c(1010091262, 1010091262, 1010091262, 1010091262, 1010091262, 1010091262, 1010091262, 1010091262, 1010091262, 1010091262, 1010091262, 1640300380, 1640300380, 1640300380, 1640300380, 1640300380, 1640300380), CauseOfStay = c("Folge: Hirninfarkt", "Folge: Hirninfarkt", "Folge: Hirninfarkt", "Folge: Hirninfarkt", "Folge: Hirninfarkt", "Schädigung: Hirn, anoxisch, aonkl", "Schädigung: Hirn, anoxisch, aonkl", "Schädigung: Hirn, anoxisch, aonkl", "Schädigung: Hirn, anoxisch, aonkl", "Schädigung: Hirn, anoxisch, aonkl", "Schädigung: Hirn, anoxisch, aonkl", "Schädigung: Hirn, anoxisch, aonkl", "Schädigung: Hirn, anoxisch, aonkl", "Schädigung: Hirn, anoxisch, aonkl", "Schädigung: Hirn, anoxisch, aonkl", "Schädigung: Hirn, anoxisch, aonkl", "Schädigung: Hirn, anoxisch, aonkl")), row.names = c(1L, 2L, 3L, 4L, 5L, 15L, 16L, 17L, 18L, 19L, 20L, 60L, 61L, 62L, 63L, 64L, 65L), class = "data.frame")

The output should contain 3 rows with the earliest admission date, latest discharge date grouped by the cause of stay and the ID.

CodePudding user response:

You can use group_by / summarize.. You can add your patient identifier (ID) to the group_by() clause to allow different patients to have different summarized rows, for the same Cause of Stay

dat %>%
  group_by(ID, CauseOfStay) %>%
  summarize(AdmissionDate = min(AdmissionDate, na.rm=T),
            DischargeDate = max(DischargeDate, na.rm=T))

Output:

          ID CauseOfStay                       AdmissionDate DischargeDate
       <dbl> <chr>                             <date>        <date>       
1 1010091262 Folge: Hirninfarkt                2015-10-01    2015-10-07   
2 1010091262 Schädigung: Hirn, anoxisch, aonkl 2015-10-21    2015-11-03   
3 1640300380 Schädigung: Hirn, anoxisch, aonkl 2015-12-18    2015-12-30 
  • Related