Home > database >  Create a study end date variable based on death, transfer and data collection dates in R
Create a study end date variable based on death, transfer and data collection dates in R

Time:09-14

I have longitudinal patient data in R. I would like to create an end_date variable (4th column in table below) from the three different dates I already have, whichever occurred first, and in case death or transfer didn't occur, end_date would equal to data_collection_end_date:

patid death_date transfer_date data_collection_end_date end_date
1 07/07/2018 07/07/2016 01/11/2021 07/07/2016
2 07/07/2019 07/07/2018 01/11/2021 07/07/2018
3 07/07/2020 07/07/2018 01/11/2021 07/07/2018
4 NA NA 01/12/2021 01/12/2021
5 NA NA 01/11/2021 01/11/2021
6 NA 07/09/2016 01/11/2021 07/09/2016

CodePudding user response:

If the dates are proper date type objects, we can use rowwise() and max(). We may have to transform those into dates in case they are strings.

library(dplyr)

data %>% rowwise() %>%
         mutate(end_date = max(c_across(death_rate:data_collection_end_date), na.rm = TRUE))

CodePudding user response:

Capture whichever occurred first:

library(dplyr)
library(lubridate)

# Your sample data
patid <- c(1:6)
death_date <- dmy(c("07/07/2018", "07/07/2019", "07/07/2020", NA, NA, NA))
transfer_date <- dmy(c("07/07/2016", "07/07/2018", "07/07/2018", NA, NA, "07/09/2016"))
data_collection_end_date <- dmy(c("01/11/2021", "01/11/2021", "01/11/2021", "01/11/2021", "01/11/2021", "01/11/2021"))

# Data frame
df <- data.frame(patid, death_date, transfer_date, data_collection_end_date)



# @GuedesBF's solution corrected to capture earliest date
df %>% rowwise() %>%
  mutate(end_date = min(c_across(death_date:data_collection_end_date), na.rm = TRUE))
  • Related