I have longitudinal patient data in R. I would like to create an end_date variable (4th column in table below) from the three different dates I already have, whichever occurred first, and in case death or transfer didn't occur, end_date would equal to data_collection_end_date:
patid | death_date | transfer_date | data_collection_end_date | end_date |
---|---|---|---|---|
1 | 07/07/2018 | 07/07/2016 | 01/11/2021 | 07/07/2016 |
2 | 07/07/2019 | 07/07/2018 | 01/11/2021 | 07/07/2018 |
3 | 07/07/2020 | 07/07/2018 | 01/11/2021 | 07/07/2018 |
4 | NA | NA | 01/12/2021 | 01/12/2021 |
5 | NA | NA | 01/11/2021 | 01/11/2021 |
6 | NA | 07/09/2016 | 01/11/2021 | 07/09/2016 |
CodePudding user response:
If the dates are proper date type objects, we can use rowwise() and max(). We may have to transform those into dates in case they are strings.
library(dplyr)
data %>% rowwise() %>%
mutate(end_date = max(c_across(death_rate:data_collection_end_date), na.rm = TRUE))
CodePudding user response:
Capture whichever occurred first:
library(dplyr)
library(lubridate)
# Your sample data
patid <- c(1:6)
death_date <- dmy(c("07/07/2018", "07/07/2019", "07/07/2020", NA, NA, NA))
transfer_date <- dmy(c("07/07/2016", "07/07/2018", "07/07/2018", NA, NA, "07/09/2016"))
data_collection_end_date <- dmy(c("01/11/2021", "01/11/2021", "01/11/2021", "01/11/2021", "01/11/2021", "01/11/2021"))
# Data frame
df <- data.frame(patid, death_date, transfer_date, data_collection_end_date)
# @GuedesBF's solution corrected to capture earliest date
df %>% rowwise() %>%
mutate(end_date = min(c_across(death_date:data_collection_end_date), na.rm = TRUE))