Home > Software design >  How to find the earliest date across multiple columns in R (Issue with NAs)
How to find the earliest date across multiple columns in R (Issue with NAs)

Time:03-28

I have 3 date columns (class-date) and I want to create a new column that will have the earliest of the 3 dates. This is the code I used below:

df1 <- df %>% mutate(timeout= pmin(date1, date2, end_date))

In the case that date1 and date2 are NAs, then I would like the date in end_date to be returned in the timeout column and therefore timeout should not have any NAs. The code above is bringing back NAs. Any assistance will be greatly appreciated.

CodePudding user response:

You can add na.rm = TRUE, then it will ignore the NAs in each row when calculating pmin.

library(dplyr)

df %>% 
  mutate(timeout = pmin(date1, date2, end_date, na.rm = TRUE))

Output

  id      date1      date2   end_date    timeout
1  1       <NA>       <NA> 2008-01-23 2008-01-23
2  1 2007-10-16 2007-11-01 2008-01-23 2007-10-16
3  2 2007-11-30 2007-11-30 2007-11-30 2007-11-30
4  3 2007-08-17 2007-12-17 2008-12-12 2007-08-17
5  3 2008-11-12 2008-12-12 2008-12-12 2008-11-12

Data

df <- structure(list(id = c(1L, 1L, 2L, 3L, 3L), date1 = structure(c(NA, 
13802, 13847, 13742, 14195), class = "Date"), date2 = structure(c(NA, 
13818, 13847, 13864, 14225), class = "Date"), end_date = c("2008-01-23", 
"2008-01-23", "2007-11-30", "2008-12-12", "2008-12-12")), class = "data.frame", row.names = c("1", 
"2", "3", "4", "5"))
  • Related