Home > OS >  R searching for minimum date across columns based on a condition
R searching for minimum date across columns based on a condition

Time:04-09

I have the following data frame with several date variables.

    x <- structure(list(id = c(1, 2, 3, 4), date = structure(c(18611, 
16801, 16801, 17532), class = "Date"), s1 = c(0, 1, 1, NA), date1 = structure(c(17880, 
16450, 16416, NA), class = "Date"), s2 = c(0, 0, 1, NA), date2 = structure(c(17880, 
NA, 15869, NA), class = "Date"), DN = structure(c(18611, 15869, 
15869, NA), class = "Date")), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -4L))

I would like to compare date1 and date2 and generate DN as the minimum of the two dates based on the conditions in case_when. I am currently using this code:

x <- mutate(date = as.Date(date),
           date1 =  as.Date(date1),
           date2 =  as.Date(date2),
           DN = case_when(
               s1 == 1 | s2 == 1 ~ min(date1, date2, na.rm = T),
               s1 == 0 | s2 == 0 ~ date,
               is.na(s1) & is.na(s2) ~ NA_real_
           ))

However, I get a strange result !! For id = 2 the value of DN is taken from id = 3, which I can not understand !!

Any ideas? Thanks in forward

enter image description here

CodePudding user response:

You can use pmin to select the first date of the assigned columns. You can use the following code:

library(dplyr)
x %>% 
  mutate(DN = case_when(
    s1 == 1 | s2 == 1 ~ pmin(date1, date2, na.rm = T),
    s1 == 0 | s2 == 0 ~ date,
    is.na(s1) & is.na(s2) ~ NA_real_
  ))

Output:

# A tibble: 4 × 7
     id date          s1 date1         s2 date2      DN        
  <dbl> <date>     <dbl> <date>     <dbl> <date>     <date>    
1     1 2020-12-15     0 2018-12-15     0 2018-12-15 2020-12-15
2     2 2016-01-01     1 2015-01-15     0 NA         2015-01-15
3     3 2016-01-01     1 2014-12-12     1 2013-06-13 2013-06-13
4     4 2018-01-01    NA NA            NA NA         NA 

CodePudding user response:

If you add rowwise() (i.e. grouping by rows) you will get the required row-minimum:

x %>% 
  rowwise() %>% 
  mutate(date = as.Date(date),
         date1 =  as.Date(date1),
         date2 =  as.Date(date2),
         DN = case_when(
           s1 == 1 | s2 == 1 ~ pmin(date1, date2, na.rm = T),
           s1 == 0 | s2 == 0 ~ date,
           is.na(s1) & is.na(s2) ~ NA_real_
         ))
  • Related