compare multiple columns to one column and return names of columns that match condition-CodePudding

I have an df like this:

df <- data.frame(
  Death = as.Date(c("2017-09-20")),
  First_Date = as.Date(c("2016-09-09", "2018-09-20", "2016-09-09")),
  Second_Date = as.Date(c("2019-05-02", "2019-09-20", "2016-09-09")),
  new = c("Second_Date", "First_Date, Second_Date", NA),
  row_number = c(1,2,3))

And I want to create the column 'new' where if any columns that contain the word "Date" is after the 'Death' date column, then I want to return the names of these columns. For example, you can see:

in the first row, Second_Date is after Death so new = Second_date
in the second row, both First_Date and Second_Date are after Death so new = First_Date, Second_Date
in the third row, none of the dates are after death so new = NA

So far I have this code:

df2 <- df %>% mutate(new = Reduce(coalesce, across(contains("Date"), ~ ifelse(. > Death, cur_column(), NA_character_))))

but I'm only able to return the first column from left to right that meets this condition. Any help would be much appreciated.

CodePudding user response：

We loop across the columns that have '_Date' as suffix in column names, get the column name (cur_column()) if the values are greater than Death column, return as new column by modifying the .names, then use unite to join those _new column to a single one

library(dplyr)
library(tidyr)
df %>% 
   mutate(across(ends_with("_Date"),
    ~ case_when(.x > Death ~ cur_column()), .names = "{.col}_new")) %>% 
   unite(new, ends_with("_new"), na.rm = TRUE, sep = ", ") %>%
   na_if("")

-output

     Death First_Date Second_Date row_number                     new
1 2017-09-20 2016-09-09  2019-05-02          1             Second_Date
2 2017-09-20 2018-09-20  2019-09-20          2 First_Date, Second_Date
3 2017-09-20 2016-09-09  2016-09-09          3                    <NA>

NOTE: coalesce returns only the first non-NA value across rows

CodePudding user response：

Another possible solution, in base R:

df <- data.frame(
  Death = as.Date(c("2017-09-20")),
  First_Date = as.Date(c("2016-09-09", "2018-09-20", "2016-09-09")),
  Second_Date = as.Date(c("2019-05-02", "2019-09-20", "2016-09-09")))
         
df$new <- apply(df, 1, \(x) if (any(x[1] < x[2:3])) 
             paste(names(df)[c(F, x[1] < x[2:3])], collapse = ", ") else NA)

df

#>        Death First_Date Second_Date                     new
#> 1 2017-09-20 2016-09-09  2019-05-02             Second_Date
#> 2 2017-09-20 2018-09-20  2019-09-20 First_Date, Second_Date
#> 3 2017-09-20 2016-09-09  2016-09-09                    <NA>