Home > OS >  compare multiple columns to one column and return names of columns that match condition
compare multiple columns to one column and return names of columns that match condition

Time:05-03

I have an df like this:

df <- data.frame(
  Death = as.Date(c("2017-09-20")),
  First_Date = as.Date(c("2016-09-09", "2018-09-20", "2016-09-09")),
  Second_Date = as.Date(c("2019-05-02", "2019-09-20", "2016-09-09")),
  new = c("Second_Date", "First_Date, Second_Date", NA),
  row_number = c(1,2,3))
                      

And I want to create the column 'new' where if any columns that contain the word "Date" is after the 'Death' date column, then I want to return the names of these columns. For example, you can see:

  1. in the first row, Second_Date is after Death so new = Second_date
  2. in the second row, both First_Date and Second_Date are after Death so new = First_Date, Second_Date
  3. in the third row, none of the dates are after death so new = NA

So far I have this code:

df2 <- df %>% mutate(new = Reduce(coalesce, across(contains("Date"), ~ ifelse(. > Death, cur_column(), NA_character_))))

but I'm only able to return the first column from left to right that meets this condition. Any help would be much appreciated.

CodePudding user response:

We loop across the columns that have '_Date' as suffix in column names, get the column name (cur_column()) if the values are greater than Death column, return as new column by modifying the .names, then use unite to join those _new column to a single one

library(dplyr)
library(tidyr)
df %>% 
   mutate(across(ends_with("_Date"),
    ~ case_when(.x > Death ~ cur_column()), .names = "{.col}_new")) %>% 
   unite(new, ends_with("_new"), na.rm = TRUE, sep = ", ") %>%
   na_if("")

-output

     Death First_Date Second_Date row_number                     new
1 2017-09-20 2016-09-09  2019-05-02          1             Second_Date
2 2017-09-20 2018-09-20  2019-09-20          2 First_Date, Second_Date
3 2017-09-20 2016-09-09  2016-09-09          3                    <NA>

NOTE: coalesce returns only the first non-NA value across rows

CodePudding user response:

Another possible solution, in base R:

df <- data.frame(
  Death = as.Date(c("2017-09-20")),
  First_Date = as.Date(c("2016-09-09", "2018-09-20", "2016-09-09")),
  Second_Date = as.Date(c("2019-05-02", "2019-09-20", "2016-09-09")))
         
df$new <- apply(df, 1, \(x) if (any(x[1] < x[2:3])) 
             paste(names(df)[c(F, x[1] < x[2:3])], collapse = ", ") else NA)

df

#>        Death First_Date Second_Date                     new
#> 1 2017-09-20 2016-09-09  2019-05-02             Second_Date
#> 2 2017-09-20 2018-09-20  2019-09-20 First_Date, Second_Date
#> 3 2017-09-20 2016-09-09  2016-09-09                    <NA>
  • Related