I have an df like this:
df <- data.frame(
Death = as.Date(c("2017-09-20")),
First_Date = as.Date(c("2016-09-09", "2018-09-20", "2016-09-09")),
Second_Date = as.Date(c("2019-05-02", "2019-09-20", "2016-09-09")),
new = c("Second_Date", "First_Date, Second_Date", NA),
row_number = c(1,2,3))
And I want to create the column 'new' where if any columns that contain the word "Date" is after the 'Death' date column, then I want to return the names of these columns. For example, you can see:
- in the first row, Second_Date is after Death so new = Second_date
- in the second row, both First_Date and Second_Date are after Death so new = First_Date, Second_Date
- in the third row, none of the dates are after death so new = NA
So far I have this code:
df2 <- df %>% mutate(new = Reduce(coalesce, across(contains("Date"), ~ ifelse(. > Death, cur_column(), NA_character_))))
but I'm only able to return the first column from left to right that meets this condition. Any help would be much appreciated.
CodePudding user response:
We loop across
the columns that have '_Date' as suffix in column names, get the column name (cur_column()
) if the values are greater than Death
column, return as new column by modifying the .names
, then use unite
to join those _new
column to a single one
library(dplyr)
library(tidyr)
df %>%
mutate(across(ends_with("_Date"),
~ case_when(.x > Death ~ cur_column()), .names = "{.col}_new")) %>%
unite(new, ends_with("_new"), na.rm = TRUE, sep = ", ") %>%
na_if("")
-output
Death First_Date Second_Date row_number new
1 2017-09-20 2016-09-09 2019-05-02 1 Second_Date
2 2017-09-20 2018-09-20 2019-09-20 2 First_Date, Second_Date
3 2017-09-20 2016-09-09 2016-09-09 3 <NA>
NOTE: coalesce
returns only the first non-NA value across rows
CodePudding user response:
Another possible solution, in base R:
df <- data.frame(
Death = as.Date(c("2017-09-20")),
First_Date = as.Date(c("2016-09-09", "2018-09-20", "2016-09-09")),
Second_Date = as.Date(c("2019-05-02", "2019-09-20", "2016-09-09")))
df$new <- apply(df, 1, \(x) if (any(x[1] < x[2:3]))
paste(names(df)[c(F, x[1] < x[2:3])], collapse = ", ") else NA)
df
#> Death First_Date Second_Date new
#> 1 2017-09-20 2016-09-09 2019-05-02 Second_Date
#> 2 2017-09-20 2018-09-20 2019-09-20 First_Date, Second_Date
#> 3 2017-09-20 2016-09-09 2016-09-09 <NA>