Is there an R function for finding and displaying the most recent date in a row among multiple colum-CodePudding

I'm sorting through invasive species data by site, and trying to figure out which record is the most recent per-site; this would allow me to categorize the site as either in-treatment, eradicated, or unknown.

Is there a way to populate a Recent_Date column by comparing the dates between multiple columns in one row?

TARGET_FID	Treatment_Date	Presence_Date	Non_Detection_Date	Recent_Date
1	2021-09-30	2016-07-10	2021-10-05	????-??-??
2	2019-05-31	2021-06-01	NA	????-??-??

CodePudding user response：

We could use

library(dplyr)
library(purrr)
df1 %>%
   mutate(Recent_Date = invoke(pmax, 
       c(across(ends_with('_Date'), as.Date), na.rm = TRUE)))

-output

TARGET_FID Treatment_Date Presence_Date Non_Detection_Date Recent_Date
1          1     2021-09-30    2016-07-10         2021-10-05  2021-10-05
2          2     2019-05-31    2021-06-01               <NA>  2021-06-01

Or use exec

df1 %>% 
  mutate(across(ends_with('_Date'), as.Date),
   Recent_Date = exec(pmax, !!! rlang::syms(names(.)[-1]), na.rm = TRUE ))
  TARGET_FID Treatment_Date Presence_Date Non_Detection_Date Recent_Date
1          1     2021-09-30    2016-07-10         2021-10-05  2021-10-05
2          2     2019-05-31    2021-06-01               <NA>  2021-06-01

Or using base R

df1$Recent_Date <- do.call(pmax, c(df1[-1], na.rm = TRUE))

data

df1 <- structure(list(TARGET_FID = 1:2, Treatment_Date = c("2021-09-30", 
"2019-05-31"), Presence_Date = c("2016-07-10", "2021-06-01"), 
    Non_Detection_Date = c("2021-10-05", NA)),
 class = "data.frame", row.names = c(NA, 
-2L))

CodePudding user response：

1) Define pmaxNA to be the same as pmax except na.rm=TRUE. Then we can apply it to the columns of df1, except the first, using Reduce.

library(dplyr)

pmaxNA <- function(...) pmax(..., na.rm = TRUE)
df1 %>% mutate(Recent_Date = Reduce(pmaxNA, across(-1)))

2) Another approach is that on each row apply max to all columns except the first.

library(dplyr)

df1 %>%
   rowwise %>%
   mutate(Recent_Date = max(c_across(-1), na.rm = TRUE)) %>%
   ungroup

3) Another way is to convert to long form and then use slice_max

library(dplyr)
library(tidyr)

df1 %>% 
  arrange(TARGET_FID) %>%
  mutate(Recent_Date = pivot_longer(., -1) %>% 
    drop_na %>% 
    group_by(TARGET_FID) %>% 
    slice_max(value) %>% 
    pull(value)
  )