I'm sorting through invasive species data by site, and trying to figure out which record is the most recent per-site; this would allow me to categorize the site as either in-treatment, eradicated, or unknown.
Is there a way to populate a Recent_Date
column by comparing the dates between multiple columns in one row?
TARGET_FID | Treatment_Date | Presence_Date | Non_Detection_Date | Recent_Date |
---|---|---|---|---|
1 | 2021-09-30 | 2016-07-10 | 2021-10-05 | ????-??-?? |
2 | 2019-05-31 | 2021-06-01 | NA | ????-??-?? |
CodePudding user response:
We could use
library(dplyr)
library(purrr)
df1 %>%
mutate(Recent_Date = invoke(pmax,
c(across(ends_with('_Date'), as.Date), na.rm = TRUE)))
-output
TARGET_FID Treatment_Date Presence_Date Non_Detection_Date Recent_Date
1 1 2021-09-30 2016-07-10 2021-10-05 2021-10-05
2 2 2019-05-31 2021-06-01 <NA> 2021-06-01
Or use exec
df1 %>%
mutate(across(ends_with('_Date'), as.Date),
Recent_Date = exec(pmax, !!! rlang::syms(names(.)[-1]), na.rm = TRUE ))
TARGET_FID Treatment_Date Presence_Date Non_Detection_Date Recent_Date
1 1 2021-09-30 2016-07-10 2021-10-05 2021-10-05
2 2 2019-05-31 2021-06-01 <NA> 2021-06-01
Or using base R
df1$Recent_Date <- do.call(pmax, c(df1[-1], na.rm = TRUE))
data
df1 <- structure(list(TARGET_FID = 1:2, Treatment_Date = c("2021-09-30",
"2019-05-31"), Presence_Date = c("2016-07-10", "2021-06-01"),
Non_Detection_Date = c("2021-10-05", NA)),
class = "data.frame", row.names = c(NA,
-2L))
CodePudding user response:
1) Define pmaxNA
to be the same as pmax
except na.rm=TRUE
. Then we can apply it to the columns of df1
, except the first, using Reduce
.
library(dplyr)
pmaxNA <- function(...) pmax(..., na.rm = TRUE)
df1 %>% mutate(Recent_Date = Reduce(pmaxNA, across(-1)))
2) Another approach is that on each row apply max
to all columns except the first.
library(dplyr)
df1 %>%
rowwise %>%
mutate(Recent_Date = max(c_across(-1), na.rm = TRUE)) %>%
ungroup
3) Another way is to convert to long form and then use slice_max
library(dplyr)
library(tidyr)
df1 %>%
arrange(TARGET_FID) %>%
mutate(Recent_Date = pivot_longer(., -1) %>%
drop_na %>%
group_by(TARGET_FID) %>%
slice_max(value) %>%
pull(value)
)