I have the following data frame with several date variables.
x <- structure(list(id = c(1, 2, 3, 4), date = structure(c(18611,
16801, 16801, 17532), class = "Date"), s1 = c(0, 1, 1, NA), date1 = structure(c(17880,
16450, 16416, NA), class = "Date"), s2 = c(0, 0, 1, NA), date2 = structure(c(17880,
NA, 15869, NA), class = "Date"), DN = structure(c(18611, 15869,
15869, NA), class = "Date")), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -4L))
I would like to compare date1
and date2
and generate DN
as the minimum of the two dates based on the conditions in case_when
. I am currently using this code:
x <- mutate(date = as.Date(date),
date1 = as.Date(date1),
date2 = as.Date(date2),
DN = case_when(
s1 == 1 | s2 == 1 ~ min(date1, date2, na.rm = T),
s1 == 0 | s2 == 0 ~ date,
is.na(s1) & is.na(s2) ~ NA_real_
))
However, I get a strange result !! For id = 2
the value of DN
is taken from id = 3
, which I can not understand !!
Any ideas? Thanks in forward
CodePudding user response:
You can use pmin
to select the first date of the assigned columns. You can use the following code:
library(dplyr)
x %>%
mutate(DN = case_when(
s1 == 1 | s2 == 1 ~ pmin(date1, date2, na.rm = T),
s1 == 0 | s2 == 0 ~ date,
is.na(s1) & is.na(s2) ~ NA_real_
))
Output:
# A tibble: 4 × 7
id date s1 date1 s2 date2 DN
<dbl> <date> <dbl> <date> <dbl> <date> <date>
1 1 2020-12-15 0 2018-12-15 0 2018-12-15 2020-12-15
2 2 2016-01-01 1 2015-01-15 0 NA 2015-01-15
3 3 2016-01-01 1 2014-12-12 1 2013-06-13 2013-06-13
4 4 2018-01-01 NA NA NA NA NA
CodePudding user response:
If you add rowwise()
(i.e. grouping by rows) you will get the required row-minimum:
x %>%
rowwise() %>%
mutate(date = as.Date(date),
date1 = as.Date(date1),
date2 = as.Date(date2),
DN = case_when(
s1 == 1 | s2 == 1 ~ pmin(date1, date2, na.rm = T),
s1 == 0 | s2 == 0 ~ date,
is.na(s1) & is.na(s2) ~ NA_real_
))