I have a df with many columns but for this issue am focusing on four date columns A, B, C, D
. I am using the following to create a column for the minimum date for that row.
mutate(MIN_DATE = pmin(A,B,C,D))
To create a column for the minimum date for that row.
A B C D MIN_DATE
1-1-20 1-2-20 1-10-20 2-10-20 1-1-20
1-2-20 1-4-20 1-11-20 2-1-20 1-2-20
1-3-20 1-5-20 1-20-20 2-3-20 1-3-20
What I would like to do is add another column which is the column name of the minimum date which would look like :
A B C D MIN_DATE MIN_COL
1-1-20 1-2-20 1-10-20 2-10-20 1-1-20 A
1-2-20 1-4-20 1-11-20 2-1-20 1-2-20 A
1-3-20 1-5-20 1-2-20 2-3-20 1-3-20 C
What is the easiest way to accomplish this, some combination of names
and which.min()
?
CodePudding user response:
We can use which.min
to get the index of minimum date and use that to subset the names
after doing rowwise
library(dplyr)
library(lubridate)
df %>%
# in case the dates are not Date class
mutate(across(c(A, B, C, D), mdy)) %>%
rowwise %>%
mutate(MIN_COL = names(.)[1:4][which.min(c_across(c(A, B, C, D)))]) %>%
ungroup
-output
# A tibble: 3 × 5
A B C D MIN_COL
<date> <date> <date> <date> <chr>
1 2020-01-01 2020-01-02 2020-01-10 2020-02-10 A
2 2020-01-02 2020-01-04 2020-01-11 2020-02-01 A
3 2020-01-03 2020-01-05 2020-01-20 2020-02-03 A
Or probably a faster option with max.col
from base R
df$MIN_COL <- c("A", "B", "C", "D")[max.col(-sapply(
df[c("A", "B", "C", "D")], \(x) as.Date(x, "%m-%d-%y")), "first")]