Home > OS >  Dplyr mutate minimum column name
Dplyr mutate minimum column name

Time:03-03

I have a df with many columns but for this issue am focusing on four date columns A, B, C, D. I am using the following to create a column for the minimum date for that row.

mutate(MIN_DATE = pmin(A,B,C,D))

To create a column for the minimum date for that row.

  A         B       C       D       MIN_DATE
1-1-20   1-2-20  1-10-20  2-10-20    1-1-20
1-2-20   1-4-20  1-11-20  2-1-20     1-2-20
1-3-20   1-5-20  1-20-20  2-3-20     1-3-20

What I would like to do is add another column which is the column name of the minimum date which would look like :

  A         B       C       D       MIN_DATE   MIN_COL
1-1-20   1-2-20  1-10-20  2-10-20    1-1-20       A
1-2-20   1-4-20  1-11-20  2-1-20     1-2-20       A
1-3-20   1-5-20  1-2-20   2-3-20     1-3-20       C

What is the easiest way to accomplish this, some combination of names and which.min()?

CodePudding user response:

We can use which.min to get the index of minimum date and use that to subset the names after doing rowwise

library(dplyr)
library(lubridate)
df %>%
   # in case the dates are not Date class
   mutate(across(c(A, B, C, D), mdy)) %>%
   rowwise %>%
   mutate(MIN_COL = names(.)[1:4][which.min(c_across(c(A, B, C, D)))]) %>%
   ungroup

-output

# A tibble: 3 × 5
  A          B          C          D          MIN_COL
  <date>     <date>     <date>     <date>     <chr>  
1 2020-01-01 2020-01-02 2020-01-10 2020-02-10 A      
2 2020-01-02 2020-01-04 2020-01-11 2020-02-01 A      
3 2020-01-03 2020-01-05 2020-01-20 2020-02-03 A   

Or probably a faster option with max.col from base R

df$MIN_COL <- c("A", "B", "C", "D")[max.col(-sapply(
    df[c("A", "B", "C", "D")], \(x) as.Date(x, "%m-%d-%y")), "first")]
  • Related