Home > Back-end >  Get rank for every column using dplyr
Get rank for every column using dplyr

Time:03-08

I'm trying to get a rank for every column in a dataframe.

Ideal output (using mtcars as an example) would be as below but with the rank filled in for each column:

                   mpg cyl  disp  hp drat    wt  qsec vs am gear carb mpg_rank cyl_rank disp_rank ...
Mazda RX4         21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
Mazda RX4 Wag     21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
Datsun 710        22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1
Hornet 4 Drive    21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
Hornet Sportabout 18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2
....

I can produce a rank for each column, but I can't get the output in the above format... that's where I'm struggling.

cols <- colnames(mtcars)

get_rank <- function(col){
  
  df %>% mutate(rank=rank(.data[[col]]))
}

lapply(cols, get_rank)

CodePudding user response:

We may use across - loop over the numeric column, get the rank and create new column names by adding a suffix in .names

library(dplyr)
out <- mtcars %>% 
   mutate(across(where(is.numeric), rank, .names = "{.col}_rank"))

-output

> head(out, 2)
              mpg cyl disp  hp drat    wt  qsec vs am gear carb mpg_rank cyl_rank disp_rank hp_rank drat_rank wt_rank qsec_rank vs_rank
Mazda RX4      21   6  160 110  3.9 2.620 16.46  0  1    4    4     19.5       15      13.5      13      21.5       9       6.0     9.5
Mazda RX4 Wag  21   6  160 110  3.9 2.875 17.02  0  1    4    4     19.5       15      13.5      13      21.5      12      10.5     9.5
              am_rank gear_rank carb_rank
Mazda RX4          26      21.5      25.5
Mazda RX4 Wag      26      21.5      25.5

By default, if there are ties, then the rank may take average

rank(x, na.last = TRUE, ties.method = c("average", "first", "last", "random", "max", "min"))

So, it may be better to specify ties.method or may use dense_rank

out <- mtcars %>% 
   mutate(across(where(is.numeric), dense_rank, .names = "{.col}_rank"))

-output

> head(out, 2)
              mpg cyl disp  hp drat    wt  qsec vs am gear carb mpg_rank cyl_rank disp_rank hp_rank drat_rank wt_rank qsec_rank vs_rank
Mazda RX4      21   6  160 110  3.9 2.620 16.46  0  1    4    4       16        2        13      11        16       9         6       1
Mazda RX4 Wag  21   6  160 110  3.9 2.875 17.02  0  1    4    4       16        2        13      11        16      12        10       1
              am_rank gear_rank carb_rank
Mazda RX4           2         2         4
Mazda RX4 Wag       2         2         4

Regarding the OP's function, it uses df as input dataset which is not an argument to the function and by default df is a function in base R. Also, the rank= returns each of the column name to be rank. The function could be modified as

cols <- colnames(mtcars)

get_rank <- function(data, col){
  
  data %>% 
   transmute(!! stringr::str_c(col, "_rank") :=rank(.data[[col]]))
}

lapply(cols, get_rank, data = mtcars) %>%
   bind_cols(mtcars, .)
  • Related