I'm trying to get a rank for every column in a dataframe.
Ideal output (using mtcars as an example) would be as below but with the rank filled in for each column:
mpg cyl disp hp drat wt qsec vs am gear carb mpg_rank cyl_rank disp_rank ...
Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
....
I can produce a rank for each column, but I can't get the output in the above format... that's where I'm struggling.
cols <- colnames(mtcars)
get_rank <- function(col){
df %>% mutate(rank=rank(.data[[col]]))
}
lapply(cols, get_rank)
CodePudding user response:
We may use across
- loop over the numeric column, get the rank
and create new column names by adding a suffix in .names
library(dplyr)
out <- mtcars %>%
mutate(across(where(is.numeric), rank, .names = "{.col}_rank"))
-output
> head(out, 2)
mpg cyl disp hp drat wt qsec vs am gear carb mpg_rank cyl_rank disp_rank hp_rank drat_rank wt_rank qsec_rank vs_rank
Mazda RX4 21 6 160 110 3.9 2.620 16.46 0 1 4 4 19.5 15 13.5 13 21.5 9 6.0 9.5
Mazda RX4 Wag 21 6 160 110 3.9 2.875 17.02 0 1 4 4 19.5 15 13.5 13 21.5 12 10.5 9.5
am_rank gear_rank carb_rank
Mazda RX4 26 21.5 25.5
Mazda RX4 Wag 26 21.5 25.5
By default, if there are ties, then the rank
may take average
rank(x, na.last = TRUE, ties.method = c("average", "first", "last", "random", "max", "min"))
So, it may be better to specify ties.method
or may use dense_rank
out <- mtcars %>%
mutate(across(where(is.numeric), dense_rank, .names = "{.col}_rank"))
-output
> head(out, 2)
mpg cyl disp hp drat wt qsec vs am gear carb mpg_rank cyl_rank disp_rank hp_rank drat_rank wt_rank qsec_rank vs_rank
Mazda RX4 21 6 160 110 3.9 2.620 16.46 0 1 4 4 16 2 13 11 16 9 6 1
Mazda RX4 Wag 21 6 160 110 3.9 2.875 17.02 0 1 4 4 16 2 13 11 16 12 10 1
am_rank gear_rank carb_rank
Mazda RX4 2 2 4
Mazda RX4 Wag 2 2 4
Regarding the OP's function, it uses df
as input dataset which is not an argument to the function and by default df
is a function in base R
. Also, the rank=
returns each of the column name to be rank
. The function could be modified as
cols <- colnames(mtcars)
get_rank <- function(data, col){
data %>%
transmute(!! stringr::str_c(col, "_rank") :=rank(.data[[col]]))
}
lapply(cols, get_rank, data = mtcars) %>%
bind_cols(mtcars, .)