How to rank numeric data by rows in a dataframe in r?-CodePudding

I have a data frame of nearly about 5000 columns. here is the snippet of the data frame

df= data.frame(a=c(13,17,19,7,9),
           b=c(1,3,50,NA,3),
           c=c(NA,NA,NA,NA,9))

I want to rank the values of the data frame cells w.r.t. rows

EXPECTED OUTPUT

df= data.frame(a=c(1,1,2,1,1),
               b=c(2,2,1,NA,2),
               c=c(NA,NA,NA,NA,1))

CodePudding user response：

We may use pmap to loop over each of the rows (would be fast compared to rowwise) and apply dense_rank

library(purrr)
library(dplyr)
df %>% 
    pmap_dfr(~ setNames(dense_rank(-c(...)), names(c(...))))

-output

# A tibble: 5 x 3
      a     b     c
  <int> <int> <int>
1     1     2    NA
2     1     2    NA
3     2     1    NA
4     1    NA    NA
5     1     2     1

Or a faster option may be using dapply from collapse

library(collapse)
library(data.table)
dapply(df, MARGIN = 1, FUN = frank, ties.method = 'dense', na.last = "keep")
  a  b  c
1 2  1 NA
2 2  1 NA
3 1  2 NA
4 1 NA NA
5 2  1  2

CodePudding user response：

df <- data.frame(a=c(13,17,19,7,9), b=c(1,3,50,NA,3), c=c(NA,NA,NA,NA,9))
apply(X = -df, MARGIN = 1, FUN = rank, ties.method = "min", na.last = "keep")
#>   [,1] [,2] [,3] [,4] [,5]
#> a    1    1    2    1    1
#> b    2    2    1   NA    3
#> c   NA   NA   NA   NA    1

Transposed

t(apply(X = -df, MARGIN = 1, FUN = rank, ties.method = "min", na.last = "keep"))
#>      a  b  c
#> [1,] 1  2 NA
#> [2,] 1  2 NA
#> [3,] 2  1 NA
#> [4,] 1 NA NA
#> [5,] 1  3  1

Note behavior of ties is different than you might expect, e.g., fifth row.

CodePudding user response：

df= data.frame(a=c(13,17,19,7,9),
               b=c(1,3,50,NA,3),
               c=c(NA,NA,NA,NA,9))

library(tidyverse)
out <- df %>% 
  rowwise() %>% 
  transmute(res = list(dense_rank(-c_across(a:c)))) %>% 
  unnest_wider(res) 

names(out) <- names(df)
out
#> # A tibble: 5 x 3
#>       a     b     c
#>   <int> <int> <int>
#> 1     1     2    NA
#> 2     1     2    NA
#> 3     2     1    NA
#> 4     1    NA    NA
#> 5     1     2     1

^{Created on 2021-09-20 by the reprex package (v2.0.1)}