Home > Blockchain >  R dataframe - Top n values in row with column names
R dataframe - Top n values in row with column names

Time:02-18

I want to sort rowwise values in specific columns, get top 'n' values, and get corresponding column names in new columns.

The output would look something like this:

      SL   SW    PL    PW   Species high1 high2 high3 col1 col2  col3
      dbl> <dbl> <dbl> <dbl> <fct>  <dbl> <dbl>  <dbl>
 1     5.1  3.5  1.4    0.2   setosa   3.5  1.4   0.2  SW   PL     PW
 2     4.9  3    1.4    0.2   setosa   3    1.4   0.2  SW   PL     PW
 3     4.7  3.2  1.3    0.2   setosa   3.2  1.3   0.2  SW   PL     PW

Tried something like code below, but unable to get column names. What I'm hoping to achieve is to compare the highest 'n' values (rows[n]) with values in dataframe for each row, and then extract corresponding column name of matching value. For eg. rows[1] == 3.5 (from column 'SW'). Is this feasible? Help appreciated.

 iris %>% 
      rowwise() %>%
      mutate(rows = list(sort(c( Sepal.Width, Petal.Length, Petal.Width), decreasing = TRUE))) %>%
      mutate(high1 = rows[1], col1 = list(colnames(~.)[~. ==rows[1]]),
             high2 = rows[2], col2 = list(colnames(~.)[~. ==rows[2]]),
             high3 = rows[3], col3 = list(colnames(~.)[~. ==rows[3]])
             ) %>%
      select(-rows)

CodePudding user response:

You could pivot to long, group by the corresponding original row, use slice_max to get the top values, then pivot back to wide and bind that output to the original table.

library(dplyr, warn.conflicts = FALSE)
library(tidyr)

iris %>% 
  group_by(rn = row_number()) %>% 
  pivot_longer(-c(Species, rn), 'col', values_to = 'high') %>% 
  slice_max(col, n = 2) %>% 
  mutate(nm = row_number()) %>% 
  pivot_wider(values_from = c(high, col), 
              names_from = nm) %>% 
  ungroup() %>% 
  select(-c(Species, rn)) %>% 
  bind_cols(iris)
#> # A tibble: 150 × 9
#>    high_1 high_2 col_1   col_2 Sepal.Length Sepal.Width Petal.Length Petal.Width
#>     <dbl>  <dbl> <chr>   <chr>        <dbl>       <dbl>        <dbl>       <dbl>
#>  1    5.1    3.5 Sepal.… Sepa…          5.1         3.5          1.4         0.2
#>  2    4.9    3   Sepal.… Sepa…          4.9         3            1.4         0.2
#>  3    4.7    3.2 Sepal.… Sepa…          4.7         3.2          1.3         0.2
#>  4    4.6    3.1 Sepal.… Sepa…          4.6         3.1          1.5         0.2
#>  5    5      3.6 Sepal.… Sepa…          5           3.6          1.4         0.2
#>  6    5.4    3.9 Sepal.… Sepa…          5.4         3.9          1.7         0.4
#>  7    4.6    3.4 Sepal.… Sepa…          4.6         3.4          1.4         0.3
#>  8    5      3.4 Sepal.… Sepa…          5           3.4          1.5         0.2
#>  9    4.4    2.9 Sepal.… Sepa…          4.4         2.9          1.4         0.2
#> 10    4.9    3.1 Sepal.… Sepa…          4.9         3.1          1.5         0.1
#> # … with 140 more rows, and 1 more variable: Species <fct>

Created on 2022-02-16 by the reprex package (v2.0.1)

Edited to remove the unnecessary rename and mutate, thanks to tip from @Onyambu!

CodePudding user response:

My approach is to make a function that takes any dataframe (df), any set of columns that you want to focus on (cols), and any value for top n (n)

# load data.table and magrittr (I only use %>% for illustration here)
library(data.table)
library(magrittr)

# define function
get_high_vals_cols <- function(df, cols, n=3) {
  
  setDT(df)[, `_rowid`:=.I]
  
  df_l <- melt(df,id = "_rowid",measure.vars = cols, variable.name = "col",value.name = "high") %>% 
    .[order(-high), .SD[1:n], by="_rowid"] %>% 
    .[,id:=1:.N, by="_rowid"]
  
  dcast(df_l, `_rowid`~id, value.var = list("col", "high"))[,`_rowid`:=NULL]
}

Then, you can feed any dataframe to this function, along with any columns of interest

cols= c("Sepal.Width", "Petal.Length", "Petal.Width")
get_high_vals_cols(iris,cols,3)

Output

            col_1        col_2       col_3 high_1 high_2 high_3
  1:  Sepal.Width Petal.Length Petal.Width    3.5    1.4    0.2
  2:  Sepal.Width Petal.Length Petal.Width    3.0    1.4    0.2
  3:  Sepal.Width Petal.Length Petal.Width    3.2    1.3    0.2
  4:  Sepal.Width Petal.Length Petal.Width    3.1    1.5    0.2
  5:  Sepal.Width Petal.Length Petal.Width    3.6    1.4    0.2
 ---                                                           
146: Petal.Length  Sepal.Width Petal.Width    5.2    3.0    2.3
147: Petal.Length  Sepal.Width Petal.Width    5.0    2.5    1.9
148: Petal.Length  Sepal.Width Petal.Width    5.2    3.0    2.0
149: Petal.Length  Sepal.Width Petal.Width    5.4    3.4    2.3
150: Petal.Length  Sepal.Width Petal.Width    5.1    3.0    1.8
  • Related