Home > OS >  How to get the column name of selected max/min value (i.e. second largest/smallest) across row by dp
How to get the column name of selected max/min value (i.e. second largest/smallest) across row by dp

Time:10-20

As the title, How do I get the column name of the second/third largest/smallest value across rows by dplyr?

One way is to use apply in base R:

a <- data.frame(gp1=c(3:11), gp2=c(1:9), gp3=c(8,8,2,6,6,6,12,12,6))
##
##   gp1 gp2 gp3
## 1   3   1   8
## 2   4   2   8
## 3   5   3   2
## 4   6   4   6
## 5   7   5   6
## 6   8   6   6
## 7   9   7  12
## 8  10   8  12
## 9  11   9   6

a$max1_colname <- apply(a, 1, function(t) colnames(a)[which.max(t)])
##
##   gp1 gp2 gp3 max1_colname
## 1   3   1   8          gp3
## 2   4   2   8          gp3
## 3   5   3   2          gp1
## 4   6   4   6          gp1
## 5   7   5   6          gp1
## 6   8   6   6          gp1
## 7   9   7  12          gp3
## 8  10   8  12          gp3
## 9  11   9   6          gp1

How could I accomplish it by using dplyr (ignore the 4th row has two max values) and how about the second largest column name?

Extra: More complex, if there are two max largest value (such as 4th row), how could I get the result as follow:

##
##   gp1 gp2 gp3 max1_colname
## 1   3   1   8          gp3
## 2   4   2   8          gp3
## 3   5   3   2          gp1
## 4   6   4   6          gp1 gp3
## 5   7   5   6          gp1
## 6   8   6   6          gp1
## 7   9   7  12          gp3
## 8  10   8  12          gp3
## 9  11   9   6          gp1

Thanks.

CodePudding user response:

You can use max.col inside a mutate call. Note that this will only one value, and ties can be handled with ties.method. By default, it picks one tied maximum randomly.

library(dplyr)
a %>%
  mutate(max1_colname = names(.)[max.col(.)])

For your second, more complex case, you can do this:

a %>%
  rowwise() %>% 
  mutate(max1_colname = paste(names(.)[c_across(everything()) == max(c_across(everything()))], collapse = " "))

#     gp1   gp2   gp3 max1_colname
# 1     3     1     8 gp3         
# 2     4     2     8 gp3         
# 3     5     3     2 gp1         
# 4     6     4     6 gp1 gp3     
# 5     7     5     6 gp1         
# 6     8     6     6 gp1         
# 7     9     7    12 gp3         
# 8    10     8    12 gp3         
# 9    11     9     6 gp1         

CodePudding user response:

The first part can be accomplished by:

library(tidyverse)

a %>% 
  rowwise() %>% 
  mutate(max1_colname = names(.)[which.max(c_across(everything()))])

gp1 gp2 gp3 max1_colname
3   1   8   gp3
4   2   8   gp3
5   3   2   gp1
6   4   6   gp1
7   5   6   gp1
8   6   6   gp1
9   7   12  gp3
10  8   12  gp3
11  9   6   gp1

A solution for the second part:

a1 <- a %>% mutate(id = row_number()) %>% 
  pivot_longer(-id)

a2 <- a1 %>% 
  group_by(id) %>% 
  top_n(value, n = 1) %>% 
  mutate(max1_colname = paste0(name, collapse = " ")) %>% 
  select(id, max1_colname) %>% 
  distinct()
  
a1 %>% 
  left_join(a2, by = c("id")) %>% 
  pivot_wider() %>% 
  relocate(max1_colname, .after = gp3) %>% 
  select(-id) 

gp1 gp2 gp3 max1_colname
3   1   8   gp3
4   2   8   gp3
5   3   2   gp1
6   4   6   gp1 gp3
7   5   6   gp1
8   6   6   gp1
9   7   12  gp3
10  8   12  gp3
11  9   6   gp1

CodePudding user response:

Also note there is a base max.col function, the only thing is it returns one index (can be set to first, last, random).

a %>%
  mutate(max1_colname = paste("grp", max.col(.), sep = ""))

#   gp1 gp2 gp3 max1_colname
# 1   3   1   8         grp3
# 2   4   2   8         grp3
# 3   5   3   2         grp1
# 4   6   4   6         grp1
# 5   7   5   6         grp1
# 6   8   6   6         grp1
# 7   9   7  12         grp3
# 8  10   8  12         grp3
# 9  11   9   6         grp1
  • Related