As the title, How do I get the column name of the second/third largest/smallest value across rows by dplyr
?
One way is to use apply
in base R:
a <- data.frame(gp1=c(3:11), gp2=c(1:9), gp3=c(8,8,2,6,6,6,12,12,6))
##
## gp1 gp2 gp3
## 1 3 1 8
## 2 4 2 8
## 3 5 3 2
## 4 6 4 6
## 5 7 5 6
## 6 8 6 6
## 7 9 7 12
## 8 10 8 12
## 9 11 9 6
a$max1_colname <- apply(a, 1, function(t) colnames(a)[which.max(t)])
##
## gp1 gp2 gp3 max1_colname
## 1 3 1 8 gp3
## 2 4 2 8 gp3
## 3 5 3 2 gp1
## 4 6 4 6 gp1
## 5 7 5 6 gp1
## 6 8 6 6 gp1
## 7 9 7 12 gp3
## 8 10 8 12 gp3
## 9 11 9 6 gp1
How could I accomplish it by using dplyr
(ignore the 4th row has two max values) and how about the second largest column name?
Extra: More complex, if there are two max largest value (such as 4th row), how could I get the result as follow:
##
## gp1 gp2 gp3 max1_colname
## 1 3 1 8 gp3
## 2 4 2 8 gp3
## 3 5 3 2 gp1
## 4 6 4 6 gp1 gp3
## 5 7 5 6 gp1
## 6 8 6 6 gp1
## 7 9 7 12 gp3
## 8 10 8 12 gp3
## 9 11 9 6 gp1
Thanks.
CodePudding user response:
You can use max.col
inside a mutate
call. Note that this will only one value, and ties can be handled with ties.method
. By default, it picks one tied maximum randomly.
library(dplyr)
a %>%
mutate(max1_colname = names(.)[max.col(.)])
For your second, more complex case, you can do this:
a %>%
rowwise() %>%
mutate(max1_colname = paste(names(.)[c_across(everything()) == max(c_across(everything()))], collapse = " "))
# gp1 gp2 gp3 max1_colname
# 1 3 1 8 gp3
# 2 4 2 8 gp3
# 3 5 3 2 gp1
# 4 6 4 6 gp1 gp3
# 5 7 5 6 gp1
# 6 8 6 6 gp1
# 7 9 7 12 gp3
# 8 10 8 12 gp3
# 9 11 9 6 gp1
CodePudding user response:
The first part can be accomplished by:
library(tidyverse)
a %>%
rowwise() %>%
mutate(max1_colname = names(.)[which.max(c_across(everything()))])
gp1 gp2 gp3 max1_colname
3 1 8 gp3
4 2 8 gp3
5 3 2 gp1
6 4 6 gp1
7 5 6 gp1
8 6 6 gp1
9 7 12 gp3
10 8 12 gp3
11 9 6 gp1
A solution for the second part:
a1 <- a %>% mutate(id = row_number()) %>%
pivot_longer(-id)
a2 <- a1 %>%
group_by(id) %>%
top_n(value, n = 1) %>%
mutate(max1_colname = paste0(name, collapse = " ")) %>%
select(id, max1_colname) %>%
distinct()
a1 %>%
left_join(a2, by = c("id")) %>%
pivot_wider() %>%
relocate(max1_colname, .after = gp3) %>%
select(-id)
gp1 gp2 gp3 max1_colname
3 1 8 gp3
4 2 8 gp3
5 3 2 gp1
6 4 6 gp1 gp3
7 5 6 gp1
8 6 6 gp1
9 7 12 gp3
10 8 12 gp3
11 9 6 gp1
CodePudding user response:
Also note there is a base max.col
function, the only thing is it returns one index (can be set to first, last, random).
a %>%
mutate(max1_colname = paste("grp", max.col(.), sep = ""))
# gp1 gp2 gp3 max1_colname
# 1 3 1 8 grp3
# 2 4 2 8 grp3
# 3 5 3 2 grp1
# 4 6 4 6 grp1
# 5 7 5 6 grp1
# 6 8 6 6 grp1
# 7 9 7 12 grp3
# 8 10 8 12 grp3
# 9 11 9 6 grp1