As the title, How do I get the second/third largest/smallest value across rows by dplyr
? Is there an elegant way to achieve it?
a <- data.frame(gp1=c(3:11), gp2=c(1:9), gp3=c(8,8,2,6,6,6,12,12,6))
## the max/min value is very simple
a %>%
rowwise() %>%
mutate(max1=max(gp1, gp2, gp3))
#
# # A tibble: 9 × 4
# # Rowwise:
# gp1 gp2 gp3 max1
# <int> <int> <dbl> <dbl>
# 1 3 1 8 8
# 2 4 2 8 8
# 3 5 3 2 5
# 4 6 4 6 6
# 5 7 5 6 7
# 6 8 6 6 8
# 7 9 7 12 12
# 8 10 8 12 12
# 9 11 9 6 11
The result should be similar to this:
#
# # A tibble: 9 × 4
# # Rowwise:
# gp1 gp2 gp3 max1 max2
# <int> <int> <dbl> <dbl> <dbl>
# 1 3 1 8 8 3
# 2 4 2 8 8 4
# 3 5 3 2 5 3
# 4 6 4 6 6 6
# 5 7 5 6 7 6
# 6 8 6 6 8 6
# 7 9 7 12 12 9
# 8 10 8 12 12 12
# 9 11 9 6 11 9
CodePudding user response:
You can use c_across
along with sort
. The use of rev
here reverses the sorted data, making it easy to select the largest value with index 1, the second-largest with index 2, etc.
Note that column "max2" in your example output makes errors in certain rows (I think you may have been including the "max1" column in some cases).
a %>%
rowwise() %>%
mutate(
max1 = max(gp1, gp2, gp3),
max2 = rev(sort(c_across(c(gp1, gp2, gp3))))[2]
)
gp1 gp2 gp3 max1 max2
<int> <int> <dbl> <dbl> <dbl>
1 3 1 8 8 3
2 4 2 8 8 4
3 5 3 2 5 3
4 6 4 6 6 6
5 7 5 6 7 6
6 8 6 6 8 6
7 9 7 12 12 9
8 10 8 12 12 10
9 11 9 6 11 9
CodePudding user response:
I am sure there is a shorter way to automate it, but here is a quick solution for now:
library(dplyr)
library(slider)
a %>%
rowwise() %>%
mutate(output = list(slide_dfc(sort(c_across(everything()), decreasing = TRUE), max, .before = 1, .complete = TRUE))) %>%
unnest_wider(output) %>%
rename_with(~ sub('\\. (\\d)', 'Max_\\1', .), contains('.')) %>%
suppressMessages()
# A tibble: 9 × 5
gp1 gp2 gp3 Max_1 Max_2
<int> <int> <dbl> <dbl> <dbl>
1 3 1 8 8 3
2 4 2 8 8 4
3 5 3 2 5 3
4 6 4 6 6 6
5 7 5 6 7 6
6 8 6 6 8 6
7 9 7 12 12 9
8 10 8 12 12 10
9 11 9 6 11 9
CodePudding user response:
An option with pmax
library(dplyr)
a %>%
mutate(max1 = do.call(pmax, across(everything())),
across(starts_with('gp'), ~ replace(.x, .x == max1, NA))) %>%
transmute(max2 = do.call(pmax, c(across(starts_with('gp')), na.rm = TRUE))) %>%
bind_cols(a, .)
-output
gp1 gp2 gp3 max2
1 3 1 8 3
2 4 2 8 4
3 5 3 2 3
4 6 4 6 4
5 7 5 6 6
6 8 6 6 6
7 9 7 12 9
8 10 8 12 10
9 11 9 6 9
Or in base R
a$max2 <- do.call(pmax, c(replace(a, cbind(seq_len(nrow(a)),
max.col(a, 'first')), NA), na.rm = TRUE))
a$max2
[1] 3 4 3 6 6 6 9 10 9
CodePudding user response:
A solution with pmap
which does not involve rowwise
:
library(purrr)
a %>%
mutate(max1 = pmax(gp1, gp2, gp3),
max2 = pmap(., ~ rev(sort(c(..1, ..2, ..3)))[2]))
gp1 gp2 gp3 max1 max2
1 3 1 8 8 3
2 4 2 8 8 4
3 5 3 2 5 3
4 6 4 6 6 6
5 7 5 6 7 6
6 8 6 6 8 6
7 9 7 12 12 9
8 10 8 12 12 10
9 11 9 6 11 9