Home > Software engineering >  How to get the selected max/min value (i.e. second largest/smallest) across row by dplyr
How to get the selected max/min value (i.e. second largest/smallest) across row by dplyr

Time:10-20

As the title, How do I get the second/third largest/smallest value across rows by dplyr? Is there an elegant way to achieve it?

a <- data.frame(gp1=c(3:11), gp2=c(1:9), gp3=c(8,8,2,6,6,6,12,12,6))

## the max/min value is very simple
a %>%
    rowwise() %>%
    mutate(max1=max(gp1, gp2, gp3))
#
# # A tibble: 9 × 4
# # Rowwise: 
#     gp1   gp2   gp3  max1
#   <int> <int> <dbl> <dbl>
# 1     3     1     8     8
# 2     4     2     8     8
# 3     5     3     2     5
# 4     6     4     6     6
# 5     7     5     6     7
# 6     8     6     6     8
# 7     9     7    12    12
# 8    10     8    12    12
# 9    11     9     6    11

The result should be similar to this:

#
# # A tibble: 9 × 4
# # Rowwise: 
#     gp1   gp2   gp3  max1   max2  
#   <int> <int> <dbl> <dbl>  <dbl>
# 1     3     1     8     8      3
# 2     4     2     8     8      4
# 3     5     3     2     5      3
# 4     6     4     6     6      6
# 5     7     5     6     7      6
# 6     8     6     6     8      6
# 7     9     7    12    12      9
# 8    10     8    12    12     12
# 9    11     9     6    11      9

CodePudding user response:

You can use c_across along with sort. The use of rev here reverses the sorted data, making it easy to select the largest value with index 1, the second-largest with index 2, etc.

Note that column "max2" in your example output makes errors in certain rows (I think you may have been including the "max1" column in some cases).

a %>% 
  rowwise() %>% 
  mutate(
    max1 = max(gp1, gp2, gp3),
    max2 = rev(sort(c_across(c(gp1, gp2, gp3))))[2]
  )

    gp1   gp2   gp3  max1  max2
  <int> <int> <dbl> <dbl> <dbl>
1     3     1     8     8     3
2     4     2     8     8     4
3     5     3     2     5     3
4     6     4     6     6     6
5     7     5     6     7     6
6     8     6     6     8     6
7     9     7    12    12     9
8    10     8    12    12    10
9    11     9     6    11     9

CodePudding user response:

I am sure there is a shorter way to automate it, but here is a quick solution for now:

library(dplyr)
library(slider)

a %>%
  rowwise() %>%
  mutate(output = list(slide_dfc(sort(c_across(everything()), decreasing = TRUE), max, .before = 1, .complete = TRUE))) %>%
  unnest_wider(output) %>%
  rename_with(~ sub('\\. (\\d)', 'Max_\\1', .), contains('.')) %>%
  suppressMessages()

# A tibble: 9 × 5
    gp1   gp2   gp3 Max_1 Max_2
  <int> <int> <dbl> <dbl> <dbl>
1     3     1     8     8     3
2     4     2     8     8     4
3     5     3     2     5     3
4     6     4     6     6     6
5     7     5     6     7     6
6     8     6     6     8     6
7     9     7    12    12     9
8    10     8    12    12    10
9    11     9     6    11     9

CodePudding user response:

An option with pmax

library(dplyr)
a %>%
  mutate(max1 = do.call(pmax, across(everything())),
   across(starts_with('gp'), ~ replace(.x, .x == max1, NA))) %>% 
  transmute(max2 = do.call(pmax, c(across(starts_with('gp')), na.rm = TRUE))) %>% 
  bind_cols(a, .)

-output

   gp1 gp2 gp3 max2
1   3   1   8    3
2   4   2   8    4
3   5   3   2    3
4   6   4   6    4
5   7   5   6    6
6   8   6   6    6
7   9   7  12    9
8  10   8  12   10
9  11   9   6    9

Or in base R

a$max2 <- do.call(pmax, c(replace(a, cbind(seq_len(nrow(a)), 
     max.col(a, 'first')), NA), na.rm = TRUE))
a$max2
[1]  3  4  3  6  6  6  9 10  9

CodePudding user response:

A solution with pmap which does not involve rowwise:

library(purrr)
a %>% 
  mutate(max1 = pmax(gp1, gp2, gp3),
         max2 = pmap(., ~ rev(sort(c(..1, ..2, ..3)))[2]))

  gp1 gp2 gp3 max1 max2
1   3   1   8    8    3
2   4   2   8    8    4
3   5   3   2    5    3
4   6   4   6    6    6
5   7   5   6    7    6
6   8   6   6    8    6
7   9   7  12   12    9
8  10   8  12   12   10
9  11   9   6   11    9
  • Related