Select the row with the maximum value in each group based on multiple columns in R dplyr-CodePudding

My data frame looks like this one

library(tidyverse)

df1 <- tibble(col1= c("apple","apple","banana","banana"), 
              col2 = c("appl","aple","banan","bananb"),
              count_col1=c(1,1,4,4), count_col2=c(3,4,1,1))
df1
#> # A tibble: 4 × 4
#>   col1   col2   count_col1 count_col2
#>   <chr>  <chr>       <dbl>      <dbl>
#> 1 apple  appl            1          3
#> 2 apple  aple            1          4
#> 3 banana banan           4          1
#> 4 banana bananb          4          1

^{Created on 2022-02-17 by the reprex package (v2.0.1)}

I want to select after grouping_by col1 the row that has the maximum value based on count_col1 and count_col2.

I want my data to look like this

      col1   col2   count_col1 count_col2
      apple  aple            1          4
      banana banan           4          1
      banana bananb          4          1

for one column you can write something

df1 %>% 
  slice(which.max(count_col1))

but not for two

CodePudding user response：

We may get rowwise max of the 'count' columns with pmax, grouped by 'col1', filter the rows where the max value of 'Max' column is.

library(dplyr)
df1 %>% 
 mutate(Max = pmax(count_col1, count_col2) ) %>%
 group_by(col1) %>%
 filter(Max == max(Max)) %>%
 ungroup %>%
 select(-Max)

-output

# A tibble: 3 × 4
  col1   col2   count_col1 count_col2
  <chr>  <chr>       <dbl>      <dbl>
1 apple  aple            1          4
2 banana banan           4          1
3 banana bananb          4          1

We may also use slice_max

library(purrr)
df1 %>%
  group_by(col1) %>%
  slice_max(invoke(pmax, across(starts_with("count")))) %>%
  ungroup
# A tibble: 3 × 4
  col1   col2   count_col1 count_col2
  <chr>  <chr>       <dbl>      <dbl>
1 apple  aple            1          4
2 banana banan           4          1
3 banana bananb          4          1