Home > front end >  select n-th largest row per group
select n-th largest row per group

Time:12-12

I am trying to select the n-th largest row per group in a dataset. Example, look at the iris dataset - I found this code on the internet that does this for the second largest value of sepal.length for each type of flower species :

library(dplyr)
myfun <-  function(x) {
    u <- unique(x)
    sort(u, decreasing = TRUE)[2L]
}

iris %>% 
    group_by(Species) %>% 
    summarise(result = myfun(Sepal.Length))`

I am just trying to clarification if I have understand this correctly. If I want 3rd largest, do I just make change like this? How I can select all rows from original data?

library(dplyr)
myfun <-  function(x) {
  u <- unique(x)
  sort(u, decreasing = TRUE)[3L]
}

iris %>% 
  group_by(Species) %>% 
  summarise(result = myfun(Sepal.Length))
`

CodePudding user response:

Just modify the function to have an extra argument n to make it dynamic

myfun <-  function(x, n) {
    u <- unique(x)
    sort(u, decreasing = TRUE)[n]
}

and then call as

library(dplyr)
iris %>% 
  group_by(Species) %>% 
  summarise(result = myfun(Sepal.Length, 3))

-output

# A tibble: 3 × 2
  Species    result
  <fct>       <dbl>
1 setosa        5.5
2 versicolor    6.8
3 virginica     7.6

To get all the numeric columns, loop across the numeric columns

iris %>%
   group_by(Species) %>%
   summarise(across(where(is.numeric), ~ myfun(.x, 3)))
   # or use nth
   # summarise(across(where(is.numeric), ~ nth(unique(.x),
   #    order_by = -unique(.x), 3)))

-output

# A tibble: 3 × 5
  Species    Sepal.Length Sepal.Width Petal.Length Petal.Width
  <fct>             <dbl>       <dbl>        <dbl>       <dbl>
1 setosa              5.5         4.1          1.6         0.4
2 versicolor          6.8         3.2          4.9         1.6
3 virginica           7.6         3.4          6.6         2.3

CodePudding user response:

We could use nth from dplyr package after grouping and arrange:

library(dplyr)

iris %>% 
  group_by(Species) %>% 
  arrange(-Sepal.Length, .by_group = TRUE) %>% 
  summarise(across(, ~nth(unique(.x), 3)))
    Species    Sepal.Length Sepal.Width Petal.Length Petal.Width
  <fct>             <dbl>       <dbl>        <dbl>       <dbl>
1 setosa              5.5         3.8          1.7         0.3
2 versicolor          6.8         2.8          4.8         1.7
3 virginica           7.6         2.8          6.9         2.3
  •  Tags:  
  • r
  • Related