I am trying to select the n-th largest row per group in a dataset. Example, look at the iris dataset - I found this code on the internet that does this for the second largest value of sepal.length for each type of flower species :
library(dplyr)
myfun <- function(x) {
u <- unique(x)
sort(u, decreasing = TRUE)[2L]
}
iris %>%
group_by(Species) %>%
summarise(result = myfun(Sepal.Length))`
I am just trying to clarification if I have understand this correctly. If I want 3rd largest, do I just make change like this? How I can select all rows from original data?
library(dplyr)
myfun <- function(x) {
u <- unique(x)
sort(u, decreasing = TRUE)[3L]
}
iris %>%
group_by(Species) %>%
summarise(result = myfun(Sepal.Length))
`
CodePudding user response:
Just modify the function to have an extra argument n
to make it dynamic
myfun <- function(x, n) {
u <- unique(x)
sort(u, decreasing = TRUE)[n]
}
and then call as
library(dplyr)
iris %>%
group_by(Species) %>%
summarise(result = myfun(Sepal.Length, 3))
-output
# A tibble: 3 × 2
Species result
<fct> <dbl>
1 setosa 5.5
2 versicolor 6.8
3 virginica 7.6
To get all the numeric columns, loop across
the numeric
columns
iris %>%
group_by(Species) %>%
summarise(across(where(is.numeric), ~ myfun(.x, 3)))
# or use nth
# summarise(across(where(is.numeric), ~ nth(unique(.x),
# order_by = -unique(.x), 3)))
-output
# A tibble: 3 × 5
Species Sepal.Length Sepal.Width Petal.Length Petal.Width
<fct> <dbl> <dbl> <dbl> <dbl>
1 setosa 5.5 4.1 1.6 0.4
2 versicolor 6.8 3.2 4.9 1.6
3 virginica 7.6 3.4 6.6 2.3
CodePudding user response:
We could use nth
from dplyr
package after grouping and arrange
:
library(dplyr)
iris %>%
group_by(Species) %>%
arrange(-Sepal.Length, .by_group = TRUE) %>%
summarise(across(, ~nth(unique(.x), 3)))
Species Sepal.Length Sepal.Width Petal.Length Petal.Width
<fct> <dbl> <dbl> <dbl> <dbl>
1 setosa 5.5 3.8 1.7 0.3
2 versicolor 6.8 2.8 4.8 1.7
3 virginica 7.6 2.8 6.9 2.3