filter number of top20%~80% of each group-CodePudding

from original data

fruit price
apple 12
apple 13
apple 14
apple 15
apple 16
banana 3
banana 5
banana 1
banana 4
banana 2

to new data

fruit price 
apple 13 
apple 14 
apple 15 
banana 3 
banana 4 
banana 2

(deleted apple 12,16 and banana 1,5)

thanks for your help

CodePudding user response：

Here's an approach using tidyverse and the base R function quantile:

library(tidyverse)

df %>% 
  group_by(fruit) %>%
  filter(price > quantile(price, 0.2) & price < quantile(price, 0.8))
#> # A tibble: 6 x 2
#> # Groups:   fruit [2]
#>   fruit  price
#>   <chr>  <int>
#> 1 apple     13
#> 2 apple     14
#> 3 apple     15
#> 4 banana     3
#> 5 banana     4
#> 6 banana     2

^{Created on 2022-04-09 by the reprex package (v2.0.1)}

Data (taken from question) in reproducible format

df <- structure(list(fruit = c("apple", "apple", "apple", "apple", 
"apple", "banana", "banana", "banana", "banana", "banana"), price = c(12L, 
13L, 14L, 15L, 16L, 3L, 5L, 1L, 4L, 2L)), class = "data.frame", row.names = c(NA, 
-10L))

CodePudding user response：

slightly different way using dplyr's percent_rank function

library(dplyr)

df_new <-df %>%
  group_by(fruit) %>%
  filter(percent_rank(price) %>% between(0.2,0.8)) %>%
  ungroup()

CodePudding user response：

The combination of @Allan Cameron and @Joe Erinjeri answer:

library(dplyr)

df %>% 
  group_by(fruit) %>% 
  filter(between(price, quantile(price, .2), quantile(price, .8)))

  fruit  price
  <chr>  <int>
1 apple     13
2 apple     14
3 apple     15
4 banana     3
5 banana     4
6 banana     2