Home > front end >  Get quantiles within groups within a dplyr chain
Get quantiles within groups within a dplyr chain

Time:04-12

What I wouldlike to do:

library(tidyverse)
diamonds |> 
  group_by(cut) |> 
  mutate(qt_25 = some_ideal_func_25_pctile(price))

I want to mutate a new column that, for each group, gets the 25 percentile of price.

E.g. for the cut 'Ideal':

diamonds |> filter(cut == 'Ideal') |> pull(price) |> quantile()
     0%     25%     50%     75%    100% 
  326.0   878.0  1810.0  4678.5 18806.0 

I would then want 878.0 repeated across all rows in the Ideal cut group.

How can I do this within a dplyr chain per my first block of code?

CodePudding user response:

You could simply use quantile()s probs argument (thanks to Axeman):

library(tidyverse)

diamonds %>%  
  group_by(cut) %>% 
  mutate(qt_25 = quantile(price, 0.25))

This returns

# A tibble: 53,940 x 11
# Groups:   cut [5]
   carat cut       color clarity depth table price     x     y     z qt_25
   <dbl> <ord>     <ord> <ord>   <dbl> <dbl> <int> <dbl> <dbl> <dbl> <dbl>
 1  0.23 Ideal     E     SI2      61.5    55   326  3.95  3.98  2.43  878 
 2  0.21 Premium   E     SI1      59.8    61   326  3.89  3.84  2.31 1046 
 3  0.23 Good      E     VS1      56.9    65   327  4.05  4.07  2.31 1145 
 4  0.29 Premium   I     VS2      62.4    58   334  4.2   4.23  2.63 1046 
 5  0.31 Good      J     SI2      63.3    58   335  4.34  4.35  2.75 1145 
 6  0.24 Very Good J     VVS2     62.8    57   336  3.94  3.96  2.48  912 
 7  0.24 Very Good I     VVS1     62.3    57   336  3.95  3.98  2.47  912 
 8  0.26 Very Good H     SI1      61.9    55   337  4.07  4.11  2.53  912 
 9  0.22 Fair      E     VS2      65.1    61   337  3.87  3.78  2.49 2050.
10  0.23 Very Good H     VS1      59.4    61   338  4     4.05  2.39  912 
# ... with 53,930 more rows
  • Related