Home > Net >  dplyr: mean extraction inside 5-95 quantiles in a specific output
dplyr: mean extraction inside 5-95 quantiles in a specific output

Time:09-26

I'd like to calculate the mean, max, min and sd of the mpg variable but just only in 5% to 95% mpg quantiles interval by cyl and am inside mtcars data set and not for all mpg values. In my example:

library(dplyr)
mtcars %>%
 group_by(cyl,am) %>%
 filter (mpg >= quantile(mpg, 0.05) & mpg <= quantile(mpg, 0.95)) %>% 
 summarise(mean = mean(mpg),
 max = max(mpg),
 min = mean(mpg),
 sd = sd(mpg))

But, I'd like too the results not in the wide format output with just a single operation, the output was something like:

 cly  am  mpg  varType
 <dbl> <dbl> <dbl> <dbl>
1 4 1 28.2 mean
2 4 1 32.4 max
3 4 1 28.2 min
4 4 1 3.53 sd

Please, any help with it? Thanks in advance!

CodePudding user response:

Just add a pivot_longer to your pipe.

library(tidyverse)
mtcars %>%
 group_by(cyl,am) %>%
 filter (mpg >= quantile(mpg, 0.05) & mpg <= quantile(mpg, 0.95)) %>% 
 summarise(mean = mean(mpg),
 max = max(mpg),
 min = mean(mpg),
 sd = sd(mpg)) %>%
 pivot_longer(-c(cyl,am))

CodePudding user response:

Within summarise, you can create a tibble with as many rows and columns as you'd like the output to have (as long as number of columns and column types match across groups)

library(dplyr, warn.conflicts = FALSE)

mtcars %>%
  group_by(cyl, am) %>%
  filter(mpg >= quantile(mpg, 0.05) & mpg <= quantile(mpg, 0.95)) %>%
  summarise({
    stats <-
      c(
        mean = mean(mpg),
        max = max(mpg),
        min = min(mpg),
        sd = sd(mpg)
      )
    tibble(mpg = stats, varType = names(stats))
  })
#> `summarise()` has grouped output by 'cyl', 'am'. You can override using the `.groups` argument.
#> # A tibble: 20 × 4
#> # Groups:   cyl, am [5]
#>      cyl    am    mpg varType
#>    <dbl> <dbl>  <dbl> <chr>  
#>  1     4     0 22.8   mean   
#>  2     4     0 22.8   max    
#>  3     4     0 22.8   min    
#>  4     4     0 NA     sd     
#>  5     4     1 28.2   mean   
#>  6     4     1 32.4   max    
#>  7     4     1 22.8   min    
#>  8     4     1  3.53  sd     
#>  9     6     0 18.6   mean   
#> 10     6     0 19.2   max    
#> 11     6     0 18.1   min    
#> 12     6     0  0.778 sd     
#> 13     6     1 21     mean   
#> 14     6     1 21     max    
#> 15     6     1 21     min    
#> 16     6     1  0     sd     
#> 17     8     0 14.7   mean   
#> 18     8     0 18.7   max    
#> 19     8     0 10.4   min    
#> 20     8     0  2.57  sd

Created on 2021-09-25 by the reprex package (v2.0.1)

  • Related