I'd like to calculate the mean, max, min and sd of the mpg
variable but just only in 5% to 95% mpg
quantiles interval by cyl
and am
inside mtcars data set and not for all mpg
values. In my example:
library(dplyr)
mtcars %>%
group_by(cyl,am) %>%
filter (mpg >= quantile(mpg, 0.05) & mpg <= quantile(mpg, 0.95)) %>%
summarise(mean = mean(mpg),
max = max(mpg),
min = mean(mpg),
sd = sd(mpg))
But, I'd like too the results not in the wide format output with just a single operation, the output was something like:
cly am mpg varType
<dbl> <dbl> <dbl> <dbl>
1 4 1 28.2 mean
2 4 1 32.4 max
3 4 1 28.2 min
4 4 1 3.53 sd
Please, any help with it? Thanks in advance!
CodePudding user response:
Just add a pivot_longer to your pipe.
library(tidyverse)
mtcars %>%
group_by(cyl,am) %>%
filter (mpg >= quantile(mpg, 0.05) & mpg <= quantile(mpg, 0.95)) %>%
summarise(mean = mean(mpg),
max = max(mpg),
min = mean(mpg),
sd = sd(mpg)) %>%
pivot_longer(-c(cyl,am))
CodePudding user response:
Within summarise, you can create a tibble with as many rows and columns as you'd like the output to have (as long as number of columns and column types match across groups)
library(dplyr, warn.conflicts = FALSE)
mtcars %>%
group_by(cyl, am) %>%
filter(mpg >= quantile(mpg, 0.05) & mpg <= quantile(mpg, 0.95)) %>%
summarise({
stats <-
c(
mean = mean(mpg),
max = max(mpg),
min = min(mpg),
sd = sd(mpg)
)
tibble(mpg = stats, varType = names(stats))
})
#> `summarise()` has grouped output by 'cyl', 'am'. You can override using the `.groups` argument.
#> # A tibble: 20 × 4
#> # Groups: cyl, am [5]
#> cyl am mpg varType
#> <dbl> <dbl> <dbl> <chr>
#> 1 4 0 22.8 mean
#> 2 4 0 22.8 max
#> 3 4 0 22.8 min
#> 4 4 0 NA sd
#> 5 4 1 28.2 mean
#> 6 4 1 32.4 max
#> 7 4 1 22.8 min
#> 8 4 1 3.53 sd
#> 9 6 0 18.6 mean
#> 10 6 0 19.2 max
#> 11 6 0 18.1 min
#> 12 6 0 0.778 sd
#> 13 6 1 21 mean
#> 14 6 1 21 max
#> 15 6 1 21 min
#> 16 6 1 0 sd
#> 17 8 0 14.7 mean
#> 18 8 0 18.7 max
#> 19 8 0 10.4 min
#> 20 8 0 2.57 sd
Created on 2021-09-25 by the reprex package (v2.0.1)