Summarizing Multiple Columns of Data Using Pipes-CodePudding

I'm looking to report the min, max, and mean of certain columns (price, age, and dist)from the houses data set using pipes in a concise tibble. For now, I have the following code which produces a rather inelegant solution with a 1x9 tibble:

houses %>% 
  select(price, age, dist) %>%
  summarize_each(list(min = min, max = max, mean = mean))

I was hoping to create a more organized solution using pipes with the selected data as rows and the summary stats (min, max, mean) as columns resulting in a 3x3 tibble. Any ideas?

CodePudding user response：

You may first get the data in long format and then calculate summary statistics for each column. Here is an example with mtcars dataset.

library(dplyr)
library(tidyr)

mtcars %>% 
  select(mpg, disp, cyl) %>%
  pivot_longer(cols = everything()) %>%
  group_by(name) %>%
  summarise(min = min(value, na.rm = TRUE), 
            max = max(value, na.rm = TRUE), 
            mean = mean(value, na.rm = TRUE))

#  name    min   max   mean
#  <chr> <dbl> <dbl>  <dbl>
#1 cyl     4     8     6.19
#2 disp   71.1 472   231.  
#3 mpg    10.4  33.9  20.1

CodePudding user response：

A possible solution to output a dataframe:

library(dplyr)
houses %>% 
  summarise(across(c(price,age,dist),c(max,min,mean))) %>% 
  matrix(ncol = 3, byrow = T) %>% 
  as.data.frame() %>% 
  rename(Max=V1, Min=V2, Mean=V3)

A possible solution to output a tibble:

library(dplyr)
houses %>% 
  summarise(across(c(price,age,dist),c(max,min,mean))) %>% 
  matrix(ncol = 3, byrow = T) %>% 
  tibble(Max=unlist(.[,1]),Min=unlist(.[,2]),Mean=unlist(.[,3])) %>% 
  select(Max,Min,Mean)