Home > Enterprise >  Summarizing Multiple Columns of Data Using Pipes
Summarizing Multiple Columns of Data Using Pipes

Time:10-01

I'm looking to report the min, max, and mean of certain columns (price, age, and dist)from the houses data set using pipes in a concise tibble. For now, I have the following code which produces a rather inelegant solution with a 1x9 tibble:

houses %>% 
  select(price, age, dist) %>%
  summarize_each(list(min = min, max = max, mean = mean))

I was hoping to create a more organized solution using pipes with the selected data as rows and the summary stats (min, max, mean) as columns resulting in a 3x3 tibble. Any ideas?

CodePudding user response:

You may first get the data in long format and then calculate summary statistics for each column. Here is an example with mtcars dataset.

library(dplyr)
library(tidyr)

mtcars %>% 
  select(mpg, disp, cyl) %>%
  pivot_longer(cols = everything()) %>%
  group_by(name) %>%
  summarise(min = min(value, na.rm = TRUE), 
            max = max(value, na.rm = TRUE), 
            mean = mean(value, na.rm = TRUE))

#  name    min   max   mean
#  <chr> <dbl> <dbl>  <dbl>
#1 cyl     4     8     6.19
#2 disp   71.1 472   231.  
#3 mpg    10.4  33.9  20.1 

CodePudding user response:

A possible solution to output a dataframe:

library(dplyr)
houses %>% 
  summarise(across(c(price,age,dist),c(max,min,mean))) %>% 
  matrix(ncol = 3, byrow = T) %>% 
  as.data.frame() %>% 
  rename(Max=V1, Min=V2, Mean=V3)

A possible solution to output a tibble:

library(dplyr)
houses %>% 
  summarise(across(c(price,age,dist),c(max,min,mean))) %>% 
  matrix(ncol = 3, byrow = T) %>% 
  tibble(Max=unlist(.[,1]),Min=unlist(.[,2]),Mean=unlist(.[,3])) %>% 
  select(Max,Min,Mean)
  •  Tags:  
  • r
  • Related