Home > other >  Nest ggplot histograms per variable
Nest ggplot histograms per variable

Time:10-12

I want to create a data frame where I summarize values like number of observations, mean and median, and I want to nest its ggplot histograms. For this, I will use the iris dataset.

This is my first attempt:

iris %>%
  pivot_longer(-Species, 
               names_to = "Vars", 
               values_to = "Values") %>%
  group_by(Vars) %>%
  summarise(obs = n(),
            mean = round(mean(Values),2),
            median = round(median(Values),2))

So it gives me:

# A tibble: 4 x 4
  Vars           obs  mean median
  <chr>        <int> <dbl>  <dbl>
1 Petal.Length   150  3.76   4.35
2 Petal.Width    150  1.2    1.3 
3 Sepal.Length   150  5.84   5.8 
4 Sepal.Width    150  3.06   3   

This is the expected table:

# A tibble: 4 x 5
  Vars           obs  mean median plot
  <chr>        <int> <dbl>  <dbl> <list>
1 Petal.Length   150  3.76   4.35 <gg>
2 Petal.Width    150  1.2    1.3  <gg>
3 Sepal.Length   150  5.84   5.8  <gg>
4 Sepal.Width    150  3.06   3    <gg>

This is what I have tried:

iris %>%
  pivot_longer(-Species, 
               names_to = "Vars", 
               values_to = "Values") %>%
  group_by(Vars) %>%
  nest() %>%
  mutate(metrics = lapply(data, function(df) df %>% summarise(obs = n(), mean = mean(Values), median = median(Values))),
         plots = lapply(data, function(df) df %>% ggplot(aes(Values))   geom_histogram()))

Almost there, I see this:

# A tibble: 4 x 4
# Groups:   Vars [4]
  Vars         data               metrics          plots 
  <chr>        <list>             <list>           <list>
1 Sepal.Length <tibble [150 × 2]> <tibble [1 × 3]> <gg>  
2 Sepal.Width  <tibble [150 × 2]> <tibble [1 × 3]> <gg>  
3 Petal.Length <tibble [150 × 2]> <tibble [1 × 3]> <gg>  
4 Petal.Width  <tibble [150 × 2]> <tibble [1 × 3]> <gg>  

But I don't know how to see the expected tibble with the obs, mean, median and plots columns without the data and metrics columns. Any help will be greatly appreciated.

CodePudding user response:

We may use cur_data() in summarise and get the output in a list by wrapping

library(dplyr)
library(ggplot2)
library(tidyr)
out <- iris %>%
  pivot_longer(-Species, 
               names_to = "Vars", 
               values_to = "Values") %>%
  group_by(Vars) %>%
  summarise(obs = n(),
            mean = round(mean(Values),2),
            median = round(median(Values),2), 
    plots = list(ggplot(cur_data(), aes(Values))   geom_histogram()))

-output

out
# A tibble: 4 × 5
  Vars           obs  mean median plots 
  <chr>        <int> <dbl>  <dbl> <list>
1 Petal.Length   150  3.76   4.35 <gg>  
2 Petal.Width    150  1.2    1.3  <gg>  
3 Sepal.Length   150  5.84   5.8  <gg>  
4 Sepal.Width    150  3.06   3    <gg>  
  • Related