Home > database >  How can I create histograms from nested dataframes and store them as objects in a list in r?
How can I create histograms from nested dataframes and store them as objects in a list in r?

Time:03-15

I am a novice coder. I am trying to create a Shiny App for work that takes large datasets of fisheries data, calculates some metrics, and then spits out all required plots and metrics in rMarkdown files. These datasets are filled with numerous observations of multiple different species within multiple different lakes. We want to create plots for each species for each lake.

To get to the desired outputs, I believe I need to nest the dataframes, create geom_histograms for each lake_species combination (cyl_gear combination in my example below), and then store them as objects in a list/column in the primary dataframe so that I can pass the objects into rMarkdown for printing.

Here is an example of what I am asking:

library(dplyr)
library(ggplot2)
library(tidyr)
library(purrr)

nested <- mtcars %>%
  mutate(uniqueID=paste(mtcars$cyl, sep = "_", mtcars$gear),
         gear2=gear) %>%
  group_by(uniqueID, gear) %>%
    nest()

histyfun <- function(x){   ## I know this set of case_when code does not work, but this 
                           ## is my most recent attempt at it.
  
  case_when(x$gear=="3" ~ 
              
              ggplot(data=x$data, aes(x=wt, fill=hp))   
              geom_histogram(binwidth = 0.2, color="black", 
              position = position_stack(reverse=TRUE),
                             breaks=seq(min(data$wt)-0.2, max(data$wt) 0.2, 0.2))  
              scale_fill_continuous(type = "gradient")  
              scale_x_continuous(name="Weight", 
              breaks = seq(min(data$wt)-0.2, max(data$wt) 0.2, 0.2))   
              aes(y=stat(count)/sum(stat(count)))  
              scale_y_continuous(name="Percent Frequency", labels = scales::percent, 
                                 breaks = seq(0, 1, 0.02))  
              labs(fill="")  
              theme(panel.grid.major = element_blank(), 
               panel.grid.minor = element_blank(),
               panel.background = element_blank(), 
               axis.line = element_line(colour = "black")),
            
            
            x$gear=="4" ~ 
              
              ggplot(data=x$data, aes(x=wt, fill=hp))   
              geom_histogram(binwidth = 0.1, color="black", 
              position = position_stack(reverse=TRUE),
                             breaks=seq(min(data$wt)-0.2, max(data$wt) 0.2, 0.2))  
              scale_fill_continuous(type = "gradient")  
              scale_x_continuous(name="Weight", 
              breaks = seq(min(data$wt)-0.2, max(data$wt) 0.2, 0.2))   
              aes(y=stat(count)/sum(stat(count)))  
              scale_y_continuous(name="Percent Frequency", labels = scales::percent, 
                                 breaks = seq(0, 1, 0.02))  
              labs(fill="")  
              theme(panel.grid.major = element_blank(), 
               panel.grid.minor = element_blank(),
               panel.background = element_blank(), 
               axis.line = element_line(colour = "black")),
              
            x$gear=="5" ~ 
              
              ggplot(data=x$data, aes(x=wt, fill=hp))   
              geom_histogram(binwidth = 0.3, color="black", 
              position = position_stack(reverse=TRUE),
                             breaks=seq(min(data$wt)-0.2, max(data$wt) 0.2, 0.2))  
              scale_fill_continuous(type = "gradient")  
              scale_x_continuous(name="Weight", 
              breaks = seq(min(data$wt)-0.2, max(data$wt) 0.2, 0.2))   
              aes(y=stat(count)/sum(stat(count)))  
              scale_y_continuous(name="Percent Frequency", labels = scales::percent, 
                                 breaks = seq(0, 1, 0.02))  
              labs(fill="")  
              theme(panel.grid.major = element_blank(), 
               panel.grid.minor = element_blank(),
               panel.background = element_blank(), 
               axis.line = element_line(colour = "black")),

            TRUE ~ 0
  )
}

mutate(nested, histogram = nested %>% map(histyfun))

I know the above code does not work, but it should hopefully illustrate what I am attempting to create.

I am struggling with how to: A) create my geom_histograms by calling the appropriate column (wt in the example here) inside the nested dataframe and then B) how to store those histograms as objects in the new column/list. I have no idea what I am doing and appreciate any pointers/tips you can give me. Thanks!

CodePudding user response:

The tidyverse packages are incredibly useful for most data manipulation, but they aren't really designed for implementing functions. While this approach is admittedly inelegant and old-school, I think it will give you what you are after. I reworked your function to be called on a list. Instead of using the case_when() function, which is meant for altering values within a tibble or dataframe, I used if() and else() statements. Also, your function didn't have a return() call, so I added that in. Give it a look, hopefully it is what you are after.

library(dplyr)
library(ggplot2)
library(tidyr)
library(purrr)

nested <- mtcars %>%
  mutate(uniqueID=paste(mtcars$cyl, sep = "_", mtcars$gear),
         gear2=gear) %>%
  group_by(uniqueID, gear) %>%
  nest()

histyfun <- function(x){   ## I know this set of case_when code does not work, but this is my most
  ## recent attempt at it.
  
  if(unique(x$gear2)==3){ 
              
              Y<-ggplot(data=x, aes(x=wt, fill=hp))   
              geom_histogram(binwidth = 0.2, color="black", position = position_stack(reverse=TRUE),
                             breaks=seq(min(x$wt)-0.2, max(x$wt) 0.2, 0.2))  
              scale_fill_continuous(type = "gradient")  
              scale_x_continuous(name="Weight", breaks = seq(min(x$wt)-0.2, max(x$wt) 0.2, 0.2))   
              aes(y=stat(count)/sum(stat(count)))  
              scale_y_continuous(name="Percent Frequency", labels = scales::percent, 
                                 breaks = seq(0, 1, 0.02))  
              labs(fill="")  
              theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank(),
                    panel.background = element_blank(), axis.line = element_line(colour = "black"))
            
  }else{
    if(unique(x$gear2)==4){ 
              
              Y<-ggplot(data=x, aes(x=wt, fill=hp))   
              geom_histogram(binwidth = 0.1, color="black", position = position_stack(reverse=TRUE),
                             breaks=seq(min(x$wt)-0.2, max(x$wt) 0.2, 0.2))  
              scale_fill_continuous(type = "gradient")  
              scale_x_continuous(name="Weight", breaks = seq(min(x$wt)-0.2, max(x$wt) 0.2, 0.2))   
              aes(y=stat(count)/sum(stat(count)))  
              scale_y_continuous(name="Percent Frequency", labels = scales::percent, 
                                 breaks = seq(0, 1, 0.02))  
              labs(fill="")  
              theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank(),
                    panel.background = element_blank(), axis.line = element_line(colour = "black"))
    }
    else{
      if(unique(x$gear2)==5){ 
              
              Y<-ggplot(data=x, aes(x=wt, fill=hp))   
              geom_histogram(binwidth = 0.3, color="black", position = position_stack(reverse=TRUE),
                             breaks=seq(min(x$wt)-0.2, max(x$wt) 0.2, 0.2))  
              scale_fill_continuous(type = "gradient")  
              scale_x_continuous(name="Weight", breaks = seq(min(x$wt)-0.2, max(x$wt) 0.2, 0.2))   
              aes(y=stat(count)/sum(stat(count)))  
              scale_y_continuous(name="Percent Frequency", labels = scales::percent, 
                                 breaks = seq(0, 1, 0.02))  
              labs(fill="")  
              theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank(),
                    panel.background = element_blank(), axis.line = element_line(colour = "black"))
      }
    }
  }
  return(Y)
}
nest_list<-as.list(nested$data)
tmp<-lapply(nest_list, as.data.frame)
par(mfrow=c(2,4))
lapply(tmp, histyfun)

CodePudding user response:

A tidyverse approach may look like so.

  1. Make your function a function of two (or ...) arguments, e.g. gear and a dataset x
  2. Instead of purrr::map you could use purrr::pmap (or map2) to loop over both the gear and the data column of your nested dataset
  3. You could probably also simplify your function considerably. Instead of duplicating the plotting code use an if or switch to conditionally set the parameters which are varying depending on the number of gears, e.g. in case of your reprex the binwidth argument.

BTW: After a group_by it's always a good idea to ungroup (especially with nesting).

library(dplyr)
library(ggplot2)
library(tidyr)
library(purrr)

histyfun <- function(gear, x) { ## I know this set of case_when code does not work, but this
  binwidth <- switch(as.character(gear), "3" = .2, "4" = 0.1, .3)
  breaks_x <- seq(min(x$wt) - 0.2, max(x$wt)   0.2, 0.2)

  ggplot(data = x, aes(x = wt, fill = hp))  
    geom_histogram(
      binwidth = binwidth, color = "black",
      position = position_stack(reverse = TRUE)
    )  
    scale_fill_continuous(type = "gradient")  
    scale_x_continuous(
      name = "Weight",
      breaks = breaks_x
    )  
    aes(y = stat(count) / sum(stat(count)))  
    scale_y_continuous(
      name = "Percent Frequency", labels = scales::percent,
      breaks = seq(0, 1, 0.02)
    )  
    labs(fill = "")  
    theme(
      panel.grid.major = element_blank(),
      panel.grid.minor = element_blank(),
      panel.background = element_blank(),
      axis.line = element_line(colour = "black")
    )
}

nested <- mtcars %>%
  mutate(
    uniqueID = paste(mtcars$cyl, sep = "_", mtcars$gear),
    gear2 = gear
  ) %>%
  group_by(uniqueID, gear) %>%
  nest() %>%
  ungroup()

mutate(nested, histogram = pmap(list(gear = gear, x = data), histyfun))
#> # A tibble: 8 × 4
#>    gear uniqueID data               histogram
#>   <dbl> <chr>    <list>             <list>   
#> 1     4 6_4      <tibble [4 × 11]>  <gg>     
#> 2     4 4_4      <tibble [8 × 11]>  <gg>     
#> 3     3 6_3      <tibble [2 × 11]>  <gg>     
#> 4     3 8_3      <tibble [12 × 11]> <gg>     
#> 5     3 4_3      <tibble [1 × 11]>  <gg>     
#> 6     5 4_5      <tibble [2 × 11]>  <gg>     
#> 7     5 8_5      <tibble [2 × 11]>  <gg>     
#> 8     5 6_5      <tibble [1 × 11]>  <gg>
  • Related