Home > Software engineering >  How to create multiple density plots (by number of column) with for loop in R
How to create multiple density plots (by number of column) with for loop in R

Time:09-18

Apologies, I'm going to do my best to create a reproducible example, but I am not sure if it's good enough, so excuse me if it isn't. I am trying to create multiple density plots to inspect the distribution of counts in 300 variables (species) as seperated by participant_group. I have a dataframe with dimensions 97 (participants) x 320. My first 2 columns are participant_id and participant_group respectively, and the rest 318 columns are the names of the species with their respective counts. I want to create a density plot for each of them.

participant_id <- c("P01","P02","P03","P04","P05","P06","P07","P08","P09","P10")
participant_group <- c("control", "responsive", "non-responsive", "control", "responsive", "non-responsive", "non-responsive", "control", "responsive", "non-responsive")
A <- c (0, 54, 0, 35, 76, 890, 45, 0, 1, 99)
B <- c (10, 504, 1, 52, 76, 90, 15, 20, 21, 9)
C <- c (460, 54, 5, 35, 7, 9, 45, 0, 1, 0)
D <- c (870, 654, 40, 5, 760, 80, 45, 0, 1, 76)
example_df <- data.frame(participant_id, participant_group, A, B, C, D)

So in this example, I would like to create density plots from column 3 to the ncol(example_df) I have tried the following code while trying to loop the number of columns or the name of the columns:

library(ggplot2)

# looping number of columns
loop.vector <- 3:6
plot_by_number <- for (i in loop.vector) { 
  taxa <- example_df[,i]
  ggplot(example_df, aes(x=taxa, group=participant_group, fill=participant_group))  
          geom_density(adjust=1.5, alpha=.4)
}

# OR

# looping species names
species_names<-colnames(example_df[,3:ncol(example_df)])
plot_by_name <- for (i in species_names) { 
  ggplot(example_df, aes(x=i, group=participant_group, fill=participant_group))  
          geom_density(adjust=1.5, alpha=.4)
}


However, I get NULL for the plots. When I run the simple codes with the name or number of column it works:

ggplot(example_df, aes(x=A, group=participant_group, fill=participant_group))  
          geom_density(adjust=1.5, alpha=.4)
# OR

ggplot(example_df, aes(x=[,3], group=participant_group, fill=participant_group))  
          geom_density(adjust=1.5, alpha=.4)

I would also like to add the name of every column (species) as the title of the plot and save them all in a pdf file, however I am still far away from that.

I would really appreciate any help. Thanks for reading!

CodePudding user response:

Here is one way of doing it.

Create a custom function, with aes_string for the single variable (columns)

f_ggplot <- function(v_column){
    ggplot(data = example_df, 
           aes(group = participant_group, 
               fill = participant_group))  
    geom_density(aes_string(x = v_column),
                 adjust = 1.5, 
                 alpha = 0.4)  
    labs(title = paste("Title for variable", v_column))
}

You can use the function on a single column:

f_ggplot("A")

enter image description here

Or pass a list of columns names (strings) to a lapply.

l_cols <- c(LETTERS[1:4])
lapply(l_cols, f_ggplot)

enter image description here

PS: To make a report with all these result use Rmarkdown with pdf as output.

A note on tidyeval. Since ggplot2 3.0 (see here)

aes() now supports quasiquotation so that you can use !!, !!!, and :=. This replaces aes_() and aes_string() which are now soft-deprecated (but will remain around for a long time)

With that in mind, you could use sym() and !!, and rewrite the function as:

f_ggplot <- function(v_column){
    v_column2 <- sym(v_column)
    ggplot(data = example_df, 
          aes(x = !!v_column2,
              group = participant_group, 
              fill = participant_group))  
          geom_density(adjust = 1.5, 
                       alpha = 0.4)  
    labs(title = paste("Title for variable", v_column2))
}

The use is the same as before. More options on this question.

More on quasiquotation in Advanced R.

  • Related