Home > database >  Create a boxplot for each column of a data frame, according to two label columns from another data f
Create a boxplot for each column of a data frame, according to two label columns from another data f

Time:01-25

I have a data frame with 25 features as columns, call it df1. In another data frame, call it df2, there are two label columns that I want to use. The first label column is cancertype and it has only three cancer types. The second label is stage and it has 4 different stages. Both df1 and df2 have the same exact samples in the rows, in the same order.

I want to create a boxplot for each feature, where the x-axis would be the three cancer types, and for each cancer type, I'd have 4 box plots according to the stage. So in the end, I'll have 25 boxplots, each one for each feature.

Here is a plot as an example:

enter image description here

So here instead of 0.5, 1, and 2, I'd have the three cancer types, and instead of Orange juice and Ascorbic juice, I'll have the 4 stages of cancer. And of course, instead of Tooth Length, I'll have the values of the feature. How can I do that in R?

Here is an example of the data, I combined the two labels columns stage and cancertype from df2, to some of the features of df1, just so you guys see what the data looks like:

structure(list(`CD4-T-cells` = c(-0.126653261025908, -0.146944662222103, 
-0.0115148964617506, -0.215846341670589, -0.146791213061172, 
-0.219857803332179, -0.17081620282989, -0.0891711254320417, -0.0874442820512679, 
-0.220366749282219, -0.133508075707128, -0.193271241796752, -0.201211825907853, 
-0.125225983272556, -0.115043218071495), `CD8-T-cells` = c(-0.157666852910288, 
-0.232728699747416, -0.0160616032665578, -0.209410515957277, 
-0.202956751240547, -0.175764455863453, -0.323439446327317, -0.0338852795521566, 
-0.135834830888521, -0.141126851397699, -0.255009952032843, -0.213879070793134, 
-0.183317103095461, -0.189072789833267, -0.259816019334781), 
    `T-helpers` = c(-0.22607814653843, -0.215166591267584, -0.00118837391747009, 
    -0.312608050611363, -0.341117207247747, -0.294448125003604, 
    -0.292393762983215, -0.118961302821571, -0.253841509210292, 
    -0.183748456919948, -0.260611567417525, -0.265980836082014, 
    -0.200401095547006, -0.189306997555047, -0.284547076718132
    ), `NK-cells` = c(-0.0768979654779382, -0.156845903079279, 
    -0.00175695677961327, 0.0435753727359533, -0.113414879053929, 
    -0.0719972487895254, -0.173138898156026, -0.0363772958157232, 
    -0.113696954024299, -0.0452152425828402, -0.169035653930283, 
    -0.0850518565032839, -0.0992324194948109, -0.0820755296254414, 
    -0.158947253410155), cancertype = c("Melanoma", "Urothelial Bladder Carcinoma", 
    "Urothelial Bladder Carcinoma", "Renal Clear Cell Carcinoma", 
    "Renal Clear Cell Carcinoma", "Renal Clear Cell Carcinoma", 
    "Renal Clear Cell Carcinoma", "Melanoma", "Renal Clear Cell Carcinoma", 
    "Melanoma", "Renal Clear Cell Carcinoma", "Melanoma", "Melanoma", 
    "Renal Clear Cell Carcinoma", "Urothelial Bladder Carcinoma"
    ), stage = c("stage1", "stage2", "stage4", "Stage3", " Stage2", 
    "Stage4", "Stage1", "Stage1", "Stage1", " Stage1", "Stage3", 
    "Stage1", "Stage3", "Stage3", "Stage4")), class = "data.frame", row.names = c("Pt1", 
"Pt10", "Pt101", "Pt103", "Pt106", "Pt11", "Pt17", "Pt18", "Pt2", 
"Pt24", "Pt26", "Pt27", "Pt28", "Pt29", "Pt3"))

EDIT -

Apparently, I have too many features so presenting the plots together with facet_wrap would make a mess. Therefore, what I need is to save each plot individually inside a list, and then for example, I call mylist[enter image description here

EDIT And to output one plot per feature as a list you could do:

library(tidyverse)

dat |> 
  pivot_longer(-c(cancertype, stage), names_to = "feature") |> 
  mutate(stage = stringr::str_to_title(stage),
         stage = stringr::str_trim(stage)) |> 
  group_split(feature) |>
  lapply(function(.data) {
    ggplot(.data, aes(cancertype, value, fill = stage))  
      geom_boxplot(position = position_dodge(preserve = "single"))  
      scale_x_discrete(labels = ~str_wrap(.x, 25))  
      scale_fill_brewer(palette = "Set1")  
      labs(title = unique(.data$feature))  
      theme_bw()
  })
  • Related