Home > Back-end >  How to use generate multiple colour scales in ggplot for every variable
How to use generate multiple colour scales in ggplot for every variable

Time:01-15

I am trying to generate a ggplot with the following aesthetic (see below) on relatively small data set (x = case number, y1, y2, y3 etc. = multiple variables pertaining to different characteristics of the cases)

Case <- c("Case 1", "Case 2", "Case 3", "Case 4", "Case 5")
Age <- c(53, 46, 72, 68, 45)
Tumor_Stage <- c(1, 2, 3, 1, 2) 
Tumor_Grade <- c(3, 1, 2, 2, 1)
Smoking_Status <- c(0,1 ,1 ,0 ,1)
CD3 <- c(0,1,0,0,1)
df <- tibble(Case, Age, Tumor_Stage, Tumor_Grade, Smoking_Status, CD3)
df1 <- df %>% pivot_longer(cols = c(Age, Tumor_Stage, Tumor_Grade, Smoking_Status,CD3),
                                          names_to = "Variables")

ggplot(df1,aes(x = Case, 
                     y = Variables, 
                     col = value,
                     fill = value))  
  geom_tile()

I am getting the following plot:

enter image description here

CodePudding user response:

One option to get an individual fill scale and legend for each variable via ggnewscale would be to use multiple geom_tile layers, one for each variable. To this end split your dataframe by Variable, then use e.g. purrr::imap to add the single layers:

library(ggplot2)
library(ggnewscale)

df1_split <- split(df1, df1$Variables)

legend_order <- rev(seq_along(df1_split))
names(legend_order) <- names(df1_split)

ggplot(df1, aes(
  x = Case,
  y = Variables
))  
  purrr::imap(df1_split, function(x, y) {
    order <- legend_order[[y]]
    list(
      geom_tile(aes(fill = value), data = x),
      scale_fill_gradient(
        name = y,
        guide = guide_colorbar(direction = "horizontal", title.position = "top", order = order)
      ),
      new_scale_fill()
    )
  })

A second option to achieve your desired result would be to create separate plots for each variable and glue them together using e.g. patchwork. While this approach requires some more effort to make the patch look like one plot, i.e. setting the plot margins and removing the axis, one advantage is that the legends are nicely aligned with y axis categories. And I would guess that this approach was used for the example plot which you added as an image.


library(patchwork)

plot_fun <- function(x, y) {
  theme_adjust <- if (y != "Tumor_Stage") {
    theme(
      axis.line.x = element_blank(),
      axis.text.x = element_blank(),
      axis.title.x = element_blank(),
      axis.ticks.x = element_blank(),
      axis.ticks.length.x = unit(0, "pt")
    )  
  }
  
  plot_margin <- if (y == "Tumor_Stage") {
    theme(plot.margin = margin(0, 5.5, 5.5, 5.5))
  } else if (y == "Age") {
    theme(plot.margin = margin(5.5, 5.5, 0, 5.5))
  } else {
    theme(plot.margin = margin(0, 5.5, 0, 5.5))
  }
  
  ggplot(df1, aes(
    x = Case,
    y = Variables
  ))  
    geom_tile(aes(fill = value), data = x)  
    scale_fill_gradient(
          name = y,
          guide = guide_colorbar(direction = "horizontal", title.position = "top")
        )  
    scale_y_discrete(expand = c(0, 0))  
    theme_adjust  
    plot_margin  
    labs(y = NULL)
}

purrr::imap(df1_split, plot_fun) |> 
  wrap_plots(ncol = 1)

EDIT Concerning your second question. If you have a mix of categorical and numerical variables I would suggest to use the data in wide format. For the example below I slightly altered the example data and converted Smoking_Status and Tumor_Stage to factors. Concerning the fill scales. There are in general various approach. An easy but probably not the most elegant approach would be to create a list of fill scales, i.e. a list containing the desired fill scale for each variable. I also opted for the patchwork approach. Note that as I now use the wide dataset it's no longer necessary to split the dataset. Instead we have to loop over the column names.

Case <- c("Case 1", "Case 2", "Case 3", "Case 4", "Case 5")
Age <- c(53, 46, 72, 68, 45)
Tumor_Stage <- c(1, 2, 3, 1, 2)
Tumor_Grade <- c(3, 1, 2, 2, 1)
Smoking_Status <- c(0, 1, 1, 0, 1)
CD3 <- c(0, 1, 0, 0, 1)
df <- data.frame(Case, Age, Tumor_Stage, Tumor_Grade, Smoking_Status, CD3)

df$Smoking_Status <- factor(df$Smoking_Status)
df$Tumor_Stage <- factor(df$Tumor_Stage)

library(ggplot2)
library(patchwork)

cols <- c("Age", "Tumor_Stage", "Tumor_Grade", "Smoking_Status", "CD3")
cols <- sort(cols)

scale_fill <- lapply(cols, function(x) {
  if (x == "Smoking_Status") {
    scale_fill_brewer(type = "div", name = x, palette = "BrBG",
                      guide = guide_legend(direction = "horizontal", title.position = "top"))
  } else if (x == "Tumor_Stage") {
    scale_fill_brewer(type = "div", name = x, palette = "PiYG",
                      guide = guide_legend(direction = "horizontal", title.position = "top"))
  } else if (x == "Age") {
    scale_fill_gradient(name = x, low = "lightgreen", high = "darkgreen",
                        guide = guide_colorbar(direction = "horizontal", title.position = "top")
    )
  } else {
    scale_fill_gradient(name = x, 
                        guide = guide_colorbar(direction = "horizontal", title.position = "top")
    ) 
  }
})
names(scale_fill) <- cols

plot_fun <- function(x) {
  theme_adjust <- if (x != "Tumor_Stage") {
    theme(
      axis.line.x = element_blank(),
      axis.text.x = element_blank(),
      axis.title.x = element_blank(),
      axis.ticks.x = element_blank(),
      axis.ticks.length.x = unit(0, "pt")
    )  
  }
  
  plot_margin <- if (x == "Tumor_Stage") {
    theme(plot.margin = margin(0, 5.5, 5.5, 5.5))
  } else if (x == "Age") {
    theme(plot.margin = margin(5.5, 5.5, 0, 5.5))
  } else {
    theme(plot.margin = margin(0, 5.5, 0, 5.5))
  }
  
  scale_fill <- scale_fill[[x]]
  ggplot(df, aes(
    x = Case,
    y = x
  ))  
    geom_tile(aes(fill = .data[[x]]))  
    scale_fill  
    scale_y_discrete(expand = c(0, 0))  
    theme_adjust  
    plot_margin  
    labs(y = NULL)
}

purrr::map(cols, plot_fun) |> 
  wrap_plots(ncol = 1)

  • Related