Home > Software engineering >  How do I create a non-stacked barplot with data labels using ggplot2 in R?
How do I create a non-stacked barplot with data labels using ggplot2 in R?

Time:03-22

I am processing this dataset (bottom of the page) in R for a project. First I load in the data:

count_data <- read.table(file = "../data/GSE156388_read_counts.tsv", header = T, sep = "",
                         row.names = 1)

I then melt the data using reshape2:

melted_count_data <- melt(count_data)

Then I create a factor for colouring graphs by group:

color_groups <- factor(melted_count_data$variable, labels = rep(c("siTFIP11", "siGl3"), each = 3))

Now we get to the barplot I'm trying to make:

ggplot(melted_count_data, aes(x = variable, y = value / 1e6, fill = color_groups))  
  geom_bar(stat = "identity")   labs(title = "Read counts", y = "Sequencing depth (millions of reads)")

The problem is that this creates a barplot with a bunch of stripes, leading me to believe it is trying to stack a ton of bars on top of each other instead of just creating one solid block.

I also wanted to add data labels to the plot:

  geom_text(label = value / 1e6)

but this seemed to just put a bunch of values on top of each other.

For the stacked bars problem I tried to use y = sum(values) but this just made all the bars the same height. I also tried using y = colSums(values) but this obviously didn't work because it needs "an array of at least two dimensions".
I tried figuring it out using the unmelted data but to no avail.

I just kind of gave up on the labels since I wasn't even able to fix the bars problem.

EDIT:
I found a thread suggesting this:

ggplot(melted_count_data, aes(x = variable, y = value / 1e6, color = color_groups))  
  geom_bar(stat = "identity")   labs(title = "Read counts", y = "Sequencing depth (millions of reads)")

Changing fill to color. This fixes the white lines but results in some (fewer) black lines. Looking at this new chart leads me to believe it might actually be pasting a bunch of charts on top of each other?

CodePudding user response:

You could do:

library(tidyverse)

url <- paste0( "https://www.ncbi.nlm.nih.gov/geo/download/", 
               "?acc=GSE156388&format=file&file=GSE156388%5", 
               "Fread_counts.tsv.gz")

tmpfile <- tempfile()
download.file(url, tmpfile)

count_data <- readr::read_tsv(gzfile(tmpfile),
                              show_col_types = FALSE)

count_data %>% 
  pivot_longer(-1) %>%
  mutate(color_groups = factor(name, 
                          labels = rep(c("siTFIP11", "siGl3"), each = 3))) %>%
  group_by(name) %>%
  summarise(value = sum(value)/1e6, color_groups = first(color_groups)) %>%
  ggplot(aes(name, value, fill = color_groups))  
  geom_col()   
  geom_text(aes(label = round(value, 2)), nudge_y = 0.5)  
  labs(title = "Read counts", x = "", fill = "Type",
       y = "Sequencing depth (millions of reads)")  
  scale_fill_manual(values = c("gold", "deepskyblue3"))  
  theme_minimal()

Created on 2022-03-21 by the reprex package (v2.0.1)

  • Related