I am processing this dataset (bottom of the page) in R for a project. First I load in the data:
count_data <- read.table(file = "../data/GSE156388_read_counts.tsv", header = T, sep = "",
row.names = 1)
I then melt the data using reshape2:
melted_count_data <- melt(count_data)
Then I create a factor for colouring graphs by group:
color_groups <- factor(melted_count_data$variable, labels = rep(c("siTFIP11", "siGl3"), each = 3))
Now we get to the barplot I'm trying to make:
ggplot(melted_count_data, aes(x = variable, y = value / 1e6, fill = color_groups))
geom_bar(stat = "identity") labs(title = "Read counts", y = "Sequencing depth (millions of reads)")
The problem is that this creates a barplot with a bunch of stripes, leading me to believe it is trying to stack a ton of bars on top of each other instead of just creating one solid block.
I also wanted to add data labels to the plot:
geom_text(label = value / 1e6)
but this seemed to just put a bunch of values on top of each other.
For the stacked bars problem I tried to use y = sum(values)
but this just made all the bars the same height. I also tried using y = colSums(values)
but this obviously didn't work because it needs "an array of at least two dimensions".
I tried figuring it out using the unmelted data but to no avail.
I just kind of gave up on the labels since I wasn't even able to fix the bars problem.
EDIT:
I found a thread suggesting this:
ggplot(melted_count_data, aes(x = variable, y = value / 1e6, color = color_groups))
geom_bar(stat = "identity") labs(title = "Read counts", y = "Sequencing depth (millions of reads)")
Changing fill
to color
. This fixes the white lines but results in some (fewer) black lines. Looking at this new chart leads me to believe it might actually be pasting a bunch of charts on top of each other?
CodePudding user response:
You could do:
library(tidyverse)
url <- paste0( "https://www.ncbi.nlm.nih.gov/geo/download/",
"?acc=GSE156388&format=file&file=GSE156388%5",
"Fread_counts.tsv.gz")
tmpfile <- tempfile()
download.file(url, tmpfile)
count_data <- readr::read_tsv(gzfile(tmpfile),
show_col_types = FALSE)
count_data %>%
pivot_longer(-1) %>%
mutate(color_groups = factor(name,
labels = rep(c("siTFIP11", "siGl3"), each = 3))) %>%
group_by(name) %>%
summarise(value = sum(value)/1e6, color_groups = first(color_groups)) %>%
ggplot(aes(name, value, fill = color_groups))
geom_col()
geom_text(aes(label = round(value, 2)), nudge_y = 0.5)
labs(title = "Read counts", x = "", fill = "Type",
y = "Sequencing depth (millions of reads)")
scale_fill_manual(values = c("gold", "deepskyblue3"))
theme_minimal()
Created on 2022-03-21 by the reprex package (v2.0.1)