I am a new user of r (and stackoverflow, excuse my formatting in advance) and am having trouble making a barplot (using ggplot, tidyverse package).
I need to make a bar plot with multiple columns on the x-axis and two subgroups, and a mean value on the y-axis.
My data looks like the following
# A tibble: 6 x 5
Id Baseline Tyr1 Tyr2 Time
<dbl> <dbl> <dbl> <dbl> <chr>
1 1 0.536 0.172 0.141 pre
2 2 0.428 0.046 0.084 post
3 3 0.077 0.015 0.063 pre
4 4 0.2 0.052 0.041 post
5 5 0.161 0.058 0.039 pre
6 6 0.219 0.059 0.05 post
I want to plot a bar graph with x-axis = Baseline, Tyr1, Tyr2 with subgroups of Time, and y-axis = means. I believe I can use the fill function to make the subgroups, however, I can't find a way to get all my columns on the x-axis.
Goal is to make it look something like the following (I am not sure if the picture is getting uploaded?):
CodePudding user response:
The best way to achieve this is to reshape your dataset long, and then use position_dodge
to separate the bars corresponding to different times. So:
library(ggplot2)
library(tidyr)
dat %>%
pivot_longer(cols=-c(Id,Time)) %>%
ggplot(aes(x=name, y=value, fill=Time, group=Time))
stat_summary(geom="col",width=0.8,position=position_dodge())
stat_summary(geom="errorbar",width=0.2,position=position_dodge(width=0.8))
Consider also adding the data points for extra transparency. It will be easier for your readers to understand the data and judge its meaning if they can see the individual points:
dat %>%
pivot_longer(cols=-c(Id,Time)) %>%
ggplot(aes(x=name, y=value, fill=Time, group=Time))
stat_summary(geom="col",width=0.8,position=position_dodge())
stat_summary(geom="errorbar",width=0.2,position=position_dodge(width=0.8))
geom_point(position = position_dodge(width=0.8))
To understand what is happening with the reshape, the intermediate dataset looks like this:
> dat %>%
pivot_longer(cols=-c(Id,Time))
# A tibble: 18 x 4
Id Time name value
<int> <chr> <chr> <dbl>
1 1 pre Baseline 0.536
2 1 pre Tyr1 0.172
3 1 pre Tyr2 0.141
4 2 post Baseline 0.428
5 2 post Tyr1 0.046
6 2 post Tyr2 0.084
7 3 pre Baseline 0.077
8 3 pre Tyr1 0.015
9 3 pre Tyr2 0.063
10 4 post Baseline 0.2
11 4 post Tyr1 0.052
12 4 post Tyr2 0.041
13 5 pre Baseline 0.161
14 5 pre Tyr1 0.058
15 5 pre Tyr2 0.039
16 6 post Baseline 0.219
17 6 post Tyr1 0.059
18 6 post Tyr2 0.05
CodePudding user response:
First gather the baseline, tyr1 & tyr2 columns into key/value pairs (and calculate the mean by group):
long <- gather(df, key, value, -id, -time) %>%
group_by(key, time) %>%
summarise(value = mean(value))
Then plot using time as the group, and dodging the columns:
ggplot(long, aes(x = key, y = value, group = time, fill = time))
geom_col(position="dodge")
labs(y = "mean")
Results in: