Home > OS >  R bar-plot with different columns on the x-axis
R bar-plot with different columns on the x-axis

Time:04-09

I am a new user of r (and stackoverflow, excuse my formatting in advance) and am having trouble making a barplot (using ggplot, tidyverse package).

I need to make a bar plot with multiple columns on the x-axis and two subgroups, and a mean value on the y-axis.

My data looks like the following

# A tibble: 6 x 5
     Id Baseline  Tyr1  Tyr2 Time 
  <dbl>    <dbl> <dbl> <dbl> <chr>
1     1    0.536 0.172 0.141 pre  
2     2    0.428 0.046 0.084 post 
3     3    0.077 0.015 0.063 pre  
4     4    0.2   0.052 0.041 post 
5     5    0.161 0.058 0.039 pre  
6     6    0.219 0.059 0.05  post  

I want to plot a bar graph with x-axis = Baseline, Tyr1, Tyr2 with subgroups of Time, and y-axis = means. I believe I can use the fill function to make the subgroups, however, I can't find a way to get all my columns on the x-axis.

Goal is to make it look something like the following (I am not sure if the picture is getting uploaded?):

CodePudding user response:

The best way to achieve this is to reshape your dataset long, and then use position_dodge to separate the bars corresponding to different times. So:

library(ggplot2)
library(tidyr)
dat %>% 
  pivot_longer(cols=-c(Id,Time)) %>% 
  ggplot(aes(x=name, y=value, fill=Time, group=Time))   
    stat_summary(geom="col",width=0.8,position=position_dodge())   
    stat_summary(geom="errorbar",width=0.2,position=position_dodge(width=0.8))

enter image description here

Consider also adding the data points for extra transparency. It will be easier for your readers to understand the data and judge its meaning if they can see the individual points:

dat %>% 
  pivot_longer(cols=-c(Id,Time)) %>% 
  ggplot(aes(x=name, y=value, fill=Time, group=Time))   
    stat_summary(geom="col",width=0.8,position=position_dodge())   
    stat_summary(geom="errorbar",width=0.2,position=position_dodge(width=0.8))   
    geom_point(position = position_dodge(width=0.8))

enter image description here

To understand what is happening with the reshape, the intermediate dataset looks like this:

> dat %>% 
    pivot_longer(cols=-c(Id,Time)) 
# A tibble: 18 x 4
      Id Time  name     value
   <int> <chr> <chr>    <dbl>
 1     1 pre   Baseline 0.536
 2     1 pre   Tyr1     0.172
 3     1 pre   Tyr2     0.141
 4     2 post  Baseline 0.428
 5     2 post  Tyr1     0.046
 6     2 post  Tyr2     0.084
 7     3 pre   Baseline 0.077
 8     3 pre   Tyr1     0.015
 9     3 pre   Tyr2     0.063
10     4 post  Baseline 0.2  
11     4 post  Tyr1     0.052
12     4 post  Tyr2     0.041
13     5 pre   Baseline 0.161
14     5 pre   Tyr1     0.058
15     5 pre   Tyr2     0.039
16     6 post  Baseline 0.219
17     6 post  Tyr1     0.059
18     6 post  Tyr2     0.05

CodePudding user response:

First gather the baseline, tyr1 & tyr2 columns into key/value pairs (and calculate the mean by group):

long <- gather(df, key, value, -id, -time) %>%
  group_by(key, time) %>%
  summarise(value = mean(value))

Then plot using time as the group, and dodging the columns:

ggplot(long, aes(x = key, y = value, group = time, fill = time))   
  geom_col(position="dodge")  
  labs(y = "mean")

Results in:

plot

  • Related