Home > Software engineering >  ggplot crossbar mislocalization in log-scaled data
ggplot crossbar mislocalization in log-scaled data

Time:05-03

I want to plot my data as a scatter plot with the mean sem error bar by using ggplot2. I am using stat_summary to add the mean bar and errorbar. As the variation is pretty huge, I used scale_y_continuous to transform the y axis as a log10 scale for better visualization.

Here is the example data:

Value <- c(815,2467,4130,32588,171,68,582,476)
Treatment <- c(rep("Ctl",4),rep("Mutant",4))
data.frame(Value, Treatment)

It works fine when I use the linear y-axis. The crossbar localized on 10000, the mean of Ctl group.

plot_linear <- dat %>% 
  ggplot(aes(x=Treatment, y=Value, color = Treatment))  
  geom_dotplot(aes(color = Treatment), fill = "white", stroke = 2,
               binaxis='y', stackdir='center', dotsize = 1,
               position=position_dodge(0.9))  
  stat_summary(fun = mean, geom = "crossbar", size = 1, width = 0.6, position=position_dodge(0.9))  
  stat_summary(fun.data = mean_se, geom = "errorbar", size = 0.5, width = 0.3, position=position_dodge(0.9))  
  theme_bw() 

enter image description here However, if I log transform the y axis, the crossbar for the mean value of Ctl always localizes on the second-highest point (4130) but not the mean point (10000).

plot_log <- dat %>% 
        ggplot(aes(x=Treatment, y=Value, color = Treatment))  
        geom_dotplot(aes(color = Treatment), fill = "white", stroke = 2,
                     binaxis='y', stackdir='center', dotsize = 1,
                     position=position_dodge(0.9))  
        stat_summary(fun = mean, geom = "crossbar", size = 1, width = 0.6, position=position_dodge(0.9))  
        stat_summary(fun.data = mean_se, geom = "errorbar", size = 0.5, width = 0.3, position=position_dodge(0.9))  
        theme_bw()  
        # log scaled y axis
        scale_y_continuous(trans = log10_trans(),
                           breaks = trans_breaks("log10", function(x) 10^x))

enter image description here I don't understand the logic of this wired localization for the crossbar.

Is there a way to plot the mean bar for the log-scaled data?

Thanks a lot!

CodePudding user response:

Is the issue related to you adding the log scale after everything else, so when the error bars are made, they are made for the non-log axis and then the axis is overwritten in log scale? I would try putting the scale_y_conintous() function first after the ggplot() function.

CodePudding user response:

The stat_summary steps are performed after the transformation, which is why the mean looks different with log scale. This is described a bit here:

enter image description here

  1. Apply the inverse transform to your summary variables. This is probably more brittle and more likely to confuse others reading the code, but it does keep the calculation within the ggplot2 chain.

dat %>% 
  ggplot(aes(x=Treatment, y=Value, color = Treatment))  
  geom_dotplot(aes(color = Treatment), fill = "white", stroke = 2,
               binaxis='y', stackdir='center', dotsize = 1,
               position=position_dodge(0.9))  
  stat_summary(fun = function(x) log10(mean(10^x)), geom = "crossbar", size = 1, width = 0.6, position=position_dodge(0.9))  
  stat_summary(fun.data = function(x) log10(mean_se(10^x)), geom = "errorbar", size = 0.5, width = 0.3, position=position_dodge(0.9))  
  theme_bw()  
  scale_y_continuous(trans = log10_trans(),
                     breaks = trans_breaks("log10", function(x) 10^x))
  • Related