Home > Software design >  ggplot2 Error Bars only 1 direction (dependent on group)
ggplot2 Error Bars only 1 direction (dependent on group)

Time:11-27

I apologize if this question has been asked already somewhere. I have found some forum posts but with no great solutions for my current situation.

I have the following made-up example data set:


     Subject <- c(1,2,3,1,2,3,1,2,3,1,2,3,1,2,3,1,2,3)
     Condition <- c("A", "A", "A", "A", "A", "A", "B", "B", "B", "B", "B", "B", "C", "C", "C", "C", "C", "C")
     Time <- c(1,1,1,2,2,2,1,1,1,2,2,2,1,1,1,2,2,2)
     Value1 <- c(600,550,450,300,325,250,610,545,453,323,299,280,575,560,475,100,140,85)

     DF1 <- data.frame (Subject, Condition, Time, Value1)

I have using ggplot to graph this data via line graph with error bars. The goal of this graph is to create an academic publication-ready figure. Therefore, due to the fact that I have large standard deviations (they are even larger in the real data), I would like to only show the upper error bar for the condition with the highest line, and the lower error bar for the conditions with lower lines to try and visually clean this up.

In my example data frame, there is only one Value however, in reality I have 9 Value variables for each Subject, in each condition, for each time. As a result, I would like to avoid (if possible) manually calculating the mean and SD for each of these combinations.

I am currently using the following ggplot code:

     PublicationPlot <-ggplot(DF1, aes(Time, Value1, shape = Condition))
     PublicationPlot   stat_summary(fun = mean,
                                    geom = "point",
                                     size= 2,
                                     aes(group = Condition)) 
                       stat_summary(fun= mean, 
                                    geom = "line",
                                    aes(group = Condition,
                                    linetype = Condition))  
                       stat_summary(fun.data = mean_cl_normal,
                                    geom = "errorbar",
                                    width = 0.075,
                                   aes(group = Condition)) 
                       xlab("Measurement Times") 
                       ylab("Value 1 (Units)") 
           theme(panel.grid.major = element_blank(),
                 panel.grid.minor = element_blank(),
                 panel.background = element_blank(),
                 axis.line=element_line(color = "black"),
                 legend.key = element_rect(fill= "white"),
                 axis.title.x = element_text(size = 15),
                 axis.text.x = element_text(size = 13),
                 axis.title.y = element_text(size = 15),
                 axis.text.y = element_text(size = 13),
                 legend.title = element_text( size=12), 
                 legend.text=element_text(size=12))


Any help on this problem would be incredible. Thank you for your time and expertise. I look forward to learning from you.

CodePudding user response:

Although the summary functions in ggplot are great for quickly generating commonly used data manipulations, I often find that people tie themselves in knots trying to get the built in ggplot functions trying to do things they weren't designed for. As I often say, you should work out what you want to plot, then plot it.

In your case, it sounds as though you want to have a separate dataframe with the grouped minima and maxima at time 1 and time 2. Therefore, you could do something like this:

times <- lapply(split(DF1, DF1$Time), 
                function(x) do.call(rbind,
                  lapply(split(x$Value1, x$Condition), mean_cl_normal)))

DF2 <- do.call(rbind, 
               lapply(times, 
               function(x) data.frame(ymin = min(x$ymin), ymax = max(x$ymax))))

DF2$Time <- 1:2 

Which then allows you to draw the error bars directly rather than relying on stat_summary:

 PublicationPlot <-ggplot(DF1, aes(Time, Value1, shape = Condition))
 PublicationPlot   stat_summary(fun = mean,
                                geom = "point",
                                 size= 2,
                                 aes(group = Condition)) 
                   stat_summary(fun= mean, 
                                geom = "line",
                                aes(group = Condition,
                                linetype = Condition))  
                   geom_errorbar(data = DF2,
                                 aes(x = Time, ymin = ymin, ymax = ymax),
                                 inherit.aes = FALSE,
                                 width = 0.075)  
                   xlab("Measurement Times") 
                   ylab("Value 1 (Units)") 
       theme(panel.grid.major = element_blank(),
             panel.grid.minor = element_blank(),
             panel.background = element_blank(),
             axis.line=element_line(color = "black"),
             legend.key = element_rect(fill= "white"),
             axis.title.x = element_text(size = 15),
             axis.text.x = element_text(size = 13),
             axis.title.y = element_text(size = 15),
             axis.text.y = element_text(size = 13),
             legend.title = element_text( size=12), 
             legend.text=element_text(size=12))

enter image description here

CodePudding user response:

I addition to the Solution by @allan-cameron, you could also just leave the whole summary-shabang out of thee ggplot call and just aggregate your data in an earlier part of the pipeline:

  library(tidyverse)

Subject <- c(1,2,3,1,2,3,1,2,3,1,2,3,1,2,3,1,2,3)
Condition <- c("A", "A", "A", "A", "A", "A", "B", "B", "B", "B", "B", "B", "C", "C", "C", "C", "C", "C")
Time <- c(1,1,1,2,2,2,1,1,1,2,2,2,1,1,1,2,2,2)
Value1 <- c(600,550,450,300,325,250,610,545,453,323,299,280,575,560,475,100,140,85)

DF1 <- data.frame (Subject, Condition, Time, Value1)

DF1 %>%
  group_by(Condition,
           Time) %>%
  summarise(
    m = mean(Value1),
    SD = sd(Value1),
    upper = m   SD,
    lower = m - SD
  ) %>%
  ungroup() %>% 
  group_by(Time) %>% 
  mutate(upper = max(upper),
         lower = min(lower)) %>% 
  ggplot(aes(x = Time, y = m))  
  geom_point(aes(shape = Condition))  
  geom_line(aes(lty = Condition))  
  geom_errorbar(aes(ymin = lower,
                    ymax = upper),
                width = .075)  
  xlab("Measurement Times") 
  ylab("Value 1 (Units)") 
  theme(panel.grid.major = element_blank(),
        panel.grid.minor = element_blank(),
        panel.background = element_blank(),
        axis.line=element_line(color = "black"),
        legend.key = element_rect(fill= "white"),
        axis.title.x = element_text(size = 15),
        axis.text.x = element_text(size = 13),
        axis.title.y = element_text(size = 15),
        axis.text.y = element_text(size = 13),
        legend.title = element_text( size=12), 
        legend.text=element_text(size=12))
#> `summarise()` has grouped output by 'Condition'. You can override using the `.groups` argument.

This solution relies on other tidyverse packages to replace the limits of your errorbars by the relative maximum. From an academic point of view, the SD of the Condition C at the second time gets lost in this way though, so you might want a slighlty different solution to this, that keeps errorbars that do not overlap.

Here is a solution that does just this:

DF1 <- data.frame (Subject, Condition, Time, Value1)

DF1 %>%
  group_by(Condition,
           Time) %>%
  summarise(
    m = mean(Value1),
    SD = sd(Value1),
    upper = m   SD,
    lower = m - SD
  ) %>%
  ungroup() %>% 
  group_by(Time) %>% 
  mutate(upper = map_dbl(upper, ~ if (any(upper > .) &
                                      !any(. > m & . < upper)) {
    .
  } else{
    max(upper[m<= .])
  }),
  lower = map_dbl(lower, ~ if (any(lower < .) &
                               !any(. < m & . > lower)) {
    .
  } else{
    min(lower[m>= .])
  })) %>%
  ggplot(aes(x = Time, y = m))  
  geom_point(aes(shape = Condition))  
  geom_line(aes(lty = Condition))  
  geom_errorbar(aes(ymin = lower,
                    ymax = upper),
                width = .075)  
  xlab("Measurement Times") 
  ylab("Value 1 (Units)") 
  theme(panel.grid.major = element_blank(),
        panel.grid.minor = element_blank(),
        panel.background = element_blank(),
        axis.line=element_line(color = "black"),
        legend.key = element_rect(fill= "white"),
        axis.title.x = element_text(size = 15),
        axis.text.x = element_text(size = 13),
        axis.title.y = element_text(size = 15),
        axis.text.y = element_text(size = 13),
        legend.title = element_text( size=12), 
        legend.text=element_text(size=12))
#> `summarise()` has grouped output by 'Condition'. You can override using the `.groups` argument.

You essentially use the max/min upper or lower limit that does not belong to a mean that is higher/lower than the limit you are testing for.


Edit: I just saw that your summary function uses the t-CI as generated by the mean_cl_normal-function, here is a solution that does that too.

  library(tidyverse)

Subject <- c(1,2,3,1,2,3,1,2,3,1,2,3,1,2,3,1,2,3)
Condition <- c("A", "A", "A", "A", "A", "A", "B", "B", "B", "B", "B", "B", "C", "C", "C", "C", "C", "C")
Time <- c(1,1,1,2,2,2,1,1,1,2,2,2,1,1,1,2,2,2)
Value1 <- c(600,550,450,300,325,250,610,545,453,323,299,280,575,560,475,100,140,85)

DF1 <- data.frame (Subject, Condition, Time, Value1)

DF1 %>% 
  group_by(Condition,
           Time) %>%
  summarise(
    m = mean(Value1),
    SD = sd(Value1),
    upper = m   qt((1   .95)/2, n()-1) * SD/sqrt(n()),
    lower = m - qt((1   .95)/2, n()-1) * SD/sqrt(n())
  ) %>%
  ungroup() %>% 
  group_by(Time) %>% 
  mutate(upper = map_dbl(upper, ~ if (any(upper > .) &
                                      !any(. > m & . < upper)) {
    .
  } else{
    max(upper[m<= .])
  }),
  lower = map_dbl(lower, ~ if (any(lower < .) &
                               !any(. < m & . > lower)) {
    .
  } else{
    min(lower[m>= .])
  })) %>%
  ggplot(aes(x = Time, y = m))  
  geom_point(aes(shape = Condition))  
  geom_line(aes(lty = Condition))  
  geom_errorbar(aes(ymin = lower,
                    ymax = upper),
                width = .075)  
  xlab("Measurement Times") 
  ylab("Value 1 (Units)") 
  theme(panel.grid.major = element_blank(),
        panel.grid.minor = element_blank(),
        panel.background = element_blank(),
        axis.line=element_line(color = "black"),
        legend.key = element_rect(fill= "white"),
        axis.title.x = element_text(size = 15),
        axis.text.x = element_text(size = 13),
        axis.title.y = element_text(size = 15),
        axis.text.y = element_text(size = 13),
        legend.title = element_text( size=12), 
        legend.text=element_text(size=12))
#> `summarise()` has grouped output by 'Condition'. You can override using the `.groups` argument.

  • Related