Home > Software engineering >  Connect medians with lines in dodged boxplots within subgroups?
Connect medians with lines in dodged boxplots within subgroups?

Time:08-15

In the code below, which uses an interaction of two variables to define a group, why ggplot2 correctly plots separate points for the medians, but complains that the group consists of only one observation when I try to plot lines connecting the medians within each group?

Please how can I do it?

dat <- 
    data.frame(method = c("A","A","A","A","A","A","A","A","A","A","B","B","B","B","B","B","B","B","B","B"),
               var = c("X01","X01","X01","X01","X01","X02","X02","X02","X02","X02","X01","X01","X01","X01","X01","X02","X02","X02","X02","X02"), 
               value = c(-1.1,0.3,0.3,-0.6,-0.2,0.7,1.2,0.2,0.3,1.8,0.5,1.1,-0.8,0.8,0.6,1.0,0.5,0.6,1.5,1.3))

ggplot(dat, aes(var, value))  
    geom_boxplot(aes(fill = method), 
                 position = position_dodge(width = .9), 
                 outlier.color = NA)  
    stat_summary(
        fun = median,
        geom = "point",
        aes(group = interaction(method, var)),
        position = position_dodge(width = .9)
    )  
    stat_summary(
        fun = median,
        geom = "line",
        aes(group = interaction(method, var)),
        position = position_dodge(width = .9)
    )
geom_path: Each group consists of only one observation. Do you need to adjust
the group aesthetic?

enter image description here

Desired output:

enter image description here

CodePudding user response:

Because ggplot2 thinks the groups are a combination of method and var. It doesn't know that after you've calculated the median, you'd like to drop the method part of the grouping. I don't think there is an elegant way to get the lines as you want.

Below is an inelegant put pragmatic solution: you first calculate the medians, then feed that layer data to a new line geom.

library(ggplot2)

dat <- 
  data.frame(method = c("A","A","A","A","A","A","A","A","A","A","B","B","B","B","B","B","B","B","B","B"),
             var = c("X01","X01","X01","X01","X01","X02","X02","X02","X02","X02","X01","X01","X01","X01","X01","X02","X02","X02","X02","X02"), 
             value = c(-1.1,0.3,0.3,-0.6,-0.2,0.7,1.2,0.2,0.3,1.8,0.5,1.1,-0.8,0.8,0.6,1.0,0.5,0.6,1.5,1.3))

p <- ggplot(dat, aes(var, value))  
  geom_boxplot(aes(fill = method), 
               position = position_dodge(width = .9), 
               outlier.color = NA)  
  stat_summary(
    fun = median,
    geom = "point",
    aes(group = interaction(method, var)),
    position = position_dodge(width = .9)
  )

p   geom_line(
  data = layer_data(p, 2),
  aes(x = x, group = round(x), y = y)
)

Created on 2022-08-14 by the reprex package (v2.0.1)

  • Related