Home > database >  Connect medians within nested groups in ggplot?
Connect medians within nested groups in ggplot?

Time:12-14

I am trying to plot these different groups and connect the medians of every series (A,B,C) within each group with a line.

    n <- 60
    data <- data.frame(series=rep(LETTERS[1:3], n/3), 
                   sex=rep(c("F","M"),each=30), 
                   setting=rep(c("wild","rural"),n/2),
                   fit=rnorm(n))

    ggplot(data,aes(x=sex, y=fit, fill=series))  
    geom_boxplot(width=.3,aes( alpha=.5,color=sex),
              lwd=0.8, position = position_dodge(width = 0.6))   
    facet_grid(~setting)  
    stat_summary(fun.y=median, geom="point", shape=23, size=2,
               position=position_dodge(width = 0.6))  
    geom_text(aes(y=-2.5, label=series),  position=position_dodge(width=0.6))  

    geom_point(shape=20,alpha=0.2,position=position_jitterdodge(dodge.width = 0.6,jitter.width = 0.25)) 
    theme_blank()

This is what it looks like so far-

I have been playing around with geom_line and stats_summary, but not super succesfully, as the grouping always seems wrong. Here is an example of what I tried with stats_summary and what looks like the 4 lines that I actually want (connecting A to C in each group), but is not overlayed with the boxplots for some reason...

    stat_summary(fun.y=median, colour="red", geom="line",aes(x=series, group=interaction(setting,sex)),
             position=position_dodge(width = 0.6))

I have also found a very similar answered question here, but I could not get this soulution working for me. I get an error message when running the second part (Error in geom_line():Problem while computing aesthetics.ℹ Error occurred in the 4th layer. Caused by error in FUN():! object 'series' not found)

At this point I'd also be happy to accept any type of workaround, but couldn't come up with it myself. Thanks!

CodePudding user response:

Basically this is an adaption of the approach in the answer you referenced to your case which makes use of "manual dodging", which means to compute the x (and of course the y). positions for your median lines manually. To this end we first have to convert the sex column which is mapped on x to a numeric. Second, we have to shift the positions depending series (and of course taking account of the width by which you dodge the box plots and the points), i.e. for group A we shift the position to the left and for group C to the right.

set.seed(123)

library(ggplot2)
library(dplyr)

data_line <- data %>%
  group_by(setting, sex, series) %>%
  summarise(fit = median(fit)) |>
  ungroup() |> 
  # Manual dodging 
  mutate(x_num = as.numeric(factor(sex))   seq(-1, 1, length.out = 3) * .6 / 3)
#> `summarise()` has grouped output by 'setting', 'sex'. You can override using
#> the `.groups` argument.

ggplot(data, aes(x = sex, y = fit, fill = series))  
  geom_boxplot(
    width = .3, aes(alpha = .5, color = sex),
    lwd = 0.8, position = position_dodge(width = 0.6)
  )  
  facet_grid(~setting)  
  stat_summary(
    fun = median, geom = "point", shape = 23, size = 2,
    position = position_dodge(width = 0.6)
  )  
  geom_line(data = data_line,
    aes(x = x_num, group = sex),
    colour = "red"
  ) 
  geom_text(aes(y = -2.5, label = series), position = position_dodge(width = 0.6))  
  geom_point(shape = 20, alpha = 0.2, position = position_jitterdodge(dodge.width = 0.6, jitter.width = 0.25))
#> Warning: `position_dodge()` requires non-overlapping x intervals
#> `position_dodge()` requires non-overlapping x intervals

CodePudding user response:

This is a similar approach to Stefan's, but with the calculations done on-the-fly within ggplot:

library(ggplot2)

n <- 60
data <- data.frame(series=rep(LETTERS[1:3], n/3), 
                   sex=rep(c("F","M"),each=30), 
                   setting=rep(c("wild","rural"),n/2),
                   fit=rnorm(n))

ggplot(data, aes(x = as.numeric(interaction(series, sex))  
                   1.5 * ((as.numeric(interaction(series, sex)) - 1) %/% 3), 
                 y = fit, fill = series))  
  geom_boxplot(width = 0.5, aes(color = sex), alpha = 0.5)   
  geom_point(fun = median, stat = "summary", shape = 23, size = 2)  
  geom_text(aes(y = -2.5, label = series))  
  geom_point(shape = 20, alpha = 0.2, position = position_jitter(0.25))  
  geom_line(stat = "summary", fun = median, aes(group = sex))  
  facet_grid(~setting)  
  scale_x_continuous(NULL, breaks = c(2, 6.5), labels = c("F", "M"),
                     limits = c(0, 8.5))  
  ggnetwork::theme_blank()  
  theme(axis.text.x.bottom = element_text(), strip.background = element_blank())

Created on 2022-12-14 with reprex v2.0.2

  • Related