Home > database >  How to create ggplot box plot which add data over time
How to create ggplot box plot which add data over time

Time:10-19

I have a dataframe, which gives values for different courses over a series of weeks.

     Course Week        m
1   UGS200H    1 44.33333
2   CMSE201    1 73.66667
3   CMSE201    2 88.16667
4   CMSE201    2 88.16667
5   PHY215     2 73.66667
6   PHY215     3 86.33333
7   CMSE201    3 84.00000
8   UGS200H    4 60.66667
9   UGS200H    4 76.66667

I would like to create a series of box plots which plot m values over the weeks for each course. I would like for the box plots to build off of each other though, such that week 1 contains only the data from Week = 1, but week 2 contains data including data from Week = 1 and 2, and week 3 includes data from Week = 1,2,3 and etc. I have create the following code which creates the box plots but without the building up over the weeks.

d <- subset(data_manual)
a <- ggplot(data=d, aes(x=(Week), fill = Course, y=(m), group=interaction(Course, Week))) 
  geom_boxplot() 
  scale_y_continuous(limits = c(-2, 100), breaks = seq(0, 100, by = 20)) 
  xlab('Week')  
  ylab('Course-Level SE')  
print(a) #show us the plot!!
  }

This gives plots like thisenter image description here

But these are just individual weeks, not the summed version that I would like. Is there a way to have them build and plot the multiple weeks on one plot?

CodePudding user response:

How about this:

# dat <- tibble::tribble(
#   ~Course, ~Week,        ~m,
# "UGS200H",    1, 44.33333,
# "CMSE201",    1, 73.66667,
# "CMSE201",    2, 88.16667,
# "CMSE201",    2, 88.16667,
# "PHY215",     2, 73.66667,
# "PHY215",     3, 86.33333,
# "CMSE201",    3, 84.00000,
# "UGS200H",    4, 60.66667,
# "UGS200H",    4, 76.66667)

dat <- data.frame(
  Course = rep(c("A", "B", "C"), each=1000), 
  Week = rep(rep(1:10, each=100), 3), 
  m = runif(3000, 50, 100)
)

library(ggplot2)

dats <- lapply(1:max(dat$Week), \(i){
  tmp <- subset(dat, Week <= i)
  tmp$plot_week <- i
  tmp})

dats <- do.call(rbind, dats)
table(dat$Week)
#> 
#>   1   2   3   4   5   6   7   8   9  10 
#> 300 300 300 300 300 300 300 300 300 300
table(dats$plot_week)
#> 
#>    1    2    3    4    5    6    7    8    9   10 
#>  300  600  900 1200 1500 1800 2100 2400 2700 3000

ggplot(data=dats, aes(x=as.factor(plot_week), fill = Course, y=(m), group=interaction(Course, plot_week)))    
  geom_boxplot() 
  scale_y_continuous(limits = c(-2, 100), breaks = seq(0, 100, by = 20)) 
  xlab('Week')  
  ylab('Course-Level SE')

Created on 2022-10-18 by the reprex package (v2.0.1)

CodePudding user response:

Basically the same idea as by @DaveArmstrong but using lapply with multiple geom_boxplots.

Note 1: To make the example a bit more realistic I use some random fake example data.

Note 2: I added an additional geom_point just to check that the number of obs. is actually increasing for each week.

set.seed(123)

d <- data.frame(
  Course = rep(c("UGS200H", "CMSE201", "PHY215"), each = 40),
  Week = rep(1:4, 30),
  m = runif(120, 40, 100)
)

library(ggplot2)

ggplot(data=d, aes(x = factor(Week), fill = Course, y=m))  
  lapply(unique(d$Week), function(x) {
    list(
      geom_boxplot(data = subset(d, Week <= x) |> transform(Week = x), position = "dodge"),
      geom_point(data = subset(d, Week <= x) |> transform(Week = x), position = position_dodge(.9), alpha = .2)
    )
  })  
  labs(x = 'Week', y = 'Course-Level SE')

  • Related