I have a dataframe, which gives values for different courses over a series of weeks.
Course Week m
1 UGS200H 1 44.33333
2 CMSE201 1 73.66667
3 CMSE201 2 88.16667
4 CMSE201 2 88.16667
5 PHY215 2 73.66667
6 PHY215 3 86.33333
7 CMSE201 3 84.00000
8 UGS200H 4 60.66667
9 UGS200H 4 76.66667
I would like to create a series of box plots which plot m
values over the weeks for each course. I would like for the box plots to build off of each other though, such that week 1 contains only the data from Week
= 1, but week 2 contains data including data from Week
= 1 and 2, and week 3 includes data from Week
= 1,2,3 and etc. I have create the following code which creates the box plots but without the building up over the weeks.
d <- subset(data_manual)
a <- ggplot(data=d, aes(x=(Week), fill = Course, y=(m), group=interaction(Course, Week)))
geom_boxplot()
scale_y_continuous(limits = c(-2, 100), breaks = seq(0, 100, by = 20))
xlab('Week')
ylab('Course-Level SE')
print(a) #show us the plot!!
}
But these are just individual weeks, not the summed version that I would like. Is there a way to have them build and plot the multiple weeks on one plot?
CodePudding user response:
How about this:
# dat <- tibble::tribble(
# ~Course, ~Week, ~m,
# "UGS200H", 1, 44.33333,
# "CMSE201", 1, 73.66667,
# "CMSE201", 2, 88.16667,
# "CMSE201", 2, 88.16667,
# "PHY215", 2, 73.66667,
# "PHY215", 3, 86.33333,
# "CMSE201", 3, 84.00000,
# "UGS200H", 4, 60.66667,
# "UGS200H", 4, 76.66667)
dat <- data.frame(
Course = rep(c("A", "B", "C"), each=1000),
Week = rep(rep(1:10, each=100), 3),
m = runif(3000, 50, 100)
)
library(ggplot2)
dats <- lapply(1:max(dat$Week), \(i){
tmp <- subset(dat, Week <= i)
tmp$plot_week <- i
tmp})
dats <- do.call(rbind, dats)
table(dat$Week)
#>
#> 1 2 3 4 5 6 7 8 9 10
#> 300 300 300 300 300 300 300 300 300 300
table(dats$plot_week)
#>
#> 1 2 3 4 5 6 7 8 9 10
#> 300 600 900 1200 1500 1800 2100 2400 2700 3000
ggplot(data=dats, aes(x=as.factor(plot_week), fill = Course, y=(m), group=interaction(Course, plot_week)))
geom_boxplot()
scale_y_continuous(limits = c(-2, 100), breaks = seq(0, 100, by = 20))
xlab('Week')
ylab('Course-Level SE')
Created on 2022-10-18 by the reprex package (v2.0.1)
CodePudding user response:
Basically the same idea as by @DaveArmstrong but using lapply
with multiple geom_boxplot
s.
Note 1: To make the example a bit more realistic I use some random fake example data.
Note 2: I added an additional geom_point
just to check that the number of obs. is actually increasing for each week.
set.seed(123)
d <- data.frame(
Course = rep(c("UGS200H", "CMSE201", "PHY215"), each = 40),
Week = rep(1:4, 30),
m = runif(120, 40, 100)
)
library(ggplot2)
ggplot(data=d, aes(x = factor(Week), fill = Course, y=m))
lapply(unique(d$Week), function(x) {
list(
geom_boxplot(data = subset(d, Week <= x) |> transform(Week = x), position = "dodge"),
geom_point(data = subset(d, Week <= x) |> transform(Week = x), position = position_dodge(.9), alpha = .2)
)
})
labs(x = 'Week', y = 'Course-Level SE')