Home > Software design >  Cumulative mean line chart in ggplot
Cumulative mean line chart in ggplot

Time:09-22

Clarification: I want to count the number of events that has occurred for every individual up until every age, group by gender and calculate the mean. So for example, there are four women in the data set. At age 20 there are 4 events for all women in total and at age 21 there are 6 events in total. Since there are 4 women in the data set, the cumulative mean (if that is the word for it) at age 20 is 1, at age 21 the mean is 1.5 etc. This is what I want to visualise in a line chart.

I have data that look like this:

set.seed(123)

id <- rep(1:8, each = 5)
female <- (c(rep(1,20), rep(0, 20)))
age <- c(rep(20:24, 8))

dat <- data.frame(id, female, age)

dat$event <- sample(rep(c(1, 0), nrow(dat) / 2))

Where the id column represents a unique individual who is a women if female equals 1. Event equals 1 if it occurred and 0 if it didn't occur.

I want to create a line chart with the cumulative mean of events occurring at every age grouped by gender. That is I want to visualise the cumulative number of events experienced by men at age 20,...,24 on average and by women 20,...,24 on average.

I hope it makes sense and is possible to achieve.

CodePudding user response:

I suppose this is what you want - calculating the means for the groups via dplyr.

library(dplyr)
library(ggplot2)

set.seed(123)

id <- rep(1:8, each = 5)
female <- (c(rep(1,20), rep(0, 20)))
age <- c(rep(20:24, 8))

dat <- data.frame(id, female, age)

dat$event <- sample(rep(c(1, 0), nrow(dat) / 2))

dat <- dat %>% group_by(female, age) %>%                        
  summarise_at(vars(event),
               list(mean = mean))

ggplot(dat, aes(x=age, y=mean, group=female))  
  geom_line(aes(linetype=as.character(female)))  
  geom_point()

enter image description here

CodePudding user response:

I hope i got you correct

library(dplyr)
library(ggplot2)

set.seed(123)

id <- rep(1:8, each = 5)
female <- (c(rep(1,20), rep(0, 20)))
age <- c(rep(20:24, 8))

dat <- data.frame(id, female, age)

dat$event <- sample(rep(c(1, 0), nrow(dat) / 2))
    
dats <- dat %>%
  arrange(age)%>%
  group_by(age, female)%>%
  mutate(cumul = cumsum(event))%>%
  summarise(mean(cumul))
  

ggplot(dats, aes(x = age, y =dats$`mean(cumul)`, group = female)) 
  geom_line(aes(linetype=as.character(female)))

enter image description here

  • Related