Home > Back-end >  How to add number of valid observations of each group at each timepoint to my linechart
How to add number of valid observations of each group at each timepoint to my linechart

Time:05-05

I have the evolution of the mean values for two groups. But as the number of valid observations changes at each timepoint, I want to add to the graph at each timepoint the number of valid values for each group. The aim is to make the reader see that the mean over time are not calculated on the same number of individuals

mydata<-data.frame(
ID=1:10,
groupe=c(rep("A",5),rep("B",5)),
value1=c(50,49,47,46,44,39,37,36,30,30),
value2=c(43,40,42,36,25,37,36,35,30,28),
value3=c(32,30,38,32,NA,34,36,32,27,NA),
value4=c(24,25,30,NA,NA,30,32,28,NA,28),
value5=c(24,22,NA,NA,NA,25,27,NA,NA,NA)
)

library(dplyr)

mydata2<-mydata %>% 
  group_by(groupe) %>% 
  summarise(mean_value1 = mean(value1),
            mean_value2 = mean(value2),
            mean_value3 = mean(value3,na.rm=T),
            mean_value4 = mean(value4,na.rm=T),
            mean_value5 = mean(value5,na.rm=T)
            
            )

mydata2Lg<-mydata2%>%pivot_longer(
  cols = mean_value1 :mean_value5,
  names_to = "time",values_to = "mean",
  names_prefix = "mean_value"
)

mydata2Lg$groupe<-as.factor(mydata2Lg$groupe)


ggplot(mydata2Lg,aes(x=time, y=mean, group=groupe,color=groupe)) 
  geom_line(aes(linetype=groupe),size=1) 
  geom_point(aes(shape=groupe))

I'm sorry for not giving a clear visual indication of what I want. I hope you understand what I mean.

Below the graphic

enter image description here

CodePudding user response:

A typical way to show the uncertainty caused by different sample size is to use error bars or a ribbon to indicate the standard error. This gives a nice visual intuition of the uncertainty introduced by both the spread of the data and the sample size. However, you can also add labels of counts too. You just need to summarize your data appropriately.

For completeness, here is your data represented with both a standard error ribbon and labels of the number of samples at each time point:

library(tidyverse)

mydata %>% 
  pivot_longer(value1:value5) %>%
  group_by(groupe, name) %>%
  summarize(count = sum(!is.na(value)),
            mean = mean(value, na.rm = TRUE),
            sd = sd(value, na.rm = TRUE)) %>%
  mutate(time = as.numeric(gsub("\\D", "", name)),
         upper = mean   sd/sqrt(count),
         lower = mean - sd/sqrt(count)) %>%
  ggplot(aes(time, mean, color = groupe))  
  geom_ribbon(aes(ymin = lower, ymax = upper, fill = groupe), 
              color = NA, alpha = 0.2)  
  geom_point()  
  geom_line()  
  geom_label(aes(label = paste0("n = ", count),
                 y = mean   ifelse(groupe == "A", 1,-1)),
             key_glyph = draw_key_blank)  
  scale_color_manual(values = c("orangered3", "deepskyblue4"))  
  scale_fill_manual(values = c("orangered3", "deepskyblue4"))   
  labs(title = 'Mean values for each group over time \u00B1 standard error',
       subtitle = expression(italic("Labels show sample size at each point")))  
  theme_light(base_size = 16)

enter image description here

  • Related