Home > Mobile >  ggplot cumulative sum of y variable per line
ggplot cumulative sum of y variable per line

Time:08-24

Here's a sample of the relevant part of my data:

Name Year.published Name.meaning
a 2021 Location
b 2021 Location
c 2019 Location
d 2015 Location
e 2015 Person
f 2015 Conceptual
g 2010 Location

I have used the following code to create the following graph, and it seems to have plotted n per year, rather than cumulatively plotting n per year where it adds up over time.

I tried to add cumsum() to the y variable to get a cumulative count of n, but this seems to produce a cumulative sum across all the lines (second image), making each line go up to around 70 - which is way too high for each group total.

library(ggplot2)
library(dplyr)
df = paperstats %>% dplyr::count(Year.published, Name.meaning)
ggplot(df,
       aes(x = Year.published,
           y = n,
           color = Name.meaning))    geom_line()

enter image description here

enter image description here

Is there any way to apply cumsum() per line, or an alternative method of counting items per group over time in a line plot?

CodePudding user response:

I think you need to group on Name.meaning, arrange by Name.meaning and Year.published, then calculate the cumsum before plotting.

It's difficult to test without seeing your data, but something like this:

paperstats %>%
  count(Year.published, Name.meaning) %>% 
  group_by(Name.meaning) %>% 
  arrange(Name.meaning, Year.published) %>% 
  mutate(cs = cumsum(n)) %>% 
  ungroup() %>% 
  ggplot(aes(Year.published, cs))   
  geom_line(aes(color = Name.meaning))
  • Related