Here's a sample of the relevant part of my data:
Name | Year.published | Name.meaning |
---|---|---|
a | 2021 | Location |
b | 2021 | Location |
c | 2019 | Location |
d | 2015 | Location |
e | 2015 | Person |
f | 2015 | Conceptual |
g | 2010 | Location |
I have used the following code to create the following graph, and it seems to have plotted n per year, rather than cumulatively plotting n per year where it adds up over time.
I tried to add cumsum() to the y variable to get a cumulative count of n, but this seems to produce a cumulative sum across all the lines (second image), making each line go up to around 70 - which is way too high for each group total.
library(ggplot2)
library(dplyr)
df = paperstats %>% dplyr::count(Year.published, Name.meaning)
ggplot(df,
aes(x = Year.published,
y = n,
color = Name.meaning)) geom_line()
Is there any way to apply cumsum() per line, or an alternative method of counting items per group over time in a line plot?
CodePudding user response:
I think you need to group on Name.meaning
, arrange by Name.meaning
and Year.published
, then calculate the cumsum
before plotting.
It's difficult to test without seeing your data, but something like this:
paperstats %>%
count(Year.published, Name.meaning) %>%
group_by(Name.meaning) %>%
arrange(Name.meaning, Year.published) %>%
mutate(cs = cumsum(n)) %>%
ungroup() %>%
ggplot(aes(Year.published, cs))
geom_line(aes(color = Name.meaning))