I have a dataset like this:
Year Type Return
1900 A 2
1900 B 4
1901 A 7
1901 A 9
1901 B 6
1901 B 5
1903 B 5
1906 A 5
I have yearly information about two types and the correspondent return. It can have more than one type per year, but can also repeat a type in the same year. It can also have an year with just one type, and some missing years.
I would like to plot a line graph with colors (maybe ggplot) that shows the evolution of the returns of A and B throughout time. (x axis Year, y axis Return). When there is more than one piece of information in a year (such as in 1901 when we have two A's) we should average the returns (for A: mean of 7 and 9).
The real database has >10k lines of info.
Bonus question: it would be great if I could also have a separate version that instead of averaging the returns per year, sums the returns in each year (for A: 7 9)
Thanks!
CodePudding user response:
You may try
library(dplyr)
library(ggplot2)
dummy <- read.table(text = "Year Type Return
1900 A 2
1900 B 4
1901 A 7
1901 A 9
1901 B 6
1901 B 5
1903 B 5
1906 A 5", header = T)
dummy %>%
dplyr::group_by(Year, Type) %>%
dplyr::summarize(m = mean(Return),
s = sum(Return)) %>%
ggplot(aes(color = Type))
geom_line(aes(Year, m))
geom_line(aes(Year, s), linetype = 2)
dummy1 <- dummy %>%
dplyr::group_by(Year, Type) %>%
dplyr::summarize(m = mean(Return),
s = sum(Return))
Mean
dummy1 %>%
ggplot(aes(color = Type))
geom_line(aes(Year, m))
Sum
dummy1 %>%
ggplot(aes(color = Type))
geom_line(aes(Year, s))
Bar
dummy1 %>%
ggplot(aes(Year, s,fill = Type))
geom_col(stat = "identity")
Bar x axis
dummy1 %>%
ggplot(aes(Year, s,fill = Type))
geom_col(stat = "identity")
scale_x_continuous(breaks = seq(min(dummy1$Year), max(dummy1$Year)))