Home > Enterprise >  Plot line graph with different colors of specific average values of a column
Plot line graph with different colors of specific average values of a column

Time:11-03

I have a dataset like this:

Year  Type Return
1900   A   2
1900   B   4
1901   A   7
1901   A   9
1901   B   6
1901   B   5
1903   B   5
1906   A   5

I have yearly information about two types and the correspondent return. It can have more than one type per year, but can also repeat a type in the same year. It can also have an year with just one type, and some missing years.

I would like to plot a line graph with colors (maybe ggplot) that shows the evolution of the returns of A and B throughout time. (x axis Year, y axis Return). When there is more than one piece of information in a year (such as in 1901 when we have two A's) we should average the returns (for A: mean of 7 and 9).

The real database has >10k lines of info.

Bonus question: it would be great if I could also have a separate version that instead of averaging the returns per year, sums the returns in each year (for A: 7 9)

Thanks!

CodePudding user response:

You may try

library(dplyr)
library(ggplot2)

dummy <- read.table(text = "Year  Type Return
1900   A   2
1900   B   4
1901   A   7
1901   A   9
1901   B   6
1901   B   5
1903   B   5
1906   A   5", header = T)

dummy %>%
  dplyr::group_by(Year, Type) %>%

  dplyr::summarize(m = mean(Return),
            s = sum(Return)) %>%
  ggplot(aes(color = Type))  
  geom_line(aes(Year, m))  
  geom_line(aes(Year, s), linetype = 2)

enter image description here

dummy1 <-   dummy %>%
  dplyr::group_by(Year, Type) %>%
  
  dplyr::summarize(m = mean(Return),
                   s = sum(Return))

Mean

dummy1 %>%
  ggplot(aes(color = Type))  
  geom_line(aes(Year, m))

Sum

dummy1 %>%
  ggplot(aes(color = Type))  
  geom_line(aes(Year, s))

Bar

dummy1 %>%
  ggplot(aes(Year, s,fill = Type))  
  geom_col(stat = "identity")

enter image description here

Bar x axis

dummy1 %>%
  ggplot(aes(Year, s,fill = Type))  
  geom_col(stat = "identity")  
  scale_x_continuous(breaks = seq(min(dummy1$Year), max(dummy1$Year)))

enter image description here

  • Related