Home > Enterprise >  Plot time series of different years together
Plot time series of different years together

Time:10-22

I am trying to compare different years' variables but I am having trouble plotting them together. The time series is a temperature series which can be found in enter image description here

If someone can give some advice or point me in the right direction I would really appreciate it

CodePudding user response:

You can try this way.

The first chart shows all the available temperatures, the second chart is aggregated by month.

In the first chart, we force the same year so that ggplot will plot them aligned, but we separate the lines by colour.

For the second one, we just use month as x variable and year as colour variable.

Note that:

  • with scale_x_datetime we can hide the year so that no one can see that we forced the year 2020 to every observation
  • with scale_x_continous we can show the name of the months instead of the numbers

[just try to run the charts with and without scale_x_... to understand what I'm talking about]

month.abb is a useful default variable for months names.

# read data
df <- readr::read_csv2("https://raw.githubusercontent.com/gonzalodqa/timeseries/main/temp.csv")


# libraries
library(ggplot2)
library(dplyr)


# line chart by datetime
df %>% 
  # make datetime: force unique year
  mutate(datetime = lubridate::make_datetime(2020, month, day, hour, minute, second)) %>% 
  
  ggplot()  
  geom_line(aes(x = datetime, y = T42, colour = factor(year)))  
  scale_x_datetime(breaks = lubridate::make_datetime(2020,1:12), labels = month.abb)  
  labs(title = "Temperature by Datetime", colour = "Year")

# line chart by month
df %>% 
  
  # average by year-month
  group_by(year, month) %>% 
  summarise(T42 = mean(T42, na.rm = TRUE), .groups = "drop") %>% 
  
  ggplot()  
  geom_line(aes(x = month, y = T42, colour = factor(year)))  
  scale_x_continuous(breaks = 1:12, labels = month.abb, minor_breaks = NULL)  
  labs(title = "Average Temperature by Month", colour = "Year")


In case you want your chart to start from July, you can use this code instead:

months_order <- c(7:12,1:6)

# line chart by month
df %>% 
  
  # average by year-month
  group_by(year, month) %>% 
  summarise(T42 = mean(T42, na.rm = TRUE), .groups = "drop") %>% 
    
  # create new groups starting from each July
  group_by(neworder = cumsum(month == 7)) %>% 
    
  # keep only complete years
  filter(n() == 12) %>% 
    
  # give new names to groups
  mutate(years = paste(unique(year), collapse = " / ")) %>% 
  ungroup() %>% 
  
  # reorder months
  mutate(month = factor(month, levels = months_order, labels = month.abb[months_order], ordered = TRUE)) %>% 
      
  # plot
  ggplot()  
  geom_line(aes(x = month, y = T42, colour = years, group = years))  
  labs(title = "Average Temperature by Month", colour = "Year")


EDIT

To have something similar to the first plot but starting from July, you could use the following code:

# libraries
library(ggplot2)
library(dplyr)
library(lubridate)


# custom months order
months_order <- c(7:12,1:6)

# fake dates for plot
# note: choose 4 to include 29 Feb which exist only in leap years
dates <- make_datetime(c(rep(3,6), rep(4,6)), months_order)

# line chart by datetime
df %>%
  
  # create date time
  mutate(datetime = make_datetime(year, month, day, hour, minute, second)) %>%
  
  # filter years of interest
  filter(datetime >= make_datetime(2018,7), datetime < make_datetime(2020,7)) %>%
  
  # create increasing group after each july
  group_by(year, month) %>%
  mutate(dummy = month(datetime) == 7 & datetime == min(datetime)) %>%
  ungroup() %>%
  mutate(dummy = cumsum(dummy)) %>%
  
  # force unique years and create custom name
  group_by(dummy) %>%
  mutate(datetime = datetime - years(year - 4) - years(month>=7),
         years = paste(unique(year), collapse = " / ")) %>%
  ungroup() %>%
  
  # plot
  ggplot()  
  geom_line(aes(x = datetime, y = T42, colour = years))  
  scale_x_datetime(breaks = dates, labels = month.abb[months_order])  
  labs(title = "Temperature by Datetime", colour = "Year")

CodePudding user response:

To order month differently and sum up the values in couples of years, you've to work a bit with your data before plotting them:

library(dplyr)     # work data
library(ggplot2)   # plots
library(lubridate) # date
library(readr)     # fetch data

# your data
df <- read_csv2("https://raw.githubusercontent.com/gonzalodqa/timeseries/main/temp.csv")


  df %>%
  mutate(date = make_date(year, month,day)) %>%
  # reorder month
  group_by(month_2 = factor(as.character(month(date, label = T, locale = Sys.setlocale("LC_TIME", "English"))),
                            levels = c('Jul','Aug','Sep','Oct','Nov','Dec','Jan','Feb','Mar','Apr','May','Jun')),
           # group years as you like
           year_2   = ifelse( year(date) %in% (2018:2019), '2018/2019', '2020/2021')) %>%
  # you can put whatever aggregation function you need
  summarise(val = mean(T42, na.rm = T)) %>%
  # plot it!
  ggplot(aes(x = month_2, y = val, color = year_2, group = year_2))   
  geom_line()     
  ylab('T42')    
  xlab('month')   
  theme_light()

enter image description here

  • Related