R forecast flatline-CodePudding

I want to create a forecasting plot via ggplot2. I need some guidance with the output plot. The code works fine, it's just that I was expecting the blue forecast line to have wiggles similar to the past data. However, the plot is returning a flat forecast line.

What could be the reason behind this flat forecast line (which I guess means no change in future temperature) or am I interpreting it wrong?

All the examples of forecasting in R I have seen so far show forecast lines with wiggles, thus I am confused as to why my plot is different.

Purpose is the forecast the next 10 years.

Data (AvgTMeanYear):

structure(list(year = 1980:2021, AvgTMean = c(24.2700686838937, 
23.8852956598276, 25.094446596092, 24.1561175050287, 24.157183605977, 
24.3047482638362, 24.7899738481466, 24.5756232655603, 24.5833086228592, 
24.7344695534483, 25.3094451071121, 25.2100615173707, 24.3651692293534, 
24.5423890611494, 25.2492166633908, 24.7005097837931, 24.2491591827443, 
25.0912281781322, 25.0779264303305, 24.403294248319, 24.4983991453592, 
24.4292324356466, 24.8179824927011, 24.7243948463075, 24.5086534543966, 
24.2818632071983, 24.4567195220259, 24.8402224356034, 24.6574465515086, 
24.5440715673563, 23.482670620977, 24.9979594684914, 24.5452453980747, 
24.9271462811494, 24.7443215819253, 25.8929839790805, 25.1801908261063, 
25.2079308058908, 25.0722425561207, 25.4554644289799, 25.4548979078736, 
25.0756772250287)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, 
-42L))

Code:

library(tidyverse)
library(forecast)

p = ggplot(aes(x = year, y = AvgTMean), data = AvgTMeanYear)
p = p   geom_line()
p   geom_forecast()

Output:

CodePudding user response：

Using the example in the docs, we can see the 'wiggly' forecast to which you refer:

autoplot(USAccDeaths)   geom_forecast()

The reason the line is wiggly is because of the underlying periodicity in the data, which is stored in the attributes of the time series to be predicted:

frequency(USAccDeaths)
#> [1] 12

The data you are using has no periodicity stored, since it is annual data. Unless you are testing a hypothesis that there is some periodic change over the years, you would be as well using a simple linear regression for your forecast.

ggplot(AvgTMeanYear, aes(year, AvgTMean))   
  geom_smooth(method = 'lm', fullrange = TRUE)  
  annotate('rect', xmin = -Inf, xmax = 2021, ymin = -Inf, ymax = Inf,
           fill = 'gray92')  
  geom_vline(xintercept = seq(1980, 2020, 5), color = 'white')  
  geom_hline(yintercept = seq(23.5, 25.5, 0.5), color = 'white')  
  geom_line()  
  scale_x_continuous(limits = c(1980, 2030))

CodePudding user response：

The wiggles that you see in forecast examples are due to seasonality. Monthly, daily, and weekly time series often have seasonality. The sales of sunblock is higher during the summer, and possibly higher on Saturdays than Mondays.

Yearly data is unlikely to have seasonality. It might if it's tied to something like a presidential election cycle (4 years) or cicado broods (13 years) that occurs regularly over multiple years. I don't imagine that's the case for your data.

In order to ask the forecast package to detect seasonality and try to include it in an arima or ets model, you have to specify the frequency of the seasonality in the time series object. Declaring a time series object with ts can include a frequency = 12 argument to specify a monthly time series with annual seasonality for example. forecast::ets or forecast::auto.arima will detect if including a seasonal term at that frequency improves the model information criteria.

In your problem, you'd have to specify a frequency as well. The forecast package will not automatically test for frequency. You could specify frequency = 10 if you feel like there's a pattern over 10 year periods (probably not the case!). You can look at plots of the autocorrelation function to visually detect seasonality.

There is nothing wrong with a forecast that has no seasonality. Perhaps that's the best forecast possible given the available information.

However, I think it's safe to generalize that for time series with a lot of variation, no detectable seasonality, and no significant exogenous variables, the forecast will be linear and not very accurate.

Example: monthly time series with annual seasonality and seasonal ets model forecast

library(tidyverse)
library(forecast)
#> Registered S3 method overwritten by 'quantmod':
#>   method            from
#>   as.zoo.data.frame zoo

dat   <- sin((1:48)*2*pi/12)   5   runif(48)
model <- ets(ts(dat, frequency = 12, start = c(2017, 1)))
fcst  <- forecast(model, 12)
autoplot(fcst)

^{Created on 2022-06-28 by the reprex package (v2.0.1)}