I am having trouble with predicting future values with my input set. I am fairly new with statsmodels so I am not sure if it is even possible to do with this much input data.
This is the DataFrame that I am using. (Note: Starts at index 5 since I had to filter some data)
year suicides_no
5 1990 193361
6 1991 198020
7 1992 211473
8 1993 221565
9 1994 232063
10 1995 243544
11 1996 246725
12 1997 240745
13 1998 249591
14 1999 256119
15 2000 255832
16 2001 250652
17 2002 256095
18 2003 256079
19 2004 240861
20 2005 234375
21 2006 233361
22 2007 233408
23 2008 235447
24 2009 243487
25 2010 238702
26 2011 236484
27 2012 230160
28 2013 223199
29 2014 222984
30 2015 203640
From this, id like to get a prediction for the years (2016-2022) and plot it to a graph like this one.
CodePudding user response:
This is a rather open-ended problem. I can certainly show you how you might write some code to make a prediction, but I think discussing how to make a good prediction is beyond the scope of StackOverflow. It will be very dependent on a good understanding of the problem domain.
But with that caveat aside, on with the show. You've suggested you'd like to see a Statsmodel example.
Statsmodels is certainly capable of these sorts of forecasts. There are lots of approaches but yes, you can take a 1D time-series and use it to make future predictions.
There's also a detailed tutorial of state space models here - this is a common approach, or rather, family of approaches. Different state-space models would be used depending on e.g. whether you feel seasonality (cyclic behaviour), or certain exogenous variables (contextual drivers of behaviour) are important or not.
I adapted a simple example from there:
import pandas as pd
import statsmodels as sm
# df = your DataFrame
endog = df.suicides_number
endog.index = pd.period_range("1990", "2015", freq="Y")
# Construct the (very simple) AR model
mod = sm.tsa.SARIMAX(endog, order=(1, 0, 0), trend='c')
# Estimate the parameters
res = mod.fit()
res.forecast(steps=7)
The order
parameter determines exactly what sort of model you get. This is pretty simple, an autoregression model that looks at past behaviour, recent behaviour, and extrapolates forward.
As I said, I cannot guarantee it will give you a good forecast here (it's definitely a reach to take 25 samples forward to predict the next 7), but you could test different parameters and read up on this type of model.