Home > OS >  How do I find the structural breaks in this time series data?
How do I find the structural breaks in this time series data?

Time:08-16

I have yearly average closing values for an asset in a dataframe, and I need to find the structural breaks in the time series. I intended to do this using the stats model 'season_decompose' method but I am having trouble implementing it.

Example data below

from statsmodels.tsa.seasonal import seasonal_decompose

data = {'Year':['1991','1992','1993','1994','1995','1996','1997','1998','1999','2000','2001','2002','2003','2004'],'Close':[11,22,55,34,447,85,99,86,83,82,81,34,33,36]}
df = pd.DataFrame(data)

df['Year'] = df['Year'].astype(str)
sd = seasonal_decompose(df)
plt.show()

ValueError: You must specify a period or x must be a pandas object with a DatetimeIndex with a freq not set to None

When I change the 'Year' column to date time, I get the following issue:

TypeError: float() argument must be a string or a number, not 'Timestamp'

I do not know what the issue is. I have no missing values? Secondary to this, does anybody know a more efficient method to identify structural breaks in time series data?

Thanks

CodePudding user response:

The problem is that you need to set column Year as the index after converting the string values to datetime (from the ValueError message: a pandas object with a DatetimeIndex).

So, e.g.:

from statsmodels.tsa.seasonal import seasonal_decompose
import pandas as pd

data = {'Year':['1991','1992','1993','1994','1995','1996','1997','1998','1999','2000','2001','2002','2003','2004'],'Close':[11,22,55,34,447,85,99,86,83,82,81,34,33,36]}
df = pd.DataFrame(data)

df['Year'] = pd.to_datetime(df['Year'])
df.set_index('Year', drop=True, inplace=True)
sd = seasonal_decompose(df)
sd.plot()

Plot:

plot sd

  • Related