Im experimenting with time series predictions something like this:
import pandas as pd
from statsmodels.tsa.statespace.sarimax import SARIMAX
model = SARIMAX(data.values,
order=order,
seasonal_order=seasonal_order)
result = model.fit()
train = data.sample(frac=0.8,random_state=0)
test = data.drop(train.index)
start = len(train)
end = len(train) len(test) - 1
# Predictions for one-year against the test set
predictions = result.predict(start, end,
typ='levels')
where predictions is a numpy array. How do I add this to my test
pandas df? If I try this:
test['predicted'] = predictions.tolist()
This wont contact properly where I was hoping to add in the prediction as another column in my df. It looks like this below:
hour
2021-06-07 17:00:00 75726.57143
2021-06-07 20:00:00 62670.06667
2021-06-08 00:00:00 16521.65
2021-06-08 14:00:00 71628.1
2021-06-08 17:00:00 62437.16667
...
2021-09-23 22:00:00 7108.533333
2021-09-24 02:00:00 13325.2
2021-09-24 04:00:00 13322.31667
2021-09-24 13:00:00 37941.65
predicted [13605.31231433516, 12597.907337725523, 13484.... <--- not coming in as another df column
Would anyone have any advice? Am hoping to ultimately plot the predicted values against the test values as well as calculate rsme maybe something like:
from sklearn.metrics import mean_squared_error
from statsmodels.tools.eval_measures import rmse
# Calculate root mean squared error
rmse(test, predictions)
# Calculate mean squared error
mean_squared_error(test, predictions)
EDIT
train = data.sample(frac=0.8,random_state=0)
test = data.drop(train.index)
start = len(train)
end = len(train) len(test) - 1
CodePudding user response:
You should be able to add it as a column directly without needing to do any additional conversion. The output from result.predict()
should be a Pandas series. If not, you should still be able to simply add it directly to the dataframe so long as it's the same length and order.
test = pd.DataFrame({'date': ['01-01-2020', '01-02-2020', '01-03-2020', '01-04-2020', '01-05-2020'],
'value': [15, 25, 35, 45 ,55]}
)
test['date'] = pd.to_datetime(test['date'])
test = test.set_index('date')
predictions = np.array([10,20,30,40,50])
test['predictions'] = predictions
Output:
value predictions
date
2020-01-01 15 10
2020-01-02 25 20
2020-01-03 35 30
2020-01-04 45 40
2020-01-05 55 50