I've written a user defined function to calculate the RMSE for the predicted results from my model, the function code is this:
def rmse(result):
forecast = result.forecast(point)
t = test['X']
y = forecast
mse=np.mean((t-y)**2)
return np.sqrt(mse)
'point' is just the test-train split index number that i've defined earlier (it's 20 in the current program).
So, the problem is that whenever I train and fit my model and pass that to the function, but the function always keeps returning a NaN result.
The test values are as such:
YEAR
2000 11327.2
2001 10494.8
2002 10863.3
2003 15471.8
2004 11689.8
2005 12620.2
2006 11500.0
2007 11529.3
2008 13736.2
2009 10428.8
2010 11000.0
2011 12250.6
2012 11085.1
2013 15585.5
2014 13348.4
2015 12000.0
2016 11490.1
2017 12793.2
2018 10421.8
2019 14761.3
and the predicted values are these:
100 13369.005272
101 14896.559807
102 13576.774285
103 13808.247991
104 13464.385955
105 14945.066492
106 12996.661601
107 14605.002956
108 14698.833373
109 14142.829314
110 14939.133219
111 13950.538418
112 13993.636520
113 15191.622044
114 14067.356824
115 15013.353349
116 15184.201982
117 14713.896434
118 14801.679892
119 14986.230462
Both 't' and 'y' in the function have a datatype of float64.
I have no idea why it's returning NaN, I've even tried to only return the mse (without sqrt) and the simple mean error (with no squares) to no avail. Any help would be appreciated.
CodePudding user response:
Simply replace:
np.mean((t-y)**2)
With:
np.mean((t.values-y.values)**2)
Assuming t and y are both are pandas series.
CodePudding user response:
As Mustafa Aydin pointed out above in the comments, the indexing was the reason behind the issue, once that was fixed, the function properly returned the RMSE values.