Home > Enterprise >  Python function for RMSE keeps returning NaN
Python function for RMSE keeps returning NaN

Time:09-23

I've written a user defined function to calculate the RMSE for the predicted results from my model, the function code is this:

def rmse(result):
 forecast = result.forecast(point)  
 t = test['X']
 y = forecast
 mse=np.mean((t-y)**2)
 return np.sqrt(mse)

'point' is just the test-train split index number that i've defined earlier (it's 20 in the current program).

So, the problem is that whenever I train and fit my model and pass that to the function, but the function always keeps returning a NaN result.

The test values are as such:

YEAR
2000    11327.2
2001    10494.8
2002    10863.3
2003    15471.8
2004    11689.8
2005    12620.2
2006    11500.0
2007    11529.3
2008    13736.2
2009    10428.8
2010    11000.0
2011    12250.6
2012    11085.1
2013    15585.5
2014    13348.4
2015    12000.0
2016    11490.1
2017    12793.2
2018    10421.8
2019    14761.3

and the predicted values are these:

100    13369.005272
101    14896.559807
102    13576.774285
103    13808.247991
104    13464.385955
105    14945.066492
106    12996.661601
107    14605.002956
108    14698.833373
109    14142.829314
110    14939.133219
111    13950.538418
112    13993.636520
113    15191.622044
114    14067.356824
115    15013.353349
116    15184.201982
117    14713.896434
118    14801.679892
119    14986.230462

Both 't' and 'y' in the function have a datatype of float64.

I have no idea why it's returning NaN, I've even tried to only return the mse (without sqrt) and the simple mean error (with no squares) to no avail. Any help would be appreciated.

CodePudding user response:

Simply replace:

np.mean((t-y)**2)

With:

np.mean((t.values-y.values)**2)

Assuming t and y are both are pandas series.

CodePudding user response:

As Mustafa Aydin pointed out above in the comments, the indexing was the reason behind the issue, once that was fixed, the function properly returned the RMSE values.

  • Related