Calculating MSE, RMSE with a certain range of data frame rows until the end of the data frame-CodePudding

I have a data frame df which has two column True and Prediction while the data frame has 1000 rows. I want to calculate MSE and RMSE using function from sklearn mean_squared_error(y_test, y_pred) . But I want to keep calculating them in a pattern such that, the 1st MSE will be calculated on the first 20 rows of True and prediction column values. Then next MSE will be on 21-40 row values from the True and Prediction column. Thus I want to calculate a bunch of MSE and RMSE taking every 20 rows consecutively from the total 1000 rows and arranging them in a data frame. I am not being able to find loop condition for that. I tried for i in range(0,len(df),20) but its not working. How could I solve this? For example, the data frame is

>df
    True  Prediction
0     5      5
1     6      4
2     7      2
3     2      3
..
1000  1      3

The output will be a data frame like this

   MSE   RMSE
0  1.5   2.5
1   1    0.5
2   1    1.2
...
50  2    3.7

As each MSE and RMSE will be based on 20 rows of True and Prediction value, there will be only 50 rows in the new data frame of MSE and RMSE

CodePudding user response：

This should work for MSE:

mse = df.groupby(lambda i: int(i / 20)).agg({'True': lambda x: list(x), 'Prediction': lambda x: list(x)})
mse = mse.apply(lambda r: mean_squared_error(r['True'], r['Prediction']), axis=1)
mse.head(5)

See live implementation here

CodePudding user response：

Just want to let know that later this code also kind of work at an extent but I am confused.

x=0
for i in range(1000/20):
    df_x=df.iloc[x:(20 x), :]
    k=mean_squared_error(df_x['True'] ,df_x['Prediction'])
    x=(20 x)
    print(f'The MSE of prediction is: {k}')