I have a data frame df
which has two column True
and Prediction
while the data frame has 1000 rows. I want to calculate MSE and RMSE using function from sklearn mean_squared_error(y_test, y_pred)
. But I want to keep calculating them in a pattern such that, the 1st MSE will be calculated on the first 20 rows of True
and prediction
column values. Then next MSE will be on 21-40 row values from the True
and Prediction
column. Thus I want to calculate a bunch of MSE and RMSE taking every 20 rows consecutively from the total 1000 rows and arranging them in a data frame. I am not being able to find loop condition for that. I tried for i in range(0,len(df),20)
but its not working. How could I solve this?
For example, the data frame is
>df
True Prediction
0 5 5
1 6 4
2 7 2
3 2 3
..
1000 1 3
The output will be a data frame like this
MSE RMSE
0 1.5 2.5
1 1 0.5
2 1 1.2
...
50 2 3.7
As each MSE and RMSE will be based on 20 rows of True and Prediction value, there will be only 50 rows in the new data frame of MSE and RMSE
CodePudding user response:
This should work for MSE:
mse = df.groupby(lambda i: int(i / 20)).agg({'True': lambda x: list(x), 'Prediction': lambda x: list(x)})
mse = mse.apply(lambda r: mean_squared_error(r['True'], r['Prediction']), axis=1)
mse.head(5)
See live implementation here
CodePudding user response:
Just want to let know that later this code also kind of work at an extent but I am confused.
x=0
for i in range(1000/20):
df_x=df.iloc[x:(20 x), :]
k=mean_squared_error(df_x['True'] ,df_x['Prediction'])
x=(20 x)
print(f'The MSE of prediction is: {k}')