I have some data that includes information about the width and weight of a certain species of fish. I'm using a linear regression model to predict the weight input width, and I wanted to compute the mean squared error of the model.
If I use the function mean_squared_error provided by the Scikit-learn library with the values of the y_test list and the values of the predictions, like so:
mse = metrics.mean_squared_error(y_test, preds)
I get a mean squared error of about 5679.0812, which is super high. However, if I normalize the values of both arrays before computing the MSE I get a much more acceptable value of about 7.3843e-05.
Is it a good practice to normalize the values before computing the MSE?
Thanks.
CodePudding user response:
It is a good practice to normalize before you train the algorithm.
Normalizing the mean square error is manipulating the results actually. Your are not getting a better score in that way.
CodePudding user response:
short answer is you shouldn't standardize your data when it comes to linear regression, especially not the y values, as you are changing the error scale.
the square root of the MSE is the standard error of prediction, which is an estimate of the average error you will get when you use this model in real life, however when you are normalizing the y values, you are simply scaling your standard error of prediction, so it loses its meaning.