Should the data preparation phase for ML include both: fitting data to right distribution followed b-CodePudding

I have used distfit library to find the best distribution that will fit my data to avoid skewness. Let us say, I have transformed my data into Normal distribution using the boxcox method.

After this, shall I scale my data, for example, using Robust Scaler that handles outliers very well.

I am confused that I should be following both the steps or just one.

Not sure, if I am heading in the right direction in the data prep phase. please share your thoughts on this. Thanks!

CodePudding user response：

You might or might have to do scaling after Normalization.

Answer depends on what are we doing to this data. e.g. Are we planing to fit some model? or anything else?

One concrete example is:

If want to train our model for Neural Networks, then let see:

For faster convergence of training: We should have mean= 0 and sigma=1 (Normalization needed)
For effective regularization, you mush have all the data features at similar scale. (Scaling needed)

On contrast, if you want to fit say Decision Tree, then neither of these things are needed.

So, it all boils down to what we have to do after processing the data.