I wonder how MAE loss is optimized with SGD optimizer? I mean how the derivative of absolute values sum is calculated. Is there used any numerical solution or something else?
CodePudding user response:
I've found out that in sklearn.linear_model.SGDRegressor MAE loss is a special case of 'epsilon_insensitive' loss with epsilon equal to 0. And according to source code of this loss we simply apply sign(x) function to difference of ground truth and predicted values in order to calculate derivative.