(Stochastic) Gradient Descent implementation in Python-CodePudding

I am trying to do (preferably Stochastic) Gradient Descent to minimize a custom loss function. I tried using scikit learn SGDRegressor class. However, SGDRegressor doesn't seem to allow me to minimize a custom loss function without data, and if I can use custom loss function, I can only use it as regression to fit data with fit() method.

Is there a way to use scikit implementation or any other Python implementation of stochastic gradient descent to minimize a custom function without data?

CodePudding user response：

Implementation of Basic Gradient Descent Now that you know how the basic gradient descent works, you can implement it in Python. You’ll use only plain Python and NumPy, which enables you to write concise code when working with arrays (or vectors) and gain a performance boost.

This is a basic implementation of the algorithm that starts with an arbitrary point, start, iteratively moves it toward the minimum, and returns a point that is hopefully at or near the minimum:

def gradient_descent(gradient, start, learn_rate, n_iter):
    vector = start
    for _ in range(n_iter):
        diff = -learn_rate * gradient(vector)
        vector  = diff
    return vector

gradient_descent() takes four arguments:

gradient is the function or any Python callable object that takes a vector and returns the gradient of the function you’re trying to minimize. start is the point where the algorithm starts its search, given as a sequence (tuple, list, NumPy array, and so on) or scalar (in the case of a one-dimensional problem). learn_rate is the learning rate that controls the magnitude of the vector update. n_iter is the number of iterations.

This function does exactly what’s described above: it takes a starting point (line 2), iteratively updates it according to the learning rate and the value of the gradient (lines 3 to 5), and finally returns the last position found.

Before you apply gradient_descent(), you can add another termination criterion:

import numpy as np

def gradient_descent(
    gradient, start, learn_rate, n_iter=50, tolerance=1e-06):
    vector = start
    for _ in range(n_iter):
        diff = -learn_rate * gradient(vector)
        if np.all(np.abs(diff) <= tolerance):
            break
        vector  = diff
    return vector

You now have the additional parameter tolerance (line 4), which specifies the minimal allowed movement in each iteration. You’ve also defined the default values for tolerance and n_iter, so you don’t have to specify them each time you call gradient_descent().

Lines 9 and 10 enable gradient_descent() to stop iterating and return the result before n_iter is reached if the vector update in the current iteration is less than or equal to tolerance. This often happens near the minimum, where gradients are usually very small. Unfortunately, it can also happen near a local minimum or a saddle point.

Line 9 uses the convenient NumPy functions numpy.all() and numpy.abs() to compare the absolute values of diff and tolerance in a single statement. That’s why you import numpy on line 1.

Now that you have the first version of gradient_descent(), it’s time to test your function. You’ll start with a small example and find the minimum of the function