Problem

I am trying to build a simple gradient descent algorithm and plot it on a heatmap.

I assume there are better ways to do this, but I have to use this methodology.

My professor and I have very similar code but we cannot understand why mine behaves differently. Once the lowest point is reached, it should just turn around the lowest point. It works for my professor, but mine goes back up and stops in the middle of nowhere for some reason that we cannot figure out.

Let me be precise, it does not stop because the number of iterations (passed as a parameter of the function) is reached, but the gradient vector becomes very small in a random place as it should happen at the lowest point.

Here is my code :

from math import cos, sin, exp
import numpy as np
import matplotlib.pyplot as plt
from scipy.misc import derivative

def f(x, y):
    return 4 * exp(-((x**2)/2   (y**2)/2)) * sin(x*(y-1/2)) * cos(x/2   y)

precision = 10e-5

# Derivee par rapport a x
def ddx(f):
    return lambda x, y: derivative(f, x, dx=precision, n=1, args=(y,))

# Derivee par rapport a y
def ddy(f):
    return lambda x, y: derivative(f, y, dx=precision, n=1, args=(x,))

# grad(f) retourne la fonction vectorielle de deux variables réelles : (x, y) -> ∇(x, y)
def grad(f):
    return lambda x, y: np.array([ddx(f)(x, y), ddy(f)(x, y)])

def display_heatmap_with_gradient_descent(f, a, b, c, d, n, x0, y0, iterations, h):
    x = np.arange(a,b,n)
    y = np.arange(c,d,n)
    X, Y = np.meshgrid(x, y) # préparation du maillage
    f_vect = np.vectorize(f) # transformation de f en une fonction vectorielle
    Z = f_vect(X,Y) # calcul des images

    # Trace la carte de couleur
    fig, ax = plt.subplots()
    colormap = ax.pcolormesh(X, Y, Z, cmap='YlGnBu')
    fig.colorbar(colormap)

    # Fonction gradient pour f
    gradient = grad(f)

    # Ajoute la descente de gradient
    x = x0
    y = y0
    for i in range(iterations):
        vecteur = -gradient(x, y)
        new_x = x   h * vecteur[0]
        new_y = y   h * vecteur[1]
        print("Itération "   str(i)   " : ("   str(x)   ", "   str(y)   ") -> ("   str(new_x)   ", "   str(new_y)   ")")
        print("\tVecteur : "   str(vecteur))
        ax.plot([x, new_x], [y, new_y], color='black')
        x = new_x
        y = new_y

    # Affiche le point d'arrêt
    ax.plot([x], [y], marker='o', markersize=3, color="red")
        
    plt.show()

display_heatmap_with_gradient_descent(f, -5, 5, -5, 5, 0.05, -0.36, -0.39, 100, 0.1)

Update

I do not know if this is the only problem, but I made some tests with my gradient function, and apparently there is something wrong with it.

For example, I tested the following code :

print(grad(lambda x, y : x**2 - y**2)(1., 1.))

And it gives me [2., 2.], but it should be [2., -2.] since ddx is 2x and ddy is -2x for f = x^2 - y^2.

So I guess this is where the problem lies but I do not understand what is wrong with my grad function.

In addition, if the problem is in the grad function, it is strange that my algorithm almost works and goes through the lowest point.

With such a core problem, I would assume it would do anything.

Track

Again, this might not be the only problem, but at least it should be part of it.

If I do grad(f)(x, y) where x=y, then my ddx and ddy called by grad will return the same functions. It explains why I get [2., 2.] instead of [2., -2.] for grad(lambda x, y : x**2 - y**2)(1., 1.).

How can I process to get different partial derivatives with x = y ?

Let me know if you need any further explanation.

Thank you in advance.

CodePudding user response：

Solution

The problem was in effect with the grad function or rather with the ddx and ddy functions. These two functions were actually the same, except I was swapping x and y.

In other words, I wasn't computing the partial derivatives correctly which caused my gradient to be wrong and my algorithm to not work properly.

Here is the part of the code that has been modified, the rest is unchanged:

def f_replace_y(f, y):
    return lambda x: f(x, y)

def f_replace_x(f, x):
    return lambda y: f(x, y)

# Dérivée partielle de f par rapport à x
def ddx(f):
    return lambda x, y : derivative(f_replace_y(f, y), x, dx=precision, n=1)

# Dériavée partielle de f par rapport à y
def ddy(f):
    return lambda x, y : derivative(f_replace_x(f, x), y, dx=precision, n=1)

However, I still wonder how my previous code could pass through the lowest point given that my partial derivatives were wrong.