I am currently working on the spectral gradient method. Now I have developed two similar algorithms, but apparently, when I thought they yield the same results, they actually return different values after some iterations.
Here's the variable
v (vector at kth iteration)
v_1 (vector at k 1th iteration)
g (gradient at kth iteration)
g_1 (gradient at k 1th iteration)
The first algorithm is as follows:
def spectral_grad1(
v: np.ndarray, v_1: np.ndarray,
g: np.ndarray, g_1: np.ndarray
) -> np.ndarray:
s = v_1 - v
y = g_1 - g
sT_y = s.T@y
sT_s = s.T@s
if s.T_s > s.T_y:
s1 = s**4
s2 = s**2
w_k = (sT_s - sT_y) / s1.sum()
B_k = np.array([1. / (1. w_k*i) for i in s2])
return np.diag(B_k)
return np.identity(len(s)) #rescaling
direction = np.negative(np.linalg.inv(B) @ g)
On the other hand, here is the second algorithm:
def spectral_grad2(
v: np.ndarray, v_1: np.ndarray,
g: np.ndarray, g_1: np.ndarray
) -> np.ndarray:
s = v_1 - v
y = g_1 - g
sT_y = s.T@y
sT_s = s.T@s
if sT_s <= sT_y:
return np.identity(len(v)) # Rescaling
s1 = s**4
s2 = s**2
w_k = (sT_s - sT_y) / s1.sum()
B_k = 1. w_k*s2
return np.diag(B_k) # Diagonal matrix of B
direction = -(B @ g)
If you look at these two algorithms, the only difference is the B_k
and the computation of direction
.
I'm sorry as I can't give you much data to reproduce this since this is just part of the gradient descent algorithm. What I'm more curious is about what gives the B_k
and direction
in both algorithms to have different result?
Edit
I wish to find the inverse of B_k
. So any insight about the use of reciprocal or np.linalg.inv
would be greatly appreciated
CodePudding user response:
In one of these you effectively use:
diag([1. / (1. w_k*i) ...])
and in the other
linalg.inv(diag(1. w_k*s2))
Note that the theoretical inverse of a diagonal matrix is the reciprocal diagonal matrix. But linalg.inv
cannot know it’s input is diagonal and thus the numerical inverse produced is likely to have some very small non zero values off-diagonal.