GradientTape returning None when run in a loop-CodePudding

The following gradient descent is failing 'coz the gradients returned by tape.gradient() are none when the loop runs second time.

w = tf.Variable(tf.random.normal((3, 2)), name='w')
b = tf.Variable(tf.zeros(2, dtype=tf.float32), name='b')
x = tf.constant([[1., 2., 3.]])


for i in range(10):
  print("iter {}".format(i))
  with tf.GradientTape() as tape:
    #forward prop
    y = x @ w   b  
    loss = tf.reduce_mean(y**2)
    print("loss is \n{}".format(loss))
    print("output- y is \n{}".format(y))
    #vars getting dropped after couple of iterations
    print(tape.watched_variables()) 
  
  #get the gradients to minimize the loss
  dl_dw, dl_db = tape.gradient(loss,[w,b]) 

  #descend the gradients
  w = w.assign_sub(0.001*dl_dw)
  b = b.assign_sub(0.001*dl_db)

iter 0
loss is 
23.328645706176758
output- y is 
[[ 6.8125362  -0.49663293]]
(<tf.Variable 'w:0' shape=(3, 2) dtype=float32, numpy=
array([[-1.3461215 ,  0.43708783],
       [ 1.5931423 ,  0.31951016],
       [ 1.6574576 , -0.52424705]], dtype=float32)>, <tf.Variable 'b:0' shape=(2,) dtype=float32, numpy=array([0., 0.], dtype=float32)>)
iter 1
loss is 
22.634033203125
output- y is 
[[ 6.7103477  -0.48918355]]
()

TypeError                                 Traceback (most recent call last)
c:\projects\pyspace\mltest\test.ipynb Cell 7' in <cell line: 1>()
     11 dl_dw, dl_db = tape.gradient(loss,[w,b]) 
     13 #descend the gradients
---> 14 w = w.assign_sub(0.001*dl_dw)
     15 b = b.assign_sub(0.001*dl_db)

TypeError: unsupported operand type(s) for *: 'float' and 'NoneType'

I checked the documentation which explains the possibilities of the gradients becoming None, but none of them are helping.

CodePudding user response：

This is because assign_sub returns a Tensor. In the line w = w.assign_sub(0.001*dl_dw) you are thus overwriting w with a tensor with the new value. Thus, in the next step, it is not a Variable anymore and is not tracked by the gradient tape by default. This results in the gradient becoming None (tensors also do not have the assign_sub method, so that would crash as well).

Instead, simply write w.assign_sub(0.001*dl_dw) and same for b. The assign functions work in place, so no assignment is necessary.