I am trying to implement grad-cam for a tensorflow 2.0 model (created using the keras-api), but the gradients returned from the tape are always None.
I am following the example given on https://keras.io/examples/vision/grad_cam/.
My model is fairly simple, but I swapped it out for the builtin Xception model provided by tf.keras.applications in order to debug (no difference in behavior, so the problem must be with my code).
# model (not shown here) is Xception from tf.keras.applications
cam_model = tf.keras.models.Model(
[model.inputs],
[model.get_layer(conv_layer_name).output, model.output] # conv_layer_name = 'block14_sepconv2_act'
)
with tf.GradientTape() as tape:
conv_out, predictions = cam_model(image)
class_out = tf.argmax(predictions, axis=-1)
grads = tape.gradient(class_out, conv_out)
if grads is None: # grads is None
raise Exception("Grad cam has recorded no gradient")
This is simple enough, I fail to see why the gradients are None. I suspect the tape might not be recording, but given the colab in https://colab.research.google.com/github/keras-team/keras-io/blob/master/examples/vision/ipynb/grad_cam.ipynb nothing seems to be required.
There is a related question but the convolution layer was incorrect, whereas here it's indeed the correct layer.
EDIT
So the argmax
was problematic in the case of Xception, but fixing this (using predictions directly for instance) does not work for my model. Here is the model definition code:
backbone = VGG16(
include_top=False,
weights='imagenet',
input_shape=(*size, 3),
pooling='max'
)
backbone.trainable = False
net = Sequential()
for layer in backbone.layers:
net.add(layer)
net.add(Flatten())
net.add(Dense(256, activation='relu'))
net.add(Dense(128, activation='relu'))
net.add(Dense(len(CLASSES), activation='softmax'))
This is in tensorflow 2.8.0, on GPU.
CodePudding user response:
Like @AloneTogether mentionned, the result of argmax
is not differentiable, thus the None result after applying tape.gradients(...)
is normal, since no gradient can be computed.
While the result of argmax
cannot be differentiated, it can be used to select the correct activation in the following way:
class_pred = tf.argmax(predictions, axis=-1)
class_out = predictions[:, class_pred]
This solves this problem (by using Xception).
The other problem, using my full model, was disconnected graph errors when trying to access the inner layer of VGG16. I was able to fix this in a somewhat unsatisfying way, by using the input of VGG16 as the first layer of a model and using the output of VGG16 as the next available layer (using the functional API):
x = vgg16.output
x = Flatten()(x)
...
return Model(vgg16.input, x)
The graph of the network will be fully expanded, meaning you will not have a "vgg" block, but all the layers of VGG unrolled. I think it's possible to have a non-unrolled version but I was unable to achieve it. This answer hints as this being possible: Heatmap on custom model with transfer learning