I am training a multilabel VGG-16 based classification model. There are 25 labels for this task. I am trying to replicate this code at https://towardsdatascience.com/multi-label-classification-and-class-activation-map-on-fashion-mnist-1454f09f5925 to generate the class activation map using the trained model.
model = load_model('weights/vgg16_multilabel.09-0.3833.h5')
model.summary()
sgd = SGD(learning_rate=0.001, decay=1e-6, momentum=0.9, nesterov=True)
model.compile(optimizer=sgd, loss='binary_crossentropy',
metrics=['accuracy'])
#labels
columns=['Action', 'Adventure', 'Animation', 'Biography', 'Comedy',
'Crime', 'Documentary', 'Drama', 'Family', 'Fantasy',
'History', 'Horror', 'Music', 'Musical', 'Mystery',
'N/A', 'News', 'Reality-TV', 'Romance', 'Sci-Fi', 'Short',
'Sport', 'Thriller', 'War', 'Western']
gap_weights = model.layers[-1].get_weights()[0] #final dense layer
print(" >>> size(gap_weights) = ", gap_weights.size)
#extract from the deepest convolutional layer
cam_model = Model(inputs=model.input,
outputs=(model.layers[-3].output,
model.layers[-1].output))
print(" >>> K.int_shape(model.layers[-3].output) = ", K.int_shape(model.layers[-3].output))
print(" >>> K.int_shape(model.layers[-1].output) = ", K.int_shape(model.layers[-1].output))
#--- make the prediction
features, results = cam_model.predict(X_test)
# check the CAM activations for 10 test images
for idx in range(10):
# get the feature map of the test image
features_for_one_img = features[idx, :, :, :]
# map the feature map to the original size
height_roomout = train_img_size_h / features_for_one_img.shape[0]
width_roomout = train_img_size_w / features_for_one_img.shape[1]
cam_features = sp.ndimage.zoom(features_for_one_img, (height_roomout, width_roomout, 1), order=2)
# get the predicted label with the maximum probability
pred = np.argmax(results[idx])
# prepare the final display
plt.figure(facecolor='white')
# get the weights of class activation map
cam_weights = gap_weights[:, pred]
# create the class activation map
cam_output = np.dot(cam_features, cam_weights)
# draw the class activation map
ax.set_xticklabels([])
ax.set_yticklabels([])
buf = 'Predicted Class = ' columns[pred] ', Probability = ' str(results[idx][pred])
plt.xlabel(buf)
plt.imshow(t_pic[idx], alpha=0.5)
plt.imshow(cam_output, cmap='jet', alpha=0.5)
plt.show()
This is the output
size(gap_weights) = 12800
K.int_shape(model.layers[-4].output) = (None, 512)
K.int_shape(model.layers[-1].output) = (None, 25)
I get the following error:
Traceback (most recent call last):
File "/project/1/complete_code.py", line 1295, in <module>
features_for_one_img = features[idx, :, :, :]
IndexError: too many indices for array: array is 2-dimensional, but 4 were indexed
I am getting this error in Tensorflow 2.X but I had no problems in Tensorflow 1.X.
CodePudding user response:
When you use VGG16 as model, model.layers[-3].output will give you the output of a dense layer, i.e., a tensor in (None, 512). But, what CAM needs is the output of the last MaxPooling2D layer as a tensor in (None, 7, 7, 512). Please print model.summary to get the correct output layer. I think that you should use model.layers[-6].output in cam_model.