After building a vgg16 based classifier. I would like to build a bounding box which bound the detected object.
I found the internet that I can do that by removing the layer after the last Maxpool
and add some fully connected layer
flatten = vgg16.output
flatten = Flatten()(flatten)
bboxhead = Dense(128,activation="relu")(flatten)
bboxhead = Dense(64,activation="relu")(bboxhead)
bboxhead = Dense(32,activation="relu")(bboxhead)
bboxhead = Dense(4,activation="relu")(bboxhead)
box_model = Model(inputs = vgg16.input,outputs = bboxhead)
box_model.summary()
The model should be like this, same as that I searched.
Model: "box_model"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_1 (InputLayer) [(None, 224, 224, 3)] 0
block1_conv1 (Conv2D) (None, 224, 224, 64) 1792
block1_conv2 (Conv2D) (None, 224, 224, 64) 36928
block1_pool (MaxPooling2D) (None, 112, 112, 64) 0
block2_conv1 (Conv2D) (None, 112, 112, 128) 73856
block2_conv2 (Conv2D) (None, 112, 112, 128) 147584
block2_pool (MaxPooling2D) (None, 56, 56, 128) 0
block3_conv1 (Conv2D) (None, 56, 56, 256) 295168
block3_conv2 (Conv2D) (None, 56, 56, 256) 590080
block3_conv3 (Conv2D) (None, 56, 56, 256) 590080
block3_pool (MaxPooling2D) (None, 28, 28, 256) 0
block4_conv1 (Conv2D) (None, 28, 28, 512) 1180160
block4_conv2 (Conv2D) (None, 28, 28, 512) 2359808
block4_conv3 (Conv2D) (None, 28, 28, 512) 2359808
block4_pool (MaxPooling2D) (None, 14, 14, 512) 0
block5_conv1 (Conv2D) (None, 14, 14, 512) 2359808
block5_conv2 (Conv2D) (None, 14, 14, 512) 2359808
block5_conv3 (Conv2D) (None, 14, 14, 512) 2359808
block5_pool (MaxPooling2D) (None, 7, 7, 512) 0
flatten (Flatten) (None, 25088) 0
dense (Dense) (None, 128) 3211392
dense_1 (Dense) (None, 64) 8256
dense_2 (Dense) (None, 32) 2080
dense_3 (Dense) (None, 4) 132
=================================================================
Total params: 17,936,548
Trainable params: 3,221,860
Non-trainable params: 14,714,688
_________________________________________________________________
Then train the model
from tensorflow.keras.optimizers import Adam
opt = Adam(1e-4)
box_model.compile(loss='mse',optimizer=opt)
steps, val_steps = train_gen.n/batch_size, val_gen.n/batch_size
num_epochs = 30
history = box_model.fit(train_gen,validation_data=val_gen,batch_size=32,epochs=30,verbose=1)
But I found that the last Dense
layer has 4 dim, does not match my number of class (5). After I changed the dim to 5. It works, but I cannot train anything. The output 5-values array is not reasonable (all 0).
Or my implementation is not correct?
CodePudding user response:
In short: your implementation is fine, but your data is wrong.
In order to train a new output, you need new labels. The input need not change, but somehow you need to acquire the x, y, height and width of the bounding box you are trying to detect. If the data set does not provide this, you will need to label them yourself.
If you want to train on bounding box coordinates, your label needs to be bounding box coordinates. You can't keep training with the class labels of your dataset. Whatever your model is trying to learn in supervised learning, that is what you need to supply as a label.