How to extract features using VGG16 model and use them as input for another model(say resnet, vit-ke-CodePudding

I am a bit new at Deep learning and image classification. I want to extract features from an image using VGG16 and give them as input to my vit-keras model. Following is my code:

from tensorflow.keras.applications.vgg16 import VGG16
vgg_model = VGG16(include_top=False, weights = 'imagenet', input_shape=(IMAGE_SIZE, IMAGE_SIZE, 3))

for layer in vgg_model.layers:
    layer.trainable = False

from vit_keras import vit
vit_model = vit.vit_b16(
        image_size = IMAGE_SIZE,
        activation = 'sigmoid',
        pretrained = True,
        include_top = False,
        pretrained_top = False,
        classes = 2)

model = tf.keras.Sequential([
        vgg_model,
        vit_model,
        tf.keras.layers.Flatten(),
        tf.keras.layers.Dense(512, activation = tfa.activations.gelu),
        tf.keras.layers.Dense(256, activation = tfa.activations.gelu),
        tf.keras.layers.Dense(64, activation = tfa.activations.gelu),
        tf.keras.layers.BatchNormalization(),
        tf.keras.layers.Dense(1, 'sigmoid')
    ],
    name = 'vision_transformer')

model.summary()

But, I'm getting the following error:

ValueError: Input 0 of layer embedding is incompatible with the layer: expected axis -1 of input shape to have value 3 but received input with shape (None, 8, 8, 512)

I'm assuming this error occurs at the merging of VGG16 and vit-keras. How will rectify this error for this situation?

CodePudding user response：

You cannot feed the output of the VGG16 model to the vit_model, since both models expect the input shape (224, 224, 3) or some shape that you defined. The problem is that the VGG16 model has the output shape (8, 8, 512). You could try upsampling / reshaping / resizing the output to fit the expected shape, but I would not recommend it. Instead, just feed the same input to both models and concatenate their results afterwards. Here is a working example:

import tensorflow as tf
import tensorflow_addons as tfa
from vit_keras import vit

IMAGE_SIZE = 224
vgg_model = tf.keras.applications.vgg16.VGG16(include_top=False, weights = 'imagenet', input_shape=(IMAGE_SIZE, IMAGE_SIZE, 3))
for layer in vgg_model.layers:
    layer.trainable = False

vit_model = vit.vit_b16(
        image_size = IMAGE_SIZE,
        activation = 'sigmoid',
        pretrained = True,
        include_top = False,
        pretrained_top = False,
        classes = 2)

inputs = tf.keras.layers.Input((IMAGE_SIZE, IMAGE_SIZE, 3))
vgg_output = tf.keras.layers.Flatten()(vgg_model(inputs))
vit_output = vit_model(inputs)
x = tf.keras.layers.Concatenate(axis=-1)([vgg_output, vit_output])
x = tf.keras.layers.Dense(512, activation = tfa.activations.gelu)(x)
x = tf.keras.layers.Dense(256, activation = tfa.activations.gelu)(x)
x = tf.keras.layers.Dense(64, activation = tfa.activations.gelu)(x)
x = tf.keras.layers.BatchNormalization()(x)
outputs = tf.keras.layers.Dense(1, 'sigmoid')(x)
model = tf.keras.Model(inputs, outputs)
print(model.summary())