InvalidArgumentError: required broadcastable shapes [Op:Add] tensorflow model-CodePudding

I am a using CLIP model. Where I have two models. One model output is (20, 128, 256) and the other one output is (20, 256).

image_model_output = (20, 256)
text_model_output = (20, 128, 256)

I use the following to calculate this

logits = (tf.matmul(caption_embeddings, image_embeddings, transpose_b=True))
so it will be like `(20, 256) * (256, 128, 20)`
it's ouput will be `(20, 128, 20)`

Similarly I calculate like this

images_similarity = tf.matmul(
        image_embeddings, image_embeddings, transpose_b=True
    )
(Output)--> (20, 256) * (256, 20) = (20,20)

and this

captions_similarity = tf.matmul(
        caption_embeddings, caption_embeddings, transpose_b=True
    )
(Output)--> (20, 128, 256) * (256, 128, 20) = (20, 128, 128)

The problem arises here

targets = keras.activations.softmax(
        (captions_similarity   images_similarity) / (2 * self.temperature)
    )

So do I need to change the activation function or there is any way to add these 3d matrices with different shapes? Sorry to technically explain like this but people with solid deep learning and machine learning backgorund will understand.

NOTE: After adding `axis 1` like this `tf.expand_dims(image_embeddings, axis=1)` the below part runs successfully

targets = keras.activations.softmax(
    (captions_similarity   images_similarity) / (2 * self.temperature)
)

However after this there is a loss funtion like below

captions_loss = keras.losses.categorical_crossentropy(
        y_true=targets, y_pred=logits, from_logits=True
    )

which generates this error

ValueError: Shapes (2, 128, 128) and (2, 128, 1) are incompatible

Is it possible to solve this error?

CodePudding user response：

To handle the above error I used a different loss funtion. I changed the code like below.

captions_loss = keras.losses.categorical_crossentropy(
    y_true=targets, y_pred=logits, from_logits=True
)

captions_loss = keras.losses.kl_divergence(
    y_true=targets, y_pred=logits
)

To save time of developers I have answered to my own. I am available to discuss on it further if someone is interested.

NOTE: After adding axis 1 like this tf.expand_dims(image_embeddings, axis=1) the below part runs successfully

NOTE: After adding `axis 1` like this `tf.expand_dims(image_embeddings, axis=1)` the below part runs successfully