Note: it's not an issue I just want to know the reason.
I am trying to implement keras clip model where the model uses text encoder and vision encoder for text and image embeddings generation. when I try to print the shape of compiled images and texts then it just shows the output shape and not the exact number of images or texts it has pre-processed like (None, 256)
. while there are about 3000 images and texts in my dataset.
def call(self, features, training=False):
with tf.device("/gpu:0"):
caption_embeddings = text_encoder(features["caption"], training=training)
print(caption_embeddings.shape)
with tf.device("/gpu:1"):
image_embeddings = vision_encoder(features["image"], training=training)
print(image_embeddings.shape)
return caption_embeddings, image_embeddings
Any tensorflow or keras developer will better understand this question. Full code of my colab file is HERE the original implementation is HERE
CodePudding user response:
call
is called before training to build the graph, thus the first dimension is not known.
When you compile your model, you can do something like this:
model.fit(..., run_eagerly=True)
and you will see that print that you expect, however the training will be very slow... instead, you can use tf.print(tf.shape(caption_embeddings))
to do it in graph mode