Compile model which has different dimensions of output and labels (in Tensorflow)-CodePudding

Simplest examples which replicates the error:

import tensorflow as tf

def loss(y, logits):
    loss = tf.reduce_mean(tf.nn.sparse_softmax_cross_entropy_with_logits(labels=y, logits=logits))
    return loss

Input = tf.keras.layers.Input(dtype=tf.float32, shape=(20,), name="X")
hidden = tf.keras.layers.Dense(40, activation=tf.keras.activations.relu, name="hidden1")(Input)
logits = tf.keras.layers.Dense(10, name="outputs")(hidden)
optimizer = tf.keras.optimizers.Adam()

model = tf.keras.Model(inputs=Input, outputs=logits)
model.summary()
model.compile(optimizer=optimizer, loss=loss)

I understand, that in this case, output of model is (batch_size, 10) while my labels have (batch_size,) dimensions. This is why I use tf.nn.sparse_softmax_cross_entropy_with_logits.

Before I can provide any kind of labels to this model, compilation fails with the following error:


C:\Stas\Development\Anaconda3\lib\site-packages\tensorflow_core\python\ops\nn_ops.py in sparse_softmax_cross_entropy_with_logits(_sentinel, labels, logits, name)
   3445       raise ValueError("Rank mismatch: Rank of labels (received %s) should "
   3446                        "equal rank of logits minus 1 (received %s)." %
-> 3447                        (labels_static_shape.ndims, logits.get_shape().ndims))
   3448     if (static_shapes_fully_defined and
   3449         labels_static_shape != logits.get_shape()[:-1]):

ValueError: Rank mismatch: Rank of labels (received 2) should equal rank of logits minus 1 (received 2).

After some investigation, I see that compilation fails because tensorflow somehow thinks that my "target_output" has shape of (None, None), while my output has shape of (None, 10), so because of equal number of dimensions, sparse cross entropy cannot be applied.

I learned that in TF 2.1, it was possible to directly give target_output as a parameter to compile which is not possible now.

What would be correct way for me to proceed with this?

CodePudding user response：

According to the docs, you just have to make sure your labels have the shape [batch_size]. Here is a working example with tf.squeeze:

import tensorflow as tf

def loss(y, logits):
    y = tf.squeeze(y, axis=-1)
    loss = tf.reduce_mean(tf.nn.sparse_softmax_cross_entropy_with_logits(labels=y, logits=logits))
    return loss

Input = tf.keras.layers.Input(dtype=tf.float32, shape=(20,), name="X")
hidden = tf.keras.layers.Dense(40, activation=tf.keras.activations.relu, name="hidden1")(Input)
logits = tf.keras.layers.Dense(10, name="outputs")(hidden)
optimizer = tf.keras.optimizers.Adam()

model = tf.keras.Model(inputs=Input, outputs=logits)
model.summary()
model.compile(optimizer=optimizer, loss=loss)

x = tf.random.normal((50, 20))
y = tf.random.uniform((50, 1), maxval=10, dtype=tf.int32)
model.fit(x, y, epochs=2)