Tensorflow keras model throwing error for categorical classificiation-CodePudding

I have a 2d numpy array (49000 entries with 784 feature columns) with training data and corresponding label array (y_train) which consists of categorical values labelled from 1 to 10.

Numpy array details -

print(X_train.shape, "X_train.shape")
print(y_train.shape, "y_train.shape")
print(X_val.shape, "X_val.shape")
print(y_val.shape, "y_val.shape")
print(np.unique(y_train))

Output - 
(49000, 784) X_train.shape
(49000,) y_train.shape
(1000, 784) X_val.shape
(1000,) y_val.shape
[0 1 2 3 4 5 6 7 8 9]

This is the code I am running -

y_train_one_hot = tf.keras.utils.to_categorical(y_train, num_classes=10)
y_val_one_hot = tf.keras.utils.to_categorical(y_val, num_classes=10)
y_test_one_hot = tf.keras.utils.to_categorical(y_test, num_classes=10)

dataset_train = tf.data.Dataset.from_tensor_slices((X_train, y_train_one_hot )).batch(32)
dataset_validate = tf.data.Dataset.from_tensor_slices((X_val, y_val_one_hot )).batch(32)
dataset_test = tf.data.Dataset.from_tensor_slices((X_test, y_test_one_hot )).batch(32)

model = tf.keras.Sequential([
    tf.keras.layers.Dense(784, activation='relu'),
    tf.keras.layers.Dense(128, activation='relu'),
    tf.keras.layers.Dense(10, activation='softmax')
])

model.compile(
    optimizer=tf.keras.optimizers.SGD(),
    loss=tf.keras.losses.CategoricalCrossentropy(),
    metrics=[tf.keras.metrics.Accuracy()],
)

model.fit(dataset_train, epochs=10, validation_data=dataset_validate)

I get the following output

 Epoch 1/10
1532/1532 [==============================] - 16s 10ms/step - loss: nan - accuracy: 0.0294 - val_loss: nan - val_accuracy: 0.0000e 00
Epoch 2/10
1532/1532 [==============================] - 12s 8ms/step - loss: nan - accuracy: 0.0000e 00 - val_loss: nan - val_accuracy: 0.0000e 00
Epoch 3/10
1532/1532 [==============================] - 14s 9ms/step - loss: nan - accuracy: 0.0000e 00 - val_loss: nan - val_accuracy: 0.0000e 00
Epoch 4/10
1532/1532 [==============================] - 11s 7ms/step - loss: nan - accuracy: 0.0000e 00 - val_loss: nan - val_accuracy: 0.0000e 00
Epoch 5/10
1532/1532 [==============================] - 13s 9ms/step - loss: nan - accuracy: 0.0000e 00 - val_loss: nan - val_accuracy: 0.0000e 00

Can anyone say what the problem is in my code? Please note that the y array has categorical labels so this is NOT a regression model.

CodePudding user response：

The error probably comes from the loss function tf.keras.losses.CategoricalCrossentropy you are using. Try using the SparseCategoricalCrossentropy loss function. As stated here:

Use the CategoricalCrossentropy loss function when there are two or more label classes. We expect labels to be provided in a one_hot representation.

CodePudding user response：

It is because you are using labels as integer (e.g. 0,1,2,..9) shape (batch_size, 1) and output size of your model is (batch_size,10) i.e. probability of each class. Either you change your labels in one_hot which will change your single label in a vector of size [num_classes]. Syntax in given below.

tf.keras.utils.to_categorical(
    y, num_classes=None, dtype='float32'
)

(in your case num_classes=10)

or use SparseCategoricalCrossentropy which requires integer labels as mentioned by @AloneTogether. Syntax as follows.

tf.keras.losses.SparseCategoricalCrossentropy(
    from_logits=False, reduction=losses_utils.ReductionV2.AUTO,
    name='sparse_categorical_crossentropy'
)