Home > Software engineering >  How does the output layer of this network which has 10 nodes correspond to an integer?
How does the output layer of this network which has 10 nodes correspond to an integer?

Time:09-22

ffnn = Sequential([
    Flatten(input_shape=X_train.shape[1:]),
    Dense(512, activation='relu'),
    Dropout(0.2),
    Dense(512, activation='relu'),
    Dropout(0.2),
    Dense(10, activation='softmax')
])
ffnn_history = ffnn.fit(X_train,
                        y_train,
                        batch_size=batch_size,
                        epochs=epochs,
                        validation_split=0.2,
                        callbacks=[checkpointer, early_stopping],
                        verbose=1,
                        shuffle=True)
ffnn_accuracy = ffnn.evaluate(X_test, y_test, verbose=0)[1]

These codes are from https://github.com/stefan-jansen/machine-learning-for-trading/blob/main/18_convolutional_neural_nets/02_digit_classification_with_lenet5.ipynb.

I understand this network and how softmax function works. My question is, the output layer has 10 nodes. The output should be a vector of length 10 (the sum of the vector is 1). How does it matches the label y where y is an integer in the training and evaluating process (shouldn't it transform the output vector to the corresponding integer first)?

Does tensorflow automatically interpret the length-10 output vector to the corresponding integer or what?

CodePudding user response:

In your case the labels are one-hot encoded by the loss function sparse_categorical_crossentropy():

>>> y_true = [1, 2]
>>> y_pred = [[0.05, 0.95, 0], [0.1, 0.8, 0.1]]
>>> tf.keras.losses.sparse_categorical_crossentropy(y_true, y_pred).numpy()
array([0.05129344, 2.3025851 ], dtype=float32)

The output softmax(x) can be interpreted as a probability distribution (Σ softmax(x) = 1.0). So e.g. argmax(softmax(x)) = id_maxprob is going to return you the index of the most probable class.

Hence, your target vector for your neural network is going to be 10-dimensional such that each integer [0, 1, .., 8, 9] corresponds to one node of the softmax-output.

With that being said, the target vector you're trying to predict is simply going to be one-hot encoded:

[1, 0, 0, 0, 0, 0, 0, 0, 0, 0]  # == 0
[0, 1, 0, 0, 0, 0, 0, 0, 0, 0]  # == 1
..
[0, 0, 0, 0, 0, 0, 0, 0, 0, 1]  # == 9

In other words: If you have a batch of images and feed it to your network, the output will be (n, num_classes) (here num_classes is 10) and it is you who is going to do the final interpretation of the output e.g. by using np.argmax in order to get your final predictions.

predictions = model(images)
predicted_ids = np.argmax(predictions, axis=1)

# Print each index == predicted integer
print(predicted_ids)

Also, note the following example:

>>> tf.one_hot([1, 2, 9], depth=10)
<tf.Tensor: shape=(3, 10), dtype=float32, numpy=
array([[0., 1., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 1., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 1.]], dtype=float32)>
  • Related