I am working on a binary classification problem, using transfer learning and image inputs and have a question regarding the
I have been working through using the correct activation layers (e.g. Softmax or Sigmoid - sigmoid for binary softmax for multiclass) and noticed when I specify 'sigmoid' as part of the Dense()
output layer, I no longer need to specify from_logits=True
during model.compile()
.
This means when I am obtaining predictions, I don't use the tf.nn.sigmoid()
function and instead simply check if the value is greater than 0.5, then 1, else 0. Is this correct? Here is my code:
i = keras.Input(shape=(150, 150, 3))
scale_layer = keras.layers.Rescaling(scale=1 / 127.5, offset=-1)
mt = scale_layer(i)
mt = base_model(model_top, training=False)
mt = keras.layers.GlobalAveragePooling2D()(mt)
mt = keras.layers.Dropout(dropout)(mt) # Regularize with dropout
o = keras.layers.Dense(1,activation='sigmoid')(mt)
model = keras.Model(i, o)
....
model.compile(optimizer=keras.optimizers.Adam(lr),loss=keras.losses.BinaryCrossentropy(from_logits=False)
)
And then when I obtain predictions, I have the following:
pred = model.predict(test)
pred = tf.where(pred < 0.5, 0, 1)
pred = pred.numpy()
My intuition is that as I am specifying the sigmoid activation function during the Dense layer build, I no longer work with 'logits' and therefore do not need to apply the sigmoid function later on. In the documentation, I've seen both examples used but it's quite sparse on information when working with model.predict()
, would appreciate any guidance.
CodePudding user response:
This means when I am obtaining predictions, I don't use the tf.nn.sigmoid() function and instead simply check if the value is greater than 0.5, then 1, else 0. Is this correct?
Yes, you don't even need the from_logits
parameter since you're using the sigmoid
function. I believe it's False by default.
And then when I obtain predictions, I have the following:
Depends on how (un)balanced your training data is. Ideally, if it's balanced, you're correct, pred > 0.5 means the model thinks the image belongs closer to class 1
. If you have a disproportionately large amount of 1
, the model may be more biased to classifying an image as 1
. Conversely, if you choose to use the softmax
function, you'll get an array with length = num_of_classes, with each prediction array adding up to 1.0, with each element in the array representing the model's confidence the image belongs to each class.