Keras: Using Dice coefficient Loss Function, val loss is not improving-CodePudding

Problem

I am doing two classes image segmentation, and I want to use loss function of dice coefficient. However validation loss is not improved. How to Solve these problem?

what I did

Using the mothod of one-hot encoding, Processed label image and it has not include backgroung label.

Code

Shape of X is (num of data, 256, 256, 1) # graysacle

Shape of y is (num of data, 256, 256, 2) # two class and exclude background label

one_hot_y = np.zeros((len(y), image_height, image_width, 2))
for i in range(len(y)):
  one_hot = to_categorical(y[i])
  one_hot_y[i] = one_hot[:,:,1:] 
one_hot_y.shape  #->  (566, 256, 256, 2)

#### <-- Unet Model --> ####

from tensorflow import keras
from tensorflow.keras.layers import Input, Conv2D, MaxPooling2D, Concatenate, Conv2DTranspose
from keras import Model

def unet(image_height, image_width, num_classes):
    # inputs = Input(input_size)
    inputs = Input(shape=(image_height, image_width, 1),name='U-net')
    
    conv1 = Conv2D(32, (3, 3), activation='relu', padding='same')(inputs)
    conv1 = Conv2D(32, (3, 3), activation='relu', padding='same')(conv1)
    pool1 = MaxPooling2D(pool_size=(2, 2))(conv1)

    conv2 = Conv2D(64, (3, 3), activation='relu', padding='same')(pool1)
    conv2 = Conv2D(64, (3, 3), activation='relu', padding='same')(conv2)
    pool2 = MaxPooling2D(pool_size=(2, 2))(conv2)

    conv3 = Conv2D(128, (3, 3), activation='relu', padding='same')(pool2)
    conv3 = Conv2D(128, (3, 3), activation='relu', padding='same')(conv3)
    pool3 = MaxPooling2D(pool_size=(2, 2))(conv3)

    conv4 = Conv2D(256, (3, 3), activation='relu', padding='same')(pool3)
    conv4 = Conv2D(256, (3, 3), activation='relu', padding='same')(conv4)
    pool4 = MaxPooling2D(pool_size=(2, 2))(conv4)

    conv5 = Conv2D(512, (3, 3), activation='relu', padding='same')(pool4)
    conv5 = Conv2D(512, (3, 3), activation='relu', padding='same')(conv5)

    up6 = Concatenate()([Conv2DTranspose(256, (2, 2), strides=(2, 2), padding='same')(conv5), conv4])
    conv6 = Conv2D(256, (3, 3), activation='relu', padding='same')(up6)
    conv6 = Conv2D(256, (3, 3), activation='relu', padding='same')(conv6)

    up7 = Concatenate()([Conv2DTranspose(128, (2, 2), strides=(2, 2), padding='same')(conv6), conv3])
    conv7 = Conv2D(128, (3, 3), activation='relu', padding='same')(up7)
    conv7 = Conv2D(128, (3, 3), activation='relu', padding='same')(conv7)

    up8 = Concatenate()([Conv2DTranspose(64, (2, 2), strides=(2, 2), padding='same')(conv7), conv2])
    conv8 = Conv2D(64, (3, 3), activation='relu', padding='same')(up8)
    conv8 = Conv2D(64, (3, 3), activation='relu', padding='same')(conv8)

    up9 = Concatenate()([Conv2DTranspose(32, (2, 2), strides=(2, 2), padding='same')(conv8), conv1])
    conv9 = Conv2D(32, (3, 3), activation='relu', padding='same')(up9)
    conv9 = Conv2D(32, (3, 3), activation='relu', padding='same')(conv9)

    outputs = Conv2D(num_classes, (1, 1), activation='softmax')(conv9)
    
    return Model(inputs=[inputs], outputs=[outputs])```


#### <-- Dice Score --> ####

from tensorflow.keras import backend as K
def dice_coef(y_true, y_pred):
  y_true_f = K.flatten(y_true)
  y_pred_f = K.flatten(y_pred)
  intersection = K.sum(y_true_f * y_pred_f)
  return (2. * intersection   0.0001) / (K.sum(y_true_f)   K.sum(y_pred_f)   0.0001)

def dice_coef_loss(y_true, y_pred):
  return 1 - dice_coef(y_true, y_pred)```


#### <-- Fit the Model --> ####

from tensorflow.keras import optimizers
adam = optimizers.Adam(learning_rate=0.0001)
unet_model.compile(optimizer=adam, loss=[dice_coef_loss],metrics=[dice_coef])
hist = unet_model.fit(X_train,y_train, epochs=epochs, batch_size=batch_size,validation_data=(X_val,y_val), callbacks=[checkpoint,earlystopping])

CodePudding user response：

I tried to replicate your experience. I used the Oxford-IIIT Pets database whose label has three classes: 1: Foreground, 2: Background, 3: Not classified. If class 1 ("Foreground") is removed as you did, then the val_loss does not change during the iterations. On the other hand, if the "Not classified" class is removed, the optimization seems to work. The model fails to discriminate between "Background" and "Not classified", which is conceivable.
Besides, there is a small error in the calculation of the dice coefficient: In the denominator, you need to take the sum of the squares. It doesn't change anything for y_true but for y_pred it does.

CodePudding user response：

I can't say why your code doesn't work, but I can tell you the way I do it. Differences are that I exclude the background and encode the target inside the dice coef calculation function.

Then I define my Dice coefficient as follows:

def dice_coef(y_true, y_pred, smooth=1):
    # flatten
    y_true_f = K.flatten(y_true)
    y_pred_f = K.flatten(y_pred)
    # one-hot encoding y with 3 labels : 0=background, 1=label1, 2=label2
    y_true_f = K.one_hot(K.cast(y_true_f, np.uint8), 3)
    y_pred_f = K.one_hot(K.cast(y_pred_f, np.uint8), 3)
    # calculate intersection and union exluding background using y[:,1:]
    intersection = K.sum(y_true_f[:,1:]* y_pred_f[:,1:], axis=[-1])
    union = K.sum(y_true_f[:,1:], axis=[-1])   K.sum(y_pred_f[:,1:], axis=[-1])
    # apply dice formula
    dice = K.mean((2. * intersection   smooth)/(union   smooth), axis=0)
    return dice

def dice_loss(y_true, y_pred):
    return 1-dice_coef