Keras U-Net multi-label segmentation with two input binary masks-CodePudding

I am working on a multi-label segmentation problem using U-Net with Keras backend. For every input image, I have two masks, belonging to two different objects. The images and masks are of size 224 x 224 and are RGB and grayscale respectively. The folder structure is as follows:

data
 |_train
     |_image 
     |_label1 (binary masks of object 1)
     |_label2 (binary masks of object 2)

I am using the Qubvel segmentation models https://github.com/qubvel/segmentation_models with vgg-16 backbone. Shown below is my training pipeline:

img_width, img_height = 224,224
input_shape = (img_width, img_height, 3)
model_input = Input(shape=input_shape)
n_classes=2 # masks of object 1 and object 2 
activation='sigmoid' #since I want multi-label output and not multi-class
batch_size = 16
n_epochs = 128

BACKBONE = 'vgg16'
model1 = sm.Unet(BACKBONE, 
                 encoder_weights='imagenet', 
                 classes=n_classes, 
                 activation=activation)
opt = keras.optimizers.Adam(lr=0.001) 
loss_func='binary_crossentropy'
model1.compile(optimizer=opt, 
              loss=loss_func, 
              metrics=['binary_accuracy'])

callbacks = [ModelCheckpoint(monitor='val_loss', 
                             filepath='model1.hdf5', 
                             save_best_only=True, 
                             save_weights_only=True, 
                             mode='min', 
                             verbose = 1)]
history1 = model1.fit(X_tr, Y_tr, 
                    batch_size=batch_size, 
                    epochs=n_epochs, 
                    callbacks=callbacks,
                    validation_data=(X_val, Y_val))

The shape of each layer of the model is given below:

[(None, None, None, 3)]
(None, None, None, 64)
(None, None, None, 64)
(None, None, None, 64)
(None, None, None, 128)
(None, None, None, 128)
(None, None, None, 128)
(None, None, None, 256)
(None, None, None, 256)
(None, None, None, 256)
(None, None, None, 256)
(None, None, None, 512)
(None, None, None, 512)
(None, None, None, 512)
(None, None, None, 512)
(None, None, None, 512)
(None, None, None, 512)
(None, None, None, 512)
(None, None, None, 512)
(None, None, None, 512)
(None, None, None, 512)
(None, None, None, 512)
(None, None, None, 512)
(None, None, None, 512)
(None, None, None, 512)
(None, None, None, 512)
(None, None, None, 1024)
(None, None, None, 256)
(None, None, None, 256)
(None, None, None, 256)
(None, None, None, 256)
(None, None, None, 256)
(None, None, None, 256)
(None, None, None, 256)
(None, None, None, 768)
(None, None, None, 128)
(None, None, None, 128)
(None, None, None, 128)
(None, None, None, 128)
(None, None, None, 128)
(None, None, None, 128)
(None, None, None, 128)
(None, None, None, 384)
(None, None, None, 64)
(None, None, None, 64)
(None, None, None, 64)
(None, None, None, 64)
(None, None, None, 64)
(None, None, None, 64)
(None, None, None, 64)
(None, None, None, 192)
(None, None, None, 32)
(None, None, None, 32)
(None, None, None, 32)
(None, None, None, 32)
(None, None, None, 32)
(None, None, None, 32)
(None, None, None, 32)
(None, None, None, 16)
(None, None, None, 16)
(None, None, None, 16)
(None, None, None, 16)
(None, None, None, 16)
(None, None, None, 16)
(None, None, None, 2)
(None, None, None, 2)

Shown below is my data preparation pipeline with two masks for each image. I am trying to stack the mask 1 and mask 2 for every input image:

ids = next(os.walk("data/train/image"))[2] 
print("No. of images = ", len(ids))
X = np.zeros((len(ids), im_height, im_width, 3), dtype=np.float32) #RGB input
Y = np.zeros((len(ids), im_height, im_width, 1), dtype=np.float32) #grayscale input for the masks
for n, id_ in tqdm(enumerate(ids), total=len(ids)):
    img = load_img("data/train/image/" id_, color_mode = "rgb")
    x_img = img_to_array(img)
    x_img = resize(x_img, (224,224,3), 
                   mode = 'constant', preserve_range = True)
    # Load mask
    mask1 = img_to_array(load_img("data/train/label1/" id_, color_mode = "grayscale"))
    mask2 = img_to_array(load_img("data/train/label2/" id_, color_mode = "grayscale"))
    mask1 = resize(mask1, (224,224,1), 
                  mode = 'constant', preserve_range = True)
    mask2 = resize(mask2, (224,224,1), 
                  mode = 'constant', preserve_range = True)
    mask = np.stack([mask1,mask2], axis=-1)
    # Save images
    X[n] = x_img/255.0
    Y[n] = mask/255.0

X_tr, X_val, Y_tr, Y_val = train_test_split(X, Y, test_size=0.3, random_state=42)

I get the following error:

Traceback (most recent call last):

  File "/home/codes/untitled1.py", line 482, in <module>
    Y[n] = mask/255.0

ValueError: could not broadcast input array from shape (224,224,1,2) into shape (224,224,1)

What proper syntax I should use and modify the code to stack the masks and train a multi-label model? Thanks and looking forward to the correction in the code.

CodePudding user response：

You need to update the definition of Y, since it holds two masks, and the shape should match the output of your model:

Y = np.zeros((len(ids), im_height, im_width, 2), dtype=np.float32)

And then reshape mask:

mask = np.stack([mask1,mask2], axis=-1)
# Save images
X[n] = x_img/255.0
Y[n] = np.reshape(mask/255.0, (224,224,2))

( I am not sure, but instead of that above, you could stack directly into Y[n]:

np.stack([mask1,mask2], axis=-1, out=Y[n])
# Save images
X[n] = x_img/255.0
Y[n] = Y[n] / 255.0

in which case no reshaping needed)