I am working on a multi-label segmentation problem using U-Net with Keras backend. For every input image, I have two masks, belonging to two different objects. The images and masks are of size 224 x 224 and are RGB and grayscale respectively. The folder structure is as follows:
data
|_train
|_image
|_label1 (binary masks of object 1)
|_label2 (binary masks of object 2)
I am using the Qubvel segmentation models https://github.com/qubvel/segmentation_models with vgg-16 backbone. Shown below is my training pipeline:
img_width, img_height = 224,224
input_shape = (img_width, img_height, 3)
model_input = Input(shape=input_shape)
n_classes=2 # masks of object 1 and object 2
activation='sigmoid' #since I want multi-label output and not multi-class
batch_size = 16
n_epochs = 128
BACKBONE = 'vgg16'
model1 = sm.Unet(BACKBONE,
encoder_weights='imagenet',
classes=n_classes,
activation=activation)
opt = keras.optimizers.Adam(lr=0.001)
loss_func='binary_crossentropy'
model1.compile(optimizer=opt,
loss=loss_func,
metrics=['binary_accuracy'])
callbacks = [ModelCheckpoint(monitor='val_loss',
filepath='model1.hdf5',
save_best_only=True,
save_weights_only=True,
mode='min',
verbose = 1)]
history1 = model1.fit(X_tr, Y_tr,
batch_size=batch_size,
epochs=n_epochs,
callbacks=callbacks,
validation_data=(X_val, Y_val))
The shape of each layer of the model is given below:
[(None, None, None, 3)]
(None, None, None, 64)
(None, None, None, 64)
(None, None, None, 64)
(None, None, None, 128)
(None, None, None, 128)
(None, None, None, 128)
(None, None, None, 256)
(None, None, None, 256)
(None, None, None, 256)
(None, None, None, 256)
(None, None, None, 512)
(None, None, None, 512)
(None, None, None, 512)
(None, None, None, 512)
(None, None, None, 512)
(None, None, None, 512)
(None, None, None, 512)
(None, None, None, 512)
(None, None, None, 512)
(None, None, None, 512)
(None, None, None, 512)
(None, None, None, 512)
(None, None, None, 512)
(None, None, None, 512)
(None, None, None, 512)
(None, None, None, 1024)
(None, None, None, 256)
(None, None, None, 256)
(None, None, None, 256)
(None, None, None, 256)
(None, None, None, 256)
(None, None, None, 256)
(None, None, None, 256)
(None, None, None, 768)
(None, None, None, 128)
(None, None, None, 128)
(None, None, None, 128)
(None, None, None, 128)
(None, None, None, 128)
(None, None, None, 128)
(None, None, None, 128)
(None, None, None, 384)
(None, None, None, 64)
(None, None, None, 64)
(None, None, None, 64)
(None, None, None, 64)
(None, None, None, 64)
(None, None, None, 64)
(None, None, None, 64)
(None, None, None, 192)
(None, None, None, 32)
(None, None, None, 32)
(None, None, None, 32)
(None, None, None, 32)
(None, None, None, 32)
(None, None, None, 32)
(None, None, None, 32)
(None, None, None, 16)
(None, None, None, 16)
(None, None, None, 16)
(None, None, None, 16)
(None, None, None, 16)
(None, None, None, 16)
(None, None, None, 2)
(None, None, None, 2)
Shown below is my data preparation pipeline with two masks for each image. I am trying to stack the mask 1 and mask 2 for every input image:
ids = next(os.walk("data/train/image"))[2]
print("No. of images = ", len(ids))
X = np.zeros((len(ids), im_height, im_width, 3), dtype=np.float32) #RGB input
Y = np.zeros((len(ids), im_height, im_width, 1), dtype=np.float32) #grayscale input for the masks
for n, id_ in tqdm(enumerate(ids), total=len(ids)):
img = load_img("data/train/image/" id_, color_mode = "rgb")
x_img = img_to_array(img)
x_img = resize(x_img, (224,224,3),
mode = 'constant', preserve_range = True)
# Load mask
mask1 = img_to_array(load_img("data/train/label1/" id_, color_mode = "grayscale"))
mask2 = img_to_array(load_img("data/train/label2/" id_, color_mode = "grayscale"))
mask1 = resize(mask1, (224,224,1),
mode = 'constant', preserve_range = True)
mask2 = resize(mask2, (224,224,1),
mode = 'constant', preserve_range = True)
mask = np.stack([mask1,mask2], axis=-1)
# Save images
X[n] = x_img/255.0
Y[n] = mask/255.0
X_tr, X_val, Y_tr, Y_val = train_test_split(X, Y, test_size=0.3, random_state=42)
I get the following error:
Traceback (most recent call last):
File "/home/codes/untitled1.py", line 482, in <module>
Y[n] = mask/255.0
ValueError: could not broadcast input array from shape (224,224,1,2) into shape (224,224,1)
What proper syntax I should use and modify the code to stack the masks and train a multi-label model? Thanks and looking forward to the correction in the code.
CodePudding user response:
You need to update the definition of Y
, since it holds two masks, and the shape should match the output of your model:
Y = np.zeros((len(ids), im_height, im_width, 2), dtype=np.float32)
And then reshape mask:
mask = np.stack([mask1,mask2], axis=-1)
# Save images
X[n] = x_img/255.0
Y[n] = np.reshape(mask/255.0, (224,224,2))
( I am not sure, but instead of that above, you could stack directly into Y[n]:
np.stack([mask1,mask2], axis=-1, out=Y[n])
# Save images
X[n] = x_img/255.0
Y[n] = Y[n] / 255.0
in which case no reshaping needed)