Keras CNN, Incompatible shapes: [32,20,20,1] vs. [32,1]-CodePudding

I'm trying to reconstruct in Python the Gradient Transformation Network model in the paper titled : Single Image Super-Resolution Based on Deep Learning and Gradient Transformation by Chen et al 2016.

Here is the code I've written so far:

#Loading of data
trdata = ImageDataGenerator()
traindata = trdata.flow_from_directory(directory='train', target_size=(36,36))

tsdata = ImageDataGenerator()
testdata = tsdata.flow_from_directory(directory='test', target_size=(20,20))

The target size values are based on what was said in the paper as well as all the other parameters for the model. Input images are extracted X-gradient images. Each image is sliced into 36x36 blocks. Validation images are the same but sliced into 20x20 blocks. As written on the paper, the output of the model are 20x20 block images. I'm using the BSDS500 dataset to train the model, as used in the paper.

# Define GTN Model
GTN = Sequential()

# add model layers
GTN.add(Conv2D(filters=64, kernel_size=(9, 9), activation='relu', input_shape=(36, 36, 3)))
GTN.add(Conv2D(filters=32, kernel_size=(5, 5), activation='relu'))
GTN.add(Conv2D(filters=1, kernel_size=(5, 5), activation='relu'))
GTN.summary()

# define optimizer
sgd = SGD()
# compile model
GTN.compile(optimizer=sgd, loss='mean_squared_error', metrics=['MeanSquaredError'])

#model fitting
history = GTN.fit(traindata, validation_data=testdata, epochs = 10, steps_per_epoch=20)

Epochs value and steps are kept low for testing purposes. The `GTN.summary()' output:

Found 23400 images belonging to 1 classes.
Found 25200 images belonging to 1 classes.
Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 conv2d (Conv2D)             (None, 28, 28, 64)        15616     
                                                                 
 conv2d_1 (Conv2D)           (None, 24, 24, 32)        51232     
                                                                 
 conv2d_2 (Conv2D)           (None, 20, 20, 1)         801       
                                                                 
=================================================================
Total params: 67,649
Trainable params: 67,649
Non-trainable params: 0
_________________________________________________________________

The error I get when running the code is:

Incompatible shapes: [32,20,20,1] vs. [32,1]

I have tried adding a Flatten() Layer but this makes the final layer be

flatten (Flatten) (None, 400) 0

It finishes 1 Epoch but then displays an error

Input to reshape is a tensor with 512 values, but the requested shape requires a multiple of 400

Do I have to manually reshape the input? Is it the way I loaded the traindata and testdata into the GTN.fit() line? I'm not sure about adding the Flatten() layer as the paper specifically described 3 Conv2D() layers.

CodePudding user response：

you seem to misunderstand between validation images and training labels.

For a 36x36x3 input image, your model will produce a 20x20x1 output. Since you used MSE loss, the ground truth for each image should be in the same shape as the output. Because you specified the input shape (36x36x3) in the model definition, validation input images must be of that shape as well.

The data generator trdata.flow_from_directory produce a single integer label for each image, so that this pair of data cannot be used to train your model. A proper dataloader should produce a pair of data (36x36x3, 20x20x1). Please review your dataloader.

CodePudding user response：

Based on your error logs and the loss function you used, you may need to modify your network as follows. Note, your current model's output shape in 20, 20, 1 and your target label shape is 1, which are incompatible to calculate cost function. If the target shape is fine in your end goal, then you may need to change the output shape of your model (like below).

# Define GTN Model
GTN = Sequential()

# add model layers
GTN.add(Conv2D(filters=64, kernel_size=(9, 9), 
               activation='relu', input_shape=(36, 36, 3)))
GTN.add(Conv2D(filters=32, kernel_size=(5, 5), activation='relu'))
GTN.add(GlobalAveragePooling2D())
GTN.add(Dense(1, activation=None))
GTN.summary()

_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 conv2d_18 (Conv2D)          (None, 28, 28, 64)        15616     
                                                                 
 conv2d_19 (Conv2D)          (None, 24, 24, 32)        51232     
                                                                 
 global_average_pooling2d_6   (None, 32)               0         
 (GlobalAveragePooling2D)                                        
                                                                 
 dense_2 (Dense)             (None, 1)                 33        
                                                                 
=================================================================

Otherwise, it's entirely possible that your model construction is fine but you may need to revisit the target shape, both output shapes should match.