Output of vgg16 layer doesn't make sense-CodePudding

I have a vgg16 network without the last max pooling, fully connected and softmax layers. The network summary says that the last layer's output is going to have a size of (batchsize, 512, 14, 14). Putting an image into the network gives me an output of (batchsize, 512, 15, 15). How do I fix this?

import torch
import torch.nn as nn
from torchsummary import summary

vgg16 = torch.hub.load('pytorch/vision:v0.10.0', 'vgg16', pretrained=True)
vgg16withoutLastFewLayers = nn.Sequential(*list(vgg16.children())[:-2][0][0:30]).cuda()

image = torch.zeros((1,3,244,244)).cuda()
output = vgg16withoutLastFewLayers(image)

summary(vgg16withoutLastFewLayers, (3,224,224))
print(output.shape)

----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
================================================================
            Conv2d-1         [-1, 64, 224, 224]           1,792
              ReLU-2         [-1, 64, 224, 224]               0
            Conv2d-3         [-1, 64, 224, 224]          36,928
              ReLU-4         [-1, 64, 224, 224]               0
         MaxPool2d-5         [-1, 64, 112, 112]               0
            Conv2d-6        [-1, 128, 112, 112]          73,856
              ReLU-7        [-1, 128, 112, 112]               0
            Conv2d-8        [-1, 128, 112, 112]         147,584
              ReLU-9        [-1, 128, 112, 112]               0
        MaxPool2d-10          [-1, 128, 56, 56]               0
           Conv2d-11          [-1, 256, 56, 56]         295,168
             ReLU-12          [-1, 256, 56, 56]               0
           Conv2d-13          [-1, 256, 56, 56]         590,080
             ReLU-14          [-1, 256, 56, 56]               0
           Conv2d-15          [-1, 256, 56, 56]         590,080
             ReLU-16          [-1, 256, 56, 56]               0
        MaxPool2d-17          [-1, 256, 28, 28]               0
           Conv2d-18          [-1, 512, 28, 28]       1,180,160
             ReLU-19          [-1, 512, 28, 28]               0
           Conv2d-20          [-1, 512, 28, 28]       2,359,808
             ReLU-21          [-1, 512, 28, 28]               0
           Conv2d-22          [-1, 512, 28, 28]       2,359,808
             ReLU-23          [-1, 512, 28, 28]               0
        MaxPool2d-24          [-1, 512, 14, 14]               0
           Conv2d-25          [-1, 512, 14, 14]       2,359,808
             ReLU-26          [-1, 512, 14, 14]               0
           Conv2d-27          [-1, 512, 14, 14]       2,359,808
             ReLU-28          [-1, 512, 14, 14]               0
           Conv2d-29          [-1, 512, 14, 14]       2,359,808
             ReLU-30          [-1, 512, 14, 14]               0
================================================================
torch.Size([1, 512, 15, 15])

CodePudding user response：

The output shape should be [512, 14, 14], assuming that the input image is [3, 224, 224]. Your input image size is [3, 244, 244]. For example,

image = torch.zeros((1,3,224,224))
# torch.Size([1, 512, 14, 14])
output = vgg16withoutLastFewLayers(image)

Therefore, by increasing the image size, the spatial size [W, H] of your output tensor also increases.

CodePudding user response：

Your input shapes are not the same size...

image = torch.zeros((1,3,244,244)).cuda()
output = vgg16withoutLastFewLayers(image)

summary(vgg16withoutLastFewLayers, (3,224,224))
print(output.shape)

Difference: 244 vs 224.

Because those VGG layers are only convolutional layers, when you increase the size of the input image, the output will also be increased in size. This would cause issues if there was a classification head (with no global pooling, etc) was applied directly on top of this as they have fixed-size inputs. You're not doing this, but it's something to keep in mind.