How does the output shape of submodules in pytorch is determined? why is the output shape of a certain sub-module is modified in the code below?
When I separate the head of a classical classifier from its backbone in the following way:
import torch, torchvision
from torchsummary import summary
effnet = torchvision.models.efficientnet_b0(num_classes = 2)
backbone = torch.nn.Sequential(*(list(effnet.children())[0]))
adaptive_pool = list(effnet.children())[1]
head = list(effnet.children())[2]
model = torch.nn.Sequential(*[backbone, adaptive_pool, head])
summary(model, (3,256,256), device = 'cpu') # <== Error
I get the following error:
RuntimeError: mat1 and mat2 shapes cannot be multiplied (2560x1 and 1280x2)
This error is due to modified output shape of the sub-module adaptive_pool
. To workaround this problem, flatten can be used as follows:
class flatten(torch.nn.Module):
def forward(self, input):
return input.view(input.size(0), -1)
model = torch.nn.Sequential(*[backbone, adaptive_pool, flatten(), head])
summary(model, (3,256,256), device = 'cpu')
Why is the output shape of the sub-module adaptive_pool
is modified?
CodePudding user response:
The output of an nn.AdaptiveAvgPool2d
is 4D even if the average is computed globally i.e output_size=1
. In other words, the output shape of your global pooling layer is (N, C, 1, 1)
. This means you indeed need to flatten it for the layer which is fully connected.
In the referenced original efficient net classification network, the implementation of the flattening operation is done directly in the forward logic without the use of a dedicated layer. See this line.
Instead of implementing your own flattening layer, you can use the built-in nn.Flatten
. More details about this module can be found here.
>>> model = nn.Sequential(backbone, adaptive_pool, nn.Flatten(1), head)