Weird 1d shape result on pytorch 3d ResNet-CodePudding

I have a 3dResNet model from PyTorch. I also commented out the flatten line in the resnet.py source code so my output shouldn't be 1D.

Here is the code I have:

class VideoModel(nn.Module):
    def __init__(self,num_channels=3):
        super(VideoModel, self).__init__()
        self.r2plus1d = models.video.r2plus1d_18(pretrained=True)
        self.r2plus1d.fc = Identity()
        for layer in self.r2plus1d.children():
            layer.requires_grad_ = False

    def forward(self, x):
        print(x.shape)
        x = self.r2plus1d(x)
        print(x.shape)
        return x

My identity class exists just to ignore a layer:


class Identity(nn.Module):
    def __init__(self):
        super().__init__()

    def forward(self, x):
        return x

When I run torch.randn(1, 3, 8, 112, 112) as my input, I get the following output:

torch.Size([1, 3, 8, 112, 112])
torch.Size([1, 512, 1, 1, 1])

Why do I have a 1D output even though I removed fc layer and the flatten operation? Is there a better way to remove the flatten operation?

CodePudding user response：

The cause is the AdaptiveAvgPool3d layer right before the flatten step. It is called with the argument output_size=(1,1,1), and so pools the last three dimensions to (1,1,1) regardless of their original dimensions.

In your case, the output after the average pool has the shape (1,512,1,1,1), after flatten has the shape (1,512), and after the fc layer has the shape (1,400).

So the flatten operation is not responsible, disable the average pool and all subsequent steps to get the desired result.