I have a 3dResNet model from PyTorch. I also commented out the flatten line in the resnet.py source code so my output shouldn't be 1D.
Here is the code I have:
class VideoModel(nn.Module):
def __init__(self,num_channels=3):
super(VideoModel, self).__init__()
self.r2plus1d = models.video.r2plus1d_18(pretrained=True)
self.r2plus1d.fc = Identity()
for layer in self.r2plus1d.children():
layer.requires_grad_ = False
def forward(self, x):
print(x.shape)
x = self.r2plus1d(x)
print(x.shape)
return x
My identity class exists just to ignore a layer:
class Identity(nn.Module):
def __init__(self):
super().__init__()
def forward(self, x):
return x
When I run torch.randn(1, 3, 8, 112, 112)
as my input, I get the following output:
torch.Size([1, 3, 8, 112, 112])
torch.Size([1, 512, 1, 1, 1])
Why do I have a 1D output even though I removed fc layer and the flatten operation? Is there a better way to remove the flatten operation?
CodePudding user response:
The cause is the AdaptiveAvgPool3d
layer right before the flatten step. It is called with the argument output_size=(1,1,1)
, and so pools the last three dimensions to (1,1,1)
regardless of their original dimensions.
In your case, the output after the average pool has the shape (1,512,1,1,1)
, after flatten has the shape (1,512)
, and after the fc layer has the shape (1,400)
.
So the flatten operation is not responsible, disable the average pool and all subsequent steps to get the desired result.