Pytorch convolutional Autoencoder-CodePudding

Hi I have a project where I need to create a convolutional autoencoder trained on the MNIST database, but my constraint is that I must not use pooling. My embedding dim is 16 and I need to have a 256 * 16 * 1 * 1 tensor as output of my encoder.

I have written the following class to define my encoder :

class AutoEncoderCNN(nn.Module):
def __init__(self,nb_channels, embedding_dim):
    super(AutoEncoderCNN, self).__init__()
    self.encoder = nn.Sequential(
        nn.Conv2d(1, 16, kernel_size=5, stride=1),
        nn.ReLU(),
        nn.Conv2d(16, 32, kernel_size=5, stride=1),
        nn.ReLU(),
        nn.Conv2d(32, 64, kernel_size=5, stride=1),
        nn.ReLU(),
        nn.Conv2d(64, 128, kernel_size=5, stride=1),
        nn.ReLU(),
        nn.Conv2d(128, 256, kernel_size=5, stride=1),
        nn.ReLU(),
        nn.Conv2d(128, 256, kernel_size=5, stride=1),
        nn.ReLU()
    )
    self.decoder = nn.Sequential(
        nn.ConvTranspose2d(256, 128, kernel_size=5, stride=1),
        nn.ReLU(),
        nn.ConvTranspose2d(128, 64, kernel_size=5, stride=1),
        nn.ReLU(),
        nn.ConvTranspose2d(64, 32, kernel_size=5, stride=1),
        nn.ReLU(),
        nn.ConvTranspose2d(32, 16, kernel_size=5, stride=1),
        nn.ReLU(),
        nn.ConvTranspose2d(16, 1, kernel_size=5, stride=1),
        nn.Sigmoid()      
    )

def encode(self, x):
    
    x = self.encoder(x)# A COMPLETER
    return x
        
def decode(self, x):
    x = self.decoder(x)# A COMPLETER
    return x
        
def forward(self, x):
    x = self.encoder(x)
    x = self.decoder(x)
    return x

But I have this dimension error when I try to train my network :

RuntimeError: Given groups=1, weight of size [32, 1, 5, 5], expected input[1, 256, 28, 28] to have 1 channels, but got 256 channels instead

My loss function :

loss_function = nn.MSELoss(size_average=None, reduce=None, reduction='mean')

My optimize :

optimizer =  optim.Adam(modelcnn.parameters(), lr=learning_rate)

My dataloader :

mnistTrainLoader = DataLoader(mnistTrainSet_clean, batch_size=batch_size,shuffle=True, num_workers=0)

My train loop :

 # Procédure d'entrainement du model, en utilisant un dataloader, un optimiseur et le nombre d'époques
def train(model, data_loader, opt, n_epochs):
losses = []  
i=0
for epoch in range(n_epochs):  # Boucle sur les époques
    running_loss = 0.0

    for features, labels in data_loader:      

        # A COMPLETER
        #Propagation en avant
        labels_pred = model(features) # Equivalent à model.forward(features)
         

        #Calcul du coût
        loss = loss_function(labels_pred,labels)

        #on sauvegarde la loss pour affichage futur
        losses.append(loss.item())
        
        #Effacer les gradients précédents
        optimizer.zero_grad()

        #Calcul des gradients (rétro-propagation)
        loss.backward()

        # Mise à jour des poids : un pas de l'optimiseur
        optimizer.step()

        # print statistics
        running_loss  = loss.item()
        if i % 10 == 9:    
            print('[Epoque : %d, iteration: ]] loss: %.3f'%
                  (epoch   1, i   1, running_loss / 10))
            running_loss = 0.0
        i =1   

print('Entrainement terminé')
return losses

I have tried many things to solve it but nothing work. Anyone can help me please ?

CodePudding user response：

In the encoder, you're repeating:

nn.Conv2d(128, 256, kernel_size=5, stride=1),
nn.ReLU(),
nn.Conv2d(128, 256, kernel_size=5, stride=1),
nn.ReLU()

Just delete the duplication, and shapes will fit.

Note: As output of your encoder you'll have a shape of batch_size * 256 * h' * w'. 256 is the number of channels as output of the last convolution in the encoder, and h', w' will depend on the size of the input image h, w after passing through convolutional layers.

You're using nb_channels, and embedding_dim nowhere. And I can't see what you mean by embedding_dim since you're only using convolutions and no connecter layers.

===========EDIT===========

after dialog in down comments, I'll let this code here to inspire you -I hope- (and tell me if it works)

from torch import nn
import torch
import torch
from torch.utils.data import Dataset
from torch.utils.data import DataLoader
from torchvision import datasets
from torchvision.transforms import ToTensor

data = datasets.MNIST(root='data', train=True, download=True, transform=ToTensor())

class AutoEncoderCNN(nn.Module):
  def __init__(self):
    super(AutoEncoderCNN, self).__init__()
    self.encoder = nn.Sequential(
        nn.Conv2d(1, 32, kernel_size=5, stride=1),
        nn.ReLU(),
        nn.Conv2d(32, 64, kernel_size=5, stride=1),
        nn.ReLU(),
        nn.Conv2d(64, 128, kernel_size=5, stride=1),
        nn.ReLU(),
        nn.Conv2d(128, 256, kernel_size=5, stride=1),
        nn.ReLU(),
    )
    self.decoder = nn.Sequential(
        nn.ConvTranspose2d(256, 128, kernel_size=5, stride=1),
        nn.ReLU(),
        nn.ConvTranspose2d(128, 64, kernel_size=5, stride=1),
        nn.ReLU(),
        nn.ConvTranspose2d(64, 32, kernel_size=5, stride=1),
        nn.ReLU(),
        nn.ConvTranspose2d(32, 1, kernel_size=5, stride=1),
        nn.Sigmoid()      
    )
          
  def forward(self, x):
      x = self.encoder(x)
      x = self.decoder(x)
      return x
  
model = AutoEncoderCNN()
mnistTrainLoader = DataLoader(data,
                              batch_size=32, shuffle=True, num_workers=0)

loss_function = nn.MSELoss(size_average=None, reduce=None, reduction='mean')
optimizer =  torch.optim.Adam(model.parameters(), lr=1e-3)
losses = []
i = 0
running_loss = .0
for epoch in range(100):
  for features, _ in mnistTrainLoader:
    y = model(features)
    loss = loss_function(y, features)
    losses.append(loss.item())
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()
    running_loss  = loss.item()
    if i % 10 == 9:    
        print('[Epoque : %d, iteration: ]] loss: %.3f'%
              (epoch   1, i   1, running_loss / 10))
        running_loss = 0.0
    i =1

=======Adding a channel dimension=======

The problem was actually while creating the dataset, since the dataset contains greyscale images, the PyTorch MNIST dataset helper is returning the image without the dimension of channels. Convolutions need this dimension, so we need to add it.

Instead of loading dataset this way:

X_train = torchvision.datasets.MNIST(root='./data', train=True, download=True, transform=transforms.ToTensor()).data
print(X_train.shape) # torch.Size([60000, 28, 28])

We load it this way:

X_train = torchvision.datasets.MNIST(root='./data', train=True, download=True).data[:,None,:,:]/255.
# /255. to have floats between 0 and 1 instead of unsigned int
print(X_train.shape) # torch.Size([60000, 1, 28, 28])

Another way to handle this problem is in the model class, by adding the channel dimension to the input x.