PyTorch mat1 and mat2 shapes cannot be multiplied (4x460800 and 80000x16)-CodePudding

I'm trying to find road lanes using PyTorch. I created dataset and my model. But when I try to train my model, I get mat1 and mat2 shapes cannot be multiplied (4x460800 and 80000x16) error. I've tried other topic's solutions but those solutions didn't help me very much.

My dataset is bunch of road images with their validation images. I have .csv file that contains names of images (such as 'image1.jpg, image2.jpg'). Original size of images and validation images is 1280x720. I convert them 200x200 in my dataset code.

Road image:

Validation image:

Here's my dataset:

import os
import pandas as pd
import random

import torch
import torchvision.transforms.functional as TF
from torch.utils.data import Dataset
from torchvision import transforms
from PIL import Image

class RNetDataset(Dataset):
    def __init__(self, csv_file, root_dir, val_dir, transform=None):
        self.annotations = pd.read_csv(csv_file)
        self.root_dir = root_dir
        self.val_dir = val_dir
        self.transform = transform

    def __len__(self):
        return len(self.annotations)

    def __getitem__(self, index):
        img_path = os.path.join(self.root_dir, self.annotations.iloc[index, 0])
        image = Image.open(img_path).convert('RGB')
        mask_path = os.path.join(self.val_dir, self.annotations.iloc[index, 0])
        mask = Image.open(mask_path).convert('RGB')

        transform = transforms.Compose([
            transforms.Resize((200, 200)), 
            transforms.ToTensor()
        ])

        if self.transform:
            image = self.transform(image)
            mask = self.transform(mask)

        return image, mask

My model:

import torch
import torch.nn as nn

class RNet(nn.Module):
    def __init__(self):
        super().__init__()

        self.cnn_layers = nn.Sequential(
            # Conv2d, 3 inputs, 128 outputs
            # 200x200 image size
            nn.Conv2d(3, 128, kernel_size=3, stride=1, padding=1),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2, stride=2),
            # Conv2d, 128 inputs, 64 outputs
            # 100x100 image size
            nn.Conv2d(128, 64, kernel_size=3, stride=1, padding=1),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2, stride=2),
            # Conv2d, 64 inputs, 32 outputs
            # 50x50 image size
            nn.Conv2d(64, 32, kernel_size=3, stride=1, padding=1),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2, stride=2)
        )

        self.linear_layers = nn.Sequential(
            # Linear, 32*50*50 inputs, 16 outputs
            nn.Linear(32 * 50 * 50, 16),
            # Linear, 16 inputs, 3 outputs
            nn.Linear(16, 3)
        )


    def forward(self, x):
        x = self.cnn_layers(x)
        x = x.view(x.size(0), -1)
        x = self.linear_layers(x)
        return x

How to avoid this error and train my images on these validation images?

CodePudding user response：

The answer: In your case, NN input has a shape (3, 1280, 720), not (3, 200, 200) as you want. Probably you have forgotten to modify transform argument in RNetDataset. It stays None, so transforms are not applied and the image is not resized. Another possibility is that it happens due to these lines:

        transform = transforms.Compose([
            transforms.Resize((200, 200)), 
            transforms.ToTensor()
        ])

        if self.transform:
            image = self.transform(image)
            mask = self.transform(mask)

You have two variables named transform, but one with self. - maybe you messed them up. Verify it and the problem should go away.

How I came up with it: 460800 is clearly a tensor size after reshaping before linear layers. According to the architecture, tensor processed with self.cnn_layers should have 32 layers, so its height multiplied by width should give 460800 / 32 = 14400. Suppose that its height = H, width = W, so H x W = 14400. Let's understand, what was the original input size in this case? nn.MaxPool2d(kernel_size=2, stride=2) layer divides height and width by 2, and it happens three times. So, the original input size has been 8H x 8W = 64 x 14400 = 936000. Finally, notice that 936000 = 1280 * 720. This can't be a magical coincidence. Case closed!

Another suggestion: even if you apply transforms correctly, your code might not work. Suppose that you have an input of size (4, 3, 200, 200), where 4 is a batch size. Layers in your architecture will process this input as follows:

nn.Conv2d(3, 128, kernel_size=3, stride=1, padding=1) # -> (4, 128, 200, 200)
nn.MaxPool2d(kernel_size=2, stride=2) # -> (4, 128, 100, 100)
nn.Conv2d(128, 64, kernel_size=3, stride=1, padding=1) # -> (4, 64, 100, 100)
nn.MaxPool2d(kernel_size=2, stride=2) # -> (4, 64, 50, 50)
nn.Conv2d(64, 32, kernel_size=3, stride=1, padding=1) # -> (4, 32, 50, 50)
nn.MaxPool2d(kernel_size=2, stride=2) # -> (4, 32, 25, 25)

So, your first layer in self.linear_layers should be not nn.Linear(32 * 50 * 50, 16), but nn.Linear(32 * 25 * 25, 16). With this change, everything should be fine.