I'm trying to find road lanes using PyTorch. I created dataset and my model. But when I try to train my model, I get mat1 and mat2 shapes cannot be multiplied (4x460800 and 80000x16)
error. I've tried other topic's solutions but those solutions didn't help me very much.
My dataset is bunch of road images with their validation images. I have .csv file that contains names of images (such as 'image1.jpg, image2.jpg'). Original size of images and validation images is 1280x720. I convert them 200x200 in my dataset code.
Here's my dataset:
import os
import pandas as pd
import random
import torch
import torchvision.transforms.functional as TF
from torch.utils.data import Dataset
from torchvision import transforms
from PIL import Image
class RNetDataset(Dataset):
def __init__(self, csv_file, root_dir, val_dir, transform=None):
self.annotations = pd.read_csv(csv_file)
self.root_dir = root_dir
self.val_dir = val_dir
self.transform = transform
def __len__(self):
return len(self.annotations)
def __getitem__(self, index):
img_path = os.path.join(self.root_dir, self.annotations.iloc[index, 0])
image = Image.open(img_path).convert('RGB')
mask_path = os.path.join(self.val_dir, self.annotations.iloc[index, 0])
mask = Image.open(mask_path).convert('RGB')
transform = transforms.Compose([
transforms.Resize((200, 200)),
transforms.ToTensor()
])
if self.transform:
image = self.transform(image)
mask = self.transform(mask)
return image, mask
My model:
import torch
import torch.nn as nn
class RNet(nn.Module):
def __init__(self):
super().__init__()
self.cnn_layers = nn.Sequential(
# Conv2d, 3 inputs, 128 outputs
# 200x200 image size
nn.Conv2d(3, 128, kernel_size=3, stride=1, padding=1),
nn.ReLU(),
nn.MaxPool2d(kernel_size=2, stride=2),
# Conv2d, 128 inputs, 64 outputs
# 100x100 image size
nn.Conv2d(128, 64, kernel_size=3, stride=1, padding=1),
nn.ReLU(),
nn.MaxPool2d(kernel_size=2, stride=2),
# Conv2d, 64 inputs, 32 outputs
# 50x50 image size
nn.Conv2d(64, 32, kernel_size=3, stride=1, padding=1),
nn.ReLU(),
nn.MaxPool2d(kernel_size=2, stride=2)
)
self.linear_layers = nn.Sequential(
# Linear, 32*50*50 inputs, 16 outputs
nn.Linear(32 * 50 * 50, 16),
# Linear, 16 inputs, 3 outputs
nn.Linear(16, 3)
)
def forward(self, x):
x = self.cnn_layers(x)
x = x.view(x.size(0), -1)
x = self.linear_layers(x)
return x
How to avoid this error and train my images on these validation images?
CodePudding user response:
The answer: In your case, NN input has a shape (3, 1280, 720)
, not (3, 200, 200)
as you want. Probably you have forgotten to modify transform
argument in RNetDataset
. It stays None
, so transforms are not applied and the image is not resized. Another possibility is that it happens due to these lines:
transform = transforms.Compose([
transforms.Resize((200, 200)),
transforms.ToTensor()
])
if self.transform:
image = self.transform(image)
mask = self.transform(mask)
You have two variables named transform
, but one with self.
- maybe you messed them up. Verify it and the problem should go away.
How I came up with it: 460800
is clearly a tensor size after reshaping before linear layers. According to the architecture, tensor processed with self.cnn_layers
should have 32 layers, so its height multiplied by width should give 460800 / 32 = 14400
. Suppose that its height = H
, width = W
, so H x W = 14400
. Let's understand, what was the original input size in this case? nn.MaxPool2d(kernel_size=2, stride=2)
layer divides height and width by 2, and it happens three times. So, the original input size has been 8H x 8W = 64 x 14400 = 936000
. Finally, notice that 936000 = 1280 * 720
. This can't be a magical coincidence. Case closed!
Another suggestion: even if you apply transforms correctly, your code might not work. Suppose that you have an input of size (4, 3, 200, 200)
, where 4 is a batch size. Layers in your architecture will process this input as follows:
nn.Conv2d(3, 128, kernel_size=3, stride=1, padding=1) # -> (4, 128, 200, 200)
nn.MaxPool2d(kernel_size=2, stride=2) # -> (4, 128, 100, 100)
nn.Conv2d(128, 64, kernel_size=3, stride=1, padding=1) # -> (4, 64, 100, 100)
nn.MaxPool2d(kernel_size=2, stride=2) # -> (4, 64, 50, 50)
nn.Conv2d(64, 32, kernel_size=3, stride=1, padding=1) # -> (4, 32, 50, 50)
nn.MaxPool2d(kernel_size=2, stride=2) # -> (4, 32, 25, 25)
So, your first layer in self.linear_layers
should be not nn.Linear(32 * 50 * 50, 16)
, but nn.Linear(32 * 25 * 25, 16)
. With this change, everything should be fine.