I am working on a binary image classifier, and I am now testing using a Visual Transformer for the task. As a reference, I am using the model reimplementation by lucidrains. In the code snippet used to declare the model (see here and below), is the model pretrained or do we do the training from scratch? Thanks!
import torch
from vit_pytorch import SimpleViT
v = SimpleViT(
image_size = 256,
patch_size = 32,
num_classes = 1000,
dim = 1024,
depth = 6,
heads = 16,
mlp_dim = 2048
)
CodePudding user response:
If you look into the implementation code, there is no weight loading involved.