A Classifier Network Seems to be "Forgetting" older samples-CodePudding

This is a strange problem: Imagine a neural network classifier. It is a simple linear layer followed by a sigmoid activation that has an input size of 64, and an output size of 112. There also are 112 training samples, where I expect the output to be a one-hot vector. So the basic structure of a training loop is as follows, where samples is a list of integer indices:

model = nn.Sequential(nn.Linear(64,112),nn.Sequential())
loss_fn = nn.BCELoss()
optimizer = optim.AdamW(model.parameters(),lr=3e-4)
for epoch in range(500):
     for input_state, index in samples:
          one_hot = torch.zeros(112).float()
          one_hot[index] = 1.0
          optimizer.zero_grad()
          prediction = model(input_state)
          loss = loss_fn(prediction,one_hot)
          loss.backward()
          optimizer.step()

This model does not perform well, but I don't think it's a problem with the model itself, but rather how it's trained. I think that this is happening because for the most part, all of the one_hot tensor is zeros, that the model just tends to gravitate toward all of the outputs being zeros, which is what's happening. The question becomes: "How does this get solved?" I tried using the average loss with all the samples, to no avail. So what do I do?

CodePudding user response：

So this is very embarrassing, but the answer actually lies in how I process my data. This is a text-input project, so I used basic python lists to create blocks of messages, but when I did this, I ended up making it so that all of the inputs the net got were the same, but the output was different every time. I solved tho s problem with the copy method.