I am iterating over training samples in batches, however last batch always returns fewer samples.
Is it possible to specify step size in torch according to the current batch length?
For example most batch are of size 64, last batch only 6 samples.
If I do the usual routine:
optimizer.zero_grad()
loss.backward()
optimizer.step()
It seems that the last 6 samples carry the same weight when updating the gradients as the 64 sized batches, but in fact they should only carry about 1/10 weight due to fewer samples.
In Mxnet I could specify the step size accordingly but I don't know how to do it in torch.
CodePudding user response:
You can define a custom loss function and then e.g. reweight it based on batch size
def reweighted_cross_entropy(my_outputs, my_labels):
# compute batch size
my_batch_size = my_outputs.size()[0]
original_loss = nn.CrossEntropyLoss()
loss = original_loss (my_outputs, my_labels)
# reweight accordingly
return my_batch_size * loss
if you are using something like gradient descent then it is easy to see that
[1/10 * lr] grad [loss] = lr * grad [ 1/10 loss]
so reweighting the loss will be equivalent to reweighting your learning rate. This won't be exactly true for more comlpex optimisers though but can be good enough in practise.
CodePudding user response:
I suggest just ignore the last batch. Pytorch Dataloader has parameter to implement that behavior:
drop_last = True #(False by default)