Himself in the use of classical Transformor model in machine translation task, found a logical question,
Official Transformor Pytorch code, written in the trian and eval is so:
Mr Pred=model (src_seq trg_seq)
This means that the model to forecast, to all the real information trg_seq as input quantity? It won't logical problems?
Even though the original sequence_mask, namely in the Decoder part, in the prediction, the output of the first t a t before trg_seq information is used to calculate concentration, only
The only can solve the problem of the logic of training process, and in the eval stage, or the practical application stage (machine translation), the model should always have no way to use real information trg_seq as input!
Excuse me,, is my understanding is there a problem? Or the real logic problems.