I am building a neural collaborative filtering recommendation model using tensorflow, using binary cross entropy as the loss function. The labels to be predicted are, of course, binary.
Upon training each epoch, the loss function is printed. I have a for loop that trains the model epoch by epoch, then uses the model at that current state to predict the test labels, and calculates the loss again using the log_loss function of sci-kit learn.
I notice that the loss calculated by tensorflow (shown by loss:) is consistently higher than that calculated by sklearn (shown by train_loss:):
Is this due to slightly different math involved in the two functions?
CodePudding user response:
In the training loop, Keras measures the average loss throughout the epoch. During that time, the model is adjusted and improved, so by the time an epoch is finished, the reported loss is an overestimation of the loss at that time (assuming that the model is still learning). With sklearn
, you're calculating the loss at the end of the epoch only, with the model as it is at the end of an epoch. If the model is still learning, the loss with sklearn
will be slightly lower, since it sees only the model that has been adjusted during the epoch.