What is the loss function used in Trainer from the Transformers library of Hugging Face?
I am trying to fine tine a BERT model using the Trainer class from the Transformers library of Hugging Face.
In their documentation, they mention that one can specify a customized loss function by overriding the compute_loss
method in the class. However, if I do not do the method override and use the Trainer to fine tine a BERT model directly for sentiment classification, what is the default loss function being use? Is it the categorical crossentropy? Thanks!
CodePudding user response:
It depends!
Especially given your relatively vague setup description, it is not clear what loss will be used. But to start from the beginning, let's first check how the default compute_loss()
function in the Trainer
class looks like.
You can find the corresponding function here, if you want to have a look for yourself (current version at time of writing is 4.17). The actual loss that will be returned with default parameters is taken from the model's output values:
loss = outputs["loss"] if isinstance(outputs, dict) else outputs[0]
which means that the model itself is (by default) responsible for computing some sort of loss and returning it in outputs
.
Following this, we can then look into the actual model definitions for BERT (source: here, and in particular check out the model that will be used in your Sentiment Analysis task (I assume a BertForSequenceClassification
model.
The code relevant for defining a loss function looks like this:
if labels is not None:
if self.config.problem_type is None:
if self.num_labels == 1:
self.config.problem_type = "regression"
elif self.num_labels > 1 and (labels.dtype == torch.long or labels.dtype == torch.int):
self.config.problem_type = "single_label_classification"
else:
self.config.problem_type = "multi_label_classification"
if self.config.problem_type == "regression":
loss_fct = MSELoss()
if self.num_labels == 1:
loss = loss_fct(logits.squeeze(), labels.squeeze())
else:
loss = loss_fct(logits, labels)
elif self.config.problem_type == "single_label_classification":
loss_fct = CrossEntropyLoss()
loss = loss_fct(logits.view(-1, self.num_labels), labels.view(-1))
elif self.config.problem_type == "multi_label_classification":
loss_fct = BCEWithLogitsLoss()
loss = loss_fct(logits, labels)
Based on this information, you should be able to either set the correct loss function yourself (by changing model.config.problem_type
accordingly), or otherwise at least be able to determine whichever loss will be chosen, based on the hyperparameters of your task (number of labels, label scores, etc.)