What is the loss function used in Trainer from the Transformers library of Hugging Face?-CodePudding

What is the loss function used in Trainer from the Transformers library of Hugging Face?

I am trying to fine tine a BERT model using the Trainer class from the Transformers library of Hugging Face.

In their documentation, they mention that one can specify a customized loss function by overriding the compute_loss method in the class. However, if I do not do the method override and use the Trainer to fine tine a BERT model directly for sentiment classification, what is the default loss function being use? Is it the categorical crossentropy? Thanks!

CodePudding user response：

It depends! Especially given your relatively vague setup description, it is not clear what loss will be used. But to start from the beginning, let's first check how the default compute_loss() function in the Trainer class looks like.

You can find the corresponding function here, if you want to have a look for yourself (current version at time of writing is 4.17). The actual loss that will be returned with default parameters is taken from the model's output values:

loss = outputs["loss"] if isinstance(outputs, dict) else outputs[0]

which means that the model itself is (by default) responsible for computing some sort of loss and returning it in outputs.

Following this, we can then look into the actual model definitions for BERT (source: here, and in particular check out the model that will be used in your Sentiment Analysis task (I assume a BertForSequenceClassification model.

The code relevant for defining a loss function looks like this:

if labels is not None:
    if self.config.problem_type is None:
        if self.num_labels == 1:
            self.config.problem_type = "regression"
        elif self.num_labels > 1 and (labels.dtype == torch.long or labels.dtype == torch.int):
            self.config.problem_type = "single_label_classification"
        else:
            self.config.problem_type = "multi_label_classification"

    if self.config.problem_type == "regression":
        loss_fct = MSELoss()
        if self.num_labels == 1:
            loss = loss_fct(logits.squeeze(), labels.squeeze())
        else:
            loss = loss_fct(logits, labels)
    elif self.config.problem_type == "single_label_classification":
        loss_fct = CrossEntropyLoss()
        loss = loss_fct(logits.view(-1, self.num_labels), labels.view(-1))
    elif self.config.problem_type == "multi_label_classification":
        loss_fct = BCEWithLogitsLoss()
        loss = loss_fct(logits, labels)

Based on this information, you should be able to either set the correct loss function yourself (by changing model.config.problem_type accordingly), or otherwise at least be able to determine whichever loss will be chosen, based on the hyperparameters of your task (number of labels, label scores, etc.)