This is probably an easy one for most of you pros. I have miraculously managed to finetune an ELECTRA model on some data and got some decent f-scores, and now I wish to apply my model to some other text; it's a multi label classification model.
This is how I do with just one sentence:
test_comment = ["We’re exceptionally proud of the 62,000 employees who work in our restaurants, along with the hundreds of Russian suppliers who support our business, and our local franchisees. "]
# tokenizing comment ^
encoding = tokenizer.encode_plus(
test_comment,
add_special_tokens=True,
max_length=512,
return_token_type_ids=False,
padding="max_length",
return_attention_mask=True,
return_tensors='pt',
)
# returning probability values for each label
_, test_prediction = trained_model(encoding["input_ids"], encoding["attention_mask"])
test_prediction = test_prediction.flatten().numpy()
for label, prediction in zip(LABEL_COLUMNS, test_prediction):
print(f"{label}: {prediction}",)
#[0 if x <= 0.5 else 1 for x in test_prediction]
Which returns
morality_binary: 0.12542158365249634
emotion_binary: 0.16170987486839294
positive_binary: 0.13724404573440552
negative_binary: 0.06993409991264343
care_binary: 0.06901352107524872
fairness_binary: 0.0649697408080101
authority_binary: 0.05470539629459381
sanctity_binary: 0.03908411040902138
harm_binary: 0.05327978357672691
injustice_binary: 0.057351987808942795
betrayal_binary: 0.03698693960905075
subversion_binary: 0.05460885167121887
degradation_binary: 0.04987286403775215
Now, say I have a dataset with a structure such as
ID sample_text
1 lorem ipsum dala dulu
2 lorem ipsum dala dulu etc
3 lorem ipsum dala dulu etc
4 lorem ipsum dala dulu etc
5 lorem ipsum dala dulu etc
And I wanted the model to make a prediction for each row and add each prediction as a new column, something like
ID sample_text morality_binary positive_binary negative_binary
1 lorem ipsum dala dulu 0.13455 0.43455 0.26455
2 lorem ipsum dala dulu etc 0.12145 0.43455 0.87455
3 lorem ipsum dala dulu etc 0.03455 0.63455 0.37455
4 lorem ipsum dala dulu etc 0.41455 0.83455 0.81455
5 lorem ipsum dala dulu etc 0.73455 0.93455 0.5455
I have a feeling that it is not too difficult, I just can't wrap my head around it.
Thanks a million for any help you might provide!
CodePudding user response:
Since you didn't provide a minimal reproducible example I cannot confirm this will work, but theoretically it should, assuming your output is a list-like. I also assume your model is a black-box:
First, wrap your model in a function:
# outputs some list-like result
def run_model(input_data):
...
Then, apply the function to each row:
df[LABEL_COLUMNS] = df[['sample_text']].apply(run_model, axis=1, result_type='expand')
However, it's not super clear how your model works or what the expected input is, and whether or not you can operate on multiple inputs.