Create for-loop to loop over list

This is probably an easy one for most of you pros. I have miraculously managed to finetune an ELECTRA model on some data and got some decent f-scores, and now I wish to apply my model to some other text; it's a multi label classification model.

This is how I do with just one sentence:

test_comment = ["We’re exceptionally proud of the 62,000 employees who work in our restaurants, along with the hundreds of Russian suppliers who support our business, and our local franchisees. "]


# tokenizing comment ^
encoding = tokenizer.encode_plus(
  test_comment,
  add_special_tokens=True,
  max_length=512,
  return_token_type_ids=False,
  padding="max_length",
  return_attention_mask=True,
  return_tensors='pt',
)

# returning probability values for each label
_, test_prediction = trained_model(encoding["input_ids"], encoding["attention_mask"])
test_prediction = test_prediction.flatten().numpy()

for label, prediction in zip(LABEL_COLUMNS, test_prediction):
  print(f"{label}: {prediction}",)

#[0 if x <= 0.5 else 1 for x in test_prediction]

Which returns

morality_binary: 0.12542158365249634
emotion_binary: 0.16170987486839294
positive_binary: 0.13724404573440552
negative_binary: 0.06993409991264343
care_binary: 0.06901352107524872
fairness_binary: 0.0649697408080101
authority_binary: 0.05470539629459381
sanctity_binary: 0.03908411040902138
harm_binary: 0.05327978357672691
injustice_binary: 0.057351987808942795
betrayal_binary: 0.03698693960905075
subversion_binary: 0.05460885167121887
degradation_binary: 0.04987286403775215

Now, say I have a dataset with a structure such as

ID      sample_text
1       lorem ipsum dala dulu
2       lorem ipsum dala dulu etc
3       lorem ipsum dala dulu etc
4       lorem ipsum dala dulu etc
5       lorem ipsum dala dulu etc

And I wanted the model to make a prediction for each row and add each prediction as a new column, something like

ID      sample_text                 morality_binary    positive_binary   negative_binary 
1       lorem ipsum dala dulu       0.13455            0.43455           0.26455
2       lorem ipsum dala dulu etc   0.12145            0.43455           0.87455
3       lorem ipsum dala dulu etc   0.03455            0.63455           0.37455
4       lorem ipsum dala dulu etc   0.41455            0.83455           0.81455
5       lorem ipsum dala dulu etc   0.73455            0.93455           0.5455

I have a feeling that it is not too difficult, I just can't wrap my head around it.

Thanks a million for any help you might provide!

CodePudding user response：

Since you didn't provide a minimal reproducible example I cannot confirm this will work, but theoretically it should, assuming your output is a list-like. I also assume your model is a black-box:

First, wrap your model in a function:

# outputs some list-like result
def run_model(input_data):
    ...

Then, apply the function to each row:

df[LABEL_COLUMNS] = df[['sample_text']].apply(run_model, axis=1, result_type='expand')

However, it's not super clear how your model works or what the expected input is, and whether or not you can operate on multiple inputs.