map probabilities and classes (multiclass) to dataframe most efficiently-CodePudding

I have trained an xgboost nulticlass classifier. I want predictions for the class and for the probabilities

Let's say I have:

import pandas as pd
import numpy as np

result = pd.DataFrame({'id': [1,2,3,4], 'Pred class': ['a', 'b', 'c', 'c']})
predictions = np.array([[0.2, 0.3, 0.5],
                        [0.1, 0.5, 0.4], 
                        [0.7, 0.2, 0.1],
                        [0.4, 0.2, 0.6]])

I am finding the indices of the maximum probabilities:

max_probs = np.argmax(predictions, axis=1)

and I am creating a list with the maximum probabilities in each class:

res = []
for idx, (el, el2) in enumerate(zip(predictions, max_probs)):
    res.append(predictions[idx, max_probs[idx] ] * 100)

Then, I add the result to original dataframe:

result['probs'] = res

and I have:

  id Pred class  probs
  1          a   50.0
  2          b   50.0
  3          c   70.0
  4          c   60.0

Which is most efficient way to do this for a larger dataframe?

CodePudding user response：

Here is one way to do it using Pandas max:

result["probs"] = pd.DataFrame(predictions).max(axis=1) * 100

print(result)
# Output
   id Pred class  probs
0   1          a   50.0
1   2          b   50.0
2   3          c   70.0
3   4          c   60.0