I have trained an xgboost nulticlass classifier. I want predictions for the class and for the probabilities
Let's say I have:
import pandas as pd
import numpy as np
result = pd.DataFrame({'id': [1,2,3,4], 'Pred class': ['a', 'b', 'c', 'c']})
predictions = np.array([[0.2, 0.3, 0.5],
[0.1, 0.5, 0.4],
[0.7, 0.2, 0.1],
[0.4, 0.2, 0.6]])
I am finding the indices of the maximum probabilities:
max_probs = np.argmax(predictions, axis=1)
and I am creating a list with the maximum probabilities in each class:
res = []
for idx, (el, el2) in enumerate(zip(predictions, max_probs)):
res.append(predictions[idx, max_probs[idx] ] * 100)
Then, I add the result to original dataframe:
result['probs'] = res
and I have:
id Pred class probs
1 a 50.0
2 b 50.0
3 c 70.0
4 c 60.0
Which is most efficient way to do this for a larger dataframe?
CodePudding user response:
Here is one way to do it using Pandas max:
result["probs"] = pd.DataFrame(predictions).max(axis=1) * 100
print(result)
# Output
id Pred class probs
0 1 a 50.0
1 2 b 50.0
2 3 c 70.0
3 4 c 60.0