I have the following pandas dataframe:
ID | Class | LR | XG | SV | BEST_R2 |
---|---|---|---|---|---|
1 | Class1 | .76 | .78 | .99 | .99 |
2 | Class2 | .92 | .89 | .91 | .92 |
3 | Class3 | .87 | .95 | .87 | .95 |
This is a dataframe with the R2 of each of a series of machine learning models (LR/XG/SV) for each ID. The column "BEST_R2" represents the best R2 score for that ID across models (.max(axis=1)). I need another column with the model name for best score. For example, the dataframe below. Any tips on how to achieve this programmatically?
ID | Class | LR | XG | SV | BEST_R2 | BEST MODEL |
---|---|---|---|---|---|---|
1 | Class1 | .76 | .78 | .99 | .99 | SV |
2 | Class2 | .92 | .89 | .91 | .92 | LR |
3 | Class3 | .87 | .95 | .87 | .95 | XG |
CodePudding user response:
Assuming that ID
is the index, you can do
df["Best Model"] = df[["LR", "XG", "SV"]].idxmax(axis=1)
Result:
LR XG SV BEST_R2 Best Model
ID
1 0.76 0.78 0.99 0.99 SV
2 0.92 0.89 0.91 0.92 LR
3 0.87 0.95 0.87 0.95 XG
CodePudding user response:
Here is a way to do it:
best_models = []
for row in df.itertuples():
tuples = sorted([('LR',row.LR), ('XG',row.XG), ('SV',row.SV)],reverse = True, key = lambda x:x[1])
best_models.append(tuples[0][0])
df['Best Models'] = best_models
df