Home > database >  New dataframe column using existing column names as values
New dataframe column using existing column names as values

Time:08-13

I have the following pandas dataframe:

ID Class LR XG SV BEST_R2
1 Class1 .76 .78 .99 .99
2 Class2 .92 .89 .91 .92
3 Class3 .87 .95 .87 .95

This is a dataframe with the R2 of each of a series of machine learning models (LR/XG/SV) for each ID. The column "BEST_R2" represents the best R2 score for that ID across models (.max(axis=1)). I need another column with the model name for best score. For example, the dataframe below. Any tips on how to achieve this programmatically?

ID Class LR XG SV BEST_R2 BEST MODEL
1 Class1 .76 .78 .99 .99 SV
2 Class2 .92 .89 .91 .92 LR
3 Class3 .87 .95 .87 .95 XG

CodePudding user response:

Assuming that ID is the index, you can do

df["Best Model"] = df[["LR", "XG", "SV"]].idxmax(axis=1)

Result:

      LR    XG    SV  BEST_R2 Best Model
ID                                      
1   0.76  0.78  0.99     0.99         SV
2   0.92  0.89  0.91     0.92         LR
3   0.87  0.95  0.87     0.95         XG

CodePudding user response:

Here is a way to do it:

best_models = []
for row in df.itertuples():
    tuples = sorted([('LR',row.LR), ('XG',row.XG), ('SV',row.SV)],reverse = True, key = lambda x:x[1])
    best_models.append(tuples[0][0])
df['Best Models'] = best_models
df

enter image description here

  • Related