I understand the following will return the highest value in a row and will store the column ID in a new column called 'Top Value'
df['Top Value'] = df[['VolumeOIH','Volume KRE']].idxmax(axis=1)
Now how do I do the same but with the top 3 columnID values
This is my dataframe:
VolumeKBE VolumeKRE VolumeIYR VolumeITB VolumeSMH
0 2722.0 51852.0 10873.0 28562.0 84673.0
1 2500.0 54027.0 7157.0 11278.0 42034.0
2 2279.0 46517.0 1700.0 20291.0 64202.0
3 8200.0 43994.0 7500.0 34564.0 260018.0
4 9688.0 52993.0 4400.0 25912.0 79126.0
.. ... ... ... ... ...
64 1200.0 11411.0 19891.0 29535.0 37648.0
65 3500.0 17334.0 24248.0 25006.0 58842.0
66 1200.0 16353.0 23023.0 30704.0 118051.0
67 5700.0 13611.0 12139.0 22182.0 35798.0
68 578.0 11291.0 5780.0 27310.0 68584.0
CodePudding user response:
For top3 columns by values use numpy.argsort
with converting columns and values of DataFrame to numpy array:
N = 3
c = df.columns.to_numpy()
topN = c[np.argsort(-df.to_numpy())[:, :N]]
cols = [f'top{x 1}' for x in range(N)]
df = pd.DataFrame(topN, index=df.index, columns=cols)
print (df)
top1 top2 top3
0 VolumeSMH VolumeKRE VolumeITB
1 VolumeKRE VolumeSMH VolumeITB
2 VolumeSMH VolumeKRE VolumeITB
3 VolumeSMH VolumeKRE VolumeITB
4 VolumeSMH VolumeKRE VolumeITB
64 VolumeSMH VolumeITB VolumeIYR
65 VolumeSMH VolumeITB VolumeIYR
66 VolumeSMH VolumeITB VolumeIYR
67 VolumeSMH VolumeITB VolumeKRE
68 VolumeSMH VolumeITB VolumeKRE