Select Pandas dataframe row where two or more columns have their maximum value together-CodePudding

Suppose you have a pandas.DataFrame like so:

Institution	Feat1	Feat2	Feat3	...
ID1	14.5	0	0.32	...
ID2	322.12	1	0.94	...
ID3	27.08	0	1.47	...

My question is simple: how would one select rows from this dataframe based on the maximum combined values from two or more columns. For example:

I want to select rows where the columns Feat1and Feat3 have their maximum value together, returning:

Institution	Feat1	Feat2	Feat3	...
ID2	322.12	1	0.94	...

I am certain a good old for loop can take care of the problem given a little time, but I believe there must be a Pandas function for that, hope someone point me in the right direction.

CodePudding user response：

You can play arround with:

df.sum(axis=1)

df['row_sum'] = df.sum(axis=1)

df['sum'] = df['col1' ] df['col3']

And then:

df.sort(['sum' ],ascending=[False or True])

df.sort_index()

CodePudding user response：

You can do it with slicing:

output = df.loc[(df['Feat1']   df['Feat3']).to_frame().idxmax(),:]

This outputs:

  Institution   Feat1  Feat2  Feat3
1         ID2  322.12      1   0.94

Alternatively you can always create a column and slice through it, but this would require a bit of an extra effort.

df['filter'] = df['Feat1']   df['Feat3']
output = df[df['filter'] == df['filter'].max()]