Home > Net >  Select Pandas dataframe row where two or more columns have their maximum value together
Select Pandas dataframe row where two or more columns have their maximum value together

Time:11-19

Suppose you have a pandas.DataFrame like so:

Institution Feat1 Feat2 Feat3 ...
ID1 14.5 0 0.32 ...
ID2 322.12 1 0.94 ...
ID3 27.08 0 1.47 ...

My question is simple: how would one select rows from this dataframe based on the maximum combined values from two or more columns. For example:

  • I want to select rows where the columns Feat1and Feat3 have their maximum value together, returning:
Institution Feat1 Feat2 Feat3 ...
ID2 322.12 1 0.94 ...

I am certain a good old for loop can take care of the problem given a little time, but I believe there must be a Pandas function for that, hope someone point me in the right direction.

CodePudding user response:

You can play arround with:

df.sum(axis=1)

df['row_sum'] = df.sum(axis=1)

or

df['sum'] = df['col1' ] df['col3']

And then:

df.sort(['sum' ],ascending=[False or True])

df.sort_index()

CodePudding user response:

You can do it with slicing:

output = df.loc[(df['Feat1']   df['Feat3']).to_frame().idxmax(),:]

This outputs:

  Institution   Feat1  Feat2  Feat3
1         ID2  322.12      1   0.94

Alternatively you can always create a column and slice through it, but this would require a bit of an extra effort.

df['filter'] = df['Feat1']   df['Feat3']
output = df[df['filter'] == df['filter'].max()]
  • Related