Suppose you have a pandas.DataFrame
like so:
Institution | Feat1 | Feat2 | Feat3 | ... |
---|---|---|---|---|
ID1 | 14.5 | 0 | 0.32 | ... |
ID2 | 322.12 | 1 | 0.94 | ... |
ID3 | 27.08 | 0 | 1.47 | ... |
My question is simple: how would one select rows from this dataframe based on the maximum combined values from two or more columns. For example:
- I want to select rows where the columns
Feat1
andFeat3
have their maximum value together, returning:
Institution | Feat1 | Feat2 | Feat3 | ... |
---|---|---|---|---|
ID2 | 322.12 | 1 | 0.94 | ... |
I am certain a good old for loop can take care of the problem given a little time, but I believe there must be a Pandas function for that, hope someone point me in the right direction.
CodePudding user response:
You can play arround with:
df.sum(axis=1)
df['row_sum'] = df.sum(axis=1)
or
df['sum'] = df['col1' ] df['col3']
And then:
df.sort(['sum' ],ascending=[False or True])
df.sort_index()
CodePudding user response:
You can do it with slicing:
output = df.loc[(df['Feat1'] df['Feat3']).to_frame().idxmax(),:]
This outputs:
Institution Feat1 Feat2 Feat3
1 ID2 322.12 1 0.94
Alternatively you can always create a column and slice through it, but this would require a bit of an extra effort.
df['filter'] = df['Feat1'] df['Feat3']
output = df[df['filter'] == df['filter'].max()]