So I have a dataframe as follows:
import pandas as pd
import numpy as np
df = pd.DataFrame(np.array([[1, 2, 3], [4, 3, 6], [7, 2, 9]]),
columns=['a', 'b', 'c'])
df
Output:
a | b | c |
---|---|---|
1 | 2 | 3 |
4 | 3 | 6 |
7 | 2 | 9 |
I want to select or keep the two columns, with the highest values in the last row. What is the best way to approach? So in fact I just want to select or keep column 'a' due to value 7 and column 'c' due to value 9.
CodePudding user response:
Try:
df = df[df.iloc[-1].nlargest(2).index]
Output:
c a
0 3 1
1 6 4
2 9 7
CodePudding user response:
If you want to keep original column sequence as well, you can use Index.intersection()
together with .nlargest()
, as follows:
df[df.columns.intersection(df.iloc[-1].nlargest(2).index, sort=False)]
Result:
a c
0 1 3
1 4 6
2 7 9