Hello I have prepared a MultiIndex table in Pandas that looks like this:
Lang C java python All
Corp Name
ASW ASW 0.0 7.0 8.0 15
Cristiano NaN NaN 8.0 8
Michael NaN 7.0 0 7
Facebook Facebook 8.0 1.0 5.0 14
Piter 8.0 NaN NaN 8
Cristiano NaN NaN 3.0 3
Michael NaN 1.0 2.0 3
Google Google 2.0 24.0 1.0 27
Michael NaN 24.0 NaN 24
Piter 2.0 NaN NaN 2
Cristiano NaN NaN 1.0 1
Now I would like to Sort group of rows where sum of Corp(in column "All') is sorted decsending, then I would like to select only the two index "Corp"(and their rows) which are the largest, It should looks like:
Lang C java python All
Corp Name
Google Google 2.0 24.0 1.0 27
Michael NaN 24.0 NaN 24
Piter 2.0 NaN NaN 2
Cristiano NaN NaN 1.0 1
ASW ASW 0.0 7.0 8.0 15
Cristiano NaN NaN 8.0 8
Michael NaN 7.0 0 7
Thank You!
CodePudding user response:
IIUC, you can sort_values
per group, then slice using the max sum of All per group:
out = (df
# for each company, sort the values using the All column in descending order
.groupby(level=0).apply(lambda g: g.sort_values('All', ascending=False))
# calculate the sum or All per company
# get the index of the top 2 companies (nlargest(2))
# slice to keep only those
.loc[lambda d: d.groupby(level=0)['All'].sum().nlargest(2).index]
)
output:
Lang C java python All
Corp Name
Google Google 2.0 24.0 1.0 27
Michael NaN 24.0 NaN 24
Piter 2.0 NaN NaN 2
Cristiano NaN NaN 1.0 1
Facebook Facebook 8.0 1.0 5.0 14
Piter 8.0 NaN NaN 8
Cristiano NaN NaN 3.0 3
Michael NaN 1.0 2.0 3