Aggregate Groups in Pandas by highest value from column group-CodePudding

Python 3.10 with Pandas 1.3.4

Using the following code I create a dataframe, and then sort it to "group" the Id's together, and only display each unique Id one time.

import pandas as pd

arrays = [['123', '228', '358', '123', '358', '358','433','496', '496', '496'],
         ['James Smith', 'John Smith', 'John Smith', 'Johnny Smith', 'Jonathan Smith', 'John B. Smith', 'Jon Smithson', 'John Smythe','John Smith', 'John A. Smithy'],
         [0.56, 0.98, 1.0, 0.65, 0.77, 1.0, 0.80, 0.97, 1.0, 0.99],
         ['AM', 'WL','WL', 'PP', 'PP', 'AM', 'AM', 'WL', 'WL', 'PP']]
index = pd.MultiIndex.from_arrays(arrays, names=('Id', 'Name', 'Score', 'Dept'))
df = pd.DataFrame(index=index)
df = df.sort_values('Id')
df

Which yields:

Now I need to take this a step further, and display a dataframe which:

Gets the highest score from each group.

Using that list of highest scores from each group, sort the groups in descending order.

If two groups' highest scores are identical, then sort by Id in ascending order.

Create a new column 0 called "Highest Score" which displays that highest score from that group - display each highest score value only once per group, similar to Id.

Each group's set of scores should also be sorted in descending order.

So it would look like this:

CodePudding user response：

you can do using set_index to append the highest score per group, calculated with transform and max. then reorder_levels of the index as wanted, and sort_values, specifying ascending False or True for each level.

new_df = (
    df.set_index(df.reset_index('Score').groupby('Id')
                   ['Score'].transform(max).rename('Highest_Score'), 
                 append=True)
      .reorder_levels(['Highest_Score']   df.index.names)
      .sort_values(['Highest_Score','Id','Score'], ascending=[False,True,False])
)