Grouping multiple columns to find max value from other column and representing them columnwise-CodePudding

Existing dataframe :

id          sub-id          amount
01            A              100
01            A               50
01            B               10
01            B                5
01            B               50
01            C               10
02            A               25
02            A               10

Expected Dataframe :

id          sub-id         A_max        B_max      C_max
01            A             100          50         10
02            A             25            0          0

Please note their can total 5 sub-id which may or may not be present for every id

with this code df.loc[df.groupby(['id','sub-id'])['amount'].idxmax()].reset_index(drop=True) i can get the answer but i need it in the above expected format

please guide

@jezrael

adding up further if we have below table :

         sub-id         A_max        B_max      C_max
           A             100          50         10
           A             25            0          0
           B             250          10         50
           B             100          50         70

and expected output we need is :

         sub-id         A_max        B_max      C_max
           A             100          50         10
           B             250          50         70

how to get the same..?

CodePudding user response：

Use DataFrame.pivot_table with aggregate max and then add to sub-id values - if need first value per id use:

df2 = df.pivot_table(index='id', 
                     columns='sub-id', 
                     values='amount', 
                     aggfunc='max', fill_value=0).add_suffix('_max')
df = df.groupby('id', as_index=False)['sub-id'].first().join(df2, on='id')
print (df)
   id sub-id  A_max  B_max  C_max
0   1      A    100     50     10
1   2      A     25      0      0

Or if need id with maximal amount use DataFrameGroupBy.idxmax:

df2 = df.pivot_table(index='id', 
                     columns='sub-id', 
                     values='amount', 
                     aggfunc='max', fill_value=0).add_suffix('_max')

df = df.loc[df.groupby('id')['amount'].idxmax(), ['id','sub-id']].join(df2, on='id')
print (df)
   id sub-id  A_max  B_max  C_max
0   1      A    100     50     10
6   2      A     25      0      0

EDIT: If need maximal values per sub-id use:

df1 = df.groupby('sub-id', as_index=False).max()