Aggregate functions on a 3-level pandas grupby object-CodePudding

I want to make a new df with simple metrics like mean, sum, min, max calculated on the Value column in the df visible below, grouped by ID, Date and Key.

index	ID	Key	Date	Value	x	y	z
0	655	321	2021-01-01	50	546	235	252345
1	675	321	2021-01-01	50	345	345	34545
2	654	356	2021-02-02	70	345	346	543

I am doing it like this:

final = df.groupby(['ID','Date','Key'])['Value'].first().mean(level=[0,1]).reset_index().rename(columns={'Value':'Value_Mean'})

I use .first() because one Key can occur multiple times in the df but they all have the same Value. I want to aggregate on ID and Date so I am using level=[0,1].

and then I am adding next metrics with pandas merge as:

final = final.merge(df.groupby(['ID','Date','Key'])['Value'].first().max(level=[0,1]).reset_index().rename(columns={'Value':'Value_Max'}), on=['ID','Date'])

And I go like that with other metrics. I wonder if there is a more sophisticated way to do it than repeat it in multiple lines. I know that you can use .agg() and pass a dict with functions but it seems like in that way it isn't possible to specify the level which is important here.

CodePudding user response：

Use DataFrame.drop_duplicates with named aggregation:

df = pd.DataFrame({'ID':[655,655,655,675,654], 'Key':[321,321,333,321,356], 
                  'Date':['2021-01-01','2021-01-01','2021-01-01','2021-01-01','2021-02-02'],
                   'Value':[50,30,10,50,70]})
print (df)
    ID  Key        Date  Value
0  655  321  2021-01-01     50
1  655  321  2021-01-01     30
2  655  333  2021-01-01     10
3  675  321  2021-01-01     50
4  654  356  2021-02-02     70

final = (df.drop_duplicates(['ID','Date','Key'])
           .groupby(['ID','Date'], as_index=False).agg(Value_Mean=('Value','mean'),
                                                       Value_Max=('Value','max')))
print (final)
    ID        Date  Value_Mean  Value_Max
0  654  2021-02-02          70         70
1  655  2021-01-01          30         50
2  675  2021-01-01          50         50

final = (df.groupby(['ID','Date','Key'], as_index=False)
           .first()
           .groupby(['ID','Date'], as_index=False).agg(Value_Mean=('Value','mean'),
                                                       Value_Max=('Value','max')))

print (final)
    ID        Date  Value_Mean  Value_Max
0  654  2021-02-02          70         70
1  655  2021-01-01          30         50
2  675  2021-01-01          50         50

df = (df.groupby(['ID','Date','Key'], as_index=False)
      .first()
        .groupby(['ID','Date'], as_index=False)['Value']
        .agg(['mean', 'max'])
        .add_prefix('Value_')
        .reset_index())
print (df)
    ID        Date  Value_Mean  Value_Max
0  654  2021-02-02          70         70
1  655  2021-01-01          30         50
2  675  2021-01-01          50         50