Home > Mobile >  Aggregate functions on a 3-level pandas grupby object
Aggregate functions on a 3-level pandas grupby object

Time:06-02

I want to make a new df with simple metrics like mean, sum, min, max calculated on the Value column in the df visible below, grouped by ID, Date and Key.

index ID Key Date Value x y z
0 655 321 2021-01-01 50 546 235 252345
1 675 321 2021-01-01 50 345 345 34545
2 654 356 2021-02-02 70 345 346 543

I am doing it like this:

final = df.groupby(['ID','Date','Key'])['Value'].first().mean(level=[0,1]).reset_index().rename(columns={'Value':'Value_Mean'})

I use .first() because one Key can occur multiple times in the df but they all have the same Value. I want to aggregate on ID and Date so I am using level=[0,1].

and then I am adding next metrics with pandas merge as:

final = final.merge(df.groupby(['ID','Date','Key'])['Value'].first().max(level=[0,1]).reset_index().rename(columns={'Value':'Value_Max'}), on=['ID','Date'])

And I go like that with other metrics. I wonder if there is a more sophisticated way to do it than repeat it in multiple lines. I know that you can use .agg() and pass a dict with functions but it seems like in that way it isn't possible to specify the level which is important here.

CodePudding user response:

Use DataFrame.drop_duplicates with named aggregation:

df = pd.DataFrame({'ID':[655,655,655,675,654], 'Key':[321,321,333,321,356], 
                  'Date':['2021-01-01','2021-01-01','2021-01-01','2021-01-01','2021-02-02'],
                   'Value':[50,30,10,50,70]})
print (df)
    ID  Key        Date  Value
0  655  321  2021-01-01     50
1  655  321  2021-01-01     30
2  655  333  2021-01-01     10
3  675  321  2021-01-01     50
4  654  356  2021-02-02     70

final = (df.drop_duplicates(['ID','Date','Key'])
           .groupby(['ID','Date'], as_index=False).agg(Value_Mean=('Value','mean'),
                                                       Value_Max=('Value','max')))
print (final)
    ID        Date  Value_Mean  Value_Max
0  654  2021-02-02          70         70
1  655  2021-01-01          30         50
2  675  2021-01-01          50         50

final = (df.groupby(['ID','Date','Key'], as_index=False)
           .first()
           .groupby(['ID','Date'], as_index=False).agg(Value_Mean=('Value','mean'),
                                                       Value_Max=('Value','max')))

print (final)
    ID        Date  Value_Mean  Value_Max
0  654  2021-02-02          70         70
1  655  2021-01-01          30         50
2  675  2021-01-01          50         50

df = (df.groupby(['ID','Date','Key'], as_index=False)
      .first()
        .groupby(['ID','Date'], as_index=False)['Value']
        .agg(['mean', 'max'])
        .add_prefix('Value_')
        .reset_index())
print (df)
    ID        Date  Value_Mean  Value_Max
0  654  2021-02-02          70         70
1  655  2021-01-01          30         50
2  675  2021-01-01          50         50
  • Related